From patchwork Wed Sep 25 16:12:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161015 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 383A61709 for ; Wed, 25 Sep 2019 16:13:58 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1580221D7E for ; Wed, 25 Sep 2019 16:13:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="pHWGu9UF"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="sVEBaPJK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1580221D7E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=3jGcRqjcvQ9VYoDI7zULExA+idF1aDwZPkvN5HzOSRI=; b=pHWGu9UFnqg/87 H5lXRfJ9AmZTVaBYQnjH/U8c54WSPrs1a87dQogL1npR9HH33X1ucboQWxoITKyh6fhIbl0STGRW6 eqx9Kt8IT/gE9ja7fq3y0N43RzxPsz4owlg1nytUrYj0jtHJ//C3PPngBWoyK8JCdyJvjuXQVGPbb IgZ2Rm7wXHUb87RAQuMZlKDHs2LWhUAMfAC3S1HRip+OntlNj8IiPtjKqf/h84sOEgr5BQohlMmSo OGhK2QCEDtRkWvo6NJq2qbA3OayPDkMI66mXS0jv2pK+DGH+Fn6MOkgvJIDNqyqacbJEknkoSM0Bh T8t/eUd2OF6NNt5gKB5Q==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vN-0003sz-Nh; Wed, 25 Sep 2019 16:13:53 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vJ-0003s2-Nu for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:13:51 +0000 Received: by mail-wm1-x341.google.com with SMTP id i16so6392391wmd.3 for ; Wed, 25 Sep 2019 09:13:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Tn6PuyCvHzBYqw98o/h7jtBUwnaA2ySt5tvs6NRZxxw=; b=sVEBaPJKCR8iK0Ti7MEQI9T+ABca2s8phVUtojeWGaOvvOnjGzCAWbIcwzTp83jDXY m5r+f8qSrlBt5lQOpjo4PbCjfXkKE1E7AhkpXc7Gewyt0NaJ1VnNP6JhkqSfMoqmY7GF j8xyagkcnqsJPh13J/t3KvX77GFGnCn2aaaOo208psPIV4dtHsVaEJ/ehJH3zSYUCemn 4MlEFV8zbH5FzW68MLAwQC6BkiilCT/nnAaN30Tx0lJxhAMMEHA1RjS7B96k4pywgs/l E8RSZdVJ1XxGKuaBONZ5Q7tUfH03ni0EiUQRWdXTdNNWtPojWjjnjsAtJEFlLJjTztif q7sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Tn6PuyCvHzBYqw98o/h7jtBUwnaA2ySt5tvs6NRZxxw=; b=XG7n9qHRYUEH8Y9RBBXM4Y7aJYyOv1x/aqsDBH6bGfPapDadFc8aPwe8rEfAS4RI47 Tpsw3J6F1/K7UtEr7GF9mvaWP8qmThZpbaJmVodccymwzYAUWWytSVWHqGO3WyjSXs5i xLuztyiiphloNEmVU1mMjA6RUQLXQgWBxaq0DvsxOqwQYKcVt4Y71j8WUZbbm2BL4yjx /TZk6zgnld2HUIUPi8DfiZMRjMxq/LAnjStLwMIK94uH9g9GkQcSsO+ecTyPEIO6yNfu bXZhm7cNNUHMzyLH1Sju/oxA/+iIiOVQdZJ6KsParqrFi/Ps2VavO/2I61eN+EJOC5f6 p6Ow== X-Gm-Message-State: APjAAAWo5wp/RzxKHv7rG46D7JEn4vE6K5m7yBUsfD9pdaQVSIk/hKm0 nclk7xtZ57x7RxqHZIhHy+oSJg== X-Google-Smtp-Source: APXvYqyiGoS0vuNCTz14AtTm1UOuqGwmutm4W/PM/ZgKxsJ2voQXgcF4hUOWb9WVVS3FaxjAQLdoPg== X-Received: by 2002:a1c:a516:: with SMTP id o22mr5410877wme.116.1569428028214; Wed, 25 Sep 2019 09:13:48 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:47 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 01/18] crypto: shash - add plumbing for operating on scatterlists Date: Wed, 25 Sep 2019 18:12:38 +0200 Message-Id: <20190925161255.1871-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091349_779370_4E818665 X-CRM114-Status: GOOD ( 16.78 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:341 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Add an internal method to the shash interface that permits templates to invoke it with a scatterlist. Drivers implementing the shash interface can opt into using this method, making it more straightforward for templates to pass down data provided via scatterlists without forcing the underlying shash to process each scatterlist entry with a discrete update() call. This will be used later in the SIMD accelerated Poly1305 to amortize SIMD begin()/end() calls over the entire input. Signed-off-by: Ard Biesheuvel --- crypto/ahash.c | 18 +++++++++++++++ crypto/shash.c | 24 ++++++++++++++++++++ include/crypto/hash.h | 3 +++ include/crypto/internal/hash.h | 19 ++++++++++++++++ 4 files changed, 64 insertions(+) diff --git a/crypto/ahash.c b/crypto/ahash.c index 3815b363a693..aecb48f0f50c 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -144,6 +144,24 @@ int crypto_hash_walk_first(struct ahash_request *req, } EXPORT_SYMBOL_GPL(crypto_hash_walk_first); +int crypto_shash_walk_sg(struct shash_desc *desc, struct scatterlist *sg, + int nbytes, struct crypto_hash_walk *walk, int flags) +{ + walk->total = nbytes; + + if (!walk->total) { + walk->entrylen = 0; + return 0; + } + + walk->alignmask = crypto_shash_alignmask(desc->tfm); + walk->sg = sg; + walk->flags = flags; + + return hash_walk_new_entry(walk); +} +EXPORT_SYMBOL_GPL(crypto_shash_walk_sg); + int crypto_ahash_walk_first(struct ahash_request *req, struct crypto_hash_walk *walk) { diff --git a/crypto/shash.c b/crypto/shash.c index e83c5124f6eb..b16ab5590dc4 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -121,6 +121,30 @@ int crypto_shash_update(struct shash_desc *desc, const u8 *data, } EXPORT_SYMBOL_GPL(crypto_shash_update); +int crypto_shash_update_from_sg(struct shash_desc *desc, struct scatterlist *sg, + unsigned int len, bool atomic) +{ + struct crypto_shash *tfm = desc->tfm; + struct shash_alg *shash = crypto_shash_alg(tfm); + struct crypto_hash_walk walk; + int flags = 0; + int nbytes; + + if (!atomic) + flags = CRYPTO_TFM_REQ_MAY_SLEEP; + + if (shash->update_from_sg) + return shash->update_from_sg(desc, sg, len, flags); + + for (nbytes = crypto_shash_walk_sg(desc, sg, len, &walk, flags); + nbytes > 0; + nbytes = crypto_hash_walk_done(&walk, nbytes)) + nbytes = crypto_shash_update(desc, walk.data, nbytes); + + return nbytes; +} +EXPORT_SYMBOL_GPL(crypto_shash_update_from_sg); + static int shash_final_unaligned(struct shash_desc *desc, u8 *out) { struct crypto_shash *tfm = desc->tfm; diff --git a/include/crypto/hash.h b/include/crypto/hash.h index ef10c370605a..0b83d85a3828 100644 --- a/include/crypto/hash.h +++ b/include/crypto/hash.h @@ -158,6 +158,7 @@ struct shash_desc { * struct shash_alg - synchronous message digest definition * @init: see struct ahash_alg * @update: see struct ahash_alg + * @update_from_sg: variant of update() taking a scatterlist as input [optional] * @final: see struct ahash_alg * @finup: see struct ahash_alg * @digest: see struct ahash_alg @@ -175,6 +176,8 @@ struct shash_alg { int (*init)(struct shash_desc *desc); int (*update)(struct shash_desc *desc, const u8 *data, unsigned int len); + int (*update_from_sg)(struct shash_desc *desc, struct scatterlist *sg, + unsigned int len, int flags); int (*final)(struct shash_desc *desc, u8 *out); int (*finup)(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out); diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index bfc9db7b100d..6f4bfa057bea 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -50,6 +50,8 @@ extern const struct crypto_type crypto_ahash_type; int crypto_hash_walk_done(struct crypto_hash_walk *walk, int err); int crypto_hash_walk_first(struct ahash_request *req, struct crypto_hash_walk *walk); +int crypto_shash_walk_sg(struct shash_desc *desc, struct scatterlist *sg, + int nbytes, struct crypto_hash_walk *walk, int flags); int crypto_ahash_walk_first(struct ahash_request *req, struct crypto_hash_walk *walk); @@ -242,5 +244,22 @@ static inline struct crypto_shash *__crypto_shash_cast(struct crypto_tfm *tfm) return container_of(tfm, struct crypto_shash, base); } +/** + * crypto_shash_update_from_sg() - add data from a scatterlist to message digest + * for processing + * @desc: operational state handle that is already initialized + * @data: scatterlist with input data to be added to the message digest + * @len: length of the input data + * @atomic: whether or not the call is permitted to sleep + * + * Updates the message digest state of the operational state handle. + * + * Context: Any context. + * Return: 0 if the message digest update was successful; < 0 if an error + * occurred + */ +int crypto_shash_update_from_sg(struct shash_desc *desc, struct scatterlist *sg, + unsigned int len, bool atomic); + #endif /* _CRYPTO_INTERNAL_HASH_H */ From patchwork Wed Sep 25 16:12:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161059 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BD0C1599 for ; Wed, 25 Sep 2019 16:15:00 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 481EB21D7B for ; Wed, 25 Sep 2019 16:15:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="kcQfUrCW"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="s3m7b9na" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 481EB21D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=8Fggy79YsgAyRyEn0xNN9qpIM84wg9TTU39OD9sqqLw=; b=kcQfUrCW+jpC9s ufxb3xIFEDTR4REkR3ONJOp9ULaFn0+/BGuJZ745zZEvdpSkSqGdw27JReGiWDobvKapYdXEZKx3g Ml/TDqULACHPlBZl1z2mm4xnHR/j2yL2qZbJqoAvVkhU0x0rBj7c9BSb6HiviT3t6Xb5ijgJARlSz 25YQdbzP4FIvzsi/cdm0D0Yjf2XLfrcch3ObbB5y3IePGmwMozEH0OWxBhAoZqQPSUcyEi3yZ76kP FF9TTygmoAxo6w6N6JSwCW5BFbJYfVIMwC/gXaQaZ8/2h7DoIiTquzTI0hrrN6yFYyoeQxUjA9Yy1 uw6F+ujLlFE8TKlHubZQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9wO-0004QT-DB; Wed, 25 Sep 2019 16:14:58 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vK-0003sT-OZ for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:13:52 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5so6425923wmg.0 for ; Wed, 25 Sep 2019 09:13:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=D96R9BcQ8eKJUV7xjET5CNTgwensabrJ4To3CfkaNW8=; b=s3m7b9naLnNHrD0f/3+dV9tT2F26ZviX7fM8ksVZppWW7QbSmlRyhtUnKmysWyycxL QRgbfaz9jdwloFQ0zmVeGl860ldSIqTCS24yRKEnUgkxsXZQBieWx7+RsIyOabP5l1NI CaLze6rOX7Ks1YC8g2XOeNtTS5/IG33f917vYpdsKsKh+XoLQsYogu3bdNXE8TKiSMoO lVMRxjAD1pES7Xjw4dmUla/yurD8Lb3HymYsZJKnl29MJ9Ye7dH//ulN/TgabxNdsScb YMXMeYP5mQpbr4owmS5HBlq8lFvmKyx5dR9AUIJbus4U0FMeHDjZ1RVoAnMfdwEKJ7H9 JNOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=D96R9BcQ8eKJUV7xjET5CNTgwensabrJ4To3CfkaNW8=; b=rsLUiIJcC7BNEu5oaDlIzpgq7I4tbiuzDzOqeRM4Y8aQ51sYK5CfP85cL1quVgaaQB Asea6Z1rRcTUV3TH90zEuP5lO2jnUqks+Ed8/UQjDdVZRFMlk7+QIdnNug5M54XgGwJt 6XGF1jB6aGW0tZWcwLHwigF44oIip6b6ei8Nwt/xhKsmclhT4B8ZfSzPMtOesqb7k17F UAWaeXbMlw84xHkJuOyv6bPVO5dg6bIusOfQclxqHS92EIAn4ChCOWeeV+wWWZvRICgM 8XpE9N6ty77KdKmyYV9jJErqkyjBs8o/DlW9+gjDlSzLwcYaj76q/+QdFQUduzO3mKME Feng== X-Gm-Message-State: APjAAAVzQPq9P6LGWdcDqbrKhBrlorEJq/ur0C4Lx5ktz1OLO3kzQI7D P0VmMpuwrPdCqeoba5060ZBypQ== X-Google-Smtp-Source: APXvYqw7KfClP8SL65AqCCS1/nALL/cUk8vlg2o2VxH4LDsP9p1MBordH+gxH4Q/8d+71ltDn1xJjA== X-Received: by 2002:a1c:f602:: with SMTP id w2mr8378409wmc.145.1569428029595; Wed, 25 Sep 2019 09:13:49 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:48 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 02/18] crypto: x86/poly1305 - implement .update_from_sg method Date: Wed, 25 Sep 2019 18:12:39 +0200 Message-Id: <20190925161255.1871-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091350_799214_ABC75DE4 X-CRM114-Status: GOOD ( 14.66 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:341 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org In order to reduce the number of invocations of the RFC7539 template into the Poly1305 driver, implement the new internal .update_from_sg method that allows the driver to amortize the cost of FPU preserve/ restore sequences over a larger chunk of input. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/poly1305_glue.c | 54 ++++++++++++++++---- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index 4a1c05dce950..f2afaa8e23c2 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -115,18 +115,11 @@ static unsigned int poly1305_simd_blocks(struct poly1305_desc_ctx *dctx, return srclen; } -static int poly1305_simd_update(struct shash_desc *desc, - const u8 *src, unsigned int srclen) +static void poly1305_simd_do_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) { - struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); unsigned int bytes; - /* kernel_fpu_begin/end is costly, use fallback for small updates */ - if (srclen <= 288 || !crypto_simd_usable()) - return crypto_poly1305_update(desc, src, srclen); - - kernel_fpu_begin(); - if (unlikely(dctx->buflen)) { bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->buflen); memcpy(dctx->buf + dctx->buflen, src, bytes); @@ -147,12 +140,50 @@ static int poly1305_simd_update(struct shash_desc *desc, srclen = bytes; } - kernel_fpu_end(); - if (unlikely(srclen)) { dctx->buflen = srclen; memcpy(dctx->buf, src, srclen); } +} + +static int poly1305_simd_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + /* kernel_fpu_begin/end is costly, use fallback for small updates */ + if (srclen <= 288 || !crypto_simd_usable()) + return crypto_poly1305_update(desc, src, srclen); + + kernel_fpu_begin(); + poly1305_simd_do_update(desc, src, srclen); + kernel_fpu_end(); + + return 0; +} + +static int poly1305_simd_update_from_sg(struct shash_desc *desc, + struct scatterlist *sg, + unsigned int srclen, + int flags) +{ + bool do_simd = crypto_simd_usable() && srclen > 288; + struct crypto_hash_walk walk; + int nbytes; + + if (do_simd) { + kernel_fpu_begin(); + flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + } + + for (nbytes = crypto_shash_walk_sg(desc, sg, srclen, &walk, flags); + nbytes > 0; + nbytes = crypto_hash_walk_done(&walk, 0)) { + if (do_simd) + poly1305_simd_do_update(desc, walk.data, nbytes); + else + crypto_poly1305_update(desc, walk.data, nbytes); + } + if (do_simd) + kernel_fpu_end(); return 0; } @@ -161,6 +192,7 @@ static struct shash_alg alg = { .digestsize = POLY1305_DIGEST_SIZE, .init = poly1305_simd_init, .update = poly1305_simd_update, + .update_from_sg = poly1305_simd_update_from_sg, .final = crypto_poly1305_final, .descsize = sizeof(struct poly1305_simd_desc_ctx), .base = { From patchwork Wed Sep 25 16:12:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161061 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51208924 for ; Wed, 25 Sep 2019 16:15:24 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2041F21D7E for ; Wed, 25 Sep 2019 16:15:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="SaUEwwBP"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="FgiE7VQi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2041F21D7E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Q1i6WMIigkHEZmyq+wirvvWNtCM+Gp764OplQnWl2s0=; b=SaUEwwBPSRy2jL 070WuEw5wInBuDhAokBVwdirVw8yKx8z+rC9TDROq8AL5QNJCurqxHVq1g0iDeRV10K8zv/LnUTnn IMR802lJ6Acz+DSIHFVRDPOTP17TFuSd//3L/vjbI9vWdbKO1GJPI8XX7f+//Rgko8m0bcDXZEMtn t6ScMS7kCh0zDNIp5csUTLy+oj+wZ4N8L7DjD4aOe79ePoUBwb2wu2YdNI0HjgLP38tnt33GZ8lm4 pExBTX6ifhMuY3OR9QEyh1maZtGGtBhIpaN1O7WPbfnI5yoUan8xIJU7aQIhoV32EnfZerEEaZ+BO uHFVmJZx/bp4DW0ejA+A==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9wp-0005uS-Nr; Wed, 25 Sep 2019 16:15:23 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vO-0003sy-72 for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:00 +0000 Received: by mail-wr1-x441.google.com with SMTP id a11so7676643wrx.1 for ; Wed, 25 Sep 2019 09:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zPskUNhjzzRS3PKRjz9Sl2zmURUYr/9JZBNzrNZmnBc=; b=FgiE7VQiCwbTNewLuNwBu6B6D4y6JL7o5hrchtN+YSMXnwia2/ol6LBB5esNkM+Hkp H6uqlTDWmYQbJcclmcespiGSBfsKqdL//y1QG18S9059fYku0+SpCi8cHINwFW5VbQog U9LC8SylbfYhiTV2mBQUhw7Gmpq9KiLjgHyy7X0PinVOwY2SzYOIS3VH3cW883iNP5P9 dy45up2yZp205XgJsBryvJ5+bT9u4gwVAdrswNQdkIFLg7xkn3Thgby4+f8laysftIoi JFuKUqTF/XBuRX9jih2v+VB411wD6m8MNla9KUZLCVKSJvrzu0fz0S9SDwSZbjMEdTUU Wjjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zPskUNhjzzRS3PKRjz9Sl2zmURUYr/9JZBNzrNZmnBc=; b=A/hePfFnEU0d5cmggA6QlXxPWVnxJR5s0QxABzRH2k///TTWyddqTFwgU4ME5TV+E/ TrQmb1nvqclM9LSO4KP4960yIiFozSxjylKQ4qKSsYJTcn0DdaBHD0EOpTXarC7kOhT2 OnBPj7MrNnqpViM6A7TiCsUxIKnks5jc9ZrW4aVHSIAAswPflDfuC0D7BneEPHOnEgmE cWP3iLj/WISey6vn1juSnEHqstMcsoqeIvhMe5zAmSTZR9ksLMGlc470o8Q1Wb/EfxdK f+DS455iIvQUDFfrpVZX/pIA8C+UeCmq6BVI0jjy7HxHIHaLjTGqor1O1nuja61Ddvda IxZg== X-Gm-Message-State: APjAAAWcd2XvoHYq1cJ13KahG9VyEaeNwGMPca3b/jpH7IZCluyf0UET gRNGX2KpJ+nCRGf4D4jv2/RoMw== X-Google-Smtp-Source: APXvYqz+C+9263L/cpTXowKHCwXKRujYE6V0mWHVgN8GxAORHgQ7+pxHVvaBfA8yrSC2etstfl8Ovw== X-Received: by 2002:adf:fd10:: with SMTP id e16mr10984462wrr.17.1569428031015; Wed, 25 Sep 2019 09:13:51 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:50 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 03/18] crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation Date: Wed, 25 Sep 2019 18:12:40 +0200 Message-Id: <20190925161255.1871-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091354_513253_2B4E0AFE X-CRM114-Status: GOOD ( 13.96 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:441 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Andy Polyakov , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL project. The file 'poly1305-armv4.pl' is taken straight from this upstream GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f, and already contains all the changes required to build it as part of a Linux kernel module. [0] https://github.com/dot-asm/cryptogams Co-developed-by: Andy Polyakov Signed-off-by: Andy Polyakov Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 3 + arch/arm/crypto/Makefile | 7 +- arch/arm/crypto/poly1305-armv4.pl | 1236 ++++++++++++++++++++ arch/arm/crypto/poly1305-core.S_shipped | 1158 ++++++++++++++++++ arch/arm/crypto/poly1305-glue.c | 253 ++++ 5 files changed, 2656 insertions(+), 1 deletion(-) diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index b24df84a1d7a..ae7fc69585f9 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -131,6 +131,9 @@ config CRYPTO_CHACHA20_NEON select CRYPTO_BLKCIPHER select CRYPTO_CHACHA20 +config CRYPTO_POLY1305_ARM + tristate "Accelerated scalar and SIMD Poly1305 hash implementations" + config CRYPTO_NHPOLY1305_NEON tristate "NEON accelerated NHPoly1305 hash function (for Adiantum)" depends on KERNEL_MODE_NEON diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 4180f3a13512..c9d5fab8ad45 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o +obj-$(CONFIG_CRYPTO_POLY1305_ARM) += poly1305-arm.o obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o @@ -54,12 +55,16 @@ ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o +poly1305-arm-y := poly1305-core.o poly1305-glue.o nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o ifdef REGENERATE_ARM_CRYPTO quiet_cmd_perl = PERL $@ cmd_perl = $(PERL) $(<) > $(@) +$(src)/poly1305-core.S_shipped: $(src)/poly1305-armv4.pl + $(call cmd,perl) + $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl $(call cmd,perl) @@ -67,4 +72,4 @@ $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl $(call cmd,perl) endif -clean-files += sha256-core.S sha512-core.S +clean-files += poly1305-core.S sha256-core.S sha512-core.S diff --git a/arch/arm/crypto/poly1305-armv4.pl b/arch/arm/crypto/poly1305-armv4.pl new file mode 100644 index 000000000000..6d79498d3115 --- /dev/null +++ b/arch/arm/crypto/poly1305-armv4.pl @@ -0,0 +1,1236 @@ +#!/usr/bin/env perl +# SPDX-License-Identifier: GPL-1.0+ OR BSD-3-Clause +# +# ==================================================================== +# Written by Andy Polyakov, @dot-asm, initially for the OpenSSL +# project. +# ==================================================================== +# +# IALU(*)/gcc-4.4 NEON +# +# ARM11xx(ARMv6) 7.78/+100% - +# Cortex-A5 6.35/+130% 3.00 +# Cortex-A8 6.25/+115% 2.36 +# Cortex-A9 5.10/+95% 2.55 +# Cortex-A15 3.85/+85% 1.25(**) +# Snapdragon S4 5.70/+100% 1.48(**) +# +# (*) this is for -march=armv6, i.e. with bunch of ldrb loading data; +# (**) these are trade-off results, they can be improved by ~8% but at +# the cost of 15/12% regression on Cortex-A5/A7, it's even possible +# to improve Cortex-A9 result, but then A5/A7 loose more than 20%; + +$flavour = shift; +if ($flavour=~/\w[\w\-]*\.\w+$/) { $output=$flavour; undef $flavour; } +else { while (($output=shift) && ($output!~/\w[\w\-]*\.\w+$/)) {} } + +if ($flavour && $flavour ne "void") { + $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; + ( $xlate="${dir}arm-xlate.pl" and -f $xlate ) or + ( $xlate="${dir}../../perlasm/arm-xlate.pl" and -f $xlate) or + die "can't locate arm-xlate.pl"; + + open STDOUT,"| \"$^X\" $xlate $flavour $output"; +} else { + open STDOUT,">$output"; +} + +($ctx,$inp,$len,$padbit)=map("r$_",(0..3)); + +$code.=<<___; +#ifndef __KERNEL__ +# include "arm_arch.h" +#else +# define __ARM_ARCH__ __LINUX_ARM_ARCH__ +# define __ARM_MAX_ARCH__ __LINUX_ARM_ARCH__ +# define poly1305_init poly1305_init_arm +# define poly1305_blocks poly1305_blocks_arm +# define poly1305_emit poly1305_emit_arm +.globl poly1305_blocks_neon +#endif + +#if defined(__thumb2__) +.syntax unified +.thumb +#else +.code 32 +#endif + +.text + +.globl poly1305_emit +.globl poly1305_blocks +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: +.Lpoly1305_init: + stmdb sp!,{r4-r11} + + eor r3,r3,r3 + cmp $inp,#0 + str r3,[$ctx,#0] @ zero hash value + str r3,[$ctx,#4] + str r3,[$ctx,#8] + str r3,[$ctx,#12] + str r3,[$ctx,#16] + str r3,[$ctx,#36] @ clear is_base2_26 + add $ctx,$ctx,#20 + +#ifdef __thumb2__ + it eq +#endif + moveq r0,#0 + beq .Lno_key + +#if __ARM_MAX_ARCH__>=7 + mov r3,#-1 + str r3,[$ctx,#28] @ impossible key power value +# ifndef __KERNEL__ + adr r11,.Lpoly1305_init + ldr r12,.LOPENSSL_armcap +# endif +#endif + ldrb r4,[$inp,#0] + mov r10,#0x0fffffff + ldrb r5,[$inp,#1] + and r3,r10,#-4 @ 0x0ffffffc + ldrb r6,[$inp,#2] + ldrb r7,[$inp,#3] + orr r4,r4,r5,lsl#8 + ldrb r5,[$inp,#4] + orr r4,r4,r6,lsl#16 + ldrb r6,[$inp,#5] + orr r4,r4,r7,lsl#24 + ldrb r7,[$inp,#6] + and r4,r4,r10 + +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) +# if !defined(_WIN32) + ldr r12,[r11,r12] @ OPENSSL_armcap_P +# endif +# if defined(__APPLE__) || defined(_WIN32) + ldr r12,[r12] +# endif +#endif + ldrb r8,[$inp,#7] + orr r5,r5,r6,lsl#8 + ldrb r6,[$inp,#8] + orr r5,r5,r7,lsl#16 + ldrb r7,[$inp,#9] + orr r5,r5,r8,lsl#24 + ldrb r8,[$inp,#10] + and r5,r5,r3 + +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) + tst r12,#ARMV7_NEON @ check for NEON +# ifdef __thumb2__ + adr r9,.Lpoly1305_blocks_neon + adr r11,.Lpoly1305_blocks + it ne + movne r11,r9 + adr r12,.Lpoly1305_emit + orr r11,r11,#1 @ thumb-ify addresses + orr r12,r12,#1 +# else + add r12,r11,#(.Lpoly1305_emit-.Lpoly1305_init) + ite eq + addeq r11,r11,#(.Lpoly1305_blocks-.Lpoly1305_init) + addne r11,r11,#(.Lpoly1305_blocks_neon-.Lpoly1305_init) +# endif +#endif + ldrb r9,[$inp,#11] + orr r6,r6,r7,lsl#8 + ldrb r7,[$inp,#12] + orr r6,r6,r8,lsl#16 + ldrb r8,[$inp,#13] + orr r6,r6,r9,lsl#24 + ldrb r9,[$inp,#14] + and r6,r6,r3 + + ldrb r10,[$inp,#15] + orr r7,r7,r8,lsl#8 + str r4,[$ctx,#0] + orr r7,r7,r9,lsl#16 + str r5,[$ctx,#4] + orr r7,r7,r10,lsl#24 + str r6,[$ctx,#8] + and r7,r7,r3 + str r7,[$ctx,#12] +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) + stmia r2,{r11,r12} @ fill functions table + mov r0,#1 +#else + mov r0,#0 +#endif +.Lno_key: + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + ret @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + bx lr @ interoperable with Thumb ISA:-) +#endif +.size poly1305_init,.-poly1305_init +___ +{ +my ($h0,$h1,$h2,$h3,$h4,$r0,$r1,$r2,$r3)=map("r$_",(4..12)); +my ($s1,$s2,$s3)=($r1,$r2,$r3); + +$code.=<<___; +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: +.Lpoly1305_blocks: + stmdb sp!,{r3-r11,lr} + + ands $len,$len,#-16 + beq .Lno_data + + add $len,$len,$inp @ end pointer + sub sp,sp,#32 + +#if __ARM_ARCH__<7 + ldmia $ctx,{$h0-$r3} @ load context + add $ctx,$ctx,#20 + str $len,[sp,#16] @ offload stuff + str $ctx,[sp,#12] +#else + ldr lr,[$ctx,#36] @ is_base2_26 + ldmia $ctx!,{$h0-$h4} @ load hash value + str $len,[sp,#16] @ offload stuff + str $ctx,[sp,#12] + + adds $r0,$h0,$h1,lsl#26 @ base 2^26 -> base 2^32 + mov $r1,$h1,lsr#6 + adcs $r1,$r1,$h2,lsl#20 + mov $r2,$h2,lsr#12 + adcs $r2,$r2,$h3,lsl#14 + mov $r3,$h3,lsr#18 + adcs $r3,$r3,$h4,lsl#8 + mov $len,#0 + teq lr,#0 + str $len,[$ctx,#16] @ clear is_base2_26 + adc $len,$len,$h4,lsr#24 + + itttt ne + movne $h0,$r0 @ choose between radixes + movne $h1,$r1 + movne $h2,$r2 + movne $h3,$r3 + ldmia $ctx,{$r0-$r3} @ load key + it ne + movne $h4,$len +#endif + + mov lr,$inp + cmp $padbit,#0 + str $r1,[sp,#20] + str $r2,[sp,#24] + str $r3,[sp,#28] + b .Loop + +.align 4 +.Loop: +#if __ARM_ARCH__<7 + ldrb r0,[lr],#16 @ load input +# ifdef __thumb2__ + it hi +# endif + addhi $h4,$h4,#1 @ 1<<128 + ldrb r1,[lr,#-15] + ldrb r2,[lr,#-14] + ldrb r3,[lr,#-13] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-12] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-11] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-10] + adds $h0,$h0,r3 @ accumulate input + + ldrb r3,[lr,#-9] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-8] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-7] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-6] + adcs $h1,$h1,r3 + + ldrb r3,[lr,#-5] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-4] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-3] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-2] + adcs $h2,$h2,r3 + + ldrb r3,[lr,#-1] + orr r1,r0,r1,lsl#8 + str lr,[sp,#8] @ offload input pointer + orr r2,r1,r2,lsl#16 + add $s1,$r1,$r1,lsr#2 + orr r3,r2,r3,lsl#24 +#else + ldr r0,[lr],#16 @ load input + it hi + addhi $h4,$h4,#1 @ padbit + ldr r1,[lr,#-12] + ldr r2,[lr,#-8] + ldr r3,[lr,#-4] +# ifdef __ARMEB__ + rev r0,r0 + rev r1,r1 + rev r2,r2 + rev r3,r3 +# endif + adds $h0,$h0,r0 @ accumulate input + str lr,[sp,#8] @ offload input pointer + adcs $h1,$h1,r1 + add $s1,$r1,$r1,lsr#2 + adcs $h2,$h2,r2 +#endif + add $s2,$r2,$r2,lsr#2 + adcs $h3,$h3,r3 + add $s3,$r3,$r3,lsr#2 + + umull r2,r3,$h1,$r0 + adc $h4,$h4,#0 + umull r0,r1,$h0,$r0 + umlal r2,r3,$h4,$s1 + umlal r0,r1,$h3,$s1 + ldr $r1,[sp,#20] @ reload $r1 + umlal r2,r3,$h2,$s3 + umlal r0,r1,$h1,$s3 + umlal r2,r3,$h3,$s2 + umlal r0,r1,$h2,$s2 + umlal r2,r3,$h0,$r1 + str r0,[sp,#0] @ future $h0 + mul r0,$s2,$h4 + ldr $r2,[sp,#24] @ reload $r2 + adds r2,r2,r1 @ d1+=d0>>32 + eor r1,r1,r1 + adc lr,r3,#0 @ future $h2 + str r2,[sp,#4] @ future $h1 + + mul r2,$s3,$h4 + eor r3,r3,r3 + umlal r0,r1,$h3,$s3 + ldr $r3,[sp,#28] @ reload $r3 + umlal r2,r3,$h3,$r0 + umlal r0,r1,$h2,$r0 + umlal r2,r3,$h2,$r1 + umlal r0,r1,$h1,$r1 + umlal r2,r3,$h1,$r2 + umlal r0,r1,$h0,$r2 + umlal r2,r3,$h0,$r3 + ldr $h0,[sp,#0] + mul $h4,$r0,$h4 + ldr $h1,[sp,#4] + + adds $h2,lr,r0 @ d2+=d1>>32 + ldr lr,[sp,#8] @ reload input pointer + adc r1,r1,#0 + adds $h3,r2,r1 @ d3+=d2>>32 + ldr r0,[sp,#16] @ reload end pointer + adc r3,r3,#0 + add $h4,$h4,r3 @ h4+=d3>>32 + + and r1,$h4,#-4 + and $h4,$h4,#3 + add r1,r1,r1,lsr#2 @ *=5 + adds $h0,$h0,r1 + adcs $h1,$h1,#0 + adcs $h2,$h2,#0 + adcs $h3,$h3,#0 + adc $h4,$h4,#0 + + cmp r0,lr @ done yet? + bhi .Loop + + ldr $ctx,[sp,#12] + add sp,sp,#32 + stmdb $ctx,{$h0-$h4} @ store the result + +.Lno_data: +#if __ARM_ARCH__>=5 + ldmia sp!,{r3-r11,pc} +#else + ldmia sp!,{r3-r11,lr} + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + bx lr @ interoperable with Thumb ISA:-) +#endif +.size poly1305_blocks,.-poly1305_blocks +___ +} +{ +my ($ctx,$mac,$nonce)=map("r$_",(0..2)); +my ($h0,$h1,$h2,$h3,$h4,$g0,$g1,$g2,$g3)=map("r$_",(3..11)); +my $g4=$ctx; + +$code.=<<___; +.type poly1305_emit,%function +.align 5 +poly1305_emit: +.Lpoly1305_emit: + stmdb sp!,{r4-r11} + + ldmia $ctx,{$h0-$h4} + +#if __ARM_ARCH__>=7 + ldr ip,[$ctx,#36] @ is_base2_26 + + adds $g0,$h0,$h1,lsl#26 @ base 2^26 -> base 2^32 + mov $g1,$h1,lsr#6 + adcs $g1,$g1,$h2,lsl#20 + mov $g2,$h2,lsr#12 + adcs $g2,$g2,$h3,lsl#14 + mov $g3,$h3,lsr#18 + adcs $g3,$g3,$h4,lsl#8 + mov $g4,#0 + adc $g4,$g4,$h4,lsr#24 + + tst ip,ip + itttt ne + movne $h0,$g0 + movne $h1,$g1 + movne $h2,$g2 + movne $h3,$g3 + it ne + movne $h4,$g4 +#endif + + adds $g0,$h0,#5 @ compare to modulus + adcs $g1,$h1,#0 + adcs $g2,$h2,#0 + adcs $g3,$h3,#0 + adc $g4,$h4,#0 + tst $g4,#4 @ did it carry/borrow? + +#ifdef __thumb2__ + it ne +#endif + movne $h0,$g0 + ldr $g0,[$nonce,#0] +#ifdef __thumb2__ + it ne +#endif + movne $h1,$g1 + ldr $g1,[$nonce,#4] +#ifdef __thumb2__ + it ne +#endif + movne $h2,$g2 + ldr $g2,[$nonce,#8] +#ifdef __thumb2__ + it ne +#endif + movne $h3,$g3 + ldr $g3,[$nonce,#12] + + adds $h0,$h0,$g0 + adcs $h1,$h1,$g1 + adcs $h2,$h2,$g2 + adc $h3,$h3,$g3 + +#if __ARM_ARCH__>=7 +# ifdef __ARMEB__ + rev $h0,$h0 + rev $h1,$h1 + rev $h2,$h2 + rev $h3,$h3 +# endif + str $h0,[$mac,#0] + str $h1,[$mac,#4] + str $h2,[$mac,#8] + str $h3,[$mac,#12] +#else + strb $h0,[$mac,#0] + mov $h0,$h0,lsr#8 + strb $h1,[$mac,#4] + mov $h1,$h1,lsr#8 + strb $h2,[$mac,#8] + mov $h2,$h2,lsr#8 + strb $h3,[$mac,#12] + mov $h3,$h3,lsr#8 + + strb $h0,[$mac,#1] + mov $h0,$h0,lsr#8 + strb $h1,[$mac,#5] + mov $h1,$h1,lsr#8 + strb $h2,[$mac,#9] + mov $h2,$h2,lsr#8 + strb $h3,[$mac,#13] + mov $h3,$h3,lsr#8 + + strb $h0,[$mac,#2] + mov $h0,$h0,lsr#8 + strb $h1,[$mac,#6] + mov $h1,$h1,lsr#8 + strb $h2,[$mac,#10] + mov $h2,$h2,lsr#8 + strb $h3,[$mac,#14] + mov $h3,$h3,lsr#8 + + strb $h0,[$mac,#3] + strb $h1,[$mac,#7] + strb $h2,[$mac,#11] + strb $h3,[$mac,#15] +#endif + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + ret @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + bx lr @ interoperable with Thumb ISA:-) +#endif +.size poly1305_emit,.-poly1305_emit +___ +{ +my ($R0,$R1,$S1,$R2,$S2,$R3,$S3,$R4,$S4) = map("d$_",(0..9)); +my ($D0,$D1,$D2,$D3,$D4, $H0,$H1,$H2,$H3,$H4) = map("q$_",(5..14)); +my ($T0,$T1,$MASK) = map("q$_",(15,4,0)); + +my ($in2,$zeros,$tbl0,$tbl1) = map("r$_",(4..7)); + +$code.=<<___; +#if __ARM_MAX_ARCH__>=7 +.fpu neon + +.type poly1305_init_neon,%function +.align 5 +poly1305_init_neon: +.Lpoly1305_init_neon: + ldr r3,[$ctx,#48] @ first table element + cmp r3,#-1 @ is value impossible? + bne .Lno_init_neon + + ldr r4,[$ctx,#20] @ load key base 2^32 + ldr r5,[$ctx,#24] + ldr r6,[$ctx,#28] + ldr r7,[$ctx,#32] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + and r3,r3,#0x03ffffff + and r4,r4,#0x03ffffff + and r5,r5,#0x03ffffff + + vdup.32 $R0,r2 @ r^1 in both lanes + add r2,r3,r3,lsl#2 @ *5 + vdup.32 $R1,r3 + add r3,r4,r4,lsl#2 + vdup.32 $S1,r2 + vdup.32 $R2,r4 + add r4,r5,r5,lsl#2 + vdup.32 $S2,r3 + vdup.32 $R3,r5 + add r5,r6,r6,lsl#2 + vdup.32 $S3,r4 + vdup.32 $R4,r6 + vdup.32 $S4,r5 + + mov $zeros,#2 @ counter + +.Lsquare_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + + vmull.u32 $D0,$R0,${R0}[1] + vmull.u32 $D1,$R1,${R0}[1] + vmull.u32 $D2,$R2,${R0}[1] + vmull.u32 $D3,$R3,${R0}[1] + vmull.u32 $D4,$R4,${R0}[1] + + vmlal.u32 $D0,$R4,${S1}[1] + vmlal.u32 $D1,$R0,${R1}[1] + vmlal.u32 $D2,$R1,${R1}[1] + vmlal.u32 $D3,$R2,${R1}[1] + vmlal.u32 $D4,$R3,${R1}[1] + + vmlal.u32 $D0,$R3,${S2}[1] + vmlal.u32 $D1,$R4,${S2}[1] + vmlal.u32 $D3,$R1,${R2}[1] + vmlal.u32 $D2,$R0,${R2}[1] + vmlal.u32 $D4,$R2,${R2}[1] + + vmlal.u32 $D0,$R2,${S3}[1] + vmlal.u32 $D3,$R0,${R3}[1] + vmlal.u32 $D1,$R3,${S3}[1] + vmlal.u32 $D2,$R4,${S3}[1] + vmlal.u32 $D4,$R1,${R3}[1] + + vmlal.u32 $D3,$R4,${S4}[1] + vmlal.u32 $D0,$R1,${S4}[1] + vmlal.u32 $D1,$R2,${S4}[1] + vmlal.u32 $D2,$R3,${S4}[1] + vmlal.u32 $D4,$R0,${R4}[1] + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + @ and P. Schwabe + @ + @ H0>>+H1>>+H2>>+H3>>+H4 + @ H3>>+H4>>*5+H0>>+H1 + @ + @ Trivia. + @ + @ Result of multiplication of n-bit number by m-bit number is + @ n+m bits wide. However! Even though 2^n is a n+1-bit number, + @ m-bit number multiplied by 2^n is still n+m bits wide. + @ + @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2, + @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit + @ one is n+1 bits wide. + @ + @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that + @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4 + @ can be 27. However! In cases when their width exceeds 26 bits + @ they are limited by 2^26+2^6. This in turn means that *sum* + @ of the products with these values can still be viewed as sum + @ of 52-bit numbers as long as the amount of addends is not a + @ power of 2. For example, + @ + @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4, + @ + @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or + @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than + @ 8 * (2^52) or 2^55. However, the value is then multiplied by + @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12), + @ which is less than 32 * (2^52) or 2^57. And when processing + @ data we are looking at triple as many addends... + @ + @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and + @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the + @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while + @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32 + @ instruction accepts 2x32-bit input and writes 2x64-bit result. + @ This means that result of reduction have to be compressed upon + @ loop wrap-around. This can be done in the process of reduction + @ to minimize amount of instructions [as well as amount of + @ 128-bit instructions, which benefits low-end processors], but + @ one has to watch for H2 (which is narrower than H0) and 5*H4 + @ not being wider than 58 bits, so that result of right shift + @ by 26 bits fits in 32 bits. This is also useful on x86, + @ because it allows to use paddd in place for paddq, which + @ benefits Atom, where paddq is ridiculously slow. + + vshr.u64 $T0,$D3,#26 + vmovn.i64 $D3#lo,$D3 + vshr.u64 $T1,$D0,#26 + vmovn.i64 $D0#lo,$D0 + vadd.i64 $D4,$D4,$T0 @ h3 -> h4 + vbic.i32 $D3#lo,#0xfc000000 @ &=0x03ffffff + vadd.i64 $D1,$D1,$T1 @ h0 -> h1 + vbic.i32 $D0#lo,#0xfc000000 + + vshrn.u64 $T0#lo,$D4,#26 + vmovn.i64 $D4#lo,$D4 + vshr.u64 $T1,$D1,#26 + vmovn.i64 $D1#lo,$D1 + vadd.i64 $D2,$D2,$T1 @ h1 -> h2 + vbic.i32 $D4#lo,#0xfc000000 + vbic.i32 $D1#lo,#0xfc000000 + + vadd.i32 $D0#lo,$D0#lo,$T0#lo + vshl.u32 $T0#lo,$T0#lo,#2 + vshrn.u64 $T1#lo,$D2,#26 + vmovn.i64 $D2#lo,$D2 + vadd.i32 $D0#lo,$D0#lo,$T0#lo @ h4 -> h0 + vadd.i32 $D3#lo,$D3#lo,$T1#lo @ h2 -> h3 + vbic.i32 $D2#lo,#0xfc000000 + + vshr.u32 $T0#lo,$D0#lo,#26 + vbic.i32 $D0#lo,#0xfc000000 + vshr.u32 $T1#lo,$D3#lo,#26 + vbic.i32 $D3#lo,#0xfc000000 + vadd.i32 $D1#lo,$D1#lo,$T0#lo @ h0 -> h1 + vadd.i32 $D4#lo,$D4#lo,$T1#lo @ h3 -> h4 + + subs $zeros,$zeros,#1 + beq .Lsquare_break_neon + + add $tbl0,$ctx,#(48+0*9*4) + add $tbl1,$ctx,#(48+1*9*4) + + vtrn.32 $R0,$D0#lo @ r^2:r^1 + vtrn.32 $R2,$D2#lo + vtrn.32 $R3,$D3#lo + vtrn.32 $R1,$D1#lo + vtrn.32 $R4,$D4#lo + + vshl.u32 $S2,$R2,#2 @ *5 + vshl.u32 $S3,$R3,#2 + vshl.u32 $S1,$R1,#2 + vshl.u32 $S4,$R4,#2 + vadd.i32 $S2,$S2,$R2 + vadd.i32 $S1,$S1,$R1 + vadd.i32 $S3,$S3,$R3 + vadd.i32 $S4,$S4,$R4 + + vst4.32 {${R0}[0],${R1}[0],${S1}[0],${R2}[0]},[$tbl0]! + vst4.32 {${R0}[1],${R1}[1],${S1}[1],${R2}[1]},[$tbl1]! + vst4.32 {${S2}[0],${R3}[0],${S3}[0],${R4}[0]},[$tbl0]! + vst4.32 {${S2}[1],${R3}[1],${S3}[1],${R4}[1]},[$tbl1]! + vst1.32 {${S4}[0]},[$tbl0,:32] + vst1.32 {${S4}[1]},[$tbl1,:32] + + b .Lsquare_neon + +.align 4 +.Lsquare_break_neon: + add $tbl0,$ctx,#(48+2*4*9) + add $tbl1,$ctx,#(48+3*4*9) + + vmov $R0,$D0#lo @ r^4:r^3 + vshl.u32 $S1,$D1#lo,#2 @ *5 + vmov $R1,$D1#lo + vshl.u32 $S2,$D2#lo,#2 + vmov $R2,$D2#lo + vshl.u32 $S3,$D3#lo,#2 + vmov $R3,$D3#lo + vshl.u32 $S4,$D4#lo,#2 + vmov $R4,$D4#lo + vadd.i32 $S1,$S1,$D1#lo + vadd.i32 $S2,$S2,$D2#lo + vadd.i32 $S3,$S3,$D3#lo + vadd.i32 $S4,$S4,$D4#lo + + vst4.32 {${R0}[0],${R1}[0],${S1}[0],${R2}[0]},[$tbl0]! + vst4.32 {${R0}[1],${R1}[1],${S1}[1],${R2}[1]},[$tbl1]! + vst4.32 {${S2}[0],${R3}[0],${S3}[0],${R4}[0]},[$tbl0]! + vst4.32 {${S2}[1],${R3}[1],${S3}[1],${R4}[1]},[$tbl1]! + vst1.32 {${S4}[0]},[$tbl0] + vst1.32 {${S4}[1]},[$tbl1] + +.Lno_init_neon: + ret @ bx lr +.size poly1305_init_neon,.-poly1305_init_neon + +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: +.Lpoly1305_blocks_neon: + ldr ip,[$ctx,#36] @ is_base2_26 + + cmp $len,#64 + blo .Lpoly1305_blocks + + stmdb sp!,{r4-r7} + vstmdb sp!,{d8-d15} @ ABI specification says so + + tst ip,ip @ is_base2_26? + bne .Lbase2_26_neon + + stmdb sp!,{r1-r3,lr} + bl .Lpoly1305_init_neon + + ldr r4,[$ctx,#0] @ load hash value base 2^32 + ldr r5,[$ctx,#4] + ldr r6,[$ctx,#8] + ldr r7,[$ctx,#12] + ldr ip,[$ctx,#16] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + veor $D0#lo,$D0#lo,$D0#lo + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + veor $D1#lo,$D1#lo,$D1#lo + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + veor $D2#lo,$D2#lo,$D2#lo + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + veor $D3#lo,$D3#lo,$D3#lo + and r3,r3,#0x03ffffff + orr r6,r6,ip,lsl#24 + veor $D4#lo,$D4#lo,$D4#lo + and r4,r4,#0x03ffffff + mov r1,#1 + and r5,r5,#0x03ffffff + str r1,[$ctx,#36] @ set is_base2_26 + + vmov.32 $D0#lo[0],r2 + vmov.32 $D1#lo[0],r3 + vmov.32 $D2#lo[0],r4 + vmov.32 $D3#lo[0],r5 + vmov.32 $D4#lo[0],r6 + adr $zeros,.Lzeros + + ldmia sp!,{r1-r3,lr} + b .Lhash_loaded + +.align 4 +.Lbase2_26_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ load hash value + + veor $D0#lo,$D0#lo,$D0#lo + veor $D1#lo,$D1#lo,$D1#lo + veor $D2#lo,$D2#lo,$D2#lo + veor $D3#lo,$D3#lo,$D3#lo + veor $D4#lo,$D4#lo,$D4#lo + vld4.32 {$D0#lo[0],$D1#lo[0],$D2#lo[0],$D3#lo[0]},[$ctx]! + adr $zeros,.Lzeros + vld1.32 {$D4#lo[0]},[$ctx] + sub $ctx,$ctx,#16 @ rewind + +.Lhash_loaded: + add $in2,$inp,#32 + mov $padbit,$padbit,lsl#24 + tst $len,#31 + beq .Leven + + vld4.32 {$H0#lo[0],$H1#lo[0],$H2#lo[0],$H3#lo[0]},[$inp]! + vmov.32 $H4#lo[0],$padbit + sub $len,$len,#16 + add $in2,$inp,#32 + +# ifdef __ARMEB__ + vrev32.8 $H0,$H0 + vrev32.8 $H3,$H3 + vrev32.8 $H1,$H1 + vrev32.8 $H2,$H2 +# endif + vsri.u32 $H4#lo,$H3#lo,#8 @ base 2^32 -> base 2^26 + vshl.u32 $H3#lo,$H3#lo,#18 + + vsri.u32 $H3#lo,$H2#lo,#14 + vshl.u32 $H2#lo,$H2#lo,#12 + vadd.i32 $H4#hi,$H4#lo,$D4#lo @ add hash value and move to #hi + + vbic.i32 $H3#lo,#0xfc000000 + vsri.u32 $H2#lo,$H1#lo,#20 + vshl.u32 $H1#lo,$H1#lo,#6 + + vbic.i32 $H2#lo,#0xfc000000 + vsri.u32 $H1#lo,$H0#lo,#26 + vadd.i32 $H3#hi,$H3#lo,$D3#lo + + vbic.i32 $H0#lo,#0xfc000000 + vbic.i32 $H1#lo,#0xfc000000 + vadd.i32 $H2#hi,$H2#lo,$D2#lo + + vadd.i32 $H0#hi,$H0#lo,$D0#lo + vadd.i32 $H1#hi,$H1#lo,$D1#lo + + mov $tbl1,$zeros + add $tbl0,$ctx,#48 + + cmp $len,$len + b .Long_tail + +.align 4 +.Leven: + subs $len,$len,#64 + it lo + movlo $in2,$zeros + + vmov.i32 $H4,#1<<24 @ padbit, yes, always + vld4.32 {$H0#lo,$H1#lo,$H2#lo,$H3#lo},[$inp] @ inp[0:1] + add $inp,$inp,#64 + vld4.32 {$H0#hi,$H1#hi,$H2#hi,$H3#hi},[$in2] @ inp[2:3] (or 0) + add $in2,$in2,#64 + itt hi + addhi $tbl1,$ctx,#(48+1*9*4) + addhi $tbl0,$ctx,#(48+3*9*4) + +# ifdef __ARMEB__ + vrev32.8 $H0,$H0 + vrev32.8 $H3,$H3 + vrev32.8 $H1,$H1 + vrev32.8 $H2,$H2 +# endif + vsri.u32 $H4,$H3,#8 @ base 2^32 -> base 2^26 + vshl.u32 $H3,$H3,#18 + + vsri.u32 $H3,$H2,#14 + vshl.u32 $H2,$H2,#12 + + vbic.i32 $H3,#0xfc000000 + vsri.u32 $H2,$H1,#20 + vshl.u32 $H1,$H1,#6 + + vbic.i32 $H2,#0xfc000000 + vsri.u32 $H1,$H0,#26 + + vbic.i32 $H0,#0xfc000000 + vbic.i32 $H1,#0xfc000000 + + bls .Lskip_loop + + vld4.32 {${R0}[1],${R1}[1],${S1}[1],${R2}[1]},[$tbl1]! @ load r^2 + vld4.32 {${R0}[0],${R1}[0],${S1}[0],${R2}[0]},[$tbl0]! @ load r^4 + vld4.32 {${S2}[1],${R3}[1],${S3}[1],${R4}[1]},[$tbl1]! + vld4.32 {${S2}[0],${R3}[0],${S3}[0],${R4}[0]},[$tbl0]! + b .Loop_neon + +.align 5 +.Loop_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + @ \___________________/ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + @ \___________________/ \____________________/ + @ + @ Note that we start with inp[2:3]*r^2. This is because it + @ doesn't depend on reduction in previous iteration. + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ inp[2:3]*r^2 + + vadd.i32 $H2#lo,$H2#lo,$D2#lo @ accumulate inp[0:1] + vmull.u32 $D2,$H2#hi,${R0}[1] + vadd.i32 $H0#lo,$H0#lo,$D0#lo + vmull.u32 $D0,$H0#hi,${R0}[1] + vadd.i32 $H3#lo,$H3#lo,$D3#lo + vmull.u32 $D3,$H3#hi,${R0}[1] + vmlal.u32 $D2,$H1#hi,${R1}[1] + vadd.i32 $H1#lo,$H1#lo,$D1#lo + vmull.u32 $D1,$H1#hi,${R0}[1] + + vadd.i32 $H4#lo,$H4#lo,$D4#lo + vmull.u32 $D4,$H4#hi,${R0}[1] + subs $len,$len,#64 + vmlal.u32 $D0,$H4#hi,${S1}[1] + it lo + movlo $in2,$zeros + vmlal.u32 $D3,$H2#hi,${R1}[1] + vld1.32 ${S4}[1],[$tbl1,:32] + vmlal.u32 $D1,$H0#hi,${R1}[1] + vmlal.u32 $D4,$H3#hi,${R1}[1] + + vmlal.u32 $D0,$H3#hi,${S2}[1] + vmlal.u32 $D3,$H1#hi,${R2}[1] + vmlal.u32 $D4,$H2#hi,${R2}[1] + vmlal.u32 $D1,$H4#hi,${S2}[1] + vmlal.u32 $D2,$H0#hi,${R2}[1] + + vmlal.u32 $D3,$H0#hi,${R3}[1] + vmlal.u32 $D0,$H2#hi,${S3}[1] + vmlal.u32 $D4,$H1#hi,${R3}[1] + vmlal.u32 $D1,$H3#hi,${S3}[1] + vmlal.u32 $D2,$H4#hi,${S3}[1] + + vmlal.u32 $D3,$H4#hi,${S4}[1] + vmlal.u32 $D0,$H1#hi,${S4}[1] + vmlal.u32 $D4,$H0#hi,${R4}[1] + vmlal.u32 $D1,$H2#hi,${S4}[1] + vmlal.u32 $D2,$H3#hi,${S4}[1] + + vld4.32 {$H0#hi,$H1#hi,$H2#hi,$H3#hi},[$in2] @ inp[2:3] (or 0) + add $in2,$in2,#64 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4 and accumulate + + vmlal.u32 $D3,$H3#lo,${R0}[0] + vmlal.u32 $D0,$H0#lo,${R0}[0] + vmlal.u32 $D4,$H4#lo,${R0}[0] + vmlal.u32 $D1,$H1#lo,${R0}[0] + vmlal.u32 $D2,$H2#lo,${R0}[0] + vld1.32 ${S4}[0],[$tbl0,:32] + + vmlal.u32 $D3,$H2#lo,${R1}[0] + vmlal.u32 $D0,$H4#lo,${S1}[0] + vmlal.u32 $D4,$H3#lo,${R1}[0] + vmlal.u32 $D1,$H0#lo,${R1}[0] + vmlal.u32 $D2,$H1#lo,${R1}[0] + + vmlal.u32 $D3,$H1#lo,${R2}[0] + vmlal.u32 $D0,$H3#lo,${S2}[0] + vmlal.u32 $D4,$H2#lo,${R2}[0] + vmlal.u32 $D1,$H4#lo,${S2}[0] + vmlal.u32 $D2,$H0#lo,${R2}[0] + + vmlal.u32 $D3,$H0#lo,${R3}[0] + vmlal.u32 $D0,$H2#lo,${S3}[0] + vmlal.u32 $D4,$H1#lo,${R3}[0] + vmlal.u32 $D1,$H3#lo,${S3}[0] + vmlal.u32 $D3,$H4#lo,${S4}[0] + + vmlal.u32 $D2,$H4#lo,${S3}[0] + vmlal.u32 $D0,$H1#lo,${S4}[0] + vmlal.u32 $D4,$H0#lo,${R4}[0] + vmov.i32 $H4,#1<<24 @ padbit, yes, always + vmlal.u32 $D1,$H2#lo,${S4}[0] + vmlal.u32 $D2,$H3#lo,${S4}[0] + + vld4.32 {$H0#lo,$H1#lo,$H2#lo,$H3#lo},[$inp] @ inp[0:1] + add $inp,$inp,#64 +# ifdef __ARMEB__ + vrev32.8 $H0,$H0 + vrev32.8 $H1,$H1 + vrev32.8 $H2,$H2 + vrev32.8 $H3,$H3 +# endif + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction interleaved with base 2^32 -> base 2^26 of + @ inp[0:3] previously loaded to $H0-$H3 and smashed to $H0-$H4. + + vshr.u64 $T0,$D3,#26 + vmovn.i64 $D3#lo,$D3 + vshr.u64 $T1,$D0,#26 + vmovn.i64 $D0#lo,$D0 + vadd.i64 $D4,$D4,$T0 @ h3 -> h4 + vbic.i32 $D3#lo,#0xfc000000 + vsri.u32 $H4,$H3,#8 @ base 2^32 -> base 2^26 + vadd.i64 $D1,$D1,$T1 @ h0 -> h1 + vshl.u32 $H3,$H3,#18 + vbic.i32 $D0#lo,#0xfc000000 + + vshrn.u64 $T0#lo,$D4,#26 + vmovn.i64 $D4#lo,$D4 + vshr.u64 $T1,$D1,#26 + vmovn.i64 $D1#lo,$D1 + vadd.i64 $D2,$D2,$T1 @ h1 -> h2 + vsri.u32 $H3,$H2,#14 + vbic.i32 $D4#lo,#0xfc000000 + vshl.u32 $H2,$H2,#12 + vbic.i32 $D1#lo,#0xfc000000 + + vadd.i32 $D0#lo,$D0#lo,$T0#lo + vshl.u32 $T0#lo,$T0#lo,#2 + vbic.i32 $H3,#0xfc000000 + vshrn.u64 $T1#lo,$D2,#26 + vmovn.i64 $D2#lo,$D2 + vaddl.u32 $D0,$D0#lo,$T0#lo @ h4 -> h0 [widen for a sec] + vsri.u32 $H2,$H1,#20 + vadd.i32 $D3#lo,$D3#lo,$T1#lo @ h2 -> h3 + vshl.u32 $H1,$H1,#6 + vbic.i32 $D2#lo,#0xfc000000 + vbic.i32 $H2,#0xfc000000 + + vshrn.u64 $T0#lo,$D0,#26 @ re-narrow + vmovn.i64 $D0#lo,$D0 + vsri.u32 $H1,$H0,#26 + vbic.i32 $H0,#0xfc000000 + vshr.u32 $T1#lo,$D3#lo,#26 + vbic.i32 $D3#lo,#0xfc000000 + vbic.i32 $D0#lo,#0xfc000000 + vadd.i32 $D1#lo,$D1#lo,$T0#lo @ h0 -> h1 + vadd.i32 $D4#lo,$D4#lo,$T1#lo @ h3 -> h4 + vbic.i32 $H1,#0xfc000000 + + bhi .Loop_neon + +.Lskip_loop: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + add $tbl1,$ctx,#(48+0*9*4) + add $tbl0,$ctx,#(48+1*9*4) + adds $len,$len,#32 + it ne + movne $len,#0 + bne .Long_tail + + vadd.i32 $H2#hi,$H2#lo,$D2#lo @ add hash value and move to #hi + vadd.i32 $H0#hi,$H0#lo,$D0#lo + vadd.i32 $H3#hi,$H3#lo,$D3#lo + vadd.i32 $H1#hi,$H1#lo,$D1#lo + vadd.i32 $H4#hi,$H4#lo,$D4#lo + +.Long_tail: + vld4.32 {${R0}[1],${R1}[1],${S1}[1],${R2}[1]},[$tbl1]! @ load r^1 + vld4.32 {${R0}[0],${R1}[0],${S1}[0],${R2}[0]},[$tbl0]! @ load r^2 + + vadd.i32 $H2#lo,$H2#lo,$D2#lo @ can be redundant + vmull.u32 $D2,$H2#hi,$R0 + vadd.i32 $H0#lo,$H0#lo,$D0#lo + vmull.u32 $D0,$H0#hi,$R0 + vadd.i32 $H3#lo,$H3#lo,$D3#lo + vmull.u32 $D3,$H3#hi,$R0 + vadd.i32 $H1#lo,$H1#lo,$D1#lo + vmull.u32 $D1,$H1#hi,$R0 + vadd.i32 $H4#lo,$H4#lo,$D4#lo + vmull.u32 $D4,$H4#hi,$R0 + + vmlal.u32 $D0,$H4#hi,$S1 + vld4.32 {${S2}[1],${R3}[1],${S3}[1],${R4}[1]},[$tbl1]! + vmlal.u32 $D3,$H2#hi,$R1 + vld4.32 {${S2}[0],${R3}[0],${S3}[0],${R4}[0]},[$tbl0]! + vmlal.u32 $D1,$H0#hi,$R1 + vmlal.u32 $D4,$H3#hi,$R1 + vmlal.u32 $D2,$H1#hi,$R1 + + vmlal.u32 $D3,$H1#hi,$R2 + vld1.32 ${S4}[1],[$tbl1,:32] + vmlal.u32 $D0,$H3#hi,$S2 + vld1.32 ${S4}[0],[$tbl0,:32] + vmlal.u32 $D4,$H2#hi,$R2 + vmlal.u32 $D1,$H4#hi,$S2 + vmlal.u32 $D2,$H0#hi,$R2 + + vmlal.u32 $D3,$H0#hi,$R3 + it ne + addne $tbl1,$ctx,#(48+2*9*4) + vmlal.u32 $D0,$H2#hi,$S3 + it ne + addne $tbl0,$ctx,#(48+3*9*4) + vmlal.u32 $D4,$H1#hi,$R3 + vmlal.u32 $D1,$H3#hi,$S3 + vmlal.u32 $D2,$H4#hi,$S3 + + vmlal.u32 $D3,$H4#hi,$S4 + vorn $MASK,$MASK,$MASK @ all-ones, can be redundant + vmlal.u32 $D0,$H1#hi,$S4 + vshr.u64 $MASK,$MASK,#38 + vmlal.u32 $D4,$H0#hi,$R4 + vmlal.u32 $D1,$H2#hi,$S4 + vmlal.u32 $D2,$H3#hi,$S4 + + beq .Lshort_tail + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4:r^3 and accumulate + + vld4.32 {${R0}[1],${R1}[1],${S1}[1],${R2}[1]},[$tbl1]! @ load r^3 + vld4.32 {${R0}[0],${R1}[0],${S1}[0],${R2}[0]},[$tbl0]! @ load r^4 + + vmlal.u32 $D2,$H2#lo,$R0 + vmlal.u32 $D0,$H0#lo,$R0 + vmlal.u32 $D3,$H3#lo,$R0 + vmlal.u32 $D1,$H1#lo,$R0 + vmlal.u32 $D4,$H4#lo,$R0 + + vmlal.u32 $D0,$H4#lo,$S1 + vld4.32 {${S2}[1],${R3}[1],${S3}[1],${R4}[1]},[$tbl1]! + vmlal.u32 $D3,$H2#lo,$R1 + vld4.32 {${S2}[0],${R3}[0],${S3}[0],${R4}[0]},[$tbl0]! + vmlal.u32 $D1,$H0#lo,$R1 + vmlal.u32 $D4,$H3#lo,$R1 + vmlal.u32 $D2,$H1#lo,$R1 + + vmlal.u32 $D3,$H1#lo,$R2 + vld1.32 ${S4}[1],[$tbl1,:32] + vmlal.u32 $D0,$H3#lo,$S2 + vld1.32 ${S4}[0],[$tbl0,:32] + vmlal.u32 $D4,$H2#lo,$R2 + vmlal.u32 $D1,$H4#lo,$S2 + vmlal.u32 $D2,$H0#lo,$R2 + + vmlal.u32 $D3,$H0#lo,$R3 + vmlal.u32 $D0,$H2#lo,$S3 + vmlal.u32 $D4,$H1#lo,$R3 + vmlal.u32 $D1,$H3#lo,$S3 + vmlal.u32 $D2,$H4#lo,$S3 + + vmlal.u32 $D3,$H4#lo,$S4 + vorn $MASK,$MASK,$MASK @ all-ones + vmlal.u32 $D0,$H1#lo,$S4 + vshr.u64 $MASK,$MASK,#38 + vmlal.u32 $D4,$H0#lo,$R4 + vmlal.u32 $D1,$H2#lo,$S4 + vmlal.u32 $D2,$H3#lo,$S4 + +.Lshort_tail: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ horizontal addition + + vadd.i64 $D3#lo,$D3#lo,$D3#hi + vadd.i64 $D0#lo,$D0#lo,$D0#hi + vadd.i64 $D4#lo,$D4#lo,$D4#hi + vadd.i64 $D1#lo,$D1#lo,$D1#hi + vadd.i64 $D2#lo,$D2#lo,$D2#hi + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction, but without narrowing + + vshr.u64 $T0,$D3,#26 + vand.i64 $D3,$D3,$MASK + vshr.u64 $T1,$D0,#26 + vand.i64 $D0,$D0,$MASK + vadd.i64 $D4,$D4,$T0 @ h3 -> h4 + vadd.i64 $D1,$D1,$T1 @ h0 -> h1 + + vshr.u64 $T0,$D4,#26 + vand.i64 $D4,$D4,$MASK + vshr.u64 $T1,$D1,#26 + vand.i64 $D1,$D1,$MASK + vadd.i64 $D2,$D2,$T1 @ h1 -> h2 + + vadd.i64 $D0,$D0,$T0 + vshl.u64 $T0,$T0,#2 + vshr.u64 $T1,$D2,#26 + vand.i64 $D2,$D2,$MASK + vadd.i64 $D0,$D0,$T0 @ h4 -> h0 + vadd.i64 $D3,$D3,$T1 @ h2 -> h3 + + vshr.u64 $T0,$D0,#26 + vand.i64 $D0,$D0,$MASK + vshr.u64 $T1,$D3,#26 + vand.i64 $D3,$D3,$MASK + vadd.i64 $D1,$D1,$T0 @ h0 -> h1 + vadd.i64 $D4,$D4,$T1 @ h3 -> h4 + + cmp $len,#0 + bne .Leven + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ store hash value + + vst4.32 {$D0#lo[0],$D1#lo[0],$D2#lo[0],$D3#lo[0]},[$ctx]! + vst1.32 {$D4#lo[0]},[$ctx] + + vldmia sp!,{d8-d15} @ epilogue + ldmia sp!,{r4-r7} + ret @ bx lr +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +#ifndef __KERNEL__ +.LOPENSSL_armcap: +# ifdef _WIN32 +.word OPENSSL_armcap_P +# else +.word OPENSSL_armcap_P-.Lpoly1305_init +# endif +.comm OPENSSL_armcap_P,4,4 +.hidden OPENSSL_armcap_P +#endif +#endif +___ +} } +$code.=<<___; +.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by \@dot-asm" +.align 2 +___ + +foreach (split("\n",$code)) { + s/\`([^\`]*)\`/eval $1/geo; + + s/\bq([0-9]+)#(lo|hi)/sprintf "d%d",2*$1+($2 eq "hi")/geo or + s/\bret\b/bx lr/go or + s/\bbx\s+lr\b/.word\t0xe12fff1e/go; # make it possible to compile with -march=armv4 + + print $_,"\n"; +} +close STDOUT; # enforce flush diff --git a/arch/arm/crypto/poly1305-core.S_shipped b/arch/arm/crypto/poly1305-core.S_shipped new file mode 100644 index 000000000000..37b71d990293 --- /dev/null +++ b/arch/arm/crypto/poly1305-core.S_shipped @@ -0,0 +1,1158 @@ +#ifndef __KERNEL__ +# include "arm_arch.h" +#else +# define __ARM_ARCH__ __LINUX_ARM_ARCH__ +# define __ARM_MAX_ARCH__ __LINUX_ARM_ARCH__ +# define poly1305_init poly1305_init_arm +# define poly1305_blocks poly1305_blocks_arm +# define poly1305_emit poly1305_emit_arm +.globl poly1305_blocks_neon +#endif + +#if defined(__thumb2__) +.syntax unified +.thumb +#else +.code 32 +#endif + +.text + +.globl poly1305_emit +.globl poly1305_blocks +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: +.Lpoly1305_init: + stmdb sp!,{r4-r11} + + eor r3,r3,r3 + cmp r1,#0 + str r3,[r0,#0] @ zero hash value + str r3,[r0,#4] + str r3,[r0,#8] + str r3,[r0,#12] + str r3,[r0,#16] + str r3,[r0,#36] @ clear is_base2_26 + add r0,r0,#20 + +#ifdef __thumb2__ + it eq +#endif + moveq r0,#0 + beq .Lno_key + +#if __ARM_MAX_ARCH__>=7 + mov r3,#-1 + str r3,[r0,#28] @ impossible key power value +# ifndef __KERNEL__ + adr r11,.Lpoly1305_init + ldr r12,.LOPENSSL_armcap +# endif +#endif + ldrb r4,[r1,#0] + mov r10,#0x0fffffff + ldrb r5,[r1,#1] + and r3,r10,#-4 @ 0x0ffffffc + ldrb r6,[r1,#2] + ldrb r7,[r1,#3] + orr r4,r4,r5,lsl#8 + ldrb r5,[r1,#4] + orr r4,r4,r6,lsl#16 + ldrb r6,[r1,#5] + orr r4,r4,r7,lsl#24 + ldrb r7,[r1,#6] + and r4,r4,r10 + +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) +# if !defined(_WIN32) + ldr r12,[r11,r12] @ OPENSSL_armcap_P +# endif +# if defined(__APPLE__) || defined(_WIN32) + ldr r12,[r12] +# endif +#endif + ldrb r8,[r1,#7] + orr r5,r5,r6,lsl#8 + ldrb r6,[r1,#8] + orr r5,r5,r7,lsl#16 + ldrb r7,[r1,#9] + orr r5,r5,r8,lsl#24 + ldrb r8,[r1,#10] + and r5,r5,r3 + +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) + tst r12,#ARMV7_NEON @ check for NEON +# ifdef __thumb2__ + adr r9,.Lpoly1305_blocks_neon + adr r11,.Lpoly1305_blocks + it ne + movne r11,r9 + adr r12,.Lpoly1305_emit + orr r11,r11,#1 @ thumb-ify addresses + orr r12,r12,#1 +# else + add r12,r11,#(.Lpoly1305_emit-.Lpoly1305_init) + ite eq + addeq r11,r11,#(.Lpoly1305_blocks-.Lpoly1305_init) + addne r11,r11,#(.Lpoly1305_blocks_neon-.Lpoly1305_init) +# endif +#endif + ldrb r9,[r1,#11] + orr r6,r6,r7,lsl#8 + ldrb r7,[r1,#12] + orr r6,r6,r8,lsl#16 + ldrb r8,[r1,#13] + orr r6,r6,r9,lsl#24 + ldrb r9,[r1,#14] + and r6,r6,r3 + + ldrb r10,[r1,#15] + orr r7,r7,r8,lsl#8 + str r4,[r0,#0] + orr r7,r7,r9,lsl#16 + str r5,[r0,#4] + orr r7,r7,r10,lsl#24 + str r6,[r0,#8] + and r7,r7,r3 + str r7,[r0,#12] +#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__) + stmia r2,{r11,r12} @ fill functions table + mov r0,#1 +#else + mov r0,#0 +#endif +.Lno_key: + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_init,.-poly1305_init +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: +.Lpoly1305_blocks: + stmdb sp!,{r3-r11,lr} + + ands r2,r2,#-16 + beq .Lno_data + + add r2,r2,r1 @ end pointer + sub sp,sp,#32 + +#if __ARM_ARCH__<7 + ldmia r0,{r4-r12} @ load context + add r0,r0,#20 + str r2,[sp,#16] @ offload stuff + str r0,[sp,#12] +#else + ldr lr,[r0,#36] @ is_base2_26 + ldmia r0!,{r4-r8} @ load hash value + str r2,[sp,#16] @ offload stuff + str r0,[sp,#12] + + adds r9,r4,r5,lsl#26 @ base 2^26 -> base 2^32 + mov r10,r5,lsr#6 + adcs r10,r10,r6,lsl#20 + mov r11,r6,lsr#12 + adcs r11,r11,r7,lsl#14 + mov r12,r7,lsr#18 + adcs r12,r12,r8,lsl#8 + mov r2,#0 + teq lr,#0 + str r2,[r0,#16] @ clear is_base2_26 + adc r2,r2,r8,lsr#24 + + itttt ne + movne r4,r9 @ choose between radixes + movne r5,r10 + movne r6,r11 + movne r7,r12 + ldmia r0,{r9-r12} @ load key + it ne + movne r8,r2 +#endif + + mov lr,r1 + cmp r3,#0 + str r10,[sp,#20] + str r11,[sp,#24] + str r12,[sp,#28] + b .Loop + +.align 4 +.Loop: +#if __ARM_ARCH__<7 + ldrb r0,[lr],#16 @ load input +# ifdef __thumb2__ + it hi +# endif + addhi r8,r8,#1 @ 1<<128 + ldrb r1,[lr,#-15] + ldrb r2,[lr,#-14] + ldrb r3,[lr,#-13] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-12] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-11] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-10] + adds r4,r4,r3 @ accumulate input + + ldrb r3,[lr,#-9] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-8] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-7] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-6] + adcs r5,r5,r3 + + ldrb r3,[lr,#-5] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-4] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-3] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-2] + adcs r6,r6,r3 + + ldrb r3,[lr,#-1] + orr r1,r0,r1,lsl#8 + str lr,[sp,#8] @ offload input pointer + orr r2,r1,r2,lsl#16 + add r10,r10,r10,lsr#2 + orr r3,r2,r3,lsl#24 +#else + ldr r0,[lr],#16 @ load input + it hi + addhi r8,r8,#1 @ padbit + ldr r1,[lr,#-12] + ldr r2,[lr,#-8] + ldr r3,[lr,#-4] +# ifdef __ARMEB__ + rev r0,r0 + rev r1,r1 + rev r2,r2 + rev r3,r3 +# endif + adds r4,r4,r0 @ accumulate input + str lr,[sp,#8] @ offload input pointer + adcs r5,r5,r1 + add r10,r10,r10,lsr#2 + adcs r6,r6,r2 +#endif + add r11,r11,r11,lsr#2 + adcs r7,r7,r3 + add r12,r12,r12,lsr#2 + + umull r2,r3,r5,r9 + adc r8,r8,#0 + umull r0,r1,r4,r9 + umlal r2,r3,r8,r10 + umlal r0,r1,r7,r10 + ldr r10,[sp,#20] @ reload r10 + umlal r2,r3,r6,r12 + umlal r0,r1,r5,r12 + umlal r2,r3,r7,r11 + umlal r0,r1,r6,r11 + umlal r2,r3,r4,r10 + str r0,[sp,#0] @ future r4 + mul r0,r11,r8 + ldr r11,[sp,#24] @ reload r11 + adds r2,r2,r1 @ d1+=d0>>32 + eor r1,r1,r1 + adc lr,r3,#0 @ future r6 + str r2,[sp,#4] @ future r5 + + mul r2,r12,r8 + eor r3,r3,r3 + umlal r0,r1,r7,r12 + ldr r12,[sp,#28] @ reload r12 + umlal r2,r3,r7,r9 + umlal r0,r1,r6,r9 + umlal r2,r3,r6,r10 + umlal r0,r1,r5,r10 + umlal r2,r3,r5,r11 + umlal r0,r1,r4,r11 + umlal r2,r3,r4,r12 + ldr r4,[sp,#0] + mul r8,r9,r8 + ldr r5,[sp,#4] + + adds r6,lr,r0 @ d2+=d1>>32 + ldr lr,[sp,#8] @ reload input pointer + adc r1,r1,#0 + adds r7,r2,r1 @ d3+=d2>>32 + ldr r0,[sp,#16] @ reload end pointer + adc r3,r3,#0 + add r8,r8,r3 @ h4+=d3>>32 + + and r1,r8,#-4 + and r8,r8,#3 + add r1,r1,r1,lsr#2 @ *=5 + adds r4,r4,r1 + adcs r5,r5,#0 + adcs r6,r6,#0 + adcs r7,r7,#0 + adc r8,r8,#0 + + cmp r0,lr @ done yet? + bhi .Loop + + ldr r0,[sp,#12] + add sp,sp,#32 + stmdb r0,{r4-r8} @ store the result + +.Lno_data: +#if __ARM_ARCH__>=5 + ldmia sp!,{r3-r11,pc} +#else + ldmia sp!,{r3-r11,lr} + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_blocks,.-poly1305_blocks +.type poly1305_emit,%function +.align 5 +poly1305_emit: +.Lpoly1305_emit: + stmdb sp!,{r4-r11} + + ldmia r0,{r3-r7} + +#if __ARM_ARCH__>=7 + ldr ip,[r0,#36] @ is_base2_26 + + adds r8,r3,r4,lsl#26 @ base 2^26 -> base 2^32 + mov r9,r4,lsr#6 + adcs r9,r9,r5,lsl#20 + mov r10,r5,lsr#12 + adcs r10,r10,r6,lsl#14 + mov r11,r6,lsr#18 + adcs r11,r11,r7,lsl#8 + mov r0,#0 + adc r0,r0,r7,lsr#24 + + tst ip,ip + itttt ne + movne r3,r8 + movne r4,r9 + movne r5,r10 + movne r6,r11 + it ne + movne r7,r0 +#endif + + adds r8,r3,#5 @ compare to modulus + adcs r9,r4,#0 + adcs r10,r5,#0 + adcs r11,r6,#0 + adc r0,r7,#0 + tst r0,#4 @ did it carry/borrow? + +#ifdef __thumb2__ + it ne +#endif + movne r3,r8 + ldr r8,[r2,#0] +#ifdef __thumb2__ + it ne +#endif + movne r4,r9 + ldr r9,[r2,#4] +#ifdef __thumb2__ + it ne +#endif + movne r5,r10 + ldr r10,[r2,#8] +#ifdef __thumb2__ + it ne +#endif + movne r6,r11 + ldr r11,[r2,#12] + + adds r3,r3,r8 + adcs r4,r4,r9 + adcs r5,r5,r10 + adc r6,r6,r11 + +#if __ARM_ARCH__>=7 +# ifdef __ARMEB__ + rev r3,r3 + rev r4,r4 + rev r5,r5 + rev r6,r6 +# endif + str r3,[r1,#0] + str r4,[r1,#4] + str r5,[r1,#8] + str r6,[r1,#12] +#else + strb r3,[r1,#0] + mov r3,r3,lsr#8 + strb r4,[r1,#4] + mov r4,r4,lsr#8 + strb r5,[r1,#8] + mov r5,r5,lsr#8 + strb r6,[r1,#12] + mov r6,r6,lsr#8 + + strb r3,[r1,#1] + mov r3,r3,lsr#8 + strb r4,[r1,#5] + mov r4,r4,lsr#8 + strb r5,[r1,#9] + mov r5,r5,lsr#8 + strb r6,[r1,#13] + mov r6,r6,lsr#8 + + strb r3,[r1,#2] + mov r3,r3,lsr#8 + strb r4,[r1,#6] + mov r4,r4,lsr#8 + strb r5,[r1,#10] + mov r5,r5,lsr#8 + strb r6,[r1,#14] + mov r6,r6,lsr#8 + + strb r3,[r1,#3] + strb r4,[r1,#7] + strb r5,[r1,#11] + strb r6,[r1,#15] +#endif + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_emit,.-poly1305_emit +#if __ARM_MAX_ARCH__>=7 +.fpu neon + +.type poly1305_init_neon,%function +.align 5 +poly1305_init_neon: +.Lpoly1305_init_neon: + ldr r3,[r0,#48] @ first table element + cmp r3,#-1 @ is value impossible? + bne .Lno_init_neon + + ldr r4,[r0,#20] @ load key base 2^32 + ldr r5,[r0,#24] + ldr r6,[r0,#28] + ldr r7,[r0,#32] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + and r3,r3,#0x03ffffff + and r4,r4,#0x03ffffff + and r5,r5,#0x03ffffff + + vdup.32 d0,r2 @ r^1 in both lanes + add r2,r3,r3,lsl#2 @ *5 + vdup.32 d1,r3 + add r3,r4,r4,lsl#2 + vdup.32 d2,r2 + vdup.32 d3,r4 + add r4,r5,r5,lsl#2 + vdup.32 d4,r3 + vdup.32 d5,r5 + add r5,r6,r6,lsl#2 + vdup.32 d6,r4 + vdup.32 d7,r6 + vdup.32 d8,r5 + + mov r5,#2 @ counter + +.Lsquare_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + + vmull.u32 q5,d0,d0[1] + vmull.u32 q6,d1,d0[1] + vmull.u32 q7,d3,d0[1] + vmull.u32 q8,d5,d0[1] + vmull.u32 q9,d7,d0[1] + + vmlal.u32 q5,d7,d2[1] + vmlal.u32 q6,d0,d1[1] + vmlal.u32 q7,d1,d1[1] + vmlal.u32 q8,d3,d1[1] + vmlal.u32 q9,d5,d1[1] + + vmlal.u32 q5,d5,d4[1] + vmlal.u32 q6,d7,d4[1] + vmlal.u32 q8,d1,d3[1] + vmlal.u32 q7,d0,d3[1] + vmlal.u32 q9,d3,d3[1] + + vmlal.u32 q5,d3,d6[1] + vmlal.u32 q8,d0,d5[1] + vmlal.u32 q6,d5,d6[1] + vmlal.u32 q7,d7,d6[1] + vmlal.u32 q9,d1,d5[1] + + vmlal.u32 q8,d7,d8[1] + vmlal.u32 q5,d1,d8[1] + vmlal.u32 q6,d3,d8[1] + vmlal.u32 q7,d5,d8[1] + vmlal.u32 q9,d0,d7[1] + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + @ and P. Schwabe + @ + @ H0>>+H1>>+H2>>+H3>>+H4 + @ H3>>+H4>>*5+H0>>+H1 + @ + @ Trivia. + @ + @ Result of multiplication of n-bit number by m-bit number is + @ n+m bits wide. However! Even though 2^n is a n+1-bit number, + @ m-bit number multiplied by 2^n is still n+m bits wide. + @ + @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2, + @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit + @ one is n+1 bits wide. + @ + @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that + @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4 + @ can be 27. However! In cases when their width exceeds 26 bits + @ they are limited by 2^26+2^6. This in turn means that *sum* + @ of the products with these values can still be viewed as sum + @ of 52-bit numbers as long as the amount of addends is not a + @ power of 2. For example, + @ + @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4, + @ + @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or + @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than + @ 8 * (2^52) or 2^55. However, the value is then multiplied by + @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12), + @ which is less than 32 * (2^52) or 2^57. And when processing + @ data we are looking at triple as many addends... + @ + @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and + @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the + @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while + @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32 + @ instruction accepts 2x32-bit input and writes 2x64-bit result. + @ This means that result of reduction have to be compressed upon + @ loop wrap-around. This can be done in the process of reduction + @ to minimize amount of instructions [as well as amount of + @ 128-bit instructions, which benefits low-end processors], but + @ one has to watch for H2 (which is narrower than H0) and 5*H4 + @ not being wider than 58 bits, so that result of right shift + @ by 26 bits fits in 32 bits. This is also useful on x86, + @ because it allows to use paddd in place for paddq, which + @ benefits Atom, where paddq is ridiculously slow. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 @ &=0x03ffffff + vadd.i64 q6,q6,q4 @ h0 -> h1 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vbic.i32 d18,#0xfc000000 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vadd.i32 d10,d10,d30 @ h4 -> h0 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vbic.i32 d14,#0xfc000000 + + vshr.u32 d30,d10,#26 + vbic.i32 d10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + + subs r5,r5,#1 + beq .Lsquare_break_neon + + add r6,r0,#(48+0*9*4) + add r7,r0,#(48+1*9*4) + + vtrn.32 d0,d10 @ r^2:r^1 + vtrn.32 d3,d14 + vtrn.32 d5,d16 + vtrn.32 d1,d12 + vtrn.32 d7,d18 + + vshl.u32 d4,d3,#2 @ *5 + vshl.u32 d6,d5,#2 + vshl.u32 d2,d1,#2 + vshl.u32 d8,d7,#2 + vadd.i32 d4,d4,d3 + vadd.i32 d2,d2,d1 + vadd.i32 d6,d6,d5 + vadd.i32 d8,d8,d7 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6,:32] + vst1.32 {d8[1]},[r7,:32] + + b .Lsquare_neon + +.align 4 +.Lsquare_break_neon: + add r6,r0,#(48+2*4*9) + add r7,r0,#(48+3*4*9) + + vmov d0,d10 @ r^4:r^3 + vshl.u32 d2,d12,#2 @ *5 + vmov d1,d12 + vshl.u32 d4,d14,#2 + vmov d3,d14 + vshl.u32 d6,d16,#2 + vmov d5,d16 + vshl.u32 d8,d18,#2 + vmov d7,d18 + vadd.i32 d2,d2,d12 + vadd.i32 d4,d4,d14 + vadd.i32 d6,d6,d16 + vadd.i32 d8,d8,d18 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6] + vst1.32 {d8[1]},[r7] + +.Lno_init_neon: + bx lr @ bx lr +.size poly1305_init_neon,.-poly1305_init_neon + +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: +.Lpoly1305_blocks_neon: + ldr ip,[r0,#36] @ is_base2_26 + + cmp r2,#64 + blo .Lpoly1305_blocks + + stmdb sp!,{r4-r7} + vstmdb sp!,{d8-d15} @ ABI specification says so + + tst ip,ip @ is_base2_26? + bne .Lbase2_26_neon + + stmdb sp!,{r1-r3,lr} + bl .Lpoly1305_init_neon + + ldr r4,[r0,#0] @ load hash value base 2^32 + ldr r5,[r0,#4] + ldr r6,[r0,#8] + ldr r7,[r0,#12] + ldr ip,[r0,#16] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + veor d10,d10,d10 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + veor d12,d12,d12 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + veor d14,d14,d14 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + veor d16,d16,d16 + and r3,r3,#0x03ffffff + orr r6,r6,ip,lsl#24 + veor d18,d18,d18 + and r4,r4,#0x03ffffff + mov r1,#1 + and r5,r5,#0x03ffffff + str r1,[r0,#36] @ set is_base2_26 + + vmov.32 d10[0],r2 + vmov.32 d12[0],r3 + vmov.32 d14[0],r4 + vmov.32 d16[0],r5 + vmov.32 d18[0],r6 + adr r5,.Lzeros + + ldmia sp!,{r1-r3,lr} + b .Lhash_loaded + +.align 4 +.Lbase2_26_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ load hash value + + veor d10,d10,d10 + veor d12,d12,d12 + veor d14,d14,d14 + veor d16,d16,d16 + veor d18,d18,d18 + vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + adr r5,.Lzeros + vld1.32 {d18[0]},[r0] + sub r0,r0,#16 @ rewind + +.Lhash_loaded: + add r4,r1,#32 + mov r3,r3,lsl#24 + tst r2,#31 + beq .Leven + + vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]! + vmov.32 d28[0],r3 + sub r2,r2,#16 + add r4,r1,#32 + +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +# endif + vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26 + vshl.u32 d26,d26,#18 + + vsri.u32 d26,d24,#14 + vshl.u32 d24,d24,#12 + vadd.i32 d29,d28,d18 @ add hash value and move to #hi + + vbic.i32 d26,#0xfc000000 + vsri.u32 d24,d22,#20 + vshl.u32 d22,d22,#6 + + vbic.i32 d24,#0xfc000000 + vsri.u32 d22,d20,#26 + vadd.i32 d27,d26,d16 + + vbic.i32 d20,#0xfc000000 + vbic.i32 d22,#0xfc000000 + vadd.i32 d25,d24,d14 + + vadd.i32 d21,d20,d10 + vadd.i32 d23,d22,d12 + + mov r7,r5 + add r6,r0,#48 + + cmp r2,r2 + b .Long_tail + +.align 4 +.Leven: + subs r2,r2,#64 + it lo + movlo r4,r5 + + vmov.i32 q14,#1<<24 @ padbit, yes, always + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + itt hi + addhi r7,r0,#(48+1*9*4) + addhi r6,r0,#(48+3*9*4) + +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +# endif + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vshl.u32 q13,q13,#18 + + vsri.u32 q13,q12,#14 + vshl.u32 q12,q12,#12 + + vbic.i32 q13,#0xfc000000 + vsri.u32 q12,q11,#20 + vshl.u32 q11,q11,#6 + + vbic.i32 q12,#0xfc000000 + vsri.u32 q11,q10,#26 + + vbic.i32 q10,#0xfc000000 + vbic.i32 q11,#0xfc000000 + + bls .Lskip_loop + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + b .Loop_neon + +.align 5 +.Loop_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + @ ___________________/ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + @ ___________________/ ____________________/ + @ + @ Note that we start with inp[2:3]*r^2. This is because it + @ doesn't depend on reduction in previous iteration. + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ inp[2:3]*r^2 + + vadd.i32 d24,d24,d14 @ accumulate inp[0:1] + vmull.u32 q7,d25,d0[1] + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0[1] + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0[1] + vmlal.u32 q7,d23,d1[1] + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0[1] + + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0[1] + subs r2,r2,#64 + vmlal.u32 q5,d29,d2[1] + it lo + movlo r4,r5 + vmlal.u32 q8,d25,d1[1] + vld1.32 d8[1],[r7,:32] + vmlal.u32 q6,d21,d1[1] + vmlal.u32 q9,d27,d1[1] + + vmlal.u32 q5,d27,d4[1] + vmlal.u32 q8,d23,d3[1] + vmlal.u32 q9,d25,d3[1] + vmlal.u32 q6,d29,d4[1] + vmlal.u32 q7,d21,d3[1] + + vmlal.u32 q8,d21,d5[1] + vmlal.u32 q5,d25,d6[1] + vmlal.u32 q9,d23,d5[1] + vmlal.u32 q6,d27,d6[1] + vmlal.u32 q7,d29,d6[1] + + vmlal.u32 q8,d29,d8[1] + vmlal.u32 q5,d23,d8[1] + vmlal.u32 q9,d21,d7[1] + vmlal.u32 q6,d25,d8[1] + vmlal.u32 q7,d27,d8[1] + + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4 and accumulate + + vmlal.u32 q8,d26,d0[0] + vmlal.u32 q5,d20,d0[0] + vmlal.u32 q9,d28,d0[0] + vmlal.u32 q6,d22,d0[0] + vmlal.u32 q7,d24,d0[0] + vld1.32 d8[0],[r6,:32] + + vmlal.u32 q8,d24,d1[0] + vmlal.u32 q5,d28,d2[0] + vmlal.u32 q9,d26,d1[0] + vmlal.u32 q6,d20,d1[0] + vmlal.u32 q7,d22,d1[0] + + vmlal.u32 q8,d22,d3[0] + vmlal.u32 q5,d26,d4[0] + vmlal.u32 q9,d24,d3[0] + vmlal.u32 q6,d28,d4[0] + vmlal.u32 q7,d20,d3[0] + + vmlal.u32 q8,d20,d5[0] + vmlal.u32 q5,d24,d6[0] + vmlal.u32 q9,d22,d5[0] + vmlal.u32 q6,d26,d6[0] + vmlal.u32 q8,d28,d8[0] + + vmlal.u32 q7,d28,d6[0] + vmlal.u32 q5,d22,d8[0] + vmlal.u32 q9,d20,d7[0] + vmov.i32 q14,#1<<24 @ padbit, yes, always + vmlal.u32 q6,d24,d8[0] + vmlal.u32 q7,d26,d8[0] + + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q11,q11 + vrev32.8 q12,q12 + vrev32.8 q13,q13 +# endif + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction interleaved with base 2^32 -> base 2^26 of + @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vadd.i64 q6,q6,q4 @ h0 -> h1 + vshl.u32 q13,q13,#18 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vsri.u32 q13,q12,#14 + vbic.i32 d18,#0xfc000000 + vshl.u32 q12,q12,#12 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vbic.i32 q13,#0xfc000000 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec] + vsri.u32 q12,q11,#20 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vshl.u32 q11,q11,#6 + vbic.i32 d14,#0xfc000000 + vbic.i32 q12,#0xfc000000 + + vshrn.u64 d30,q5,#26 @ re-narrow + vmovn.i64 d10,q5 + vsri.u32 q11,q10,#26 + vbic.i32 q10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vbic.i32 d10,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + vbic.i32 q11,#0xfc000000 + + bhi .Loop_neon + +.Lskip_loop: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + add r7,r0,#(48+0*9*4) + add r6,r0,#(48+1*9*4) + adds r2,r2,#32 + it ne + movne r2,#0 + bne .Long_tail + + vadd.i32 d25,d24,d14 @ add hash value and move to #hi + vadd.i32 d21,d20,d10 + vadd.i32 d27,d26,d16 + vadd.i32 d23,d22,d12 + vadd.i32 d29,d28,d18 + +.Long_tail: + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2 + + vadd.i32 d24,d24,d14 @ can be redundant + vmull.u32 q7,d25,d0 + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0 + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0 + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0 + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0 + + vmlal.u32 q5,d29,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d25,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d21,d1 + vmlal.u32 q9,d27,d1 + vmlal.u32 q7,d23,d1 + + vmlal.u32 q8,d23,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d27,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d25,d3 + vmlal.u32 q6,d29,d4 + vmlal.u32 q7,d21,d3 + + vmlal.u32 q8,d21,d5 + it ne + addne r7,r0,#(48+2*9*4) + vmlal.u32 q5,d25,d6 + it ne + addne r6,r0,#(48+3*9*4) + vmlal.u32 q9,d23,d5 + vmlal.u32 q6,d27,d6 + vmlal.u32 q7,d29,d6 + + vmlal.u32 q8,d29,d8 + vorn q0,q0,q0 @ all-ones, can be redundant + vmlal.u32 q5,d23,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d21,d7 + vmlal.u32 q6,d25,d8 + vmlal.u32 q7,d27,d8 + + beq .Lshort_tail + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4:r^3 and accumulate + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + + vmlal.u32 q7,d24,d0 + vmlal.u32 q5,d20,d0 + vmlal.u32 q8,d26,d0 + vmlal.u32 q6,d22,d0 + vmlal.u32 q9,d28,d0 + + vmlal.u32 q5,d28,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d24,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d20,d1 + vmlal.u32 q9,d26,d1 + vmlal.u32 q7,d22,d1 + + vmlal.u32 q8,d22,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d26,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d24,d3 + vmlal.u32 q6,d28,d4 + vmlal.u32 q7,d20,d3 + + vmlal.u32 q8,d20,d5 + vmlal.u32 q5,d24,d6 + vmlal.u32 q9,d22,d5 + vmlal.u32 q6,d26,d6 + vmlal.u32 q7,d28,d6 + + vmlal.u32 q8,d28,d8 + vorn q0,q0,q0 @ all-ones + vmlal.u32 q5,d22,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d20,d7 + vmlal.u32 q6,d24,d8 + vmlal.u32 q7,d26,d8 + +.Lshort_tail: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ horizontal addition + + vadd.i64 d16,d16,d17 + vadd.i64 d10,d10,d11 + vadd.i64 d18,d18,d19 + vadd.i64 d12,d12,d13 + vadd.i64 d14,d14,d15 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction, but without narrowing + + vshr.u64 q15,q8,#26 + vand.i64 q8,q8,q0 + vshr.u64 q4,q5,#26 + vand.i64 q5,q5,q0 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vadd.i64 q6,q6,q4 @ h0 -> h1 + + vshr.u64 q15,q9,#26 + vand.i64 q9,q9,q0 + vshr.u64 q4,q6,#26 + vand.i64 q6,q6,q0 + vadd.i64 q7,q7,q4 @ h1 -> h2 + + vadd.i64 q5,q5,q15 + vshl.u64 q15,q15,#2 + vshr.u64 q4,q7,#26 + vand.i64 q7,q7,q0 + vadd.i64 q5,q5,q15 @ h4 -> h0 + vadd.i64 q8,q8,q4 @ h2 -> h3 + + vshr.u64 q15,q5,#26 + vand.i64 q5,q5,q0 + vshr.u64 q4,q8,#26 + vand.i64 q8,q8,q0 + vadd.i64 q6,q6,q15 @ h0 -> h1 + vadd.i64 q9,q9,q4 @ h3 -> h4 + + cmp r2,#0 + bne .Leven + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ store hash value + + vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + vst1.32 {d18[0]},[r0] + + vldmia sp!,{d8-d15} @ epilogue + ldmia sp!,{r4-r7} + bx lr @ bx lr +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +#ifndef __KERNEL__ +.LOPENSSL_armcap: +# ifdef _WIN32 +.word OPENSSL_armcap_P +# else +.word OPENSSL_armcap_P-.Lpoly1305_init +# endif +.comm OPENSSL_armcap_P,4,4 +.hidden OPENSSL_armcap_P +#endif +#endif +.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by @dot-asm" +.align 2 diff --git a/arch/arm/crypto/poly1305-glue.c b/arch/arm/crypto/poly1305-glue.c new file mode 100644 index 000000000000..adff7d7865bc --- /dev/null +++ b/arch/arm/crypto/poly1305-glue.c @@ -0,0 +1,253 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * OpenSSL/Cryptogams accelerated Poly1305 transform for arm64 + * + * Copyright (C) 2019 Linaro Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define POLY1305_BLOCK_SIZE 16 +#define POLY1305_DIGEST_SIZE 16 + +struct arm_poly1305_ctx { + /* the state owned by the accelerated code */ + u64 state[24]; + /* finalize key */ + u32 s[4]; + /* partial buffer */ + u8 buf[POLY1305_BLOCK_SIZE]; + /* bytes used in partial buffer */ + unsigned int buflen; + /* r key has been set */ + bool rset; + /* s key has been set */ + bool sset; +}; + +asmlinkage void poly1305_init_arm(u64 *state, const u8 *key); +asmlinkage void poly1305_blocks_arm(u64 *state, const u8 *src, u32 len, u32 hibit); +asmlinkage void poly1305_blocks_neon(u64 *state, const u8 *src, u32 len, u32 hibit); +asmlinkage void poly1305_emit_arm(u64 state[], __le32 *digest, const u32 *nonce); + +static int arm_poly1305_init(struct shash_desc *desc) +{ + struct arm_poly1305_ctx *dctx = shash_desc_ctx(desc); + + dctx->buflen = 0; + dctx->rset = false; + dctx->sset = false; + + return 0; +} + +static void arm_poly1305_blocks(struct arm_poly1305_ctx *dctx, const u8 *src, + u32 len, u32 hibit, bool do_neon) +{ + if (unlikely(!dctx->sset)) { + if (!dctx->rset) { + poly1305_init_arm(dctx->state, src); + src += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + dctx->rset = true; + } + if (len >= POLY1305_BLOCK_SIZE) { + dctx->s[0] = get_unaligned_le32(src + 0); + dctx->s[1] = get_unaligned_le32(src + 4); + dctx->s[2] = get_unaligned_le32(src + 8); + dctx->s[3] = get_unaligned_le32(src + 12); + src += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + dctx->sset = true; + } + if (len < POLY1305_BLOCK_SIZE) + return; + } + + len &= ~(POLY1305_BLOCK_SIZE - 1); + + if (likely(do_neon)) + poly1305_blocks_neon(dctx->state, src, len, hibit); + else + poly1305_blocks_arm(dctx->state, src, len, hibit); +} + +static void arm_poly1305_do_update(struct arm_poly1305_ctx *dctx, + const u8 *src, u32 len, bool do_neon) +{ + if (unlikely(dctx->buflen)) { + u32 bytes = min(len, POLY1305_BLOCK_SIZE - dctx->buflen); + + memcpy(dctx->buf + dctx->buflen, src, bytes); + src += bytes; + len -= bytes; + dctx->buflen += bytes; + + if (dctx->buflen == POLY1305_BLOCK_SIZE) { + arm_poly1305_blocks(dctx, dctx->buf, + POLY1305_BLOCK_SIZE, 1, false); + dctx->buflen = 0; + } + } + + if (likely(len >= POLY1305_BLOCK_SIZE)) { + arm_poly1305_blocks(dctx, src, len, 1, do_neon); + src += round_down(len, POLY1305_BLOCK_SIZE); + len %= POLY1305_BLOCK_SIZE; + } + + if (unlikely(len)) { + dctx->buflen = len; + memcpy(dctx->buf, src, len); + } +} + +static int arm_poly1305_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + struct arm_poly1305_ctx *dctx = shash_desc_ctx(desc); + + arm_poly1305_do_update(dctx, src, srclen, false); + return 0; +} + +static int __maybe_unused arm_poly1305_update_neon(struct shash_desc *desc, + const u8 *src, + unsigned int srclen) +{ + struct arm_poly1305_ctx *dctx = shash_desc_ctx(desc); + bool do_neon = crypto_simd_usable() && srclen > 128; + + if (do_neon) + kernel_neon_begin(); + arm_poly1305_do_update(dctx, src, srclen, do_neon); + if (do_neon) + kernel_neon_end(); + return 0; +} + +static +int __maybe_unused arm_poly1305_update_from_sg_neon(struct shash_desc *desc, + struct scatterlist *sg, + unsigned int srclen, + int flags) +{ + struct arm_poly1305_ctx *dctx = shash_desc_ctx(desc); + bool do_neon = crypto_simd_usable() && srclen > 128; + struct crypto_hash_walk walk; + int nbytes; + + if (do_neon) { + kernel_neon_begin(); + flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + } + + for (nbytes = crypto_shash_walk_sg(desc, sg, srclen, &walk, flags); + nbytes > 0; + nbytes = crypto_hash_walk_done(&walk, 0)) + arm_poly1305_do_update(dctx, walk.data, nbytes, do_neon); + + if (do_neon) + kernel_neon_end(); + + return 0; +} + +static int arm_poly1305_final(struct shash_desc *desc, u8 *dst) +{ + struct arm_poly1305_ctx *dctx = shash_desc_ctx(desc); + __le32 digest[4]; + u64 f = 0; + + if (unlikely(!dctx->sset)) + return -ENOKEY; + + if (unlikely(dctx->buflen)) { + dctx->buf[dctx->buflen++] = 1; + memset(dctx->buf + dctx->buflen, 0, + POLY1305_BLOCK_SIZE - dctx->buflen); + poly1305_blocks_arm(dctx->state, dctx->buf, POLY1305_BLOCK_SIZE, 0); + } + + poly1305_emit_arm(dctx->state, digest, dctx->s); + + /* mac = (h + s) % (2^128) */ + f = (f >> 32) + le32_to_cpu(digest[0]); + put_unaligned_le32(f, dst); + f = (f >> 32) + le32_to_cpu(digest[1]); + put_unaligned_le32(f, dst + 4); + f = (f >> 32) + le32_to_cpu(digest[2]); + put_unaligned_le32(f, dst + 8); + f = (f >> 32) + le32_to_cpu(digest[3]); + put_unaligned_le32(f, dst + 12); + + return 0; +} + +static struct shash_alg arm_poly1305_algs[] = {{ + .init = arm_poly1305_init, + .update = arm_poly1305_update, + .final = arm_poly1305_final, + .digestsize = POLY1305_DIGEST_SIZE, + .descsize = sizeof(struct arm_poly1305_ctx), + + .base.cra_name = "poly1305", + .base.cra_driver_name = "poly1305-arm", + .base.cra_priority = 150, + .base.cra_blocksize = POLY1305_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +#ifdef CONFIG_KERNEL_MODE_NEON +}, { + .init = arm_poly1305_init, + .update = arm_poly1305_update_neon, + .update_from_sg = arm_poly1305_update_from_sg_neon, + .final = arm_poly1305_final, + .digestsize = POLY1305_DIGEST_SIZE, + .descsize = sizeof(struct arm_poly1305_ctx), + + .base.cra_name = "poly1305", + .base.cra_driver_name = "poly1305-neon", + .base.cra_priority = 200, + .base.cra_blocksize = POLY1305_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +#endif +}}; + +static int __init arm_poly1305_mod_init(void) +{ + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && !(elf_hwcap & HWCAP_NEON)) + /* register only the first entry */ + return crypto_register_shash(&arm_poly1305_algs[0]); + + return crypto_register_shashes(arm_poly1305_algs, + ARRAY_SIZE(arm_poly1305_algs)); + + +} + +static void __exit arm_poly1305_mod_exit(void) +{ + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && !(elf_hwcap & HWCAP_NEON)) { + crypto_unregister_shash(&arm_poly1305_algs[0]); + return; + } + crypto_unregister_shashes(arm_poly1305_algs, + ARRAY_SIZE(arm_poly1305_algs)); +} + +module_init(arm_poly1305_mod_init); +module_exit(arm_poly1305_mod_exit); + +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("poly1305"); +MODULE_ALIAS_CRYPTO("poly1305-arm"); +MODULE_ALIAS_CRYPTO("poly1305-neon"); From patchwork Wed Sep 25 16:12:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161071 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14385924 for ; Wed, 25 Sep 2019 16:17:02 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DAB8D21D7F for ; Wed, 25 Sep 2019 16:17:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="mVz6nMJf"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="Qp0FkhpY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DAB8D21D7F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=eyLLHKoNDTsI0hJqX9FoTUUrmpo3YD0VGpDchtR8KhI=; b=mVz6nMJfoAtx9a H+iKRB3cglVwphtKkFBLtMA8kvTP01hK2Puks54u3e9vFjGNHL0fOhV5hPKmBF/IGhwzvNoiBIhRm Tn61OmK56IDgKrAlWLEBAmmtu0+rZmsNTgGNFjr25n0LTu1gK+rff7GOsfGp0Tor0LtyAjdJuQwNg ej7wujPqBufoPEIRP3WO6WonoXiejodhb8mrMUqpEb/MJlVDE5GXUOrrAPpF/Qh0gvhq1TJ2fzC1S zT+ed6Axsd3n03iuuMy6hZdAW8e9MwKZaGbV8bSDCK0sO7dmtzhljmk4FFTsrJZuR4hae5y4vfJeQ agaBXUVWOjB4iFG0GOMQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9yM-00073E-K7; Wed, 25 Sep 2019 16:16:58 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vO-0003tT-VU for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:04 +0000 Received: by mail-wm1-x341.google.com with SMTP id 3so5622576wmi.3 for ; Wed, 25 Sep 2019 09:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uBWKnqhE8fyahgPkhqt+j0IcW7qWJLRRdpKLMMQW86E=; b=Qp0FkhpYOsx997N4Zh3/U+JF9Zo4W8WFkYDiwwDmSVpBCttzjutMP6bcuwhuxOoJW4 9doLlaLphDUmBVFm4ppHXfRDHhXjuzi/wlmgu6RGuT5T0PtKDpU8YE7b5Gr1zmB6Uj2L fbJhkWFDumqhL6pu6EV80Bw7NRGaKKTajYbT3LCrGf8iWMvniM7O9faf2fijsudnUNXH hMeXlgk7gjedBIWzcTL2DzwMTEzzaegfV8eM8g120pqJ2NA0pXNrMOzAZ78tzdSrwtGg AXbNRHZZeq9lTwg+19jXR/+JAhQ1YYadJlcGzk2PYIfEhwrZa7nOotqP7udP6p/cl7bq TYAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uBWKnqhE8fyahgPkhqt+j0IcW7qWJLRRdpKLMMQW86E=; b=IQdzvI7HRfx1+fIgU6BOxxl6L5YsSG09tArQqzeVXle2XbKy/JQafSXQbUrjVaKrXo q0uWubouCKJfZhADyw62Vx68hVLJSH2gfFg5cQWk9mmBzE79GfjutRML2ooQF9zD2xua mYwF4SQDESIi/FxanZHsne1eeWtpFz5NqPpXgLWhOJ4sN//XFvy6ZP5cBpifrnLlvY1t KFi26LnRHEbNbTR00qSr1u85S31m3YvPqAV8WaHdYI7Pb8PxxLFioU/hXRYdnVzjIH6G NuLQwGO++KkmIWVCX2LiOJ1kJiWMzXSxwTBvRm25bt4m7oOA1s0T0oQhL9VTXWbdkgkW ksCg== X-Gm-Message-State: APjAAAVQhFqnm/lVftsj9hVNp0FatW06SJtZIon/jQ0VfpaEdvKyjAkM btwXwPxKMUPVxbcP30xbRk5ptg== X-Google-Smtp-Source: APXvYqy2uYVJgbhx2u3tvRPH5CpU+GCsLR7VJgtUWWLdOCVK9be8MAJNZZGc0G/p0Y4bGc9/tPwBeA== X-Received: by 2002:a7b:c761:: with SMTP id x1mr8697485wmk.47.1569428032430; Wed, 25 Sep 2019 09:13:52 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:51 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 04/18] crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation Date: Wed, 25 Sep 2019 18:12:41 +0200 Message-Id: <20190925161255.1871-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091355_480711_5AF844CE X-CRM114-Status: GOOD ( 13.74 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:341 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Andy Polyakov , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL project. The file 'poly1305-armv8.pl' is taken straight from this upstream GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f, and already contains all the changes required to build it as part of a Linux kernel module. [0] https://github.com/dot-asm/cryptogams Co-developed-by: Andy Polyakov Signed-off-by: Andy Polyakov Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 4 + arch/arm64/crypto/Makefile | 9 +- arch/arm64/crypto/poly1305-armv8.pl | 913 ++++++++++++++++++++ arch/arm64/crypto/poly1305-core.S_shipped | 835 ++++++++++++++++++ arch/arm64/crypto/poly1305-glue.c | 215 +++++ 5 files changed, 1975 insertions(+), 1 deletion(-) diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 4922c4451e7c..6ee2fdfd84aa 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -105,6 +105,10 @@ config CRYPTO_CHACHA20_NEON select CRYPTO_BLKCIPHER select CRYPTO_CHACHA20 +config CRYPTO_POLY1305_NEON + tristate "Poly1305 hash function using NEON instructions" + depends on KERNEL_MODE_NEON + config CRYPTO_NHPOLY1305_NEON tristate "NHPoly1305 hash function using NEON instructions (for Adiantum)" depends on KERNEL_MODE_NEON diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index 0435f2a0610e..164d554422fe 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -50,6 +50,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o +obj-$(CONFIG_CRYPTO_POLY1305_NEON) += poly1305-neon.o +poly1305-neon-y := poly1305-core.o poly1305-glue.o + obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o @@ -68,11 +71,15 @@ ifdef REGENERATE_ARM64_CRYPTO quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) +$(src)/poly1305-core.S_shipped: $(src)/poly1305-armv8.pl + $(call cmd,perlasm) + $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) + endif -clean-files += sha256-core.S sha512-core.S +clean-files += poly1305-core.S sha256-core.S sha512-core.S diff --git a/arch/arm64/crypto/poly1305-armv8.pl b/arch/arm64/crypto/poly1305-armv8.pl new file mode 100644 index 000000000000..6e5576d19af8 --- /dev/null +++ b/arch/arm64/crypto/poly1305-armv8.pl @@ -0,0 +1,913 @@ +#!/usr/bin/env perl +# SPDX-License-Identifier: GPL-1.0+ OR BSD-3-Clause +# +# ==================================================================== +# Written by Andy Polyakov, @dot-asm, initially for the OpenSSL +# project. +# ==================================================================== +# +# This module implements Poly1305 hash for ARMv8. +# +# June 2015 +# +# Numbers are cycles per processed byte with poly1305_blocks alone. +# +# IALU/gcc-4.9 NEON +# +# Apple A7 1.86/+5% 0.72 +# Cortex-A53 2.69/+58% 1.47 +# Cortex-A57 2.70/+7% 1.14 +# Denver 1.64/+50% 1.18(*) +# X-Gene 2.13/+68% 2.27 +# Mongoose 1.77/+75% 1.12 +# Kryo 2.70/+55% 1.13 +# ThunderX2 1.17/+95% 1.36 +# +# (*) estimate based on resources availability is less than 1.0, +# i.e. measured result is worse than expected, presumably binary +# translator is not almighty; + +$flavour=shift; +$output=shift; + +if ($flavour && $flavour ne "void") { + $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; + ( $xlate="${dir}arm-xlate.pl" and -f $xlate ) or + ( $xlate="${dir}../../perlasm/arm-xlate.pl" and -f $xlate) or + die "can't locate arm-xlate.pl"; + + open STDOUT,"| \"$^X\" $xlate $flavour $output"; +} else { + open STDOUT,">$output"; +} + +my ($ctx,$inp,$len,$padbit) = map("x$_",(0..3)); +my ($mac,$nonce)=($inp,$len); + +my ($h0,$h1,$h2,$r0,$r1,$s1,$t0,$t1,$d0,$d1,$d2) = map("x$_",(4..14)); + +$code.=<<___; +#ifndef __KERNEL__ +# include "arm_arch.h" +.extern OPENSSL_armcap_P +#endif + +.text + +// forward "declarations" are required for Apple +.globl poly1305_blocks +.globl poly1305_emit + +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: + cmp $inp,xzr + stp xzr,xzr,[$ctx] // zero hash value + stp xzr,xzr,[$ctx,#16] // [along with is_base2_26] + + csel x0,xzr,x0,eq + b.eq .Lno_key + +#ifndef __KERNEL__ + adrp x17,OPENSSL_armcap_P + ldr w17,[x17,#:lo12:OPENSSL_armcap_P] +#endif + + ldp $r0,$r1,[$inp] // load key + mov $s1,#0xfffffffc0fffffff + movk $s1,#0x0fff,lsl#48 +#ifdef __AARCH64EB__ + rev $r0,$r0 // flip bytes + rev $r1,$r1 +#endif + and $r0,$r0,$s1 // &=0ffffffc0fffffff + and $s1,$s1,#-4 + and $r1,$r1,$s1 // &=0ffffffc0ffffffc + mov w#$s1,#-1 + stp $r0,$r1,[$ctx,#32] // save key value + str w#$s1,[$ctx,#48] // impossible key power value + +#ifndef __KERNEL__ + tst w17,#ARMV7_NEON + + adr $d0,.Lpoly1305_blocks + adr $r0,.Lpoly1305_blocks_neon + adr $d1,.Lpoly1305_emit + + csel $d0,$d0,$r0,eq + +# ifdef __ILP32__ + stp w#$d0,w#$d1,[$len] +# else + stp $d0,$d1,[$len] +# endif +#endif + mov x0,#1 +.Lno_key: + ret +.size poly1305_init,.-poly1305_init + +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: +.Lpoly1305_blocks: + ands $len,$len,#-16 + b.eq .Lno_data + + ldp $h0,$h1,[$ctx] // load hash value + ldp $h2,x17,[$ctx,#16] // [along with is_base2_26] + ldp $r0,$r1,[$ctx,#32] // load key value + +#ifdef __AARCH64EB__ + lsr $d0,$h0,#32 + mov w#$d1,w#$h0 + lsr $d2,$h1,#32 + mov w15,w#$h1 + lsr x16,$h2,#32 +#else + mov w#$d0,w#$h0 + lsr $d1,$h0,#32 + mov w#$d2,w#$h1 + lsr x15,$h1,#32 + mov w16,w#$h2 +#endif + + add $d0,$d0,$d1,lsl#26 // base 2^26 -> base 2^64 + lsr $d1,$d2,#12 + adds $d0,$d0,$d2,lsl#52 + add $d1,$d1,x15,lsl#14 + adc $d1,$d1,xzr + lsr $d2,x16,#24 + adds $d1,$d1,x16,lsl#40 + adc $d2,$d2,xzr + + cmp x17,#0 // is_base2_26? + add $s1,$r1,$r1,lsr#2 // s1 = r1 + (r1 >> 2) + csel $h0,$h0,$d0,eq // choose between radixes + csel $h1,$h1,$d1,eq + csel $h2,$h2,$d2,eq + +.Loop: + ldp $t0,$t1,[$inp],#16 // load input + sub $len,$len,#16 +#ifdef __AARCH64EB__ + rev $t0,$t0 + rev $t1,$t1 +#endif + adds $h0,$h0,$t0 // accumulate input + adcs $h1,$h1,$t1 + + mul $d0,$h0,$r0 // h0*r0 + adc $h2,$h2,$padbit + umulh $d1,$h0,$r0 + + mul $t0,$h1,$s1 // h1*5*r1 + umulh $t1,$h1,$s1 + + adds $d0,$d0,$t0 + mul $t0,$h0,$r1 // h0*r1 + adc $d1,$d1,$t1 + umulh $d2,$h0,$r1 + + adds $d1,$d1,$t0 + mul $t0,$h1,$r0 // h1*r0 + adc $d2,$d2,xzr + umulh $t1,$h1,$r0 + + adds $d1,$d1,$t0 + mul $t0,$h2,$s1 // h2*5*r1 + adc $d2,$d2,$t1 + mul $t1,$h2,$r0 // h2*r0 + + adds $d1,$d1,$t0 + adc $d2,$d2,$t1 + + and $t0,$d2,#-4 // final reduction + and $h2,$d2,#3 + add $t0,$t0,$d2,lsr#2 + adds $h0,$d0,$t0 + adcs $h1,$d1,xzr + adc $h2,$h2,xzr + + cbnz $len,.Loop + + stp $h0,$h1,[$ctx] // store hash value + stp $h2,xzr,[$ctx,#16] // [and clear is_base2_26] + +.Lno_data: + ret +.size poly1305_blocks,.-poly1305_blocks + +.type poly1305_emit,%function +.align 5 +poly1305_emit: +.Lpoly1305_emit: + ldp $h0,$h1,[$ctx] // load hash base 2^64 + ldp $h2,$r0,[$ctx,#16] // [along with is_base2_26] + ldp $t0,$t1,[$nonce] // load nonce + +#ifdef __AARCH64EB__ + lsr $d0,$h0,#32 + mov w#$d1,w#$h0 + lsr $d2,$h1,#32 + mov w15,w#$h1 + lsr x16,$h2,#32 +#else + mov w#$d0,w#$h0 + lsr $d1,$h0,#32 + mov w#$d2,w#$h1 + lsr x15,$h1,#32 + mov w16,w#$h2 +#endif + + add $d0,$d0,$d1,lsl#26 // base 2^26 -> base 2^64 + lsr $d1,$d2,#12 + adds $d0,$d0,$d2,lsl#52 + add $d1,$d1,x15,lsl#14 + adc $d1,$d1,xzr + lsr $d2,x16,#24 + adds $d1,$d1,x16,lsl#40 + adc $d2,$d2,xzr + + cmp $r0,#0 // is_base2_26? + csel $h0,$h0,$d0,eq // choose between radixes + csel $h1,$h1,$d1,eq + csel $h2,$h2,$d2,eq + + adds $d0,$h0,#5 // compare to modulus + adcs $d1,$h1,xzr + adc $d2,$h2,xzr + + tst $d2,#-4 // see if it's carried/borrowed + + csel $h0,$h0,$d0,eq + csel $h1,$h1,$d1,eq + +#ifdef __AARCH64EB__ + ror $t0,$t0,#32 // flip nonce words + ror $t1,$t1,#32 +#endif + adds $h0,$h0,$t0 // accumulate nonce + adc $h1,$h1,$t1 +#ifdef __AARCH64EB__ + rev $h0,$h0 // flip output bytes + rev $h1,$h1 +#endif + stp $h0,$h1,[$mac] // write result + + ret +.size poly1305_emit,.-poly1305_emit +___ +my ($R0,$R1,$S1,$R2,$S2,$R3,$S3,$R4,$S4) = map("v$_.4s",(0..8)); +my ($IN01_0,$IN01_1,$IN01_2,$IN01_3,$IN01_4) = map("v$_.2s",(9..13)); +my ($IN23_0,$IN23_1,$IN23_2,$IN23_3,$IN23_4) = map("v$_.2s",(14..18)); +my ($ACC0,$ACC1,$ACC2,$ACC3,$ACC4) = map("v$_.2d",(19..23)); +my ($H0,$H1,$H2,$H3,$H4) = map("v$_.2s",(24..28)); +my ($T0,$T1,$MASK) = map("v$_",(29..31)); + +my ($in2,$zeros)=("x16","x17"); +my $is_base2_26 = $zeros; # borrow + +$code.=<<___; +.type poly1305_mult,%function +.align 5 +poly1305_mult: + mul $d0,$h0,$r0 // h0*r0 + umulh $d1,$h0,$r0 + + mul $t0,$h1,$s1 // h1*5*r1 + umulh $t1,$h1,$s1 + + adds $d0,$d0,$t0 + mul $t0,$h0,$r1 // h0*r1 + adc $d1,$d1,$t1 + umulh $d2,$h0,$r1 + + adds $d1,$d1,$t0 + mul $t0,$h1,$r0 // h1*r0 + adc $d2,$d2,xzr + umulh $t1,$h1,$r0 + + adds $d1,$d1,$t0 + mul $t0,$h2,$s1 // h2*5*r1 + adc $d2,$d2,$t1 + mul $t1,$h2,$r0 // h2*r0 + + adds $d1,$d1,$t0 + adc $d2,$d2,$t1 + + and $t0,$d2,#-4 // final reduction + and $h2,$d2,#3 + add $t0,$t0,$d2,lsr#2 + adds $h0,$d0,$t0 + adcs $h1,$d1,xzr + adc $h2,$h2,xzr + + ret +.size poly1305_mult,.-poly1305_mult + +.type poly1305_splat,%function +.align 4 +poly1305_splat: + and x12,$h0,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x13,$h0,#26,#26 + extr x14,$h1,$h0,#52 + and x14,x14,#0x03ffffff + ubfx x15,$h1,#14,#26 + extr x16,$h2,$h1,#40 + + str w12,[$ctx,#16*0] // r0 + add w12,w13,w13,lsl#2 // r1*5 + str w13,[$ctx,#16*1] // r1 + add w13,w14,w14,lsl#2 // r2*5 + str w12,[$ctx,#16*2] // s1 + str w14,[$ctx,#16*3] // r2 + add w14,w15,w15,lsl#2 // r3*5 + str w13,[$ctx,#16*4] // s2 + str w15,[$ctx,#16*5] // r3 + add w15,w16,w16,lsl#2 // r4*5 + str w14,[$ctx,#16*6] // s3 + str w16,[$ctx,#16*7] // r4 + str w15,[$ctx,#16*8] // s4 + + ret +.size poly1305_splat,.-poly1305_splat + +#ifdef __KERNEL__ +.globl poly1305_blocks_neon +#endif +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: +.Lpoly1305_blocks_neon: + ldr $is_base2_26,[$ctx,#24] + cmp $len,#128 + b.lo .Lpoly1305_blocks + + .inst 0xd503233f // paciasp + stp x29,x30,[sp,#-80]! + add x29,sp,#0 + + stp d8,d9,[sp,#16] // meet ABI requirements + stp d10,d11,[sp,#32] + stp d12,d13,[sp,#48] + stp d14,d15,[sp,#64] + + cbz $is_base2_26,.Lbase2_64_neon + + ldp w10,w11,[$ctx] // load hash value base 2^26 + ldp w12,w13,[$ctx,#8] + ldr w14,[$ctx,#16] + + tst $len,#31 + b.eq .Leven_neon + + ldp $r0,$r1,[$ctx,#32] // load key value + + add $h0,x10,x11,lsl#26 // base 2^26 -> base 2^64 + lsr $h1,x12,#12 + adds $h0,$h0,x12,lsl#52 + add $h1,$h1,x13,lsl#14 + adc $h1,$h1,xzr + lsr $h2,x14,#24 + adds $h1,$h1,x14,lsl#40 + adc $d2,$h2,xzr // can be partially reduced... + + ldp $d0,$d1,[$inp],#16 // load input + sub $len,$len,#16 + add $s1,$r1,$r1,lsr#2 // s1 = r1 + (r1 >> 2) + +#ifdef __AARCH64EB__ + rev $d0,$d0 + rev $d1,$d1 +#endif + adds $h0,$h0,$d0 // accumulate input + adcs $h1,$h1,$d1 + adc $h2,$h2,$padbit + + bl poly1305_mult + + and x10,$h0,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,$h0,#26,#26 + extr x12,$h1,$h0,#52 + and x12,x12,#0x03ffffff + ubfx x13,$h1,#14,#26 + extr x14,$h2,$h1,#40 + + b .Leven_neon + +.align 4 +.Lbase2_64_neon: + ldp $r0,$r1,[$ctx,#32] // load key value + + ldp $h0,$h1,[$ctx] // load hash value base 2^64 + ldr $h2,[$ctx,#16] + + tst $len,#31 + b.eq .Linit_neon + + ldp $d0,$d1,[$inp],#16 // load input + sub $len,$len,#16 + add $s1,$r1,$r1,lsr#2 // s1 = r1 + (r1 >> 2) +#ifdef __AARCH64EB__ + rev $d0,$d0 + rev $d1,$d1 +#endif + adds $h0,$h0,$d0 // accumulate input + adcs $h1,$h1,$d1 + adc $h2,$h2,$padbit + + bl poly1305_mult + +.Linit_neon: + ldr w17,[$ctx,#48] // first table element + and x10,$h0,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,$h0,#26,#26 + extr x12,$h1,$h0,#52 + and x12,x12,#0x03ffffff + ubfx x13,$h1,#14,#26 + extr x14,$h2,$h1,#40 + + cmp w17,#-1 // is value impossible? + b.ne .Leven_neon + + fmov ${H0},x10 + fmov ${H1},x11 + fmov ${H2},x12 + fmov ${H3},x13 + fmov ${H4},x14 + + ////////////////////////////////// initialize r^n table + mov $h0,$r0 // r^1 + add $s1,$r1,$r1,lsr#2 // s1 = r1 + (r1 >> 2) + mov $h1,$r1 + mov $h2,xzr + add $ctx,$ctx,#48+12 + bl poly1305_splat + + bl poly1305_mult // r^2 + sub $ctx,$ctx,#4 + bl poly1305_splat + + bl poly1305_mult // r^3 + sub $ctx,$ctx,#4 + bl poly1305_splat + + bl poly1305_mult // r^4 + sub $ctx,$ctx,#4 + bl poly1305_splat + sub $ctx,$ctx,#48 // restore original $ctx + b .Ldo_neon + +.align 4 +.Leven_neon: + fmov ${H0},x10 + fmov ${H1},x11 + fmov ${H2},x12 + fmov ${H3},x13 + fmov ${H4},x14 + +.Ldo_neon: + ldp x8,x12,[$inp,#32] // inp[2:3] + subs $len,$len,#64 + ldp x9,x13,[$inp,#48] + add $in2,$inp,#96 + adr $zeros,.Lzeros + + lsl $padbit,$padbit,#24 + add x15,$ctx,#48 + +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov $IN23_0,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,$padbit,x12,lsr#40 + add x13,$padbit,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov $IN23_1,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + fmov $IN23_2,x8 + fmov $IN23_3,x10 + fmov $IN23_4,x12 + + ldp x8,x12,[$inp],#16 // inp[0:1] + ldp x9,x13,[$inp],#48 + + ld1 {$R0,$R1,$S1,$R2},[x15],#64 + ld1 {$S2,$R3,$S3,$R4},[x15],#64 + ld1 {$S4},[x15] + +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov $IN01_0,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,$padbit,x12,lsr#40 + add x13,$padbit,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov $IN01_1,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + movi $MASK.2d,#-1 + fmov $IN01_2,x8 + fmov $IN01_3,x10 + fmov $IN01_4,x12 + ushr $MASK.2d,$MASK.2d,#38 + + b.ls .Lskip_loop + +.align 4 +.Loop_neon: + //////////////////////////////////////////////////////////////// + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + // \___________________/ + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + // \___________________/ \____________________/ + // + // Note that we start with inp[2:3]*r^2. This is because it + // doesn't depend on reduction in previous iteration. + //////////////////////////////////////////////////////////////// + // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0 + // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4 + // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3 + // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2 + // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1 + + subs $len,$len,#64 + umull $ACC4,$IN23_0,${R4}[2] + csel $in2,$zeros,$in2,lo + umull $ACC3,$IN23_0,${R3}[2] + umull $ACC2,$IN23_0,${R2}[2] + ldp x8,x12,[$in2],#16 // inp[2:3] (or zero) + umull $ACC1,$IN23_0,${R1}[2] + ldp x9,x13,[$in2],#48 + umull $ACC0,$IN23_0,${R0}[2] +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + umlal $ACC4,$IN23_1,${R3}[2] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal $ACC3,$IN23_1,${R2}[2] + and x5,x9,#0x03ffffff + umlal $ACC2,$IN23_1,${R1}[2] + ubfx x6,x8,#26,#26 + umlal $ACC1,$IN23_1,${R0}[2] + ubfx x7,x9,#26,#26 + umlal $ACC0,$IN23_1,${S4}[2] + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + + umlal $ACC4,$IN23_2,${R2}[2] + extr x8,x12,x8,#52 + umlal $ACC3,$IN23_2,${R1}[2] + extr x9,x13,x9,#52 + umlal $ACC2,$IN23_2,${R0}[2] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal $ACC1,$IN23_2,${S4}[2] + fmov $IN23_0,x4 + umlal $ACC0,$IN23_2,${S3}[2] + and x8,x8,#0x03ffffff + + umlal $ACC4,$IN23_3,${R1}[2] + and x9,x9,#0x03ffffff + umlal $ACC3,$IN23_3,${R0}[2] + ubfx x10,x12,#14,#26 + umlal $ACC2,$IN23_3,${S4}[2] + ubfx x11,x13,#14,#26 + umlal $ACC1,$IN23_3,${S3}[2] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal $ACC0,$IN23_3,${S2}[2] + fmov $IN23_1,x6 + + add $IN01_2,$IN01_2,$H2 + add x12,$padbit,x12,lsr#40 + umlal $ACC4,$IN23_4,${R0}[2] + add x13,$padbit,x13,lsr#40 + umlal $ACC3,$IN23_4,${S4}[2] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal $ACC2,$IN23_4,${S3}[2] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal $ACC1,$IN23_4,${S2}[2] + fmov $IN23_2,x8 + umlal $ACC0,$IN23_4,${S1}[2] + fmov $IN23_3,x10 + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4 and accumulate + + add $IN01_0,$IN01_0,$H0 + fmov $IN23_4,x12 + umlal $ACC3,$IN01_2,${R1}[0] + ldp x8,x12,[$inp],#16 // inp[0:1] + umlal $ACC0,$IN01_2,${S3}[0] + ldp x9,x13,[$inp],#48 + umlal $ACC4,$IN01_2,${R2}[0] + umlal $ACC1,$IN01_2,${S4}[0] + umlal $ACC2,$IN01_2,${R0}[0] +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + add $IN01_1,$IN01_1,$H1 + umlal $ACC3,$IN01_0,${R3}[0] + umlal $ACC4,$IN01_0,${R4}[0] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal $ACC2,$IN01_0,${R2}[0] + and x5,x9,#0x03ffffff + umlal $ACC0,$IN01_0,${R0}[0] + ubfx x6,x8,#26,#26 + umlal $ACC1,$IN01_0,${R1}[0] + ubfx x7,x9,#26,#26 + + add $IN01_3,$IN01_3,$H3 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + umlal $ACC3,$IN01_1,${R2}[0] + extr x8,x12,x8,#52 + umlal $ACC4,$IN01_1,${R3}[0] + extr x9,x13,x9,#52 + umlal $ACC0,$IN01_1,${S4}[0] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal $ACC2,$IN01_1,${R1}[0] + fmov $IN01_0,x4 + umlal $ACC1,$IN01_1,${R0}[0] + and x8,x8,#0x03ffffff + + add $IN01_4,$IN01_4,$H4 + and x9,x9,#0x03ffffff + umlal $ACC3,$IN01_3,${R0}[0] + ubfx x10,x12,#14,#26 + umlal $ACC0,$IN01_3,${S2}[0] + ubfx x11,x13,#14,#26 + umlal $ACC4,$IN01_3,${R1}[0] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal $ACC1,$IN01_3,${S3}[0] + fmov $IN01_1,x6 + umlal $ACC2,$IN01_3,${S4}[0] + add x12,$padbit,x12,lsr#40 + + umlal $ACC3,$IN01_4,${S4}[0] + add x13,$padbit,x13,lsr#40 + umlal $ACC0,$IN01_4,${S1}[0] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal $ACC4,$IN01_4,${R0}[0] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal $ACC1,$IN01_4,${S2}[0] + fmov $IN01_2,x8 + umlal $ACC2,$IN01_4,${S3}[0] + fmov $IN01_3,x10 + fmov $IN01_4,x12 + + ///////////////////////////////////////////////////////////////// + // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + // and P. Schwabe + // + // [see discussion in poly1305-armv4 module] + + ushr $T0.2d,$ACC3,#26 + xtn $H3,$ACC3 + ushr $T1.2d,$ACC0,#26 + and $ACC0,$ACC0,$MASK.2d + add $ACC4,$ACC4,$T0.2d // h3 -> h4 + bic $H3,#0xfc,lsl#24 // &=0x03ffffff + add $ACC1,$ACC1,$T1.2d // h0 -> h1 + + ushr $T0.2d,$ACC4,#26 + xtn $H4,$ACC4 + ushr $T1.2d,$ACC1,#26 + xtn $H1,$ACC1 + bic $H4,#0xfc,lsl#24 + add $ACC2,$ACC2,$T1.2d // h1 -> h2 + + add $ACC0,$ACC0,$T0.2d + shl $T0.2d,$T0.2d,#2 + shrn $T1.2s,$ACC2,#26 + xtn $H2,$ACC2 + add $ACC0,$ACC0,$T0.2d // h4 -> h0 + bic $H1,#0xfc,lsl#24 + add $H3,$H3,$T1.2s // h2 -> h3 + bic $H2,#0xfc,lsl#24 + + shrn $T0.2s,$ACC0,#26 + xtn $H0,$ACC0 + ushr $T1.2s,$H3,#26 + bic $H3,#0xfc,lsl#24 + bic $H0,#0xfc,lsl#24 + add $H1,$H1,$T0.2s // h0 -> h1 + add $H4,$H4,$T1.2s // h3 -> h4 + + b.hi .Loop_neon + +.Lskip_loop: + dup $IN23_2,${IN23_2}[0] + add $IN01_2,$IN01_2,$H2 + + //////////////////////////////////////////////////////////////// + // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + adds $len,$len,#32 + b.ne .Long_tail + + dup $IN23_2,${IN01_2}[0] + add $IN23_0,$IN01_0,$H0 + add $IN23_3,$IN01_3,$H3 + add $IN23_1,$IN01_1,$H1 + add $IN23_4,$IN01_4,$H4 + +.Long_tail: + dup $IN23_0,${IN23_0}[0] + umull2 $ACC0,$IN23_2,${S3} + umull2 $ACC3,$IN23_2,${R1} + umull2 $ACC4,$IN23_2,${R2} + umull2 $ACC2,$IN23_2,${R0} + umull2 $ACC1,$IN23_2,${S4} + + dup $IN23_1,${IN23_1}[0] + umlal2 $ACC0,$IN23_0,${R0} + umlal2 $ACC2,$IN23_0,${R2} + umlal2 $ACC3,$IN23_0,${R3} + umlal2 $ACC4,$IN23_0,${R4} + umlal2 $ACC1,$IN23_0,${R1} + + dup $IN23_3,${IN23_3}[0] + umlal2 $ACC0,$IN23_1,${S4} + umlal2 $ACC3,$IN23_1,${R2} + umlal2 $ACC2,$IN23_1,${R1} + umlal2 $ACC4,$IN23_1,${R3} + umlal2 $ACC1,$IN23_1,${R0} + + dup $IN23_4,${IN23_4}[0] + umlal2 $ACC3,$IN23_3,${R0} + umlal2 $ACC4,$IN23_3,${R1} + umlal2 $ACC0,$IN23_3,${S2} + umlal2 $ACC1,$IN23_3,${S3} + umlal2 $ACC2,$IN23_3,${S4} + + umlal2 $ACC3,$IN23_4,${S4} + umlal2 $ACC0,$IN23_4,${S1} + umlal2 $ACC4,$IN23_4,${R0} + umlal2 $ACC1,$IN23_4,${S2} + umlal2 $ACC2,$IN23_4,${S3} + + b.eq .Lshort_tail + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4:r^3 and accumulate + + add $IN01_0,$IN01_0,$H0 + umlal $ACC3,$IN01_2,${R1} + umlal $ACC0,$IN01_2,${S3} + umlal $ACC4,$IN01_2,${R2} + umlal $ACC1,$IN01_2,${S4} + umlal $ACC2,$IN01_2,${R0} + + add $IN01_1,$IN01_1,$H1 + umlal $ACC3,$IN01_0,${R3} + umlal $ACC0,$IN01_0,${R0} + umlal $ACC4,$IN01_0,${R4} + umlal $ACC1,$IN01_0,${R1} + umlal $ACC2,$IN01_0,${R2} + + add $IN01_3,$IN01_3,$H3 + umlal $ACC3,$IN01_1,${R2} + umlal $ACC0,$IN01_1,${S4} + umlal $ACC4,$IN01_1,${R3} + umlal $ACC1,$IN01_1,${R0} + umlal $ACC2,$IN01_1,${R1} + + add $IN01_4,$IN01_4,$H4 + umlal $ACC3,$IN01_3,${R0} + umlal $ACC0,$IN01_3,${S2} + umlal $ACC4,$IN01_3,${R1} + umlal $ACC1,$IN01_3,${S3} + umlal $ACC2,$IN01_3,${S4} + + umlal $ACC3,$IN01_4,${S4} + umlal $ACC0,$IN01_4,${S1} + umlal $ACC4,$IN01_4,${R0} + umlal $ACC1,$IN01_4,${S2} + umlal $ACC2,$IN01_4,${S3} + +.Lshort_tail: + //////////////////////////////////////////////////////////////// + // horizontal add + + addp $ACC3,$ACC3,$ACC3 + ldp d8,d9,[sp,#16] // meet ABI requirements + addp $ACC0,$ACC0,$ACC0 + ldp d10,d11,[sp,#32] + addp $ACC4,$ACC4,$ACC4 + ldp d12,d13,[sp,#48] + addp $ACC1,$ACC1,$ACC1 + ldp d14,d15,[sp,#64] + addp $ACC2,$ACC2,$ACC2 + ldr x30,[sp,#8] + .inst 0xd50323bf // autiasp + + //////////////////////////////////////////////////////////////// + // lazy reduction, but without narrowing + + ushr $T0.2d,$ACC3,#26 + and $ACC3,$ACC3,$MASK.2d + ushr $T1.2d,$ACC0,#26 + and $ACC0,$ACC0,$MASK.2d + + add $ACC4,$ACC4,$T0.2d // h3 -> h4 + add $ACC1,$ACC1,$T1.2d // h0 -> h1 + + ushr $T0.2d,$ACC4,#26 + and $ACC4,$ACC4,$MASK.2d + ushr $T1.2d,$ACC1,#26 + and $ACC1,$ACC1,$MASK.2d + add $ACC2,$ACC2,$T1.2d // h1 -> h2 + + add $ACC0,$ACC0,$T0.2d + shl $T0.2d,$T0.2d,#2 + ushr $T1.2d,$ACC2,#26 + and $ACC2,$ACC2,$MASK.2d + add $ACC0,$ACC0,$T0.2d // h4 -> h0 + add $ACC3,$ACC3,$T1.2d // h2 -> h3 + + ushr $T0.2d,$ACC0,#26 + and $ACC0,$ACC0,$MASK.2d + ushr $T1.2d,$ACC3,#26 + and $ACC3,$ACC3,$MASK.2d + add $ACC1,$ACC1,$T0.2d // h0 -> h1 + add $ACC4,$ACC4,$T1.2d // h3 -> h4 + + //////////////////////////////////////////////////////////////// + // write the result, can be partially reduced + + st4 {$ACC0,$ACC1,$ACC2,$ACC3}[0],[$ctx],#16 + mov x4,#1 + st1 {$ACC4}[0],[$ctx] + str x4,[$ctx,#8] // set is_base2_26 + + ldr x29,[sp],#80 + ret +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0 +.asciz "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm" +.align 2 +#if !defined(__KERNEL__) && !defined(_WIN64) +.comm OPENSSL_armcap_P,4,4 +.hidden OPENSSL_armcap_P +#endif +___ + +foreach (split("\n",$code)) { + s/\b(shrn\s+v[0-9]+)\.[24]d/$1.2s/ or + s/\b(fmov\s+)v([0-9]+)[^,]*,\s*x([0-9]+)/$1d$2,x$3/ or + (m/\bdup\b/ and (s/\.[24]s/.2d/g or 1)) or + (m/\b(eor|and)/ and (s/\.[248][sdh]/.16b/g or 1)) or + (m/\bum(ul|la)l\b/ and (s/\.4s/.2s/g or 1)) or + (m/\bum(ul|la)l2\b/ and (s/\.2s/.4s/g or 1)) or + (m/\bst[1-4]\s+{[^}]+}\[/ and (s/\.[24]d/.s/g or 1)); + + s/\.[124]([sd])\[/.$1\[/; + s/w#x([0-9]+)/w$1/g; + + print $_,"\n"; +} +close STDOUT; diff --git a/arch/arm64/crypto/poly1305-core.S_shipped b/arch/arm64/crypto/poly1305-core.S_shipped new file mode 100644 index 000000000000..8d1c4e420ccd --- /dev/null +++ b/arch/arm64/crypto/poly1305-core.S_shipped @@ -0,0 +1,835 @@ +#ifndef __KERNEL__ +# include "arm_arch.h" +.extern OPENSSL_armcap_P +#endif + +.text + +// forward "declarations" are required for Apple +.globl poly1305_blocks +.globl poly1305_emit + +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: + cmp x1,xzr + stp xzr,xzr,[x0] // zero hash value + stp xzr,xzr,[x0,#16] // [along with is_base2_26] + + csel x0,xzr,x0,eq + b.eq .Lno_key + +#ifndef __KERNEL__ + adrp x17,OPENSSL_armcap_P + ldr w17,[x17,#:lo12:OPENSSL_armcap_P] +#endif + + ldp x7,x8,[x1] // load key + mov x9,#0xfffffffc0fffffff + movk x9,#0x0fff,lsl#48 +#ifdef __AARCH64EB__ + rev x7,x7 // flip bytes + rev x8,x8 +#endif + and x7,x7,x9 // &=0ffffffc0fffffff + and x9,x9,#-4 + and x8,x8,x9 // &=0ffffffc0ffffffc + mov w9,#-1 + stp x7,x8,[x0,#32] // save key value + str w9,[x0,#48] // impossible key power value + +#ifndef __KERNEL__ + tst w17,#ARMV7_NEON + + adr x12,.Lpoly1305_blocks + adr x7,.Lpoly1305_blocks_neon + adr x13,.Lpoly1305_emit + + csel x12,x12,x7,eq + +# ifdef __ILP32__ + stp w12,w13,[x2] +# else + stp x12,x13,[x2] +# endif +#endif + mov x0,#1 +.Lno_key: + ret +.size poly1305_init,.-poly1305_init + +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: +.Lpoly1305_blocks: + ands x2,x2,#-16 + b.eq .Lno_data + + ldp x4,x5,[x0] // load hash value + ldp x6,x17,[x0,#16] // [along with is_base2_26] + ldp x7,x8,[x0,#32] // load key value + +#ifdef __AARCH64EB__ + lsr x12,x4,#32 + mov w13,w4 + lsr x14,x5,#32 + mov w15,w5 + lsr x16,x6,#32 +#else + mov w12,w4 + lsr x13,x4,#32 + mov w14,w5 + lsr x15,x5,#32 + mov w16,w6 +#endif + + add x12,x12,x13,lsl#26 // base 2^26 -> base 2^64 + lsr x13,x14,#12 + adds x12,x12,x14,lsl#52 + add x13,x13,x15,lsl#14 + adc x13,x13,xzr + lsr x14,x16,#24 + adds x13,x13,x16,lsl#40 + adc x14,x14,xzr + + cmp x17,#0 // is_base2_26? + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + csel x4,x4,x12,eq // choose between radixes + csel x5,x5,x13,eq + csel x6,x6,x14,eq + +.Loop: + ldp x10,x11,[x1],#16 // load input + sub x2,x2,#16 +#ifdef __AARCH64EB__ + rev x10,x10 + rev x11,x11 +#endif + adds x4,x4,x10 // accumulate input + adcs x5,x5,x11 + + mul x12,x4,x7 // h0*r0 + adc x6,x6,x3 + umulh x13,x4,x7 + + mul x10,x5,x9 // h1*5*r1 + umulh x11,x5,x9 + + adds x12,x12,x10 + mul x10,x4,x8 // h0*r1 + adc x13,x13,x11 + umulh x14,x4,x8 + + adds x13,x13,x10 + mul x10,x5,x7 // h1*r0 + adc x14,x14,xzr + umulh x11,x5,x7 + + adds x13,x13,x10 + mul x10,x6,x9 // h2*5*r1 + adc x14,x14,x11 + mul x11,x6,x7 // h2*r0 + + adds x13,x13,x10 + adc x14,x14,x11 + + and x10,x14,#-4 // final reduction + and x6,x14,#3 + add x10,x10,x14,lsr#2 + adds x4,x12,x10 + adcs x5,x13,xzr + adc x6,x6,xzr + + cbnz x2,.Loop + + stp x4,x5,[x0] // store hash value + stp x6,xzr,[x0,#16] // [and clear is_base2_26] + +.Lno_data: + ret +.size poly1305_blocks,.-poly1305_blocks + +.type poly1305_emit,%function +.align 5 +poly1305_emit: +.Lpoly1305_emit: + ldp x4,x5,[x0] // load hash base 2^64 + ldp x6,x7,[x0,#16] // [along with is_base2_26] + ldp x10,x11,[x2] // load nonce + +#ifdef __AARCH64EB__ + lsr x12,x4,#32 + mov w13,w4 + lsr x14,x5,#32 + mov w15,w5 + lsr x16,x6,#32 +#else + mov w12,w4 + lsr x13,x4,#32 + mov w14,w5 + lsr x15,x5,#32 + mov w16,w6 +#endif + + add x12,x12,x13,lsl#26 // base 2^26 -> base 2^64 + lsr x13,x14,#12 + adds x12,x12,x14,lsl#52 + add x13,x13,x15,lsl#14 + adc x13,x13,xzr + lsr x14,x16,#24 + adds x13,x13,x16,lsl#40 + adc x14,x14,xzr + + cmp x7,#0 // is_base2_26? + csel x4,x4,x12,eq // choose between radixes + csel x5,x5,x13,eq + csel x6,x6,x14,eq + + adds x12,x4,#5 // compare to modulus + adcs x13,x5,xzr + adc x14,x6,xzr + + tst x14,#-4 // see if it's carried/borrowed + + csel x4,x4,x12,eq + csel x5,x5,x13,eq + +#ifdef __AARCH64EB__ + ror x10,x10,#32 // flip nonce words + ror x11,x11,#32 +#endif + adds x4,x4,x10 // accumulate nonce + adc x5,x5,x11 +#ifdef __AARCH64EB__ + rev x4,x4 // flip output bytes + rev x5,x5 +#endif + stp x4,x5,[x1] // write result + + ret +.size poly1305_emit,.-poly1305_emit +.type poly1305_mult,%function +.align 5 +poly1305_mult: + mul x12,x4,x7 // h0*r0 + umulh x13,x4,x7 + + mul x10,x5,x9 // h1*5*r1 + umulh x11,x5,x9 + + adds x12,x12,x10 + mul x10,x4,x8 // h0*r1 + adc x13,x13,x11 + umulh x14,x4,x8 + + adds x13,x13,x10 + mul x10,x5,x7 // h1*r0 + adc x14,x14,xzr + umulh x11,x5,x7 + + adds x13,x13,x10 + mul x10,x6,x9 // h2*5*r1 + adc x14,x14,x11 + mul x11,x6,x7 // h2*r0 + + adds x13,x13,x10 + adc x14,x14,x11 + + and x10,x14,#-4 // final reduction + and x6,x14,#3 + add x10,x10,x14,lsr#2 + adds x4,x12,x10 + adcs x5,x13,xzr + adc x6,x6,xzr + + ret +.size poly1305_mult,.-poly1305_mult + +.type poly1305_splat,%function +.align 4 +poly1305_splat: + and x12,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x13,x4,#26,#26 + extr x14,x5,x4,#52 + and x14,x14,#0x03ffffff + ubfx x15,x5,#14,#26 + extr x16,x6,x5,#40 + + str w12,[x0,#16*0] // r0 + add w12,w13,w13,lsl#2 // r1*5 + str w13,[x0,#16*1] // r1 + add w13,w14,w14,lsl#2 // r2*5 + str w12,[x0,#16*2] // s1 + str w14,[x0,#16*3] // r2 + add w14,w15,w15,lsl#2 // r3*5 + str w13,[x0,#16*4] // s2 + str w15,[x0,#16*5] // r3 + add w15,w16,w16,lsl#2 // r4*5 + str w14,[x0,#16*6] // s3 + str w16,[x0,#16*7] // r4 + str w15,[x0,#16*8] // s4 + + ret +.size poly1305_splat,.-poly1305_splat + +#ifdef __KERNEL__ +.globl poly1305_blocks_neon +#endif +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: +.Lpoly1305_blocks_neon: + ldr x17,[x0,#24] + cmp x2,#128 + b.lo .Lpoly1305_blocks + + .inst 0xd503233f // paciasp + stp x29,x30,[sp,#-80]! + add x29,sp,#0 + + stp d8,d9,[sp,#16] // meet ABI requirements + stp d10,d11,[sp,#32] + stp d12,d13,[sp,#48] + stp d14,d15,[sp,#64] + + cbz x17,.Lbase2_64_neon + + ldp w10,w11,[x0] // load hash value base 2^26 + ldp w12,w13,[x0,#8] + ldr w14,[x0,#16] + + tst x2,#31 + b.eq .Leven_neon + + ldp x7,x8,[x0,#32] // load key value + + add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64 + lsr x5,x12,#12 + adds x4,x4,x12,lsl#52 + add x5,x5,x13,lsl#14 + adc x5,x5,xzr + lsr x6,x14,#24 + adds x5,x5,x14,lsl#40 + adc x14,x6,xzr // can be partially reduced... + + ldp x12,x13,[x1],#16 // load input + sub x2,x2,#16 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + +#ifdef __AARCH64EB__ + rev x12,x12 + rev x13,x13 +#endif + adds x4,x4,x12 // accumulate input + adcs x5,x5,x13 + adc x6,x6,x3 + + bl poly1305_mult + + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,x4,#26,#26 + extr x12,x5,x4,#52 + and x12,x12,#0x03ffffff + ubfx x13,x5,#14,#26 + extr x14,x6,x5,#40 + + b .Leven_neon + +.align 4 +.Lbase2_64_neon: + ldp x7,x8,[x0,#32] // load key value + + ldp x4,x5,[x0] // load hash value base 2^64 + ldr x6,[x0,#16] + + tst x2,#31 + b.eq .Linit_neon + + ldp x12,x13,[x1],#16 // load input + sub x2,x2,#16 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) +#ifdef __AARCH64EB__ + rev x12,x12 + rev x13,x13 +#endif + adds x4,x4,x12 // accumulate input + adcs x5,x5,x13 + adc x6,x6,x3 + + bl poly1305_mult + +.Linit_neon: + ldr w17,[x0,#48] // first table element + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,x4,#26,#26 + extr x12,x5,x4,#52 + and x12,x12,#0x03ffffff + ubfx x13,x5,#14,#26 + extr x14,x6,x5,#40 + + cmp w17,#-1 // is value impossible? + b.ne .Leven_neon + + fmov d24,x10 + fmov d25,x11 + fmov d26,x12 + fmov d27,x13 + fmov d28,x14 + + ////////////////////////////////// initialize r^n table + mov x4,x7 // r^1 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + mov x5,x8 + mov x6,xzr + add x0,x0,#48+12 + bl poly1305_splat + + bl poly1305_mult // r^2 + sub x0,x0,#4 + bl poly1305_splat + + bl poly1305_mult // r^3 + sub x0,x0,#4 + bl poly1305_splat + + bl poly1305_mult // r^4 + sub x0,x0,#4 + bl poly1305_splat + sub x0,x0,#48 // restore original x0 + b .Ldo_neon + +.align 4 +.Leven_neon: + fmov d24,x10 + fmov d25,x11 + fmov d26,x12 + fmov d27,x13 + fmov d28,x14 + +.Ldo_neon: + ldp x8,x12,[x1,#32] // inp[2:3] + subs x2,x2,#64 + ldp x9,x13,[x1,#48] + add x16,x1,#96 + adr x17,.Lzeros + + lsl x3,x3,#24 + add x15,x0,#48 + +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov d14,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,x3,x12,lsr#40 + add x13,x3,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov d15,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + fmov d16,x8 + fmov d17,x10 + fmov d18,x12 + + ldp x8,x12,[x1],#16 // inp[0:1] + ldp x9,x13,[x1],#48 + + ld1 {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64 + ld1 {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64 + ld1 {v8.4s},[x15] + +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov d9,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,x3,x12,lsr#40 + add x13,x3,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov d10,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + movi v31.2d,#-1 + fmov d11,x8 + fmov d12,x10 + fmov d13,x12 + ushr v31.2d,v31.2d,#38 + + b.ls .Lskip_loop + +.align 4 +.Loop_neon: + //////////////////////////////////////////////////////////////// + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + // ___________________/ + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + // ___________________/ ____________________/ + // + // Note that we start with inp[2:3]*r^2. This is because it + // doesn't depend on reduction in previous iteration. + //////////////////////////////////////////////////////////////// + // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0 + // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4 + // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3 + // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2 + // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1 + + subs x2,x2,#64 + umull v23.2d,v14.2s,v7.s[2] + csel x16,x17,x16,lo + umull v22.2d,v14.2s,v5.s[2] + umull v21.2d,v14.2s,v3.s[2] + ldp x8,x12,[x16],#16 // inp[2:3] (or zero) + umull v20.2d,v14.2s,v1.s[2] + ldp x9,x13,[x16],#48 + umull v19.2d,v14.2s,v0.s[2] +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + umlal v23.2d,v15.2s,v5.s[2] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal v22.2d,v15.2s,v3.s[2] + and x5,x9,#0x03ffffff + umlal v21.2d,v15.2s,v1.s[2] + ubfx x6,x8,#26,#26 + umlal v20.2d,v15.2s,v0.s[2] + ubfx x7,x9,#26,#26 + umlal v19.2d,v15.2s,v8.s[2] + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + + umlal v23.2d,v16.2s,v3.s[2] + extr x8,x12,x8,#52 + umlal v22.2d,v16.2s,v1.s[2] + extr x9,x13,x9,#52 + umlal v21.2d,v16.2s,v0.s[2] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal v20.2d,v16.2s,v8.s[2] + fmov d14,x4 + umlal v19.2d,v16.2s,v6.s[2] + and x8,x8,#0x03ffffff + + umlal v23.2d,v17.2s,v1.s[2] + and x9,x9,#0x03ffffff + umlal v22.2d,v17.2s,v0.s[2] + ubfx x10,x12,#14,#26 + umlal v21.2d,v17.2s,v8.s[2] + ubfx x11,x13,#14,#26 + umlal v20.2d,v17.2s,v6.s[2] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal v19.2d,v17.2s,v4.s[2] + fmov d15,x6 + + add v11.2s,v11.2s,v26.2s + add x12,x3,x12,lsr#40 + umlal v23.2d,v18.2s,v0.s[2] + add x13,x3,x13,lsr#40 + umlal v22.2d,v18.2s,v8.s[2] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal v21.2d,v18.2s,v6.s[2] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal v20.2d,v18.2s,v4.s[2] + fmov d16,x8 + umlal v19.2d,v18.2s,v2.s[2] + fmov d17,x10 + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4 and accumulate + + add v9.2s,v9.2s,v24.2s + fmov d18,x12 + umlal v22.2d,v11.2s,v1.s[0] + ldp x8,x12,[x1],#16 // inp[0:1] + umlal v19.2d,v11.2s,v6.s[0] + ldp x9,x13,[x1],#48 + umlal v23.2d,v11.2s,v3.s[0] + umlal v20.2d,v11.2s,v8.s[0] + umlal v21.2d,v11.2s,v0.s[0] +#ifdef __AARCH64EB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + add v10.2s,v10.2s,v25.2s + umlal v22.2d,v9.2s,v5.s[0] + umlal v23.2d,v9.2s,v7.s[0] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal v21.2d,v9.2s,v3.s[0] + and x5,x9,#0x03ffffff + umlal v19.2d,v9.2s,v0.s[0] + ubfx x6,x8,#26,#26 + umlal v20.2d,v9.2s,v1.s[0] + ubfx x7,x9,#26,#26 + + add v12.2s,v12.2s,v27.2s + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + umlal v22.2d,v10.2s,v3.s[0] + extr x8,x12,x8,#52 + umlal v23.2d,v10.2s,v5.s[0] + extr x9,x13,x9,#52 + umlal v19.2d,v10.2s,v8.s[0] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal v21.2d,v10.2s,v1.s[0] + fmov d9,x4 + umlal v20.2d,v10.2s,v0.s[0] + and x8,x8,#0x03ffffff + + add v13.2s,v13.2s,v28.2s + and x9,x9,#0x03ffffff + umlal v22.2d,v12.2s,v0.s[0] + ubfx x10,x12,#14,#26 + umlal v19.2d,v12.2s,v4.s[0] + ubfx x11,x13,#14,#26 + umlal v23.2d,v12.2s,v1.s[0] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal v20.2d,v12.2s,v6.s[0] + fmov d10,x6 + umlal v21.2d,v12.2s,v8.s[0] + add x12,x3,x12,lsr#40 + + umlal v22.2d,v13.2s,v8.s[0] + add x13,x3,x13,lsr#40 + umlal v19.2d,v13.2s,v2.s[0] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal v23.2d,v13.2s,v0.s[0] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal v20.2d,v13.2s,v4.s[0] + fmov d11,x8 + umlal v21.2d,v13.2s,v6.s[0] + fmov d12,x10 + fmov d13,x12 + + ///////////////////////////////////////////////////////////////// + // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + // and P. Schwabe + // + // [see discussion in poly1305-armv4 module] + + ushr v29.2d,v22.2d,#26 + xtn v27.2s,v22.2d + ushr v30.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + add v23.2d,v23.2d,v29.2d // h3 -> h4 + bic v27.2s,#0xfc,lsl#24 // &=0x03ffffff + add v20.2d,v20.2d,v30.2d // h0 -> h1 + + ushr v29.2d,v23.2d,#26 + xtn v28.2s,v23.2d + ushr v30.2d,v20.2d,#26 + xtn v25.2s,v20.2d + bic v28.2s,#0xfc,lsl#24 + add v21.2d,v21.2d,v30.2d // h1 -> h2 + + add v19.2d,v19.2d,v29.2d + shl v29.2d,v29.2d,#2 + shrn v30.2s,v21.2d,#26 + xtn v26.2s,v21.2d + add v19.2d,v19.2d,v29.2d // h4 -> h0 + bic v25.2s,#0xfc,lsl#24 + add v27.2s,v27.2s,v30.2s // h2 -> h3 + bic v26.2s,#0xfc,lsl#24 + + shrn v29.2s,v19.2d,#26 + xtn v24.2s,v19.2d + ushr v30.2s,v27.2s,#26 + bic v27.2s,#0xfc,lsl#24 + bic v24.2s,#0xfc,lsl#24 + add v25.2s,v25.2s,v29.2s // h0 -> h1 + add v28.2s,v28.2s,v30.2s // h3 -> h4 + + b.hi .Loop_neon + +.Lskip_loop: + dup v16.2d,v16.d[0] + add v11.2s,v11.2s,v26.2s + + //////////////////////////////////////////////////////////////// + // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + adds x2,x2,#32 + b.ne .Long_tail + + dup v16.2d,v11.d[0] + add v14.2s,v9.2s,v24.2s + add v17.2s,v12.2s,v27.2s + add v15.2s,v10.2s,v25.2s + add v18.2s,v13.2s,v28.2s + +.Long_tail: + dup v14.2d,v14.d[0] + umull2 v19.2d,v16.4s,v6.4s + umull2 v22.2d,v16.4s,v1.4s + umull2 v23.2d,v16.4s,v3.4s + umull2 v21.2d,v16.4s,v0.4s + umull2 v20.2d,v16.4s,v8.4s + + dup v15.2d,v15.d[0] + umlal2 v19.2d,v14.4s,v0.4s + umlal2 v21.2d,v14.4s,v3.4s + umlal2 v22.2d,v14.4s,v5.4s + umlal2 v23.2d,v14.4s,v7.4s + umlal2 v20.2d,v14.4s,v1.4s + + dup v17.2d,v17.d[0] + umlal2 v19.2d,v15.4s,v8.4s + umlal2 v22.2d,v15.4s,v3.4s + umlal2 v21.2d,v15.4s,v1.4s + umlal2 v23.2d,v15.4s,v5.4s + umlal2 v20.2d,v15.4s,v0.4s + + dup v18.2d,v18.d[0] + umlal2 v22.2d,v17.4s,v0.4s + umlal2 v23.2d,v17.4s,v1.4s + umlal2 v19.2d,v17.4s,v4.4s + umlal2 v20.2d,v17.4s,v6.4s + umlal2 v21.2d,v17.4s,v8.4s + + umlal2 v22.2d,v18.4s,v8.4s + umlal2 v19.2d,v18.4s,v2.4s + umlal2 v23.2d,v18.4s,v0.4s + umlal2 v20.2d,v18.4s,v4.4s + umlal2 v21.2d,v18.4s,v6.4s + + b.eq .Lshort_tail + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4:r^3 and accumulate + + add v9.2s,v9.2s,v24.2s + umlal v22.2d,v11.2s,v1.2s + umlal v19.2d,v11.2s,v6.2s + umlal v23.2d,v11.2s,v3.2s + umlal v20.2d,v11.2s,v8.2s + umlal v21.2d,v11.2s,v0.2s + + add v10.2s,v10.2s,v25.2s + umlal v22.2d,v9.2s,v5.2s + umlal v19.2d,v9.2s,v0.2s + umlal v23.2d,v9.2s,v7.2s + umlal v20.2d,v9.2s,v1.2s + umlal v21.2d,v9.2s,v3.2s + + add v12.2s,v12.2s,v27.2s + umlal v22.2d,v10.2s,v3.2s + umlal v19.2d,v10.2s,v8.2s + umlal v23.2d,v10.2s,v5.2s + umlal v20.2d,v10.2s,v0.2s + umlal v21.2d,v10.2s,v1.2s + + add v13.2s,v13.2s,v28.2s + umlal v22.2d,v12.2s,v0.2s + umlal v19.2d,v12.2s,v4.2s + umlal v23.2d,v12.2s,v1.2s + umlal v20.2d,v12.2s,v6.2s + umlal v21.2d,v12.2s,v8.2s + + umlal v22.2d,v13.2s,v8.2s + umlal v19.2d,v13.2s,v2.2s + umlal v23.2d,v13.2s,v0.2s + umlal v20.2d,v13.2s,v4.2s + umlal v21.2d,v13.2s,v6.2s + +.Lshort_tail: + //////////////////////////////////////////////////////////////// + // horizontal add + + addp v22.2d,v22.2d,v22.2d + ldp d8,d9,[sp,#16] // meet ABI requirements + addp v19.2d,v19.2d,v19.2d + ldp d10,d11,[sp,#32] + addp v23.2d,v23.2d,v23.2d + ldp d12,d13,[sp,#48] + addp v20.2d,v20.2d,v20.2d + ldp d14,d15,[sp,#64] + addp v21.2d,v21.2d,v21.2d + ldr x30,[sp,#8] + .inst 0xd50323bf // autiasp + + //////////////////////////////////////////////////////////////// + // lazy reduction, but without narrowing + + ushr v29.2d,v22.2d,#26 + and v22.16b,v22.16b,v31.16b + ushr v30.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + + add v23.2d,v23.2d,v29.2d // h3 -> h4 + add v20.2d,v20.2d,v30.2d // h0 -> h1 + + ushr v29.2d,v23.2d,#26 + and v23.16b,v23.16b,v31.16b + ushr v30.2d,v20.2d,#26 + and v20.16b,v20.16b,v31.16b + add v21.2d,v21.2d,v30.2d // h1 -> h2 + + add v19.2d,v19.2d,v29.2d + shl v29.2d,v29.2d,#2 + ushr v30.2d,v21.2d,#26 + and v21.16b,v21.16b,v31.16b + add v19.2d,v19.2d,v29.2d // h4 -> h0 + add v22.2d,v22.2d,v30.2d // h2 -> h3 + + ushr v29.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + ushr v30.2d,v22.2d,#26 + and v22.16b,v22.16b,v31.16b + add v20.2d,v20.2d,v29.2d // h0 -> h1 + add v23.2d,v23.2d,v30.2d // h3 -> h4 + + //////////////////////////////////////////////////////////////// + // write the result, can be partially reduced + + st4 {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16 + mov x4,#1 + st1 {v23.s}[0],[x0] + str x4,[x0,#8] // set is_base2_26 + + ldr x29,[sp],#80 + ret +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0 +.asciz "Poly1305 for ARMv8, CRYPTOGAMS by @dot-asm" +.align 2 +#if !defined(__KERNEL__) && !defined(_WIN64) +.comm OPENSSL_armcap_P,4,4 +.hidden OPENSSL_armcap_P +#endif diff --git a/arch/arm64/crypto/poly1305-glue.c b/arch/arm64/crypto/poly1305-glue.c new file mode 100644 index 000000000000..cbf3eaca487e --- /dev/null +++ b/arch/arm64/crypto/poly1305-glue.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * OpenSSL/Cryptogams accelerated Poly1305 transform for arm64 + * + * Copyright (C) 2019 Linaro Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define POLY1305_BLOCK_SIZE 16 +#define POLY1305_DIGEST_SIZE 16 + +struct neon_poly1305_ctx { + /* the state owned by the accelerated code */ + u64 state[24]; + /* finalize key */ + u32 s[4]; + /* partial buffer */ + u8 buf[POLY1305_BLOCK_SIZE]; + /* bytes used in partial buffer */ + unsigned int buflen; + /* r key has been set */ + bool rset; + /* s key has been set */ + bool sset; +}; + +asmlinkage void poly1305_init(u64 *state, const u8 *key); +asmlinkage void poly1305_blocks(u64 *state, const u8 *src, u32 len, u32 hibit); +asmlinkage void poly1305_blocks_neon(u64 *state, const u8 *src, u32 len, u32 hibit); +asmlinkage void poly1305_emit(u64 state[], __le32 *digest, const u32 *nonce); + +static int neon_poly1305_init(struct shash_desc *desc) +{ + struct neon_poly1305_ctx *dctx = shash_desc_ctx(desc); + + dctx->buflen = 0; + dctx->rset = false; + dctx->sset = false; + + return 0; +} + +static void neon_poly1305_blocks(struct neon_poly1305_ctx *dctx, const u8 *src, + u32 len, u32 hibit, bool do_neon) +{ + if (unlikely(!dctx->sset)) { + if (!dctx->rset) { + poly1305_init(dctx->state, src); + src += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + dctx->rset = true; + } + if (len >= POLY1305_BLOCK_SIZE) { + dctx->s[0] = get_unaligned_le32(src + 0); + dctx->s[1] = get_unaligned_le32(src + 4); + dctx->s[2] = get_unaligned_le32(src + 8); + dctx->s[3] = get_unaligned_le32(src + 12); + src += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + dctx->sset = true; + } + if (len < POLY1305_BLOCK_SIZE) + return; + } + + len &= ~(POLY1305_BLOCK_SIZE - 1); + + if (likely(do_neon)) + poly1305_blocks_neon(dctx->state, src, len, hibit); + else + poly1305_blocks(dctx->state, src, len, hibit); +} + +static void neon_poly1305_do_update(struct neon_poly1305_ctx *dctx, + const u8 *src, u32 len, bool do_neon) +{ + if (unlikely(dctx->buflen)) { + u32 bytes = min(len, POLY1305_BLOCK_SIZE - dctx->buflen); + + memcpy(dctx->buf + dctx->buflen, src, bytes); + src += bytes; + len -= bytes; + dctx->buflen += bytes; + + if (dctx->buflen == POLY1305_BLOCK_SIZE) { + neon_poly1305_blocks(dctx, dctx->buf, + POLY1305_BLOCK_SIZE, 1, false); + dctx->buflen = 0; + } + } + + if (likely(len >= POLY1305_BLOCK_SIZE)) { + neon_poly1305_blocks(dctx, src, len, 1, do_neon); + src += round_down(len, POLY1305_BLOCK_SIZE); + len %= POLY1305_BLOCK_SIZE; + } + + if (unlikely(len)) { + dctx->buflen = len; + memcpy(dctx->buf, src, len); + } +} + +static int neon_poly1305_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + bool do_neon = crypto_simd_usable() && srclen > 128; + struct neon_poly1305_ctx *dctx = shash_desc_ctx(desc); + + if (do_neon) + kernel_neon_begin(); + neon_poly1305_do_update(dctx, src, srclen, do_neon); + if (do_neon) + kernel_neon_end(); + return 0; +} + +static int neon_poly1305_update_from_sg(struct shash_desc *desc, + struct scatterlist *sg, + unsigned int srclen, int flags) +{ + bool do_neon = crypto_simd_usable() && srclen > 128; + struct neon_poly1305_ctx *dctx = shash_desc_ctx(desc); + struct crypto_hash_walk walk; + int nbytes; + + if (do_neon) { + kernel_neon_begin(); + flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + } + + for (nbytes = crypto_shash_walk_sg(desc, sg, srclen, &walk, flags); + nbytes > 0; + nbytes = crypto_hash_walk_done(&walk, 0)) + neon_poly1305_do_update(dctx, walk.data, nbytes, do_neon); + + if (do_neon) + kernel_neon_end(); + + return 0; +} + +static int neon_poly1305_final(struct shash_desc *desc, u8 *dst) +{ + struct neon_poly1305_ctx *dctx = shash_desc_ctx(desc); + __le32 digest[4]; + u64 f = 0; + + if (unlikely(!dctx->sset)) + return -ENOKEY; + + if (unlikely(dctx->buflen)) { + dctx->buf[dctx->buflen++] = 1; + memset(dctx->buf + dctx->buflen, 0, + POLY1305_BLOCK_SIZE - dctx->buflen); + poly1305_blocks(dctx->state, dctx->buf, POLY1305_BLOCK_SIZE, 0); + } + + poly1305_emit(dctx->state, digest, dctx->s); + + /* mac = (h + s) % (2^128) */ + f = (f >> 32) + le32_to_cpu(digest[0]); + put_unaligned_le32(f, dst); + f = (f >> 32) + le32_to_cpu(digest[1]); + put_unaligned_le32(f, dst + 4); + f = (f >> 32) + le32_to_cpu(digest[2]); + put_unaligned_le32(f, dst + 8); + f = (f >> 32) + le32_to_cpu(digest[3]); + put_unaligned_le32(f, dst + 12); + + return 0; +} + +static struct shash_alg neon_poly1305_alg = { + .init = neon_poly1305_init, + .update = neon_poly1305_update, + .update_from_sg = neon_poly1305_update_from_sg, + .final = neon_poly1305_final, + .digestsize = POLY1305_DIGEST_SIZE, + .descsize = sizeof(struct neon_poly1305_ctx), + + .base.cra_name = "poly1305", + .base.cra_driver_name = "poly1305-neon", + .base.cra_priority = 200, + .base.cra_blocksize = POLY1305_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static int __init neon_poly1305_mod_init(void) +{ + if (!cpu_have_named_feature(ASIMD)) + return -ENODEV; + return crypto_register_shash(&neon_poly1305_alg); +} + +static void __exit neon_poly1305_mod_exit(void) +{ + crypto_unregister_shash(&neon_poly1305_alg); +} + +module_init(neon_poly1305_mod_init); +module_exit(neon_poly1305_mod_exit); + +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("poly1305"); +MODULE_ALIAS_CRYPTO("poly1305-neon"); From patchwork Wed Sep 25 16:12:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161067 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B89E924 for ; Wed, 25 Sep 2019 16:16:16 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7644620665 for ; Wed, 25 Sep 2019 16:16:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="JFmqX9vT"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="Kk76jONf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7644620665 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=JcgHSvq621QvC0oM60pjdatfQ7XK0Op9B94nsdcJ11k=; b=JFmqX9vTWGs9pC X2ZrmZ2xagUGCCUnfTmMuXndV34R3Ac3q2bZFQALZIbFzqP7EnUwQc2P68n9hg4FXSq1Tmw3U1zLg WG7Vmzk70RADUA3i+rlCXun79F2EQoA6bkIqoAVTyUp0ZBdIKxQwWHMJSWWJo+c/F4BGlP6aRMugU XrE8pbmmTdusfSFKqzjq/vT9KEu6eU9jG2lS56W/Mj/yTYZkHzF313+WKVmmrMJqy6xK5RkMxtobs EL0NRumEgKpZm275Oc0CiJEb1v8EpcSucvU5v+kfnJhhZsJeUAHya+WegjIhF6x682ppH9ekAeF5I 3mGXBMxzOpESuCpnHhkA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9xf-0006Zd-M7; Wed, 25 Sep 2019 16:16:15 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vP-0003tq-HZ for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:02 +0000 Received: by mail-wm1-x344.google.com with SMTP id r17so4563266wme.0 for ; Wed, 25 Sep 2019 09:13:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PGDTNzIZ6y6016+pkZuy7eopyTkRy+b3+Rm6GuFwl+o=; b=Kk76jONfB7bYtNj5FLDtcUv3/0s91W51rv+Ls26I7fkC/vvPK+9mdMx9Jg8Gdn0Tcn VmGbHLqGOgcc6adUbG6WtJTOvrL7VcNdk4v8qXMawr+zsMV+rl2/H6D65p2AQACNgVJm 96eKIYWp95lJSTAh1bUeI+do/goY0hFe+CieySDxUSRiduz3B2wlctEOXfsW/5XVnTp9 kH4PC/QifwyhXM92pdY2lJB5BxiT113GYS5ygPyowWDRvD0YYxODbbRovbZ0uN+hJEaS HDXKvO4yCCgMNSKxhUgbdS1Xi/YdjCExPj56j7m9DwoP60fme5iwmF+DEY1GnXx5NgLh ALpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PGDTNzIZ6y6016+pkZuy7eopyTkRy+b3+Rm6GuFwl+o=; b=W5NWJc8uKeJjYlEBBKkfn7xWVoJYp4/SsiE7gidoRLZiCzSuFdeOgoBcmeS2KL7GNe jxq7Z655BdGgu/97OXlef5JaDFnyRFozswn09NdMVQOPsNpxFcv2yOZM7S+Zm5fpo/Il g9nFSajqP9xytcv2vknHqQmF8QZ7sy2f7q99QCRC+LSsby5dhuTxl9cl++dGrtBucb6s 6sWnuT+yHhm7Si5T6uPcSCTqv6LxLO+sCwGao3ZW9Tmcg7VK5CPNJWCAG/Erxs78figo k+ZeQ6K92g/nhdSYBv5zXikQBhDSLmw/zoguQwldpU7gnr5VtLE6TQLoxkZ27jOH15IT ZgBA== X-Gm-Message-State: APjAAAV5nW4lTz/FsziFVtruOP7eSKVPxQOLO4le0TrbBzOpQOuhpk4j YrLUFlCpyDYuaFvgGE2FboS3ww== X-Google-Smtp-Source: APXvYqwQoMwvrb5AFrXI7RvbmhhqvWPNR93S1WiOoTVJ3TGzKCNodQ0gbSq7O7lLYwggpA5BuvkJIA== X-Received: by 2002:a7b:cb8b:: with SMTP id m11mr3357719wmi.145.1569428033925; Wed, 25 Sep 2019 09:13:53 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:52 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 05/18] crypto: chacha - move existing library code into lib/crypto Date: Wed, 25 Sep 2019 18:12:42 +0200 Message-Id: <20190925161255.1871-6-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091355_769245_ABE8FD46 X-CRM114-Status: GOOD ( 18.57 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:344 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Move the existing shared ChaCha code into lib/crypto, and at the same time, split the support header into a public version, and an internal version that is only intended for consumption by crypto implementations. While at it, tidy up lib/crypto/Makefile a bit so we are ready for some new arrivals. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/chacha-neon-glue.c | 2 +- arch/arm64/crypto/chacha-neon-glue.c | 2 +- arch/x86/crypto/chacha_glue.c | 2 +- crypto/chacha_generic.c | 42 ++------------------ include/crypto/chacha.h | 37 ++++++++++------- include/crypto/internal/chacha.h | 25 ++++++++++++ lib/Makefile | 3 +- lib/crypto/Makefile | 19 +++++---- lib/{ => crypto}/chacha.c | 23 +++++++++++ 9 files changed, 89 insertions(+), 66 deletions(-) diff --git a/arch/arm/crypto/chacha-neon-glue.c b/arch/arm/crypto/chacha-neon-glue.c index a8e9b534c8da..26576772f18b 100644 --- a/arch/arm/crypto/chacha-neon-glue.c +++ b/arch/arm/crypto/chacha-neon-glue.c @@ -20,7 +20,7 @@ */ #include -#include +#include #include #include #include diff --git a/arch/arm64/crypto/chacha-neon-glue.c b/arch/arm64/crypto/chacha-neon-glue.c index 1495d2b18518..d4cc61bfe79d 100644 --- a/arch/arm64/crypto/chacha-neon-glue.c +++ b/arch/arm64/crypto/chacha-neon-glue.c @@ -20,7 +20,7 @@ */ #include -#include +#include #include #include #include diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c index 388f95a4ec24..bc62daa8dafd 100644 --- a/arch/x86/crypto/chacha_glue.c +++ b/arch/x86/crypto/chacha_glue.c @@ -7,7 +7,7 @@ */ #include -#include +#include #include #include #include diff --git a/crypto/chacha_generic.c b/crypto/chacha_generic.c index 085d8d219987..0a0847eacfc8 100644 --- a/crypto/chacha_generic.c +++ b/crypto/chacha_generic.c @@ -8,29 +8,10 @@ #include #include -#include +#include #include #include -static void chacha_docrypt(u32 *state, u8 *dst, const u8 *src, - unsigned int bytes, int nrounds) -{ - /* aligned to potentially speed up crypto_xor() */ - u8 stream[CHACHA_BLOCK_SIZE] __aligned(sizeof(long)); - - while (bytes >= CHACHA_BLOCK_SIZE) { - chacha_block(state, stream, nrounds); - crypto_xor_cpy(dst, src, stream, CHACHA_BLOCK_SIZE); - bytes -= CHACHA_BLOCK_SIZE; - dst += CHACHA_BLOCK_SIZE; - src += CHACHA_BLOCK_SIZE; - } - if (bytes) { - chacha_block(state, stream, nrounds); - crypto_xor_cpy(dst, src, stream, bytes); - } -} - static int chacha_stream_xor(struct skcipher_request *req, const struct chacha_ctx *ctx, const u8 *iv) { @@ -48,8 +29,8 @@ static int chacha_stream_xor(struct skcipher_request *req, if (nbytes < walk.total) nbytes = round_down(nbytes, CHACHA_BLOCK_SIZE); - chacha_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, - nbytes, ctx->nrounds); + chacha_crypt(state, walk.dst.virt.addr, walk.src.virt.addr, + nbytes, ctx->nrounds); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } @@ -58,22 +39,7 @@ static int chacha_stream_xor(struct skcipher_request *req, void crypto_chacha_init(u32 *state, const struct chacha_ctx *ctx, const u8 *iv) { - state[0] = 0x61707865; /* "expa" */ - state[1] = 0x3320646e; /* "nd 3" */ - state[2] = 0x79622d32; /* "2-by" */ - state[3] = 0x6b206574; /* "te k" */ - state[4] = ctx->key[0]; - state[5] = ctx->key[1]; - state[6] = ctx->key[2]; - state[7] = ctx->key[3]; - state[8] = ctx->key[4]; - state[9] = ctx->key[5]; - state[10] = ctx->key[6]; - state[11] = ctx->key[7]; - state[12] = get_unaligned_le32(iv + 0); - state[13] = get_unaligned_le32(iv + 4); - state[14] = get_unaligned_le32(iv + 8); - state[15] = get_unaligned_le32(iv + 12); + chacha_init(state, ctx->key, iv); } EXPORT_SYMBOL_GPL(crypto_chacha_init); diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h index d1e723c6a37d..b2aaf711f2e7 100644 --- a/include/crypto/chacha.h +++ b/include/crypto/chacha.h @@ -15,9 +15,8 @@ #ifndef _CRYPTO_CHACHA_H #define _CRYPTO_CHACHA_H -#include +#include #include -#include /* 32-bit stream position, then 96-bit nonce (RFC7539 convention) */ #define CHACHA_IV_SIZE 16 @@ -29,11 +28,6 @@ /* 192-bit nonce, then 64-bit stream position */ #define XCHACHA_IV_SIZE 32 -struct chacha_ctx { - u32 key[8]; - int nrounds; -}; - void chacha_block(u32 *state, u8 *stream, int nrounds); static inline void chacha20_block(u32 *state, u8 *stream) { @@ -41,14 +35,27 @@ static inline void chacha20_block(u32 *state, u8 *stream) } void hchacha_block(const u32 *in, u32 *out, int nrounds); -void crypto_chacha_init(u32 *state, const struct chacha_ctx *ctx, const u8 *iv); - -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, - unsigned int keysize); -int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key, - unsigned int keysize); +static inline void chacha_init(u32 *state, const u32 *key, const u8 *iv) +{ + state[0] = 0x61707865; /* "expa" */ + state[1] = 0x3320646e; /* "nd 3" */ + state[2] = 0x79622d32; /* "2-by" */ + state[3] = 0x6b206574; /* "te k" */ + state[4] = key[0]; + state[5] = key[1]; + state[6] = key[2]; + state[7] = key[3]; + state[8] = key[4]; + state[9] = key[5]; + state[10] = key[6]; + state[11] = key[7]; + state[12] = get_unaligned_le32(iv + 0); + state[13] = get_unaligned_le32(iv + 4); + state[14] = get_unaligned_le32(iv + 8); + state[15] = get_unaligned_le32(iv + 12); +} -int crypto_chacha_crypt(struct skcipher_request *req); -int crypto_xchacha_crypt(struct skcipher_request *req); +void chacha_crypt(u32 *state, u8 *dst, const u8 *src, unsigned int bytes, + int nrounds); #endif /* _CRYPTO_CHACHA_H */ diff --git a/include/crypto/internal/chacha.h b/include/crypto/internal/chacha.h new file mode 100644 index 000000000000..f7ffe0f3fa47 --- /dev/null +++ b/include/crypto/internal/chacha.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _CRYPTO_INTERNAL_CHACHA_H +#define _CRYPTO_INTERNAL_CHACHA_H + +#include +#include +#include + +struct chacha_ctx { + u32 key[8]; + int nrounds; +}; + +void crypto_chacha_init(u32 *state, const struct chacha_ctx *ctx, const u8 *iv); + +int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize); +int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize); + +int crypto_chacha_crypt(struct skcipher_request *req); +int crypto_xchacha_crypt(struct skcipher_request *req); + +#endif /* _CRYPTO_CHACHA_H */ diff --git a/lib/Makefile b/lib/Makefile index 29c02a924973..1436c9608fdb 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -30,8 +30,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o timerqueue.o xarray.o \ - idr.o extable.o \ - sha1.o chacha.o irq_regs.o argv_split.o \ + idr.o extable.o sha1.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ earlycpio.o seq_buf.o siphash.o dec_and_lock.o \ diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index cbe0b6a6450d..24dad058f2ae 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -1,13 +1,16 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_CRYPTO_LIB_AES) += libaes.o -libaes-y := aes.o +# chacha is used by the /dev/random driver which is always builtin +obj-y += chacha.o -obj-$(CONFIG_CRYPTO_LIB_ARC4) += libarc4.o -libarc4-y := arc4.o +obj-$(CONFIG_CRYPTO_LIB_AES) += libaes.o +libaes-y := aes.o -obj-$(CONFIG_CRYPTO_LIB_DES) += libdes.o -libdes-y := des.o +obj-$(CONFIG_CRYPTO_LIB_ARC4) += libarc4.o +libarc4-y := arc4.o -obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o -libsha256-y := sha256.o +obj-$(CONFIG_CRYPTO_LIB_DES) += libdes.o +libdes-y := des.o + +obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o +libsha256-y := sha256.o diff --git a/lib/chacha.c b/lib/crypto/chacha.c similarity index 85% rename from lib/chacha.c rename to lib/crypto/chacha.c index c7c9826564d3..429adac51f66 100644 --- a/lib/chacha.c +++ b/lib/crypto/chacha.c @@ -5,11 +5,14 @@ * Copyright (C) 2015 Martin Willi */ +#include #include #include #include +#include #include #include +#include // for crypto_xor_cpy #include static void chacha_permute(u32 *x, int nrounds) @@ -111,3 +114,23 @@ void hchacha_block(const u32 *in, u32 *out, int nrounds) memcpy(&out[4], &x[12], 16); } EXPORT_SYMBOL(hchacha_block); + +void chacha_crypt(u32 *state, u8 *dst, const u8 *src, unsigned int bytes, + int nrounds) +{ + /* aligned to potentially speed up crypto_xor() */ + u8 stream[CHACHA_BLOCK_SIZE] __aligned(sizeof(long)); + + while (bytes >= CHACHA_BLOCK_SIZE) { + chacha_block(state, stream, nrounds); + crypto_xor_cpy(dst, src, stream, CHACHA_BLOCK_SIZE); + bytes -= CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; + src += CHACHA_BLOCK_SIZE; + } + if (bytes) { + chacha_block(state, stream, nrounds); + crypto_xor_cpy(dst, src, stream, bytes); + } +} +EXPORT_SYMBOL(chacha_crypt); From patchwork Wed Sep 25 16:12:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161069 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5170B924 for ; Wed, 25 Sep 2019 16:16:43 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2A4C321D7E for ; Wed, 25 Sep 2019 16:16:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="L72LEmcJ"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="X8mFmd4y" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A4C321D7E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=2nP6748XjrWS2dyN3uBetkx+lc1aZjBedmUNICfdhQ8=; b=L72LEmcJ2xijRy Eo4gDehC3wae673sbe9nZYreLfnmqsr0LaQML19jMUKstgpFHZ+56hixSA1opCZRYyrRHivhIi+ir cEBLcVB5ZfkvwaMjWzH5kwYvijnA73oTp1HLGMhGu1Kqq8KIKI2RqpA1EC7rx9Z1l/mEzuKA+rthA jm1IJ05KsnIs6s6GeJ93sFPynXfCMnwC6l0ahTjFdW4rsbuKaA/B+Mg/ZPEABSogX3sFcg/Ky5Ndl zldJ8crhx7pmm24IlkXARESpcugawr8QL5q9PiHLL3kjisQ+AAz94IJPW7rfVXIv/ru+6uFY40znz BJst8oszR4Z723LI7Whg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9y6-0006pV-LP; Wed, 25 Sep 2019 16:16:42 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vR-0003vR-0s for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:03 +0000 Received: by mail-wm1-x341.google.com with SMTP id i16so6392926wmd.3 for ; Wed, 25 Sep 2019 09:13:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jexsTmRw4CcMhLW8uoTjvTlHKL6BoxHbV8Ic7banI6Y=; b=X8mFmd4yLVNY4oJiYyoojIfqERqJgfrNbslNSa6qiRdZuu52PPgOipL4/TapfpnoXV Gd4QKCJL4rt9CU5owm2CZPGZ3ZkVz/ALcAokE4W2esJICrSkGlJKLICaUw3eHJHAkXiY SHuUYGDildBgadb8Ql/bcmsjs6IAAjM9PBoF9B3SHtJemrn7zXyssy/5BQVgKdarAt6n 847KhgVDMxkPCoLWrB5m3152Z+RdUF+dCHFnBctXRIeKeMFefvaTfm3rrBq5VJ8/Korj jteoF8IIgLkfWN82uVEinwWmn3buqdplcWFX39HbO59xJMQ+hFaynNyQJ2qLQ0xvkwHq BpSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jexsTmRw4CcMhLW8uoTjvTlHKL6BoxHbV8Ic7banI6Y=; b=XA5472Br7ACHDq5U700o33mvXGNK7nXhLxxi/WRLFIrUg4aQjIOrrGNGegAcuhcgt8 t38E0Is0R3jwYqq8JOB2aFBQL2tvbhzZcSVamSmtaMqp3koRXjDZR0C6gCpPtHCY9MrQ /a42gL9kIN6i4a8NdHjyAT0WEmrEJnef2GYy8nm4pJiVswX+buf5Ec79KECl3NnId6LX 2WYg6y8JGcVgFKn/JBji9jTIUiSsCFqXb1kl4aVxii90Kswi//SibYxdyQgpe455J3Pc Ni2QDMapxD/khXdmuxy8ok1RwJ0nMF+6b2LyJxP12JwaSPPyE4ZLfAqkEUdJjgWkQPCD HtUQ== X-Gm-Message-State: APjAAAVyabXRuNBzV1bWsmo8K5aWFpRdq0fnYrevybRq2n9Pf2fgOE9l EcepTAQ44lBffjxTFzT2u0do3w== X-Google-Smtp-Source: APXvYqyAibRAkTx8Cjw4OYEuyloTPcD0FvaxY4tAD/Tft/xEGSQHog13ymV800nTEmF+zF8thWteFg== X-Received: by 2002:a1c:3182:: with SMTP id x124mr9125694wmx.168.1569428035523; Wed, 25 Sep 2019 09:13:55 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:54 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 06/18] crypto: rfc7539 - switch to shash for Poly1305 Date: Wed, 25 Sep 2019 18:12:43 +0200 Message-Id: <20190925161255.1871-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091357_586028_3F50E123 X-CRM114-Status: GOOD ( 20.17 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:341 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org The RFC7539 template uses an ahash Poly1305 transformation to implement the authentication part of the algorithm. Since ahashes operate on scatterlists only, this forces the RFC7539 driver to allocate scratch buffers in the request context, to ensure that they are allocated from the heap. However, in practice, all Poly1305 implementations available today are shashes wrapped by the generic ahash->shash wrapper, which means we are jumping through hoops unnecessarily, especially considering that the way this driver invokes the ahash (6 consecutive requests for the key, associated data, padding, ciphertext, more padding and the tail) will make it very difficult for a true asynchronous implementation to ever materialize that can operate efficiently in this context. So now that shashes can operate on scatterlists as well as virtually mapped buffers, switch the RFC7539 template from ahash to shash for the Poly1305 transformation. At the same time, switch to using the ChaCha library to generate the Poly1305 key so that we don't have to call into the [potentially asynchronous] skcipher twice, with one call only operating on 32 bytes of data. Signed-off-by: Ard Biesheuvel --- crypto/chacha20poly1305.c | 513 ++++++-------------- 1 file changed, 145 insertions(+), 368 deletions(-) diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c index 74e824e537e6..71496a8107f5 100644 --- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -20,53 +20,32 @@ struct chachapoly_instance_ctx { struct crypto_skcipher_spawn chacha; - struct crypto_ahash_spawn poly; + struct crypto_shash_spawn poly; unsigned int saltlen; }; struct chachapoly_ctx { struct crypto_skcipher *chacha; - struct crypto_ahash *poly; + struct crypto_shash *poly; + u32 chacha_key[CHACHA_KEY_SIZE / sizeof(u32)]; /* key bytes we use for the ChaCha20 IV */ unsigned int saltlen; u8 salt[]; }; -struct poly_req { - /* zero byte padding for AD/ciphertext, as needed */ - u8 pad[POLY1305_BLOCK_SIZE]; - /* tail data with AD/ciphertext lengths */ - struct { - __le64 assoclen; - __le64 cryptlen; - } tail; - struct scatterlist src[1]; - struct ahash_request req; /* must be last member */ -}; - struct chacha_req { u8 iv[CHACHA_IV_SIZE]; - struct scatterlist src[1]; struct skcipher_request req; /* must be last member */ }; struct chachapoly_req_ctx { struct scatterlist src[2]; struct scatterlist dst[2]; - /* the key we generate for Poly1305 using Chacha20 */ - u8 key[POLY1305_KEY_SIZE]; - /* calculated Poly1305 tag */ - u8 tag[POLY1305_DIGEST_SIZE]; /* length of data to en/decrypt, without ICV */ unsigned int cryptlen; - /* Actual AD, excluding IV */ - unsigned int assoclen; /* request flags, with MAY_SLEEP cleared if needed */ u32 flags; - union { - struct poly_req poly; - struct chacha_req chacha; - } u; + struct chacha_req chacha; }; static inline void async_done_continue(struct aead_request *req, int err, @@ -94,43 +73,114 @@ static void chacha_iv(u8 *iv, struct aead_request *req, u32 icb) CHACHA_IV_SIZE - sizeof(leicb) - ctx->saltlen); } -static int poly_verify_tag(struct aead_request *req) +static int poly_generate_tag(struct aead_request *req, u8 *poly_tag, + struct scatterlist *crypt) { + struct crypto_aead *tfm = crypto_aead_reqtfm(req); + struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - u8 tag[sizeof(rctx->tag)]; + u32 chacha_state[CHACHA_BLOCK_SIZE / sizeof(u32)]; + SHASH_DESC_ON_STACK(desc, ctx->poly); + u8 poly_key[POLY1305_KEY_SIZE]; + bool skip_ad_pad, atomic; + u8 iv[CHACHA_IV_SIZE]; + unsigned int assoclen; + unsigned int padlen; + __le64 tail[4]; + + /* + * Take the Poly1305 hash of the entire AD plus ciphertext in one go if + * a) we are not running in ESP mode (which puts data between the AD + * and the ciphertext in the input scatterlist), and + * b) no padding is required between the AD and the ciphertext, and + * c) the source buffer points to the ciphertext, either because we + * are decrypting, or because we are encrypting in place. + */ + if (crypto_aead_ivsize(tfm) == 8) { + if (req->assoclen < 8) + return -EINVAL; + assoclen = req->assoclen - 8; + skip_ad_pad = false; + } else { + assoclen = req->assoclen; + skip_ad_pad = !(assoclen % POLY1305_BLOCK_SIZE) && + (crypt == req->src); + } - scatterwalk_map_and_copy(tag, req->src, - req->assoclen + rctx->cryptlen, - sizeof(tag), 0); - if (crypto_memneq(tag, rctx->tag, sizeof(tag))) - return -EBADMSG; + /* derive the Poly1305 key */ + chacha_iv(iv, req, 0); + chacha_init(chacha_state, ctx->chacha_key, iv); + chacha_crypt(chacha_state, poly_key, page_address(ZERO_PAGE(0)), + sizeof(poly_key), 20); + + desc->tfm = ctx->poly; + crypto_shash_init(desc); + crypto_shash_update(desc, poly_key, sizeof(poly_key)); + + atomic = !(rctx->flags & CRYPTO_TFM_REQ_MAY_SLEEP); + if (skip_ad_pad) { + crypto_shash_update_from_sg(desc, crypt, + assoclen + rctx->cryptlen, + atomic); + } else { + struct scatterlist sg[2]; + + crypto_shash_update_from_sg(desc, req->src, assoclen, atomic); + + padlen = -assoclen % POLY1305_BLOCK_SIZE; + if (padlen) + crypto_shash_update(desc, page_address(ZERO_PAGE(0)), + padlen); + + crypt = scatterwalk_ffwd(sg, crypt, req->assoclen); + crypto_shash_update_from_sg(desc, crypt, rctx->cryptlen, + atomic); + } + + tail[0] = 0; + tail[1] = 0; + tail[2] = cpu_to_le64(assoclen); + tail[3] = cpu_to_le64(rctx->cryptlen); + + padlen = -rctx->cryptlen % POLY1305_BLOCK_SIZE; + crypto_shash_finup(desc, (u8 *)tail + (2 * sizeof(__le64) - padlen), + padlen + 2 * sizeof(__le64), poly_tag); return 0; } -static int poly_copy_tag(struct aead_request *req) +static int poly_append_tag(struct aead_request *req) { struct chachapoly_req_ctx *rctx = aead_request_ctx(req); + u8 poly_tag[POLY1305_DIGEST_SIZE]; + int err; - scatterwalk_map_and_copy(rctx->tag, req->dst, + err = poly_generate_tag(req, poly_tag, req->dst); + if (err) + return err; + + scatterwalk_map_and_copy(poly_tag, req->dst, req->assoclen + rctx->cryptlen, - sizeof(rctx->tag), 1); + sizeof(poly_tag), 1); return 0; } -static void chacha_decrypt_done(struct crypto_async_request *areq, int err) +static void chacha_encrypt_done(struct crypto_async_request *areq, int err) { - async_done_continue(areq->data, err, poly_verify_tag); + async_done_continue(areq->data, err, poly_append_tag); } -static int chacha_decrypt(struct aead_request *req) +static int chachapoly_encrypt(struct aead_request *req) { struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct chacha_req *creq = &rctx->u.chacha; + struct chacha_req *creq = &rctx->chacha; struct scatterlist *src, *dst; int err; - if (rctx->cryptlen == 0) + rctx->cryptlen = req->cryptlen; + rctx->flags = aead_request_flags(req); + + if (req->cryptlen == 0) goto skip; chacha_iv(creq->iv, req, 1); @@ -141,273 +191,48 @@ static int chacha_decrypt(struct aead_request *req) dst = scatterwalk_ffwd(rctx->dst, req->dst, req->assoclen); skcipher_request_set_callback(&creq->req, rctx->flags, - chacha_decrypt_done, req); + chacha_encrypt_done, req); skcipher_request_set_tfm(&creq->req, ctx->chacha); skcipher_request_set_crypt(&creq->req, src, dst, - rctx->cryptlen, creq->iv); - err = crypto_skcipher_decrypt(&creq->req); + req->cryptlen, creq->iv); + err = crypto_skcipher_encrypt(&creq->req); if (err) return err; skip: - return poly_verify_tag(req); -} - -static int poly_tail_continue(struct aead_request *req) -{ - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - - if (rctx->cryptlen == req->cryptlen) /* encrypting */ - return poly_copy_tag(req); - - return chacha_decrypt(req); -} - -static void poly_tail_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_tail_continue); -} - -static int poly_tail(struct aead_request *req) -{ - struct crypto_aead *tfm = crypto_aead_reqtfm(req); - struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - int err; - - preq->tail.assoclen = cpu_to_le64(rctx->assoclen); - preq->tail.cryptlen = cpu_to_le64(rctx->cryptlen); - sg_init_one(preq->src, &preq->tail, sizeof(preq->tail)); - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_tail_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, preq->src, - rctx->tag, sizeof(preq->tail)); - - err = crypto_ahash_finup(&preq->req); - if (err) - return err; - - return poly_tail_continue(req); -} - -static void poly_cipherpad_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_tail); -} - -static int poly_cipherpad(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - unsigned int padlen; - int err; - - padlen = -rctx->cryptlen % POLY1305_BLOCK_SIZE; - memset(preq->pad, 0, sizeof(preq->pad)); - sg_init_one(preq->src, preq->pad, padlen); - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_cipherpad_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, preq->src, NULL, padlen); - - err = crypto_ahash_update(&preq->req); - if (err) - return err; - - return poly_tail(req); -} - -static void poly_cipher_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_cipherpad); -} - -static int poly_cipher(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - struct scatterlist *crypt = req->src; - int err; - - if (rctx->cryptlen == req->cryptlen) /* encrypting */ - crypt = req->dst; - - crypt = scatterwalk_ffwd(rctx->src, crypt, req->assoclen); - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_cipher_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, crypt, NULL, rctx->cryptlen); - - err = crypto_ahash_update(&preq->req); - if (err) - return err; - - return poly_cipherpad(req); -} - -static void poly_adpad_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_cipher); -} - -static int poly_adpad(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - unsigned int padlen; - int err; - - padlen = -rctx->assoclen % POLY1305_BLOCK_SIZE; - memset(preq->pad, 0, sizeof(preq->pad)); - sg_init_one(preq->src, preq->pad, padlen); - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_adpad_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, preq->src, NULL, padlen); - - err = crypto_ahash_update(&preq->req); - if (err) - return err; - - return poly_cipher(req); -} - -static void poly_ad_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_adpad); -} - -static int poly_ad(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - int err; - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_ad_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, req->src, NULL, rctx->assoclen); - - err = crypto_ahash_update(&preq->req); - if (err) - return err; - - return poly_adpad(req); -} - -static void poly_setkey_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_ad); -} - -static int poly_setkey(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - int err; - - sg_init_one(preq->src, rctx->key, sizeof(rctx->key)); - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_setkey_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - ahash_request_set_crypt(&preq->req, preq->src, NULL, sizeof(rctx->key)); - - err = crypto_ahash_update(&preq->req); - if (err) - return err; - - return poly_ad(req); + return poly_append_tag(req); } -static void poly_init_done(struct crypto_async_request *areq, int err) +static void chacha_decrypt_done(struct crypto_async_request *areq, int err) { - async_done_continue(areq->data, err, poly_setkey); + aead_request_complete(areq->data, err); } -static int poly_init(struct aead_request *req) +static int chachapoly_decrypt(struct aead_request *req) { struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct poly_req *preq = &rctx->u.poly; - int err; - - ahash_request_set_callback(&preq->req, rctx->flags, - poly_init_done, req); - ahash_request_set_tfm(&preq->req, ctx->poly); - - err = crypto_ahash_init(&preq->req); - if (err) - return err; - - return poly_setkey(req); -} - -static void poly_genkey_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_init); -} - -static int poly_genkey(struct aead_request *req) -{ - struct crypto_aead *tfm = crypto_aead_reqtfm(req); - struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct chacha_req *creq = &rctx->u.chacha; + struct chacha_req *creq = &rctx->chacha; + u8 calculated_tag[POLY1305_DIGEST_SIZE]; + u8 provided_tag[POLY1305_DIGEST_SIZE]; + struct scatterlist *src, *dst; int err; - rctx->assoclen = req->assoclen; - - if (crypto_aead_ivsize(tfm) == 8) { - if (rctx->assoclen < 8) - return -EINVAL; - rctx->assoclen -= 8; - } - - memset(rctx->key, 0, sizeof(rctx->key)); - sg_init_one(creq->src, rctx->key, sizeof(rctx->key)); - - chacha_iv(creq->iv, req, 0); - - skcipher_request_set_callback(&creq->req, rctx->flags, - poly_genkey_done, req); - skcipher_request_set_tfm(&creq->req, ctx->chacha); - skcipher_request_set_crypt(&creq->req, creq->src, creq->src, - POLY1305_KEY_SIZE, creq->iv); + rctx->cryptlen = req->cryptlen - POLY1305_DIGEST_SIZE; + rctx->flags = aead_request_flags(req); - err = crypto_skcipher_decrypt(&creq->req); + err = poly_generate_tag(req, calculated_tag, req->src); if (err) return err; + scatterwalk_map_and_copy(provided_tag, req->src, + req->assoclen + rctx->cryptlen, + sizeof(provided_tag), 0); - return poly_init(req); -} - -static void chacha_encrypt_done(struct crypto_async_request *areq, int err) -{ - async_done_continue(areq->data, err, poly_genkey); -} - -static int chacha_encrypt(struct aead_request *req) -{ - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - struct chacha_req *creq = &rctx->u.chacha; - struct scatterlist *src, *dst; - int err; + if (crypto_memneq(calculated_tag, provided_tag, sizeof(provided_tag))) + return -EBADMSG; - if (req->cryptlen == 0) - goto skip; + if (rctx->cryptlen == 0) + return 0; chacha_iv(creq->iv, req, 1); @@ -417,60 +242,11 @@ static int chacha_encrypt(struct aead_request *req) dst = scatterwalk_ffwd(rctx->dst, req->dst, req->assoclen); skcipher_request_set_callback(&creq->req, rctx->flags, - chacha_encrypt_done, req); + chacha_decrypt_done, req); skcipher_request_set_tfm(&creq->req, ctx->chacha); skcipher_request_set_crypt(&creq->req, src, dst, - req->cryptlen, creq->iv); - err = crypto_skcipher_encrypt(&creq->req); - if (err) - return err; - -skip: - return poly_genkey(req); -} - -static int chachapoly_encrypt(struct aead_request *req) -{ - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - - rctx->cryptlen = req->cryptlen; - rctx->flags = aead_request_flags(req); - - /* encrypt call chain: - * - chacha_encrypt/done() - * - poly_genkey/done() - * - poly_init/done() - * - poly_setkey/done() - * - poly_ad/done() - * - poly_adpad/done() - * - poly_cipher/done() - * - poly_cipherpad/done() - * - poly_tail/done/continue() - * - poly_copy_tag() - */ - return chacha_encrypt(req); -} - -static int chachapoly_decrypt(struct aead_request *req) -{ - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); - - rctx->cryptlen = req->cryptlen - POLY1305_DIGEST_SIZE; - rctx->flags = aead_request_flags(req); - - /* decrypt call chain: - * - poly_genkey/done() - * - poly_init/done() - * - poly_setkey/done() - * - poly_ad/done() - * - poly_adpad/done() - * - poly_cipher/done() - * - poly_cipherpad/done() - * - poly_tail/done/continue() - * - chacha_decrypt/done() - * - poly_verify_tag() - */ - return poly_genkey(req); + rctx->cryptlen, creq->iv); + return crypto_skcipher_decrypt(&creq->req); } static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key, @@ -482,6 +258,15 @@ static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key, if (keylen != ctx->saltlen + CHACHA_KEY_SIZE) return -EINVAL; + ctx->chacha_key[0] = get_unaligned_le32(key); + ctx->chacha_key[1] = get_unaligned_le32(key + 4); + ctx->chacha_key[2] = get_unaligned_le32(key + 8); + ctx->chacha_key[3] = get_unaligned_le32(key + 12); + ctx->chacha_key[4] = get_unaligned_le32(key + 16); + ctx->chacha_key[5] = get_unaligned_le32(key + 20); + ctx->chacha_key[6] = get_unaligned_le32(key + 24); + ctx->chacha_key[7] = get_unaligned_le32(key + 28); + keylen -= ctx->saltlen; memcpy(ctx->salt, key + keylen, ctx->saltlen); @@ -510,16 +295,16 @@ static int chachapoly_init(struct crypto_aead *tfm) struct chachapoly_instance_ctx *ictx = aead_instance_ctx(inst); struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); struct crypto_skcipher *chacha; - struct crypto_ahash *poly; + struct crypto_shash *poly; unsigned long align; - poly = crypto_spawn_ahash(&ictx->poly); + poly = crypto_spawn_shash(&ictx->poly); if (IS_ERR(poly)) return PTR_ERR(poly); chacha = crypto_spawn_skcipher(&ictx->chacha); if (IS_ERR(chacha)) { - crypto_free_ahash(poly); + crypto_free_shash(poly); return PTR_ERR(chacha); } @@ -531,13 +316,10 @@ static int chachapoly_init(struct crypto_aead *tfm) align &= ~(crypto_tfm_ctx_alignment() - 1); crypto_aead_set_reqsize( tfm, - align + offsetof(struct chachapoly_req_ctx, u) + - max(offsetof(struct chacha_req, req) + - sizeof(struct skcipher_request) + - crypto_skcipher_reqsize(chacha), - offsetof(struct poly_req, req) + - sizeof(struct ahash_request) + - crypto_ahash_reqsize(poly))); + align + + offsetof(struct chachapoly_req_ctx, chacha.req) + + sizeof(struct skcipher_request) + + crypto_skcipher_reqsize(chacha)); return 0; } @@ -546,7 +328,7 @@ static void chachapoly_exit(struct crypto_aead *tfm) { struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); - crypto_free_ahash(ctx->poly); + crypto_free_shash(ctx->poly); crypto_free_skcipher(ctx->chacha); } @@ -555,7 +337,7 @@ static void chachapoly_free(struct aead_instance *inst) struct chachapoly_instance_ctx *ctx = aead_instance_ctx(inst); crypto_drop_skcipher(&ctx->chacha); - crypto_drop_ahash(&ctx->poly); + crypto_drop_shash(&ctx->poly); kfree(inst); } @@ -566,9 +348,9 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, struct aead_instance *inst; struct skcipher_alg *chacha; struct crypto_alg *poly; - struct hash_alg_common *poly_hash; + struct shash_alg *poly_hash; struct chachapoly_instance_ctx *ctx; - const char *chacha_name, *poly_name; + const char *chacha_name; int err; if (ivsize > CHACHAPOLY_IV_SIZE) @@ -584,18 +366,10 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, chacha_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(chacha_name)) return PTR_ERR(chacha_name); - poly_name = crypto_attr_alg_name(tb[2]); - if (IS_ERR(poly_name)) - return PTR_ERR(poly_name); - - poly = crypto_find_alg(poly_name, &crypto_ahash_type, - CRYPTO_ALG_TYPE_HASH, - CRYPTO_ALG_TYPE_AHASH_MASK | - crypto_requires_sync(algt->type, - algt->mask)); - if (IS_ERR(poly)) - return PTR_ERR(poly); - poly_hash = __crypto_hash_alg_common(poly); + poly_hash = shash_attr_alg(tb[2], 0, 0); + if (IS_ERR(poly_hash)) + return PTR_ERR(poly_hash); + poly = &poly_hash->base; err = -EINVAL; if (poly_hash->digestsize != POLY1305_DIGEST_SIZE) @@ -608,7 +382,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, ctx = aead_instance_ctx(inst); ctx->saltlen = CHACHAPOLY_IV_SIZE - ivsize; - err = crypto_init_ahash_spawn(&ctx->poly, poly_hash, + err = crypto_init_shash_spawn(&ctx->poly, poly_hash, aead_crypto_instance(inst)); if (err) goto err_free_inst; @@ -630,10 +404,13 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, if (chacha->base.cra_blocksize != 1) goto out_drop_chacha; + if (strcmp(chacha->base.cra_name, "chacha20") || + strcmp(poly->cra_name, "poly1305")) + goto out_drop_chacha; + err = -ENAMETOOLONG; if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, - "%s(%s,%s)", name, chacha->base.cra_name, - poly->cra_name) >= CRYPTO_MAX_ALG_NAME) + "%s(chacha20,poly1305)", name) >= CRYPTO_MAX_ALG_NAME) goto out_drop_chacha; if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s(%s,%s)", name, chacha->base.cra_driver_name, @@ -672,7 +449,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, out_drop_chacha: crypto_drop_skcipher(&ctx->chacha); err_drop_poly: - crypto_drop_ahash(&ctx->poly); + crypto_drop_shash(&ctx->poly); err_free_inst: kfree(inst); goto out_put_poly; From patchwork Wed Sep 25 16:12:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161065 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8D46924 for ; Wed, 25 Sep 2019 16:15:59 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9689820665 for ; Wed, 25 Sep 2019 16:15:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ANi7hXIx"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="SYIsA/jL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9689820665 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=JVJdDaRr6pUgt0fIKe5lt8yfZ6SwWnLBTRE9fOLd/6c=; b=ANi7hXIx5qqKpG dIzxydoWqP+QKAUOe4xiOb/U25xWRi7sWMYuvSuqr+jZF0CPGK8KJJ9GIUCdRQKxbVoFCdrsXt4hY quXdKkPzNvPV9oPijv//s2F6H4H1cGxtaHqRRi6S5YZLTMRpgeZRry6UwTzxpF5OqgJNdemw4y7Ti tY4dcZtP26RYB3Om97plUTnI3sV1zEcWy4YRtil2eB4yor2MyEVU1GZHjYRUcU9ZUeynjL+OHAD18 33+xRnRYqWCo6DsTZf4Im5jPltSU7yabGPFqlAcYSHnC/3ZQW0/OBdSHPyge9jwLUfmz927MUGZ6j 98Yu6tRuNKYUiRooqD2g==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9xO-0006Lt-PX; Wed, 25 Sep 2019 16:15:58 +0000 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vS-0003w3-Ja for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:02 +0000 Received: by mail-wr1-x443.google.com with SMTP id v8so7644097wrt.2 for ; Wed, 25 Sep 2019 09:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xupbsCtUMBq0dKLLZcgZZz8OoL1zNg5tE6xYiAmXGow=; b=SYIsA/jLyOT0qlEO7cDTv2aW9SXsCwRwELfU1LJSTg6ZtBdPf/Th6yPkKEhj/J2adr Y2tUqkSPh4P0nJFGCaab9LSoY9FwcwWbEFVGm9sOpeRekjpfxZurAIa5tW9A6QbzDJC0 h/JSGGDGnDXTJ95eEqNSHUarUCnxy4vRJyibdJuwzgoGERTNXJC6oAxTYpUHSLFzLBn4 cU2WQx4k6LgfVmdjuC5I5IplDoZHPC78rRIiQVHHTaFxfVPjLpzk2dNe/vfd04/019i4 YOWgSgvsxyZfu2v1O/drgMNdQfe+7oLX+jhjG24oKtDk0u73q0WP/4tcdBBTc9yMMNs1 LZZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xupbsCtUMBq0dKLLZcgZZz8OoL1zNg5tE6xYiAmXGow=; b=exHfGjAdc/6DGvI9vo5S+khKp5LfBoWXsW1Dw23bsVfOsxUNk01z/x7ZoU6lMcbW8R a8lDzg75KNhEsuoQxaAfgUxt8LxAkDB45DubteXIimTg7z61t+4kord2+oILmYwnWHuk w3A8Lxrt0iTuGRi6HRsjm9iFDlHapapAkG890D7EXFDGdyQZrlUsDcxV+SyjKWIDec9G res+ACWQwcB/H7dSE9OwGHnQwKOgWHRVyw8w8e9w1QOA8C0Oyw8b5u5UodslQAy6+zuQ xJPdEssAM4271MKCpbuHXBiikbia2Cywqr4nUiq79JFsRG93APpudE/Qtj2HE2zpN1Ec iIyw== X-Gm-Message-State: APjAAAUwbHYtt/m2yv0ERKyvP5NfPVtHgQ3+GTaVC0ltOJ7zzcqBUdd1 uccGIK2dXK2yjgYuBdtjq6iuMw== X-Google-Smtp-Source: APXvYqwC8EG0AIcwFUrSnHZLRvuiWV5oBEZ24YsBf/vED1PlaF+hl22X2hkyEdzgxov49IyWVFvLvg== X-Received: by 2002:adf:f547:: with SMTP id j7mr10623741wrp.119.1569428037024; Wed, 25 Sep 2019 09:13:57 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:56 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 07/18] crypto: rfc7539 - use zero reqsize for sync instantiations without alignmask Date: Wed, 25 Sep 2019 18:12:44 +0200 Message-Id: <20190925161255.1871-8-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091358_704792_F57B4EDF X-CRM114-Status: GOOD ( 15.66 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:443 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Now that we have moved all the scratch buffers that must be allocated on the heap out of the request context, we can move the request context itself to the stack if we are instantiating a synchronous version of the chacha20poly1305 transformation. This allows users of the AEAD to allocate the request structure on the stack, removing the need for per-packet heap allocations on the en/decryption hot path. Signed-off-by: Ard Biesheuvel --- crypto/chacha20poly1305.c | 51 ++++++++++++-------- 1 file changed, 32 insertions(+), 19 deletions(-) diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c index 71496a8107f5..d171a0c9e837 100644 --- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -49,13 +49,14 @@ struct chachapoly_req_ctx { }; static inline void async_done_continue(struct aead_request *req, int err, - int (*cont)(struct aead_request *)) + int (*cont)(struct aead_request *, + struct chachapoly_req_ctx *)) { if (!err) { struct chachapoly_req_ctx *rctx = aead_request_ctx(req); rctx->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; - err = cont(req); + err = cont(req, rctx); } if (err != -EINPROGRESS && err != -EBUSY) @@ -74,11 +75,11 @@ static void chacha_iv(u8 *iv, struct aead_request *req, u32 icb) } static int poly_generate_tag(struct aead_request *req, u8 *poly_tag, - struct scatterlist *crypt) + struct scatterlist *crypt, + struct chachapoly_req_ctx *rctx) { struct crypto_aead *tfm = crypto_aead_reqtfm(req); struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); u32 chacha_state[CHACHA_BLOCK_SIZE / sizeof(u32)]; SHASH_DESC_ON_STACK(desc, ctx->poly); u8 poly_key[POLY1305_KEY_SIZE]; @@ -148,13 +149,13 @@ static int poly_generate_tag(struct aead_request *req, u8 *poly_tag, return 0; } -static int poly_append_tag(struct aead_request *req) +static int poly_append_tag(struct aead_request *req, + struct chachapoly_req_ctx *rctx) { - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); u8 poly_tag[POLY1305_DIGEST_SIZE]; int err; - err = poly_generate_tag(req, poly_tag, req->dst); + err = poly_generate_tag(req, poly_tag, req->dst, rctx); if (err) return err; @@ -171,12 +172,17 @@ static void chacha_encrypt_done(struct crypto_async_request *areq, int err) static int chachapoly_encrypt(struct aead_request *req) { - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); + struct chachapoly_req_ctx stack_rctx CRYPTO_MINALIGN_ATTR; + struct crypto_aead *tfm = crypto_aead_reqtfm(req); + struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); + struct chachapoly_req_ctx *rctx = &stack_rctx; struct chacha_req *creq = &rctx->chacha; struct scatterlist *src, *dst; int err; + if (unlikely(crypto_aead_reqsize(tfm) > 0)) + rctx = aead_request_ctx(req); + rctx->cryptlen = req->cryptlen; rctx->flags = aead_request_flags(req); @@ -200,7 +206,7 @@ static int chachapoly_encrypt(struct aead_request *req) return err; skip: - return poly_append_tag(req); + return poly_append_tag(req, rctx); } static void chacha_decrypt_done(struct crypto_async_request *areq, int err) @@ -210,18 +216,23 @@ static void chacha_decrypt_done(struct crypto_async_request *areq, int err) static int chachapoly_decrypt(struct aead_request *req) { - struct chachapoly_ctx *ctx = crypto_aead_ctx(crypto_aead_reqtfm(req)); - struct chachapoly_req_ctx *rctx = aead_request_ctx(req); + struct chachapoly_req_ctx stack_rctx CRYPTO_MINALIGN_ATTR; + struct crypto_aead *tfm = crypto_aead_reqtfm(req); + struct chachapoly_ctx *ctx = crypto_aead_ctx(tfm); + struct chachapoly_req_ctx *rctx = &stack_rctx; struct chacha_req *creq = &rctx->chacha; u8 calculated_tag[POLY1305_DIGEST_SIZE]; u8 provided_tag[POLY1305_DIGEST_SIZE]; struct scatterlist *src, *dst; int err; + if (unlikely(crypto_aead_reqsize(tfm) > 0)) + rctx = aead_request_ctx(req); + rctx->cryptlen = req->cryptlen - POLY1305_DIGEST_SIZE; rctx->flags = aead_request_flags(req); - err = poly_generate_tag(req, calculated_tag, req->src); + err = poly_generate_tag(req, calculated_tag, req->src, rctx); if (err) return err; scatterwalk_map_and_copy(provided_tag, req->src, @@ -314,12 +325,14 @@ static int chachapoly_init(struct crypto_aead *tfm) align = crypto_aead_alignmask(tfm); align &= ~(crypto_tfm_ctx_alignment() - 1); - crypto_aead_set_reqsize( - tfm, - align + - offsetof(struct chachapoly_req_ctx, chacha.req) + - sizeof(struct skcipher_request) + - crypto_skcipher_reqsize(chacha)); + if (crypto_aead_alignmask(tfm) > 0 || + (crypto_aead_get_flags(tfm) & CRYPTO_ALG_ASYNC)) + crypto_aead_set_reqsize( + tfm, + align + + offsetof(struct chachapoly_req_ctx, chacha.req) + + sizeof(struct skcipher_request) + + crypto_skcipher_reqsize(chacha)); return 0; } From patchwork Wed Sep 25 16:12:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161063 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51FF7924 for ; Wed, 25 Sep 2019 16:15:42 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2FBB921D79 for ; Wed, 25 Sep 2019 16:15:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="LzUeo3kf"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="yF7nxogd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2FBB921D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=19v6jz+KFVY+Al7b3o5V/ZwiYLe1rPEG8YuW6XJgrXI=; b=LzUeo3kfGFDUbW Bc1kO2kbaONzshiJu0TNP0uei89jiBEv0reX5wJihXs83buCov5hRN9Kpr0BZx0En2jL7yjIND2jb uw/igyrJ8utXZGDQhXZ72CLpNIep5yU8syJmG9m9uIkUkJzVR+UknxFflnrwmDYbomVNIazbdDS8y 0JtJJP2Dir+UKMJXnu77Zzq/a99pgVcYHAspGFDEplYtMisPNdm5R1kRGAEhug1TQC3BoEhNu6Wjg SilQDUp/oyczt6tjXz6mgjfnoUgjLWb5SVm1m+aECCZd8Vb+SzvW/SUdVRUocX18k0jN4EjQ6Oh0K MVx8WWQzg1t/evkrw9cA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9x6-00066Z-A5; Wed, 25 Sep 2019 16:15:40 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vT-0003ws-IA for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:02 +0000 Received: by mail-wr1-x441.google.com with SMTP id o18so7642973wrv.13 for ; Wed, 25 Sep 2019 09:13:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oTkLJDHd3Opt8Q7gQbNEv9bluAfNAUwUozJN7kv8cIw=; b=yF7nxogdOddiZ7APJUWvv8ZOZvf7wdIBsXoh5+aa06OqQerRa+w3xtndhPDTvbyeTs SxCDjPcJJ5qj5HXRHMljeXqQhUcT5u8DVC/xiVUQWc/xL7+i5gE2JMdr8DvOBarsLX5B EawpOjrIVlrozZgzbaV+7ympMMIMlrDR+vcF8xD93tAcRteh3dAQUd6LOhepRf1abFOU kJvyoTfNL3aOgWUg9OLo15VCRpFO8RzefWgCE9X6Zu++89AoZlW3h1MyeCH2qsIp0cPV P92XYhzshGFCxB/VVpNrKiPnXbbjfbGLkRrHxm15wdGgtvLZ667XLMgIGKtHMIEw6LVd 7teg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oTkLJDHd3Opt8Q7gQbNEv9bluAfNAUwUozJN7kv8cIw=; b=rKge88bt2dJQPcOgpjpLMgEPV3HwF/UY0QbmBX1ltMQhwqX2aB+KfFxEGFINgSxDCm oP28JJT9Nm3ibZnOWPAVaVq2LMXUKmmdfW2BVnuSC0XMfxf/qcNFWyqbukCbMsJEo8j0 r+6ZmBr0Jp9iQfBg5ymsrY53DrTbbBw7O9nnPJW+eKuX/f3gQcw3vdyfWAPYaaOlc/R7 RGfWAz+5HTJuUtlLBoY328cCa8dYTDZGFC1579w5cmRiKFNqy45UrEG1ra++5+ynfOa3 MwjECJ89mesUCO8yciU/1KT5v13V+m6j6vt41oB9Ert3FDZ/k2nhzPHqAe6xg0r34CwC ERZw== X-Gm-Message-State: APjAAAWtWmfV2ieJ0KduCEGwod2ccm05Ier7HLJejukO4+RGh8j8Lx/G bMATipZNDMPqN7IEnW9Vyq2fGg== X-Google-Smtp-Source: APXvYqzc7zDHo2pym+GG6Iqyv/Iy0z/3d/GbL+d0TyOdn6ZMUTam9B5o+BxEoxNyjw7X4aDT9QAIsg== X-Received: by 2002:adf:afed:: with SMTP id y45mr10259853wrd.347.1569428038355; Wed, 25 Sep 2019 09:13:58 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:57 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 08/18] crypto: testmgr - add a chacha20poly1305 test case Date: Wed, 25 Sep 2019 18:12:45 +0200 Message-Id: <20190925161255.1871-9-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091359_702201_35AFFF5B X-CRM114-Status: GOOD ( 11.12 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:441 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Add a test case to the RFC7539 (non-ESP) test vector array that exercises the newly added code path that may optimize away one invocation of the shash when the assoclen is a multiple of the Poly1305 block size. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.h | 45 ++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index ef7d21f39d4a..5439b37f2b9f 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -18950,6 +18950,51 @@ static const struct aead_testvec rfc7539_tv_template[] = { "\x22\x39\x23\x36\xfe\xa1\x85\x1f" "\x38", .clen = 281, + }, { + .key = "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f", + .klen = 32, + .iv = "\x07\x00\x00\x00\x40\x41\x42\x43" + "\x44\x45\x46\x47", + .assoc = "\x50\x51\x52\x53\xc0\xc1\xc2\xc3" + "\xc4\xc5\xc6\xc7\x44\x45\x46\x47", + .alen = 16, + .ptext = "\x4c\x61\x64\x69\x65\x73\x20\x61" + "\x6e\x64\x20\x47\x65\x6e\x74\x6c" + "\x65\x6d\x65\x6e\x20\x6f\x66\x20" + "\x74\x68\x65\x20\x63\x6c\x61\x73" + "\x73\x20\x6f\x66\x20\x27\x39\x39" + "\x3a\x20\x49\x66\x20\x49\x20\x63" + "\x6f\x75\x6c\x64\x20\x6f\x66\x66" + "\x65\x72\x20\x79\x6f\x75\x20\x6f" + "\x6e\x6c\x79\x20\x6f\x6e\x65\x20" + "\x74\x69\x70\x20\x66\x6f\x72\x20" + "\x74\x68\x65\x20\x66\x75\x74\x75" + "\x72\x65\x2c\x20\x73\x75\x6e\x73" + "\x63\x72\x65\x65\x6e\x20\x77\x6f" + "\x75\x6c\x64\x20\x62\x65\x20\x69" + "\x74\x2e", + .plen = 114, + .ctext = "\xd3\x1a\x8d\x34\x64\x8e\x60\xdb" + "\x7b\x86\xaf\xbc\x53\xef\x7e\xc2" + "\xa4\xad\xed\x51\x29\x6e\x08\xfe" + "\xa9\xe2\xb5\xa7\x36\xee\x62\xd6" + "\x3d\xbe\xa4\x5e\x8c\xa9\x67\x12" + "\x82\xfa\xfb\x69\xda\x92\x72\x8b" + "\x1a\x71\xde\x0a\x9e\x06\x0b\x29" + "\x05\xd6\xa5\xb6\x7e\xcd\x3b\x36" + "\x92\xdd\xbd\x7f\x2d\x77\x8b\x8c" + "\x98\x03\xae\xe3\x28\x09\x1b\x58" + "\xfa\xb3\x24\xe4\xfa\xd6\x75\x94" + "\x55\x85\x80\x8b\x48\x31\xd7\xbc" + "\x3f\xf4\xde\xf0\x8e\x4b\x7a\x9d" + "\xe5\x76\xd2\x65\x86\xce\xc6\x4b" + "\x61\x16\xb3\xb8\x82\x76\x1f\x39" + "\x35\x6f\x26\x8d\x28\x0f\xac\x45" + "\x02\x5d", + .clen = 130, }, }; From patchwork Wed Sep 25 16:12:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161073 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50E171599 for ; Wed, 25 Sep 2019 16:17:23 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 29CA321D79 for ; Wed, 25 Sep 2019 16:17:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="IRKEdXLZ"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="z8OuhkVl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29CA321D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=TrY4VoadqR3UdBTol34HtYt8aou/04dk0lAtPqyTXag=; b=IRKEdXLZdf4MC+ 5l/Pu5DNoUy6/mumx7EAq4gEMZd/gYYIh494glnDWWw/MJj3UbUVrHYzeASipg7rrBbtgECmcbRYB WVVd+yMNmV+A+6mkaKtz6ndwuQCd88EFH6rMeNSvEGzO5uDZUenho/Mx/FCOTtvsqKSXgVJjdmTbc 4WBeiKGmW87xnyVcChlboMRiVyOAxGk8y2Korjlim0LNgIyn3A2Y4V1A3xrJAPCDqs9VrUuVtRJGG DOxxwe2bWAzN7jybg7mXb0xT+ZctjwZWSdIDVPsJyT8o16120Tc++uXNNhlMVk7RsTHqHQHY58X31 TMGSfbzo4cxkSgLeolRA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9yk-0007Ly-GX; Wed, 25 Sep 2019 16:17:22 +0000 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vV-0003y9-4J for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:05 +0000 Received: by mail-wr1-x443.google.com with SMTP id r5so7609427wrm.12 for ; Wed, 25 Sep 2019 09:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kSUfY6BEXBgm8PBl4NM38bq4FokXR/M8kMBF5SPnY6c=; b=z8OuhkVl6QAmyOjrvoADZg2g8ZZ/bTV/zZe0uz6H2zCwTuGSqzGinURbhc7gN4Qt6U EJXwo/vv7zQXtQT11EqcMj9fP2tIJC7yywEbI9J3SLLnBcdthlZxVl8BuaIlWbm6THek mIGaCgC7gh7xHoTZqmPirRtsSEVK65aMj76fLUXOUq1RiSMqMoaXtFGzl3barUwM845R 4Ruc+RDx+ppcOWzR4e5DRfpVb4ixuwEOsvI9Kl2dl91jQ5rTWDsgUBK7xDuolrh9Mokc 1cZ/4G/MxxVDzbEmSPxWXdsP1EMCdMHeHHrWJn/fAPiz22Uj7v0QRDdzR9QfthDOXmbU 1OKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kSUfY6BEXBgm8PBl4NM38bq4FokXR/M8kMBF5SPnY6c=; b=ckvJecm1h81C0U++nPTk/tXJa9JmjypHvi+Rwo7jRkG+fKiIWsj2JpG7CTIa1xG6bg ZuBexbxuELLSa6SSeSJwq5v/LOR6tFin/+nbPp/LanbJK+sHgIOcbacXfiqDMANtuort NVB8uNbhq0APZ2gVHZ2fOzpyINC+HwtD46+/hUTi3r2cNKH6SKbX6wmrUqGw4GYvfbqP muoyRF9h7Cj1ffmowL2rsFyTIFllX6BUQ6kkdKEpY8gZ1DBIcfMezOyFnbIgxXr/m2IJ tnZjZs9RIA8LOHJVD4SiezQYyTtn7+iSDWUmyupK6iR7p3AxfZDuw+PMcly8NM7czO8T T+SQ== X-Gm-Message-State: APjAAAVQI/7+KtTStk7XOnn6D1tBSRCcvW3Eh5+Z+t41wXM2fquvupYI Y5Nb7UW5n6QgI3CWWKFVOjVFfQ== X-Google-Smtp-Source: APXvYqyQZoCaMaVbtnCPbJVIleM5/XJDZjTPa6qND3Axok+ArgSJ7mXujP2VX/FI5Q6b+r/K1Ww8QQ== X-Received: by 2002:a5d:4ed0:: with SMTP id s16mr10339744wrv.248.1569428039638; Wed, 25 Sep 2019 09:13:59 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:13:58 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 09/18] crypto: poly1305 - move core algorithm into lib/crypto Date: Wed, 25 Sep 2019 18:12:46 +0200 Message-Id: <20190925161255.1871-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091401_508319_F971A721 X-CRM114-Status: GOOD ( 21.07 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:443 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Move the core Poly1305 transformation into a separate library in lib/crypto so it can be used by other subsystems without going through the entire crypto API. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/poly1305_glue.c | 2 +- crypto/Kconfig | 4 + crypto/adiantum.c | 4 +- crypto/nhpoly1305.c | 2 +- crypto/poly1305_generic.c | 167 +------------------- include/crypto/internal/poly1305.h | 37 +++++ include/crypto/poly1305.h | 32 +--- lib/crypto/Makefile | 3 + lib/crypto/poly1305.c | 158 ++++++++++++++++++ 9 files changed, 217 insertions(+), 192 deletions(-) diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index f2afaa8e23c2..9291cfea799d 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -7,8 +7,8 @@ #include #include +#include #include -#include #include #include #include diff --git a/crypto/Kconfig b/crypto/Kconfig index 25a06cf49a5d..45589110bf80 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -654,9 +654,13 @@ config CRYPTO_GHASH GHASH is the hash function used in GCM (Galois/Counter Mode). It is not a general-purpose cryptographic hash function. +config CRYPTO_LIB_POLY1305 + tristate "Poly1305 authenticator library" + config CRYPTO_POLY1305 tristate "Poly1305 authenticator algorithm" select CRYPTO_HASH + select CRYPTO_LIB_POLY1305 help Poly1305 authenticator algorithm, RFC7539. diff --git a/crypto/adiantum.c b/crypto/adiantum.c index 395a3ddd3707..775ed1418e2b 100644 --- a/crypto/adiantum.c +++ b/crypto/adiantum.c @@ -242,11 +242,11 @@ static void adiantum_hash_header(struct skcipher_request *req) BUILD_BUG_ON(sizeof(header) % POLY1305_BLOCK_SIZE != 0); poly1305_core_blocks(&state, &tctx->header_hash_key, - &header, sizeof(header) / POLY1305_BLOCK_SIZE); + &header, sizeof(header) / POLY1305_BLOCK_SIZE, 1); BUILD_BUG_ON(TWEAK_SIZE % POLY1305_BLOCK_SIZE != 0); poly1305_core_blocks(&state, &tctx->header_hash_key, req->iv, - TWEAK_SIZE / POLY1305_BLOCK_SIZE); + TWEAK_SIZE / POLY1305_BLOCK_SIZE, 1); poly1305_core_emit(&state, &rctx->header_hash); } diff --git a/crypto/nhpoly1305.c b/crypto/nhpoly1305.c index 9ab4e07cde4d..b88a6a71e3e2 100644 --- a/crypto/nhpoly1305.c +++ b/crypto/nhpoly1305.c @@ -78,7 +78,7 @@ static void process_nh_hash_value(struct nhpoly1305_state *state, BUILD_BUG_ON(NH_HASH_BYTES % POLY1305_BLOCK_SIZE != 0); poly1305_core_blocks(&state->poly_state, &key->poly_key, state->nh_hash, - NH_HASH_BYTES / POLY1305_BLOCK_SIZE); + NH_HASH_BYTES / POLY1305_BLOCK_SIZE, 1); } /* diff --git a/crypto/poly1305_generic.c b/crypto/poly1305_generic.c index adc40298c749..46a2da1abac7 100644 --- a/crypto/poly1305_generic.c +++ b/crypto/poly1305_generic.c @@ -13,27 +13,12 @@ #include #include -#include +#include #include #include #include #include -static inline u64 mlt(u64 a, u64 b) -{ - return a * b; -} - -static inline u32 sr(u64 v, u_char n) -{ - return v >> n; -} - -static inline u32 and(u32 v, u32 mask) -{ - return v & mask; -} - int crypto_poly1305_init(struct shash_desc *desc) { struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); @@ -47,17 +32,6 @@ int crypto_poly1305_init(struct shash_desc *desc) } EXPORT_SYMBOL_GPL(crypto_poly1305_init); -void poly1305_core_setkey(struct poly1305_key *key, const u8 *raw_key) -{ - /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */ - key->r[0] = (get_unaligned_le32(raw_key + 0) >> 0) & 0x3ffffff; - key->r[1] = (get_unaligned_le32(raw_key + 3) >> 2) & 0x3ffff03; - key->r[2] = (get_unaligned_le32(raw_key + 6) >> 4) & 0x3ffc0ff; - key->r[3] = (get_unaligned_le32(raw_key + 9) >> 6) & 0x3f03fff; - key->r[4] = (get_unaligned_le32(raw_key + 12) >> 8) & 0x00fffff; -} -EXPORT_SYMBOL_GPL(poly1305_core_setkey); - /* * Poly1305 requires a unique key for each tag, which implies that we can't set * it on the tfm that gets accessed by multiple users simultaneously. Instead we @@ -87,84 +61,8 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx, } EXPORT_SYMBOL_GPL(crypto_poly1305_setdesckey); -static void poly1305_blocks_internal(struct poly1305_state *state, - const struct poly1305_key *key, - const void *src, unsigned int nblocks, - u32 hibit) -{ - u32 r0, r1, r2, r3, r4; - u32 s1, s2, s3, s4; - u32 h0, h1, h2, h3, h4; - u64 d0, d1, d2, d3, d4; - - if (!nblocks) - return; - - r0 = key->r[0]; - r1 = key->r[1]; - r2 = key->r[2]; - r3 = key->r[3]; - r4 = key->r[4]; - - s1 = r1 * 5; - s2 = r2 * 5; - s3 = r3 * 5; - s4 = r4 * 5; - - h0 = state->h[0]; - h1 = state->h[1]; - h2 = state->h[2]; - h3 = state->h[3]; - h4 = state->h[4]; - - do { - /* h += m[i] */ - h0 += (get_unaligned_le32(src + 0) >> 0) & 0x3ffffff; - h1 += (get_unaligned_le32(src + 3) >> 2) & 0x3ffffff; - h2 += (get_unaligned_le32(src + 6) >> 4) & 0x3ffffff; - h3 += (get_unaligned_le32(src + 9) >> 6) & 0x3ffffff; - h4 += (get_unaligned_le32(src + 12) >> 8) | hibit; - - /* h *= r */ - d0 = mlt(h0, r0) + mlt(h1, s4) + mlt(h2, s3) + - mlt(h3, s2) + mlt(h4, s1); - d1 = mlt(h0, r1) + mlt(h1, r0) + mlt(h2, s4) + - mlt(h3, s3) + mlt(h4, s2); - d2 = mlt(h0, r2) + mlt(h1, r1) + mlt(h2, r0) + - mlt(h3, s4) + mlt(h4, s3); - d3 = mlt(h0, r3) + mlt(h1, r2) + mlt(h2, r1) + - mlt(h3, r0) + mlt(h4, s4); - d4 = mlt(h0, r4) + mlt(h1, r3) + mlt(h2, r2) + - mlt(h3, r1) + mlt(h4, r0); - - /* (partial) h %= p */ - d1 += sr(d0, 26); h0 = and(d0, 0x3ffffff); - d2 += sr(d1, 26); h1 = and(d1, 0x3ffffff); - d3 += sr(d2, 26); h2 = and(d2, 0x3ffffff); - d4 += sr(d3, 26); h3 = and(d3, 0x3ffffff); - h0 += sr(d4, 26) * 5; h4 = and(d4, 0x3ffffff); - h1 += h0 >> 26; h0 = h0 & 0x3ffffff; - - src += POLY1305_BLOCK_SIZE; - } while (--nblocks); - - state->h[0] = h0; - state->h[1] = h1; - state->h[2] = h2; - state->h[3] = h3; - state->h[4] = h4; -} - -void poly1305_core_blocks(struct poly1305_state *state, - const struct poly1305_key *key, - const void *src, unsigned int nblocks) -{ - poly1305_blocks_internal(state, key, src, nblocks, 1 << 24); -} -EXPORT_SYMBOL_GPL(poly1305_core_blocks); - -static void poly1305_blocks(struct poly1305_desc_ctx *dctx, - const u8 *src, unsigned int srclen, u32 hibit) +static void poly1305_blocks(struct poly1305_desc_ctx *dctx, const u8 *src, + unsigned int srclen) { unsigned int datalen; @@ -174,8 +72,8 @@ static void poly1305_blocks(struct poly1305_desc_ctx *dctx, srclen = datalen; } - poly1305_blocks_internal(&dctx->h, &dctx->r, - src, srclen / POLY1305_BLOCK_SIZE, hibit); + poly1305_core_blocks(&dctx->h, &dctx->r, src, + srclen / POLY1305_BLOCK_SIZE, 1); } int crypto_poly1305_update(struct shash_desc *desc, @@ -192,14 +90,13 @@ int crypto_poly1305_update(struct shash_desc *desc, dctx->buflen += bytes; if (dctx->buflen == POLY1305_BLOCK_SIZE) { - poly1305_blocks(dctx, dctx->buf, - POLY1305_BLOCK_SIZE, 1 << 24); + poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE); dctx->buflen = 0; } } if (likely(srclen >= POLY1305_BLOCK_SIZE)) { - poly1305_blocks(dctx, src, srclen, 1 << 24); + poly1305_blocks(dctx, src, srclen); src += srclen - (srclen % POLY1305_BLOCK_SIZE); srclen %= POLY1305_BLOCK_SIZE; } @@ -213,54 +110,6 @@ int crypto_poly1305_update(struct shash_desc *desc, } EXPORT_SYMBOL_GPL(crypto_poly1305_update); -void poly1305_core_emit(const struct poly1305_state *state, void *dst) -{ - u32 h0, h1, h2, h3, h4; - u32 g0, g1, g2, g3, g4; - u32 mask; - - /* fully carry h */ - h0 = state->h[0]; - h1 = state->h[1]; - h2 = state->h[2]; - h3 = state->h[3]; - h4 = state->h[4]; - - h2 += (h1 >> 26); h1 = h1 & 0x3ffffff; - h3 += (h2 >> 26); h2 = h2 & 0x3ffffff; - h4 += (h3 >> 26); h3 = h3 & 0x3ffffff; - h0 += (h4 >> 26) * 5; h4 = h4 & 0x3ffffff; - h1 += (h0 >> 26); h0 = h0 & 0x3ffffff; - - /* compute h + -p */ - g0 = h0 + 5; - g1 = h1 + (g0 >> 26); g0 &= 0x3ffffff; - g2 = h2 + (g1 >> 26); g1 &= 0x3ffffff; - g3 = h3 + (g2 >> 26); g2 &= 0x3ffffff; - g4 = h4 + (g3 >> 26) - (1 << 26); g3 &= 0x3ffffff; - - /* select h if h < p, or h + -p if h >= p */ - mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1; - g0 &= mask; - g1 &= mask; - g2 &= mask; - g3 &= mask; - g4 &= mask; - mask = ~mask; - h0 = (h0 & mask) | g0; - h1 = (h1 & mask) | g1; - h2 = (h2 & mask) | g2; - h3 = (h3 & mask) | g3; - h4 = (h4 & mask) | g4; - - /* h = h % (2^128) */ - put_unaligned_le32((h0 >> 0) | (h1 << 26), dst + 0); - put_unaligned_le32((h1 >> 6) | (h2 << 20), dst + 4); - put_unaligned_le32((h2 >> 12) | (h3 << 14), dst + 8); - put_unaligned_le32((h3 >> 18) | (h4 << 8), dst + 12); -} -EXPORT_SYMBOL_GPL(poly1305_core_emit); - int crypto_poly1305_final(struct shash_desc *desc, u8 *dst) { struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); @@ -274,7 +123,7 @@ int crypto_poly1305_final(struct shash_desc *desc, u8 *dst) dctx->buf[dctx->buflen++] = 1; memset(dctx->buf + dctx->buflen, 0, POLY1305_BLOCK_SIZE - dctx->buflen); - poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0); + poly1305_core_blocks(&dctx->h, &dctx->r, dctx->buf, 1, 0); } poly1305_core_emit(&dctx->h, digest); diff --git a/include/crypto/internal/poly1305.h b/include/crypto/internal/poly1305.h new file mode 100644 index 000000000000..ae0466c4ce59 --- /dev/null +++ b/include/crypto/internal/poly1305.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Common values for the Poly1305 algorithm + */ + +#ifndef _CRYPTO_INTERNAL_POLY1305_H +#define _CRYPTO_INTERNAL_POLY1305_H + +#include +#include + +struct poly1305_desc_ctx { + /* key */ + struct poly1305_key r; + /* finalize key */ + u32 s[4]; + /* accumulator */ + struct poly1305_state h; + /* partial buffer */ + u8 buf[POLY1305_BLOCK_SIZE]; + /* bytes used in partial buffer */ + unsigned int buflen; + /* r key has been set */ + bool rset; + /* s key has been set */ + bool sset; +}; + +/* Crypto API helper functions for the Poly1305 MAC */ +int crypto_poly1305_init(struct shash_desc *desc); +unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx, + const u8 *src, unsigned int srclen); +int crypto_poly1305_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen); +int crypto_poly1305_final(struct shash_desc *desc, u8 *dst); + +#endif diff --git a/include/crypto/poly1305.h b/include/crypto/poly1305.h index 34317ed2071e..83e4b4c69a5a 100644 --- a/include/crypto/poly1305.h +++ b/include/crypto/poly1305.h @@ -21,23 +21,6 @@ struct poly1305_state { u32 h[5]; /* accumulator, base 2^26 */ }; -struct poly1305_desc_ctx { - /* key */ - struct poly1305_key r; - /* finalize key */ - u32 s[4]; - /* accumulator */ - struct poly1305_state h; - /* partial buffer */ - u8 buf[POLY1305_BLOCK_SIZE]; - /* bytes used in partial buffer */ - unsigned int buflen; - /* r key has been set */ - bool rset; - /* s key has been set */ - bool sset; -}; - /* * Poly1305 core functions. These implement the ε-almost-∆-universal hash * function underlying the Poly1305 MAC, i.e. they don't add an encrypted nonce @@ -46,19 +29,10 @@ struct poly1305_desc_ctx { void poly1305_core_setkey(struct poly1305_key *key, const u8 *raw_key); static inline void poly1305_core_init(struct poly1305_state *state) { - memset(state->h, 0, sizeof(state->h)); + *state = (struct poly1305_state){}; } void poly1305_core_blocks(struct poly1305_state *state, - const struct poly1305_key *key, - const void *src, unsigned int nblocks); + const struct poly1305_key *key, const void *src, + unsigned int nblocks, u32 hibit); void poly1305_core_emit(const struct poly1305_state *state, void *dst); - -/* Crypto API helper functions for the Poly1305 MAC */ -int crypto_poly1305_init(struct shash_desc *desc); -unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx, - const u8 *src, unsigned int srclen); -int crypto_poly1305_update(struct shash_desc *desc, - const u8 *src, unsigned int srclen); -int crypto_poly1305_final(struct shash_desc *desc, u8 *dst); - #endif diff --git a/lib/crypto/Makefile b/lib/crypto/Makefile index 24dad058f2ae..6bf8a0a4ee0e 100644 --- a/lib/crypto/Makefile +++ b/lib/crypto/Makefile @@ -12,5 +12,8 @@ libarc4-y := arc4.o obj-$(CONFIG_CRYPTO_LIB_DES) += libdes.o libdes-y := des.o +obj-$(CONFIG_CRYPTO_LIB_POLY1305) += libpoly1305.o +libpoly1305-y := poly1305.o + obj-$(CONFIG_CRYPTO_LIB_SHA256) += libsha256.o libsha256-y := sha256.o diff --git a/lib/crypto/poly1305.c b/lib/crypto/poly1305.c new file mode 100644 index 000000000000..abe6fccf7b9c --- /dev/null +++ b/lib/crypto/poly1305.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Poly1305 authenticator algorithm, RFC7539 + * + * Copyright (C) 2015 Martin Willi + * + * Based on public domain code by Andrew Moon and Daniel J. Bernstein. + */ + +#include +#include +#include +#include + +static inline u64 mlt(u64 a, u64 b) +{ + return a * b; +} + +static inline u32 sr(u64 v, u_char n) +{ + return v >> n; +} + +static inline u32 and(u32 v, u32 mask) +{ + return v & mask; +} + +void poly1305_core_setkey(struct poly1305_key *key, const u8 *raw_key) +{ + /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */ + key->r[0] = (get_unaligned_le32(raw_key + 0) >> 0) & 0x3ffffff; + key->r[1] = (get_unaligned_le32(raw_key + 3) >> 2) & 0x3ffff03; + key->r[2] = (get_unaligned_le32(raw_key + 6) >> 4) & 0x3ffc0ff; + key->r[3] = (get_unaligned_le32(raw_key + 9) >> 6) & 0x3f03fff; + key->r[4] = (get_unaligned_le32(raw_key + 12) >> 8) & 0x00fffff; +} +EXPORT_SYMBOL_GPL(poly1305_core_setkey); + +void poly1305_core_blocks(struct poly1305_state *state, + const struct poly1305_key *key, const void *src, + unsigned int nblocks, u32 hibit) +{ + u32 r0, r1, r2, r3, r4; + u32 s1, s2, s3, s4; + u32 h0, h1, h2, h3, h4; + u64 d0, d1, d2, d3, d4; + + if (!nblocks) + return; + + r0 = key->r[0]; + r1 = key->r[1]; + r2 = key->r[2]; + r3 = key->r[3]; + r4 = key->r[4]; + + s1 = r1 * 5; + s2 = r2 * 5; + s3 = r3 * 5; + s4 = r4 * 5; + + h0 = state->h[0]; + h1 = state->h[1]; + h2 = state->h[2]; + h3 = state->h[3]; + h4 = state->h[4]; + + do { + /* h += m[i] */ + h0 += (get_unaligned_le32(src + 0) >> 0) & 0x3ffffff; + h1 += (get_unaligned_le32(src + 3) >> 2) & 0x3ffffff; + h2 += (get_unaligned_le32(src + 6) >> 4) & 0x3ffffff; + h3 += (get_unaligned_le32(src + 9) >> 6) & 0x3ffffff; + h4 += (get_unaligned_le32(src + 12) >> 8) | (hibit << 24); + + /* h *= r */ + d0 = mlt(h0, r0) + mlt(h1, s4) + mlt(h2, s3) + + mlt(h3, s2) + mlt(h4, s1); + d1 = mlt(h0, r1) + mlt(h1, r0) + mlt(h2, s4) + + mlt(h3, s3) + mlt(h4, s2); + d2 = mlt(h0, r2) + mlt(h1, r1) + mlt(h2, r0) + + mlt(h3, s4) + mlt(h4, s3); + d3 = mlt(h0, r3) + mlt(h1, r2) + mlt(h2, r1) + + mlt(h3, r0) + mlt(h4, s4); + d4 = mlt(h0, r4) + mlt(h1, r3) + mlt(h2, r2) + + mlt(h3, r1) + mlt(h4, r0); + + /* (partial) h %= p */ + d1 += sr(d0, 26); h0 = and(d0, 0x3ffffff); + d2 += sr(d1, 26); h1 = and(d1, 0x3ffffff); + d3 += sr(d2, 26); h2 = and(d2, 0x3ffffff); + d4 += sr(d3, 26); h3 = and(d3, 0x3ffffff); + h0 += sr(d4, 26) * 5; h4 = and(d4, 0x3ffffff); + h1 += h0 >> 26; h0 = h0 & 0x3ffffff; + + src += POLY1305_BLOCK_SIZE; + } while (--nblocks); + + state->h[0] = h0; + state->h[1] = h1; + state->h[2] = h2; + state->h[3] = h3; + state->h[4] = h4; +} +EXPORT_SYMBOL_GPL(poly1305_core_blocks); + +void poly1305_core_emit(const struct poly1305_state *state, void *dst) +{ + u32 h0, h1, h2, h3, h4; + u32 g0, g1, g2, g3, g4; + u32 mask; + + /* fully carry h */ + h0 = state->h[0]; + h1 = state->h[1]; + h2 = state->h[2]; + h3 = state->h[3]; + h4 = state->h[4]; + + h2 += (h1 >> 26); h1 = h1 & 0x3ffffff; + h3 += (h2 >> 26); h2 = h2 & 0x3ffffff; + h4 += (h3 >> 26); h3 = h3 & 0x3ffffff; + h0 += (h4 >> 26) * 5; h4 = h4 & 0x3ffffff; + h1 += (h0 >> 26); h0 = h0 & 0x3ffffff; + + /* compute h + -p */ + g0 = h0 + 5; + g1 = h1 + (g0 >> 26); g0 &= 0x3ffffff; + g2 = h2 + (g1 >> 26); g1 &= 0x3ffffff; + g3 = h3 + (g2 >> 26); g2 &= 0x3ffffff; + g4 = h4 + (g3 >> 26) - (1 << 26); g3 &= 0x3ffffff; + + /* select h if h < p, or h + -p if h >= p */ + mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1; + g0 &= mask; + g1 &= mask; + g2 &= mask; + g3 &= mask; + g4 &= mask; + mask = ~mask; + h0 = (h0 & mask) | g0; + h1 = (h1 & mask) | g1; + h2 = (h2 & mask) | g2; + h3 = (h3 & mask) | g3; + h4 = (h4 & mask) | g4; + + /* h = h % (2^128) */ + put_unaligned_le32((h0 >> 0) | (h1 << 26), dst + 0); + put_unaligned_le32((h1 >> 6) | (h2 << 20), dst + 4); + put_unaligned_le32((h2 >> 12) | (h3 << 14), dst + 8); + put_unaligned_le32((h3 >> 18) | (h4 << 8), dst + 12); +} +EXPORT_SYMBOL_GPL(poly1305_core_emit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Martin Willi "); From patchwork Wed Sep 25 16:12:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161075 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63576924 for ; Wed, 25 Sep 2019 16:17:40 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3EBE021D79 for ; Wed, 25 Sep 2019 16:17:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="j27uKYfF"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="CAnW7qsq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3EBE021D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=7jB5AdcdarTrvDn8u1zWbgUDMRLmR0BnE9tGC6/aLDM=; b=j27uKYfFytuaDL WFfvzW85fRIxWmllnASh5MWUrYrYeVrcvjzjp8qVS1rwQjsRFpjoRj1rTx1huo0Mv1Abo8vxg4FeD TQrf99CZ5PdI4VdGxRiGuvW+ZxRsd/vkiykKA3/1idD+GlcuZpn6t4juOdUAQmymyrUTHENr5MAlG rOP/ShPm9MGZ5VrML282VQt+/AUA+ZUTX/Gn5A6jd/M+3YrNr2D4KGFvvTQhBJVzN7PS2qah4Dum9 tYz9XQnLEC4zZ+9pbYEEpT/CSX5M1gcb6HhOc2XfH2Ht0xRbyWw5CzgjFhK0Tdg//0bRdiP94gWJd bu43jEbmhjjdRmCcx4kQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9z1-0007bl-LX; Wed, 25 Sep 2019 16:17:39 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vW-0003z0-ME for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:05 +0000 Received: by mail-wm1-x343.google.com with SMTP id b24so5597325wmj.5 for ; Wed, 25 Sep 2019 09:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xpVJhurqboKmy8zjq8NwWmF/2cOoT7sCwN/7BICzDQc=; b=CAnW7qsqfrk5KFVvsKsDcgra7HlMJjjL2tpzu+fodKYoLQDhcCrInovzV4qj/eASai ROhGsosp9+zoYMmAiL7lA45JmrJWYK0TqGMRZDRBciGaidScg1iKhQdEAmgkieeDIWXZ JL5s8XsQU42tg8wpyuiAuFAEcuyVWFjAy3leU5/kX/IyBF4sAvKQtp3CHt5B7ZvnBQUs 7y1XO507W5aBNQo46ccGXQBjrrDCJ3LNHNF3QTG3Ham4ymf4ixhVpKQYRB0gVwJuOMgL 1us8T+MeoWba7+PVMVgfW7xU+5siLN0AeEAhvisjNVnnslWCJPlLqKgFgIjK4Ajsf4Fh YFmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xpVJhurqboKmy8zjq8NwWmF/2cOoT7sCwN/7BICzDQc=; b=NUGOR9AFC69j/ATHOLBQYx00uje64kGwd3QCf098q3uXXdIrLCFTnEZd4tMNoT+skG uu5Me7OqmDYpZrWSAq/pkPKvI8gqcQ55TwPu2E3robGs4GuCbkTgveDjOPfrLA+JzMbl asRoGtS0EPCfzGZDQUO+jdU2TGMyhXPzJVkve0QR+6ZUTmTxinBHSQ9/phD7nw1dnzZu JjsxzuF9KfUjT7JJZtZhnrGs3IZBZzHL9isWxxeMLMccNo4bJ3XMnf1mt1sPLNBjg79Z s0a4sUJn110iyZeymSEQH3mGsG9VOlRfQbJo6HEDW5TumuNyklK8SGgWdqMK0la5PFHY GtoQ== X-Gm-Message-State: APjAAAUmbt+zLNbFPAA2C8RrGLduxA5rwT5pUaeRzmZXSmYZ4wA/oveN jTYfw6Ir4avskB2IgvYjF7yRl2mjB0YcrjQf X-Google-Smtp-Source: APXvYqzjKIx5BU0X+hSOnuLm57ITShsKdJWzENe3xNpvJbLScKbHsUEDABMKZdHgw9OoOMLqLWiLEw== X-Received: by 2002:a7b:cb8b:: with SMTP id m11mr3358192wmi.145.1569428041353; Wed, 25 Sep 2019 09:14:01 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.13.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:14:00 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 10/18] crypto: poly1305 - add init/update/final library routines Date: Wed, 25 Sep 2019 18:12:47 +0200 Message-Id: <20190925161255.1871-11-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091402_925103_B91E4FE5 X-CRM114-Status: GOOD ( 18.55 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:343 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal information X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Add the usual init/update/final library routines for the Poly1305 keyed hash library. Since this will be the external interface of the library, move the poly1305_core_* routines to the internal header (and update the users to refer to it where needed) Signed-off-by: Ard Biesheuvel --- crypto/adiantum.c | 1 + crypto/nhpoly1305.c | 1 + crypto/poly1305_generic.c | 57 +++++++------------ include/crypto/internal/poly1305.h | 16 ++---- include/crypto/poly1305.h | 34 ++++++++++-- lib/crypto/poly1305.c | 58 ++++++++++++++++++++ 6 files changed, 115 insertions(+), 52 deletions(-) diff --git a/crypto/adiantum.c b/crypto/adiantum.c index 775ed1418e2b..aded26092268 100644 --- a/crypto/adiantum.c +++ b/crypto/adiantum.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include diff --git a/crypto/nhpoly1305.c b/crypto/nhpoly1305.c index b88a6a71e3e2..f6b6a52092b4 100644 --- a/crypto/nhpoly1305.c +++ b/crypto/nhpoly1305.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include diff --git a/crypto/poly1305_generic.c b/crypto/poly1305_generic.c index 46a2da1abac7..69241e7e85be 100644 --- a/crypto/poly1305_generic.c +++ b/crypto/poly1305_generic.c @@ -23,8 +23,8 @@ int crypto_poly1305_init(struct shash_desc *desc) { struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); - poly1305_core_init(&dctx->h); - dctx->buflen = 0; + poly1305_core_init(&dctx->desc.h); + dctx->desc.buflen = 0; dctx->rset = false; dctx->sset = false; @@ -42,16 +42,16 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx, { if (!dctx->sset) { if (!dctx->rset && srclen >= POLY1305_BLOCK_SIZE) { - poly1305_core_setkey(&dctx->r, src); + poly1305_core_setkey(&dctx->desc.r, src); src += POLY1305_BLOCK_SIZE; srclen -= POLY1305_BLOCK_SIZE; dctx->rset = true; } if (srclen >= POLY1305_BLOCK_SIZE) { - dctx->s[0] = get_unaligned_le32(src + 0); - dctx->s[1] = get_unaligned_le32(src + 4); - dctx->s[2] = get_unaligned_le32(src + 8); - dctx->s[3] = get_unaligned_le32(src + 12); + dctx->desc.s[0] = get_unaligned_le32(src + 0); + dctx->desc.s[1] = get_unaligned_le32(src + 4); + dctx->desc.s[2] = get_unaligned_le32(src + 8); + dctx->desc.s[3] = get_unaligned_le32(src + 12); src += POLY1305_BLOCK_SIZE; srclen -= POLY1305_BLOCK_SIZE; dctx->sset = true; @@ -72,7 +72,7 @@ static void poly1305_blocks(struct poly1305_desc_ctx *dctx, const u8 *src, srclen = datalen; } - poly1305_core_blocks(&dctx->h, &dctx->r, src, + poly1305_core_blocks(&dctx->desc.h, &dctx->desc.r, src, srclen / POLY1305_BLOCK_SIZE, 1); } @@ -82,16 +82,17 @@ int crypto_poly1305_update(struct shash_desc *desc, struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); unsigned int bytes; - if (unlikely(dctx->buflen)) { - bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->buflen); - memcpy(dctx->buf + dctx->buflen, src, bytes); + if (unlikely(dctx->desc.buflen)) { + bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->desc.buflen); + memcpy(dctx->desc.buf + dctx->desc.buflen, src, bytes); src += bytes; srclen -= bytes; - dctx->buflen += bytes; + dctx->desc.buflen += bytes; - if (dctx->buflen == POLY1305_BLOCK_SIZE) { - poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE); - dctx->buflen = 0; + if (dctx->desc.buflen == POLY1305_BLOCK_SIZE) { + poly1305_blocks(dctx, dctx->desc.buf, + POLY1305_BLOCK_SIZE); + dctx->desc.buflen = 0; } } @@ -102,8 +103,8 @@ int crypto_poly1305_update(struct shash_desc *desc, } if (unlikely(srclen)) { - dctx->buflen = srclen; - memcpy(dctx->buf, src, srclen); + dctx->desc.buflen = srclen; + memcpy(dctx->desc.buf, src, srclen); } return 0; @@ -113,31 +114,11 @@ EXPORT_SYMBOL_GPL(crypto_poly1305_update); int crypto_poly1305_final(struct shash_desc *desc, u8 *dst) { struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc); - __le32 digest[4]; - u64 f = 0; if (unlikely(!dctx->sset)) return -ENOKEY; - if (unlikely(dctx->buflen)) { - dctx->buf[dctx->buflen++] = 1; - memset(dctx->buf + dctx->buflen, 0, - POLY1305_BLOCK_SIZE - dctx->buflen); - poly1305_core_blocks(&dctx->h, &dctx->r, dctx->buf, 1, 0); - } - - poly1305_core_emit(&dctx->h, digest); - - /* mac = (h + s) % (2^128) */ - f = (f >> 32) + le32_to_cpu(digest[0]) + dctx->s[0]; - put_unaligned_le32(f, dst + 0); - f = (f >> 32) + le32_to_cpu(digest[1]) + dctx->s[1]; - put_unaligned_le32(f, dst + 4); - f = (f >> 32) + le32_to_cpu(digest[2]) + dctx->s[2]; - put_unaligned_le32(f, dst + 8); - f = (f >> 32) + le32_to_cpu(digest[3]) + dctx->s[3]; - put_unaligned_le32(f, dst + 12); - + poly1305_final(&dctx->desc, dst); return 0; } EXPORT_SYMBOL_GPL(crypto_poly1305_final); diff --git a/include/crypto/internal/poly1305.h b/include/crypto/internal/poly1305.h index ae0466c4ce59..619705200e70 100644 --- a/include/crypto/internal/poly1305.h +++ b/include/crypto/internal/poly1305.h @@ -10,22 +10,18 @@ #include struct poly1305_desc_ctx { - /* key */ - struct poly1305_key r; - /* finalize key */ - u32 s[4]; - /* accumulator */ - struct poly1305_state h; - /* partial buffer */ - u8 buf[POLY1305_BLOCK_SIZE]; - /* bytes used in partial buffer */ - unsigned int buflen; + struct poly1305_desc desc; /* r key has been set */ bool rset; /* s key has been set */ bool sset; }; +void poly1305_core_blocks(struct poly1305_state *state, + const struct poly1305_key *key, const void *src, + unsigned int nblocks, u32 hibit); +void poly1305_core_emit(const struct poly1305_state *state, void *dst); + /* Crypto API helper functions for the Poly1305 MAC */ int crypto_poly1305_init(struct shash_desc *desc); unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx, diff --git a/include/crypto/poly1305.h b/include/crypto/poly1305.h index 83e4b4c69a5a..148d9049b906 100644 --- a/include/crypto/poly1305.h +++ b/include/crypto/poly1305.h @@ -6,6 +6,7 @@ #ifndef _CRYPTO_POLY1305_H #define _CRYPTO_POLY1305_H +#include #include #include @@ -21,6 +22,19 @@ struct poly1305_state { u32 h[5]; /* accumulator, base 2^26 */ }; +struct poly1305_desc { + /* key */ + struct poly1305_key r; + /* finalize key */ + u32 s[4]; + /* accumulator */ + struct poly1305_state h; + /* partial buffer */ + u8 buf[POLY1305_BLOCK_SIZE]; + /* bytes used in partial buffer */ + unsigned int buflen; +}; + /* * Poly1305 core functions. These implement the ε-almost-∆-universal hash * function underlying the Poly1305 MAC, i.e. they don't add an encrypted nonce @@ -31,8 +45,20 @@ static inline void poly1305_core_init(struct poly1305_state *state) { *state = (struct poly1305_state){}; } -void poly1305_core_blocks(struct poly1305_state *state, - const struct poly1305_key *key, const void *src, - unsigned int nblocks, u32 hibit); -void poly1305_core_emit(const struct poly1305_state *state, void *dst); + +static inline void poly1305_init(struct poly1305_desc *desc, const u8 *key) +{ + poly1305_core_setkey(&desc->r, key); + desc->s[0] = get_unaligned_le32(key + 16); + desc->s[1] = get_unaligned_le32(key + 20); + desc->s[2] = get_unaligned_le32(key + 24); + desc->s[3] = get_unaligned_le32(key + 28); + poly1305_core_init(&desc->h); + desc->buflen = 0; +} + +void poly1305_update(struct poly1305_desc *desc, const u8 *src, + unsigned int nbytes); +void poly1305_final(struct poly1305_desc *desc, u8 *digest); + #endif diff --git a/lib/crypto/poly1305.c b/lib/crypto/poly1305.c index abe6fccf7b9c..9af7cb5364af 100644 --- a/lib/crypto/poly1305.c +++ b/lib/crypto/poly1305.c @@ -154,5 +154,63 @@ void poly1305_core_emit(const struct poly1305_state *state, void *dst) } EXPORT_SYMBOL_GPL(poly1305_core_emit); +void poly1305_update(struct poly1305_desc *desc, const u8 *src, + unsigned int nbytes) +{ + unsigned int bytes; + + if (unlikely(desc->buflen)) { + bytes = min(nbytes, POLY1305_BLOCK_SIZE - desc->buflen); + memcpy(desc->buf + desc->buflen, src, bytes); + src += bytes; + nbytes -= bytes; + desc->buflen += bytes; + + if (desc->buflen == POLY1305_BLOCK_SIZE) { + poly1305_core_blocks(&desc->h, &desc->r, desc->buf, 1, 1); + desc->buflen = 0; + } + } + + if (likely(nbytes >= POLY1305_BLOCK_SIZE)) { + poly1305_core_blocks(&desc->h, &desc->r, src, + nbytes / POLY1305_BLOCK_SIZE, 1); + src += nbytes - (nbytes % POLY1305_BLOCK_SIZE); + nbytes %= POLY1305_BLOCK_SIZE; + } + + if (unlikely(nbytes)) { + desc->buflen = nbytes; + memcpy(desc->buf, src, nbytes); + } +} +EXPORT_SYMBOL_GPL(poly1305_update); + +void poly1305_final(struct poly1305_desc *desc, u8 *dst) +{ + __le32 digest[4]; + u64 f = 0; + + if (unlikely(desc->buflen)) { + desc->buf[desc->buflen++] = 1; + memset(desc->buf + desc->buflen, 0, + POLY1305_BLOCK_SIZE - desc->buflen); + poly1305_core_blocks(&desc->h, &desc->r, desc->buf, 1, 0); + } + + poly1305_core_emit(&desc->h, digest); + + /* mac = (h + s) % (2^128) */ + f = (f >> 32) + le32_to_cpu(digest[0]) + desc->s[0]; + put_unaligned_le32(f, dst + 0); + f = (f >> 32) + le32_to_cpu(digest[1]) + desc->s[1]; + put_unaligned_le32(f, dst + 4); + f = (f >> 32) + le32_to_cpu(digest[2]) + desc->s[2]; + put_unaligned_le32(f, dst + 8); + f = (f >> 32) + le32_to_cpu(digest[3]) + desc->s[3]; + put_unaligned_le32(f, dst + 12); +} +EXPORT_SYMBOL_GPL(poly1305_final); + MODULE_LICENSE("GPL"); MODULE_AUTHOR("Martin Willi "); From patchwork Wed Sep 25 16:12:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161077 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D0CE924 for ; Wed, 25 Sep 2019 16:18:16 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6ACA921D79 for ; Wed, 25 Sep 2019 16:18:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="gt33bye3"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="EZelB1bm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6ACA921D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=lyuL7ah7G0JF2I4iSZoZxgi+2LVhfyPxHhMeSUM7gPM=; b=gt33bye3/w4fjH mlY80cyyxvYTbviEfDZJBNQe46WRizvAEDAQA9NbEa01yHfFtQLi9uQYmlG7KvPS/1u/pSFF198l6 poR0DPQE12Crgiv0i2eIFyKLyJafew6ZSThnYVnpwC7hr80aYvGh4fmNfubdggq+FZ5Im94A0a1To 3M90Wj170AbmGIJYAQzDZinWzzGoa9jw3x8QctVzl0wBIMsBI/KzkJBpzPPOZ2dxweJ3uYl34fsbN Of+TaCPfpHRlJmPMFAl4+52jdJqTqn6s/sWbTUU9tx+kOOmKwtp2Jbb++TqjTG7Y7LzVRMrEwCXWe NmZb1N7e5HjBXwKaOCaw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9za-0007wm-FF; Wed, 25 Sep 2019 16:18:14 +0000 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vY-000401-8J for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:06 +0000 Received: by mail-wr1-x443.google.com with SMTP id n14so7640722wrw.9 for ; Wed, 25 Sep 2019 09:14:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GHZTUsgGCCxhaUqhh6GFmynzyGA2xAGilsSBrwWL8Ec=; b=EZelB1bmBbVqkj69RFWzhZDiyl+QrRS1Ld/4jAK9o8wa+U3Y4C9KupSUIqj2czhl7k ITiHGURXEqXrMJmGJNB9KMIUGwxFnx7bIhXwPlyP3otwQoRuaqODihXzE3AhyE+SuMOe Eupyf7hjZZlON1Utlzp/RmQMPvLJn2rjU8LmbXlYIzGZ+eXKz+3lmREZnr2u+zWgdOnC DnAE1RwkIUI90BBXqLfP2SQt/xl177ryb9XkpwBkxrUMcHJChPVU9vFbBntefNklJgCR eMbWcs4UmklFy7RjjpAUH6gGYaPbUv9a9jmIlTOejjFpzVISicNuFbq7E+U5L0HF88Yg IqzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GHZTUsgGCCxhaUqhh6GFmynzyGA2xAGilsSBrwWL8Ec=; b=T5ABYLXmO3W9T9GXjrpSXV3Y9vTq1A4mtXoYWyG56WZWuGYQ8ut4iTHG7a55U1XMXn KQUtKknqDbxglHDthQQvznO0j5T4qrSJnq3FJMtOAiyqI1UZ0GqEH8S0Zh/j07IdSo5J 3oI6Fad2G+4LuG8SNu9fWPqwf+TJG+ZETxK+K9MgIMVNYT/ryH9i4fFohlFqR53bbEpd Ao/Iw6jXGDcXTppdiR8Teph5RQxdv8a7seBZQqShVLS0A4sATivkjyymg4JaVZL4pvy7 Z1x5Od/h86gOtS3UbLAewkDt83XxvRlHD4voDirkBD6ngnLFzpO+KKPCex4bZa8Qnhrh 7Siw== X-Gm-Message-State: APjAAAV/W72IiUFTRLEwlxjJVZspm3W1PPARV2brrfEHK4EytW8XPfTu dwWvZnUhzjelCZkOEB7lYNb24Q== X-Google-Smtp-Source: APXvYqz1sBIY8jpDffs/GzFrwhS+OK125T+2rgBRUTsh9BLcog3Q1msw/qOOPlbx/V81RfwvRmI7vA== X-Received: by 2002:a5d:62c6:: with SMTP id o6mr9945348wrv.243.1569428043029; Wed, 25 Sep 2019 09:14:03 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.14.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:14:02 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 11/18] int128: move __uint128_t compiler test to Kconfig Date: Wed, 25 Sep 2019 18:12:48 +0200 Message-Id: <20190925161255.1871-12-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091404_575543_BC3701D1 X-CRM114-Status: GOOD ( 15.09 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:443 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org In order to use 128-bit integer arithmetic in C code, the architecture needs to have declared support for it by setting ARCH_SUPPORTS_INT128, and it requires a version of the toolchain that supports this at build time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test whether __SIZEOF_INT128__ is defined, since this is only the case for compilers that can support 128-bit integers. Let's fold this additional test into the Kconfig declaration of ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles, e.g., to decide whether a certain object needs to be included in the first place. Signed-off-by: Ard Biesheuvel --- crypto/ecc.c | 2 +- init/Kconfig | 1 + lib/ubsan.c | 2 +- lib/ubsan.h | 2 +- 4 files changed, 4 insertions(+), 3 deletions(-) diff --git a/crypto/ecc.c b/crypto/ecc.c index dfe114bc0c4a..6e6aab6c987c 100644 --- a/crypto/ecc.c +++ b/crypto/ecc.c @@ -336,7 +336,7 @@ static u64 vli_usub(u64 *result, const u64 *left, u64 right, static uint128_t mul_64_64(u64 left, u64 right) { uint128_t result; -#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) +#if defined(CONFIG_ARCH_SUPPORTS_INT128) unsigned __int128 m = (unsigned __int128)left * right; result.m_low = m; diff --git a/init/Kconfig b/init/Kconfig index bd7d650d4a99..e66f64a26d7d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -785,6 +785,7 @@ config ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH # config ARCH_SUPPORTS_INT128 bool + depends on !$(cc-option,-D__SIZEOF_INT128__=0) # For architectures that (ab)use NUMA to represent different memory regions # all cpu-local but of different latencies, such as SuperH. diff --git a/lib/ubsan.c b/lib/ubsan.c index e7d31735950d..b652cc14dd60 100644 --- a/lib/ubsan.c +++ b/lib/ubsan.c @@ -119,7 +119,7 @@ static void val_to_string(char *str, size_t size, struct type_descriptor *type, { if (type_is_int(type)) { if (type_bit_width(type) == 128) { -#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) +#if defined(CONFIG_ARCH_SUPPORTS_INT128) u_max val = get_unsigned_val(type, value); scnprintf(str, size, "0x%08x%08x%08x%08x", diff --git a/lib/ubsan.h b/lib/ubsan.h index b8fa83864467..7b56c09473a9 100644 --- a/lib/ubsan.h +++ b/lib/ubsan.h @@ -78,7 +78,7 @@ struct invalid_value_data { struct type_descriptor *type; }; -#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) +#if defined(CONFIG_ARCH_SUPPORTS_INT128) typedef __int128 s_max; typedef unsigned __int128 u_max; #else From patchwork Wed Sep 25 16:12:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161081 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CB1D1709 for ; Wed, 25 Sep 2019 16:18:46 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 13CB6222C7 for ; Wed, 25 Sep 2019 16:18:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="s49F3tVE"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="XRKs2NYw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13CB6222C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=svJiKKXQbZhcBqaMCYOR+QJq4Xyfh8vS8XOFavJ1xN4=; b=s49F3tVEudLkst idKyEYasrezDr3hwIbn1vG6bwthx/QN/ubdUH0eb9yHF5lntq9QkkURA3PWuw9yUZdsVqbuAMVHiu RQlN9SnHmNvEBYss7Mbq8OpZ9Np6QtL0RiUBDvQfnusHwMqEEfuz9pu+sfaSP/RJjo3ll+AUHA6lx mZbP84fc35uIp6TJL2pkZfVSEHrDNG6gId0gvW/IUDvrfhEtNK/13MEm0nNfVVjLEi5gr6bf+x82F 2br2B2vlE9hhns2zo6UC4feN5CiRf85OzFqP9fpg/GKoTDhuEsVQMQFrqP03FG0yl4woJo9Lq4Q/j QCvoPOqHNYnwGcEVBacw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iDA03-0008I8-6F; Wed, 25 Sep 2019 16:18:43 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vh-00045V-PV for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:17 +0000 Received: by mail-wr1-x441.google.com with SMTP id h7so7644519wrw.8 for ; Wed, 25 Sep 2019 09:14:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0Jn67cgHpegjUHsejza3rXvNu+oyRcsIfVJ14lUdJdc=; b=XRKs2NYw1E8U9jJNC8URLm1v/D0E45ZGzjruLuYncb+se4PRgodkTOSbJKomErYA8A aBx3lV2el3sxyCNKIL4eXbNQPuR1xtPQ7AqCKcspizIUydpWF8XSp1d/12gAYylUfZkW Vq2CGQalEf6bYg4NWNLgpr4GZbbJ3I7nj8HX65C6pUaO256ht0C6HhxLS98Arc0Bdtbf t9NJmmRz17ZRwZ2KtyMCn2begsYmiEYQfixNVxYVzjqwsxv2xGCLr4vQAJz+w5NX5tiV 0aMtvtcgmSHOgQautTSanq/Mc/wFnViTtqZPwXuEVF+Ycc9wGZRGOk4F+AEA8/eJCIF0 qasQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0Jn67cgHpegjUHsejza3rXvNu+oyRcsIfVJ14lUdJdc=; b=TGUDq4oGi0Bq1BLRGB5WbLfViAtOpOAxYw4PDGmuaVDOCOag+apY0cKzjKHxjahvgX pwD6c1XbM07JTWSW/98M5FsAcB2qR5ynh8W3RraOnL8uH72MWtuA9Efnk3QPBQHvVlMb rPPDj1R84c0tUMgt14h08GYB65KjXMA7geyUhH6OGoOc/13KKIjHPSX2fi5RHZ1k6pR4 kXKn1VfqlZ3Uwdv/w6kDQK10DyodITwVoF1S/HIYDB83X5+DlUuYWvHQfoDErBzhQX9j vAfjBGEM10dLwJeMk/aYca4LVGKInBLmmCqhBZmG4bU1aBreoiOYHs+TXb2knWLWVL6Y doHg== X-Gm-Message-State: APjAAAUFNVJ08PYOKVDUODzWfZnE1U0G8YVC7FbXcGtIqc2zXvfUCbrh kUYxJV33O3VHq7ybh7B/Knagjg== X-Google-Smtp-Source: APXvYqxG3ZsPD1p2O9Mgdd9cJrxp7TwXatNXcNgf+8foDbIum7T7zIqBBIHsTAaJyjCie+BmcAs/Ug== X-Received: by 2002:adf:e908:: with SMTP id f8mr9903669wrm.210.1569428052551; Wed, 25 Sep 2019 09:14:12 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.14.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:14:11 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 16/18] netlink: use new strict length types in policy for 5.2 Date: Wed, 25 Sep 2019 18:12:53 +0200 Message-Id: <20190925161255.1871-17-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091414_132361_26CCDF43 X-CRM114-Status: GOOD ( 10.98 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:441 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bruno Wolff III , "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Taken from https://git.zx2c4.com/WireGuard/commit/src?id=3120425f69003be287cb2d308f89c7a6a0335ff0 Reported-by: Bruno Wolff III --- drivers/net/wireguard/netlink.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c index 3763e8c14ea5..676d36725120 100644 --- a/drivers/net/wireguard/netlink.c +++ b/drivers/net/wireguard/netlink.c @@ -21,8 +21,8 @@ static struct genl_family genl_family; static const struct nla_policy device_policy[WGDEVICE_A_MAX + 1] = { [WGDEVICE_A_IFINDEX] = { .type = NLA_U32 }, [WGDEVICE_A_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 }, - [WGDEVICE_A_PRIVATE_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, - [WGDEVICE_A_PUBLIC_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, + [WGDEVICE_A_PRIVATE_KEY] = { .type = NLA_EXACT_LEN, .len = NOISE_PUBLIC_KEY_LEN }, + [WGDEVICE_A_PUBLIC_KEY] = { .type = NLA_EXACT_LEN, .len = NOISE_PUBLIC_KEY_LEN }, [WGDEVICE_A_FLAGS] = { .type = NLA_U32 }, [WGDEVICE_A_LISTEN_PORT] = { .type = NLA_U16 }, [WGDEVICE_A_FWMARK] = { .type = NLA_U32 }, @@ -30,12 +30,12 @@ static const struct nla_policy device_policy[WGDEVICE_A_MAX + 1] = { }; static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = { - [WGPEER_A_PUBLIC_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, - [WGPEER_A_PRESHARED_KEY] = { .len = NOISE_SYMMETRIC_KEY_LEN }, + [WGPEER_A_PUBLIC_KEY] = { .type = NLA_EXACT_LEN, .len = NOISE_PUBLIC_KEY_LEN }, + [WGPEER_A_PRESHARED_KEY] = { .type = NLA_EXACT_LEN, .len = NOISE_SYMMETRIC_KEY_LEN }, [WGPEER_A_FLAGS] = { .type = NLA_U32 }, - [WGPEER_A_ENDPOINT] = { .len = sizeof(struct sockaddr) }, + [WGPEER_A_ENDPOINT] = { .type = NLA_MIN_LEN, .len = sizeof(struct sockaddr) }, [WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL] = { .type = NLA_U16 }, - [WGPEER_A_LAST_HANDSHAKE_TIME] = { .len = sizeof(struct __kernel_timespec) }, + [WGPEER_A_LAST_HANDSHAKE_TIME] = { .type = NLA_EXACT_LEN, .len = sizeof(struct __kernel_timespec) }, [WGPEER_A_RX_BYTES] = { .type = NLA_U64 }, [WGPEER_A_TX_BYTES] = { .type = NLA_U64 }, [WGPEER_A_ALLOWEDIPS] = { .type = NLA_NESTED }, @@ -44,7 +44,7 @@ static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = { static const struct nla_policy allowedip_policy[WGALLOWEDIP_A_MAX + 1] = { [WGALLOWEDIP_A_FAMILY] = { .type = NLA_U16 }, - [WGALLOWEDIP_A_IPADDR] = { .len = sizeof(struct in_addr) }, + [WGALLOWEDIP_A_IPADDR] = { .type = NLA_MIN_LEN, .len = sizeof(struct in_addr) }, [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 } }; @@ -591,12 +591,10 @@ static const struct genl_ops genl_ops[] = { .start = wg_get_device_start, .dumpit = wg_get_device_dump, .done = wg_get_device_done, - .policy = device_policy, .flags = GENL_UNS_ADMIN_PERM }, { .cmd = WG_CMD_SET_DEVICE, .doit = wg_set_device, - .policy = device_policy, .flags = GENL_UNS_ADMIN_PERM } }; @@ -608,6 +606,7 @@ static struct genl_family genl_family __ro_after_init = { .version = WG_GENL_VERSION, .maxattr = WGDEVICE_A_MAX, .module = THIS_MODULE, + .policy = device_policy, .netnsok = true }; From patchwork Wed Sep 25 16:12:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161095 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4367D1709 for ; Wed, 25 Sep 2019 16:19:04 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 214B421D79 for ; Wed, 25 Sep 2019 16:19:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="SM7Y728r"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="EoCQ8h10" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 214B421D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=KPb6CGN+kaxnjB7RIgxzpFnSBHa7IsdvP/KQX6C/7GI=; b=SM7Y728rntrl3b BWBrQo66phm5sbQmHRO0YKqfS0bcJANS1xXDdEgi4diBW7G5G8w+mH0EaItDsy216veetOBs1yfv+ vot+k3ysy1AzIjoDy/6PZhs6/Liigtj7ShFZhflACee9jxdsf+BlXAVVWsBRG1hpxSfZ9M9fbN2vY GUBvvXYUlcCGbJutxPo3oep23O+78aFa0lizQIFC/YsKmqALi6WyP51os2X5x8+KlaMp5f1kuUptY syvA0KM+K5JDfTao2yhNCE2CNpFpPJifVeTwzmVRO5hFHCt1V8KDdH8rTFXDG3dzP4imDOggPiBY+ ElCKMCP45Hl4EIlWOjNw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iDA0N-00009W-4T; Wed, 25 Sep 2019 16:19:03 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vj-00046c-PR for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:19 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5so6427777wmg.0 for ; Wed, 25 Sep 2019 09:14:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vpYX5apOE5ovWt54wDvLpgCFg3w7Y3yfqWQuF42X3Ys=; b=EoCQ8h108o9ud79LK9qo/lEUOOMuQ+8osdvhOyjax1WhpI2184srTIyoMNIe740+DL PUcEG2HHc8fCdWvJb0SSJwMZIKVyRKNTMVEsFB/8IU8VDms3Ne0Ef83bLBd2cjCrYtCR SRjrANqSyHMKwUfeqL39Bkzo2NRv8UpRkVkvQL1q6pPRCeKZJ0h9J2+OnNwtpoZnCsk4 QZXfquoD3+tk/HriLcgx5buUFhjTYbb1OVamAfkLtzqaItM7tx+AGUk4nIO73VVcZS1E j9+B+x5xSA4YGcEfUl9jsKx6VPygEvioFVenr5YkCjIlbkiIo9ejOO4tZXy3lJXs2H0N lYqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vpYX5apOE5ovWt54wDvLpgCFg3w7Y3yfqWQuF42X3Ys=; b=fFhfxdAtC2Ow+qB+RjYViQ2h8a87/StIxLzIdEcJm9Pi6QfSssWflLJCGZRwp8IUoH H+ZBhlKrkBXxoAwk5vJfWuTBeVcTJw4bQApscof+arwTfu+faRjONmRSwKPcbydvoBMw FUFrdySJHijC8b5ZqicRmycPCcbaVsKIfkP2NIVjvbOvOKxUSPTGXrRlJgqKsjFjfYDY 8pEl4/lxE8IBytA3zHAS5n1jMIEQC5p8IP3c8AWNu/kLfLHpzqlGX/th+DldFK3v759y wSln/5FWIiit0F81Ty05gyyhYtWBywoIKhbAJF2UkLlHu1LkiONJMxxP7RazHx7Y8Xmn VUQw== X-Gm-Message-State: APjAAAX6D5olEbUnnYoMANi3w+8Usx8Xnqa9FFDAlp/mOPrK9NgbIlI9 5L8R3YDrKXm8bl97bHAPwlVDcQ== X-Google-Smtp-Source: APXvYqwNRaBDeDani6z4F3/B9eCXG5mzVV5VhoPg/CU7b5DVTv+fWOO/iUVm1vGry6yb/vakYpaOYQ== X-Received: by 2002:a1c:7ed7:: with SMTP id z206mr9232689wmc.124.1569428054603; Wed, 25 Sep 2019 09:14:14 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.14.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:14:13 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 17/18] wg switch to lib/crypto algos Date: Wed, 25 Sep 2019 18:12:54 +0200 Message-Id: <20190925161255.1871-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091416_017797_15F6A684 X-CRM114-Status: GOOD ( 11.52 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:344 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org --- drivers/net/Kconfig | 6 +++--- drivers/net/wireguard/cookie.c | 4 ++-- drivers/net/wireguard/messages.h | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index c26aef673538..3bd4dc662392 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -77,9 +77,9 @@ config WIREGUARD depends on IPV6 || !IPV6 select NET_UDP_TUNNEL select DST_CACHE - select ZINC_CHACHA20POLY1305 - select ZINC_BLAKE2S - select ZINC_CURVE25519 + select CRYPTO_LIB_CHACHA20POLY1305 + select CRYPTO_LIB_BLAKE2S + select CRYPTO_LIB_CURVE25519 help WireGuard is a secure, fast, and easy to use replacement for IPSec that uses modern cryptography and clever networking tricks. It's diff --git a/drivers/net/wireguard/cookie.c b/drivers/net/wireguard/cookie.c index bd23a14ff87f..104b739c327f 100644 --- a/drivers/net/wireguard/cookie.c +++ b/drivers/net/wireguard/cookie.c @@ -10,8 +10,8 @@ #include "ratelimiter.h" #include "timers.h" -#include -#include +#include +#include #include #include diff --git a/drivers/net/wireguard/messages.h b/drivers/net/wireguard/messages.h index 3cfd1c5e9b02..4bbb1f97af04 100644 --- a/drivers/net/wireguard/messages.h +++ b/drivers/net/wireguard/messages.h @@ -6,9 +6,9 @@ #ifndef _WG_MESSAGES_H #define _WG_MESSAGES_H -#include -#include -#include +#include +#include +#include #include #include From patchwork Wed Sep 25 16:12:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11161097 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05BCB14DB for ; Wed, 25 Sep 2019 16:19:27 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D37432146E for ; Wed, 25 Sep 2019 16:19:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="qNoXsgVz"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="RH6HBNEq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D37432146E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WpV3qhP3U5LyKD8a5gC/wMrTs9E+Z4SHbrn43Mn8Pg0=; b=qNoXsgVzRwQ6V6 3+7Nao5vNsgJV3E20POumHp+3RRXmbTPELXbX2P3q3DJCNhtyVhtI29z4ID3YEIQEXshqSviOlcHT dgLOZ1HxFxjBU7OPItZ3Aqplxww8XcOhppc1M48EFrWZ9xslYQQq9tBXYj71Tit9nIqgml5WkezlC BlCVogTYr/2djzazxdPxChdf+leDreaUsmD9NgGID+QF8HABCRSKk/j6wCXUZtFBxZNEGIzkIKmTw WWO6jEEDGdsvrY39QHdB/x66PiZcf9oD3P2YNu569A7RvW58tFF6JryzJujZnQZntgOtHCBO1xxUP es7idAM7IUGQlSrZaJhw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iDA0j-0000PY-So; Wed, 25 Sep 2019 16:19:26 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iD9vl-00047X-CH for linux-arm-kernel@lists.infradead.org; Wed, 25 Sep 2019 16:14:23 +0000 Received: by mail-wm1-x341.google.com with SMTP id f22so5644551wmc.2 for ; Wed, 25 Sep 2019 09:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=m3ulTEA/3BA67wNTxSZXJ33CnsNx57q64URFh48GI0c=; b=RH6HBNEqhY9LzPWnJDN+VkfzutAmKJiEYfrFXcT53blsHxoSt8XhsEPWedRVq8q4zk /HZWYXnp77Nk6H2pJ4fpH2NdHr2VnDsnMrx/vDU+zNUsbMBg7Gxl/hE1zNtCW3O3Mvqk g9ljunjvYU1SLGgYv/AEBz1CqRXTPEMv62Dk4Yu1rVPgiafUREqItXmoNLFbZYkrHB2D hvtHtpJ3fc/sGUMqJSu+TAEBiZxAegXNqJLDQg0mDOrI57LOzbBbGe0/Xu5f8c3yC+WQ zQ9umJj8ZZQJ8h+IeBEizzUGbfO2rjVkBaBZ+6qrqHRMgFMCqOdATCAQ+CEp8VPQpOW3 dFlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=m3ulTEA/3BA67wNTxSZXJ33CnsNx57q64URFh48GI0c=; b=L/DNmpWXyXhSAXty2rB2Ecqe3hxWM2w/wLrhQ/DXPrOODKu6LTuoKNIM/uA/h09sLX EMQzxnmi6NLHsbFHSfn1SomJvsxZrkMiOKXLLoteCB1msPpiBAALY1mZNuZLwmOi8U/c 1/CSOXknk5J8OaIotxm/h7trwxreAOjdTdpL1NiBXJD2dT2jkgeXYDeC2XkxPwSLJi2M oo/8o1tue0N+r0SnIcRKGNHNbVQgS6CXw0vvJ1H/x/r64tOvKfDMqbTlUX56uKdeARTa OyMd/rj9uoBnOZBi5rVlRk+X/P7hWww+/Vu9eXGzTWsobO09IWefm/j6g6EumhY/xYUa xylQ== X-Gm-Message-State: APjAAAUmaWlDDdB7YQceorNUOGCH8P5OENDDPAyFr02rdD8f378xBBcx 02hOiP6bH5ZqTrIF/4GImn8s1w== X-Google-Smtp-Source: APXvYqzMsBhNH5UNFZF1bzfPqeNCYYxO/juqm3X9vngxBz2Vy7X7aY24toGtRKqi2Vq0pqbMe1eaSA== X-Received: by 2002:a1c:c5c3:: with SMTP id v186mr8453265wmf.125.1569428055951; Wed, 25 Sep 2019 09:14:15 -0700 (PDT) Received: from localhost.localdomain (laubervilliers-657-1-83-120.w92-154.abo.wanadoo.fr. [92.154.90.120]) by smtp.gmail.com with ESMTPSA id o70sm4991085wme.29.2019.09.25.09.14.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 09:14:15 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption Date: Wed, 25 Sep 2019 18:12:55 +0200 Message-Id: <20190925161255.1871-19-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190925161255.1871-1-ard.biesheuvel@linaro.org> References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190925_091417_537331_2AA51B70 X-CRM114-Status: GOOD ( 21.33 ) X-Spam-Score: -0.2 (/) X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary: Content analysis details: (-0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:341 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Samuel Neves , Will Deacon , Dan Carpenter , Andy Lutomirski , Marc Zyngier , Linus Torvalds , David Miller , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Replace the chacha20poly1305() library calls with invocations of the RFC7539 AEAD, as implemented by the generic chacha20poly1305 template. For now, only synchronous AEADs are supported, but looking at the code, it does not look terribly complicated to add support for async versions of rfc7539(chacha20,poly1305) as well, some of which already exist in the drivers/crypto tree. The nonce related changes are there to address the mismatch between the 96-bit nonce (aka IV) that the rfc7539() template expects, and the 64-bit nonce that WireGuard uses. Note that these changes take advantage of the fact that synchronous instantiations of the generic rfc7539() template will use a zero reqsize if possible, removing the need for heap allocations for the request structures. This code was tested using the included netns.sh script, and by connecting to the WireGuard demo server (https://www.wireguard.com/quickstart/#demo-server) Signed-off-by: Ard Biesheuvel --- drivers/net/wireguard/noise.c | 34 ++++++++++++- drivers/net/wireguard/noise.h | 3 +- drivers/net/wireguard/queueing.h | 5 +- drivers/net/wireguard/receive.c | 51 ++++++++++++-------- drivers/net/wireguard/send.c | 45 ++++++++++------- 5 files changed, 97 insertions(+), 41 deletions(-) diff --git a/drivers/net/wireguard/noise.c b/drivers/net/wireguard/noise.c index bf0b8c5ab298..0e7aab9f645d 100644 --- a/drivers/net/wireguard/noise.c +++ b/drivers/net/wireguard/noise.c @@ -109,16 +109,37 @@ static struct noise_keypair *keypair_create(struct wg_peer *peer) if (unlikely(!keypair)) return NULL; + + keypair->sending.tfm = crypto_alloc_aead("rfc7539(chacha20,poly1305)", + 0, CRYPTO_ALG_ASYNC); + if (unlikely(IS_ERR(keypair->sending.tfm))) + goto free_keypair; + keypair->receiving.tfm = crypto_alloc_aead("rfc7539(chacha20,poly1305)", + 0, CRYPTO_ALG_ASYNC); + if (unlikely(IS_ERR(keypair->receiving.tfm))) + goto free_sending_tfm; + keypair->internal_id = atomic64_inc_return(&keypair_counter); keypair->entry.type = INDEX_HASHTABLE_KEYPAIR; keypair->entry.peer = peer; kref_init(&keypair->refcount); return keypair; + +free_sending_tfm: + crypto_free_aead(keypair->sending.tfm); +free_keypair: + kzfree(keypair); + return NULL; } static void keypair_free_rcu(struct rcu_head *rcu) { - kzfree(container_of(rcu, struct noise_keypair, rcu)); + struct noise_keypair *keypair = + container_of(rcu, struct noise_keypair, rcu); + + crypto_free_aead(keypair->sending.tfm); + crypto_free_aead(keypair->receiving.tfm); + kzfree(keypair); } static void keypair_free_kref(struct kref *kref) @@ -360,11 +381,20 @@ static void derive_keys(struct noise_symmetric_key *first_dst, struct noise_symmetric_key *second_dst, const u8 chaining_key[NOISE_HASH_LEN]) { - kdf(first_dst->key, second_dst->key, NULL, NULL, + u8 key[2][NOISE_SYMMETRIC_KEY_LEN]; + int err; + + kdf(key[0], key[1], NULL, NULL, NOISE_SYMMETRIC_KEY_LEN, NOISE_SYMMETRIC_KEY_LEN, 0, 0, chaining_key); symmetric_key_init(first_dst); symmetric_key_init(second_dst); + + err = crypto_aead_setkey(first_dst->tfm, key[0], sizeof(key[0])) ?: + crypto_aead_setkey(second_dst->tfm, key[1], sizeof(key[1])); + memzero_explicit(key, sizeof(key)); + if (unlikely(err)) + pr_warn_once("crypto_aead_setkey() failed (%d)\n", err); } static bool __must_check mix_dh(u8 chaining_key[NOISE_HASH_LEN], diff --git a/drivers/net/wireguard/noise.h b/drivers/net/wireguard/noise.h index 9c2cc62dc11e..6f033d2ea52c 100644 --- a/drivers/net/wireguard/noise.h +++ b/drivers/net/wireguard/noise.h @@ -8,6 +8,7 @@ #include "messages.h" #include "peerlookup.h" +#include #include #include #include @@ -26,7 +27,7 @@ union noise_counter { }; struct noise_symmetric_key { - u8 key[NOISE_SYMMETRIC_KEY_LEN]; + struct crypto_aead *tfm; union noise_counter counter; u64 birthdate; bool is_valid; diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h index f8de703dff97..593971edf8a3 100644 --- a/drivers/net/wireguard/queueing.h +++ b/drivers/net/wireguard/queueing.h @@ -55,9 +55,10 @@ enum packet_state { }; struct packet_cb { - u64 nonce; - struct noise_keypair *keypair; atomic_t state; + __le32 ivpad; /* pad 64-bit nonce to 96 bits */ + __le64 nonce; + struct noise_keypair *keypair; u32 mtu; u8 ds; }; diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c index 900c76edb9d6..395089e7e3a6 100644 --- a/drivers/net/wireguard/receive.c +++ b/drivers/net/wireguard/receive.c @@ -11,7 +11,7 @@ #include "cookie.h" #include "socket.h" -#include +#include #include #include #include @@ -244,13 +244,14 @@ static void keep_key_fresh(struct wg_peer *peer) } } -static bool decrypt_packet(struct sk_buff *skb, struct noise_symmetric_key *key, - simd_context_t *simd_context) +static bool decrypt_packet(struct sk_buff *skb, struct noise_symmetric_key *key) { struct scatterlist sg[MAX_SKB_FRAGS + 8]; + struct aead_request *req, stackreq; struct sk_buff *trailer; unsigned int offset; int num_frags; + int err; if (unlikely(!key)) return false; @@ -262,8 +263,8 @@ static bool decrypt_packet(struct sk_buff *skb, struct noise_symmetric_key *key, return false; } - PACKET_CB(skb)->nonce = - le64_to_cpu(((struct message_data *)skb->data)->counter); + PACKET_CB(skb)->ivpad = 0; + PACKET_CB(skb)->nonce = ((struct message_data *)skb->data)->counter; /* We ensure that the network header is part of the packet before we * call skb_cow_data, so that there's no chance that data is removed @@ -281,9 +282,23 @@ static bool decrypt_packet(struct sk_buff *skb, struct noise_symmetric_key *key, if (skb_to_sgvec(skb, sg, 0, skb->len) <= 0) return false; - if (!chacha20poly1305_decrypt_sg(sg, sg, skb->len, NULL, 0, - PACKET_CB(skb)->nonce, key->key, - simd_context)) + if (unlikely(crypto_aead_reqsize(key->tfm) > 0)) { + req = aead_request_alloc(key->tfm, GFP_ATOMIC); + if (!req) + return false; + } else { + req = &stackreq; + aead_request_set_tfm(req, key->tfm); + } + + aead_request_set_ad(req, 0); + aead_request_set_callback(req, 0, NULL, NULL); + aead_request_set_crypt(req, sg, sg, skb->len, + (u8 *)&PACKET_CB(skb)->ivpad); + err = crypto_aead_decrypt(req); + if (unlikely(req != &stackreq)) + aead_request_free(req); + if (err) return false; /* Another ugly situation of pushing and pulling the header so as to @@ -475,10 +490,10 @@ int wg_packet_rx_poll(struct napi_struct *napi, int budget) goto next; if (unlikely(!counter_validate(&keypair->receiving.counter, - PACKET_CB(skb)->nonce))) { + le64_to_cpu(PACKET_CB(skb)->nonce)))) { net_dbg_ratelimited("%s: Packet has invalid nonce %llu (max %llu)\n", peer->device->dev->name, - PACKET_CB(skb)->nonce, + le64_to_cpu(PACKET_CB(skb)->nonce), keypair->receiving.counter.receive.counter); goto next; } @@ -510,21 +525,19 @@ void wg_packet_decrypt_worker(struct work_struct *work) { struct crypt_queue *queue = container_of(work, struct multicore_worker, work)->ptr; - simd_context_t simd_context; struct sk_buff *skb; - simd_get(&simd_context); while ((skb = ptr_ring_consume_bh(&queue->ring)) != NULL) { - enum packet_state state = likely(decrypt_packet(skb, - &PACKET_CB(skb)->keypair->receiving, - &simd_context)) ? - PACKET_STATE_CRYPTED : PACKET_STATE_DEAD; + enum packet_state state; + + if (likely(decrypt_packet(skb, + &PACKET_CB(skb)->keypair->receiving))) + state = PACKET_STATE_CRYPTED; + else + state = PACKET_STATE_DEAD; wg_queue_enqueue_per_peer_napi(&PACKET_PEER(skb)->rx_queue, skb, state); - simd_relax(&simd_context); } - - simd_put(&simd_context); } static void wg_packet_consume_data(struct wg_device *wg, struct sk_buff *skb) diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c index b0df5c717502..48d1fb02f575 100644 --- a/drivers/net/wireguard/send.c +++ b/drivers/net/wireguard/send.c @@ -11,7 +11,7 @@ #include "messages.h" #include "cookie.h" -#include +#include #include #include #include @@ -157,11 +157,11 @@ static unsigned int calculate_skb_padding(struct sk_buff *skb) return padded_size - last_unit; } -static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, - simd_context_t *simd_context) +static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair) { unsigned int padding_len, plaintext_len, trailer_len; struct scatterlist sg[MAX_SKB_FRAGS + 8]; + struct aead_request *req, stackreq; struct message_data *header; struct sk_buff *trailer; int num_frags; @@ -199,7 +199,7 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, header = (struct message_data *)skb_push(skb, sizeof(*header)); header->header.type = cpu_to_le32(MESSAGE_DATA); header->key_idx = keypair->remote_index; - header->counter = cpu_to_le64(PACKET_CB(skb)->nonce); + header->counter = PACKET_CB(skb)->nonce; pskb_put(skb, trailer, trailer_len); /* Now we can encrypt the scattergather segments */ @@ -207,9 +207,24 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, if (skb_to_sgvec(skb, sg, sizeof(struct message_data), noise_encrypted_len(plaintext_len)) <= 0) return false; - return chacha20poly1305_encrypt_sg(sg, sg, plaintext_len, NULL, 0, - PACKET_CB(skb)->nonce, - keypair->sending.key, simd_context); + + if (unlikely(crypto_aead_reqsize(keypair->sending.tfm) > 0)) { + req = aead_request_alloc(keypair->sending.tfm, GFP_ATOMIC); + if (!req) + return false; + } else { + req = &stackreq; + aead_request_set_tfm(req, keypair->sending.tfm); + } + + aead_request_set_ad(req, 0); + aead_request_set_callback(req, 0, NULL, NULL); + aead_request_set_crypt(req, sg, sg, plaintext_len, + (u8 *)&PACKET_CB(skb)->ivpad); + crypto_aead_encrypt(req); + if (unlikely(req != &stackreq)) + aead_request_free(req); + return true; } void wg_packet_send_keepalive(struct wg_peer *peer) @@ -296,16 +311,13 @@ void wg_packet_encrypt_worker(struct work_struct *work) struct crypt_queue *queue = container_of(work, struct multicore_worker, work)->ptr; struct sk_buff *first, *skb, *next; - simd_context_t simd_context; - simd_get(&simd_context); while ((first = ptr_ring_consume_bh(&queue->ring)) != NULL) { enum packet_state state = PACKET_STATE_CRYPTED; skb_walk_null_queue_safe(first, skb, next) { if (likely(encrypt_packet(skb, - PACKET_CB(first)->keypair, - &simd_context))) { + PACKET_CB(first)->keypair))) { wg_reset_packet(skb); } else { state = PACKET_STATE_DEAD; @@ -314,10 +326,7 @@ void wg_packet_encrypt_worker(struct work_struct *work) } wg_queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first, state); - - simd_relax(&simd_context); } - simd_put(&simd_context); } static void wg_packet_create_data(struct sk_buff *first) @@ -389,13 +398,15 @@ void wg_packet_send_staged_packets(struct wg_peer *peer) * handshake. */ skb_queue_walk(&packets, skb) { + u64 counter = atomic64_inc_return(&key->counter.counter) - 1; + /* 0 for no outer TOS: no leak. TODO: at some later point, we * might consider using flowi->tos as outer instead. */ PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); - PACKET_CB(skb)->nonce = - atomic64_inc_return(&key->counter.counter) - 1; - if (unlikely(PACKET_CB(skb)->nonce >= REJECT_AFTER_MESSAGES)) + PACKET_CB(skb)->ivpad = 0; + PACKET_CB(skb)->nonce = cpu_to_le64(counter); + if (unlikely(counter >= REJECT_AFTER_MESSAGES)) goto out_invalid; }