From patchwork Mon Aug 6 22:32:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558065 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 316C51390 for ; Mon, 6 Aug 2018 22:35:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 20A7F29570 for ; Mon, 6 Aug 2018 22:35:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 14779297CE; Mon, 6 Aug 2018 22:35:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABF5D29570 for ; Mon, 6 Aug 2018 22:35:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731938AbeHGAqM (ORCPT ); Mon, 6 Aug 2018 20:46:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:45080 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730888AbeHGAqL (ORCPT ); Mon, 6 Aug 2018 20:46:11 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D7CA221A60; Mon, 6 Aug 2018 22:34:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594898; bh=ptypddly68/mtHqA6v40Ih3M31L4+1CcEXAXZZcxVnE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UXT719cCSAbwUGAv0lSoPy+Q8J5cK9Ab77ULuzcu0O24m96cXC9ROZOt4v/h8cSqo OiWRc35V+Zvv2utpKlL/DRs9kbgkr1us0pIu0s8Wm/CABahqUdo7HPB/hJ9OQxgfkS GMsWwiFIXGgSooEqhkcC3MHRTHHTXTAOuFuBiqQk= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 1/9] crypto: chacha20-generic - add HChaCha20 library function Date: Mon, 6 Aug 2018 15:32:52 -0700 Message-Id: <20180806223300.113891-2-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Refactor the unkeyed permutation part of chacha20_block() into its own function, then add hchacha20_block() which is the ChaCha equivalent of HSalsa20 and is an intermediate step towards XChaCha20 (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha20 skips the final addition of the initial state, and outputs only certain words of the state. It should not be used for streaming directly. Signed-off-by: Eric Biggers --- include/crypto/chacha20.h | 2 ++ lib/chacha20.c | 52 +++++++++++++++++++++++++++++++++------ 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h index b83d66073db0..f00052137942 100644 --- a/include/crypto/chacha20.h +++ b/include/crypto/chacha20.h @@ -20,6 +20,8 @@ struct chacha20_ctx { }; void chacha20_block(u32 *state, u32 *stream); +void hchacha20_block(const u32 *in, u32 *out); + void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv); int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keysize); diff --git a/lib/chacha20.c b/lib/chacha20.c index c1cc50fb68c9..13a0bdcb1604 100644 --- a/lib/chacha20.c +++ b/lib/chacha20.c @@ -1,5 +1,5 @@ /* - * ChaCha20 256-bit cipher algorithm, RFC7539 + * The "hash function" used as the core of the ChaCha20 stream cipher (RFC7539) * * Copyright (C) 2015 Martin Willi * @@ -16,14 +16,10 @@ #include #include -void chacha20_block(u32 *state, u32 *stream) +static void chacha20_permute(u32 *x) { - u32 x[16], *out = stream; int i; - for (i = 0; i < ARRAY_SIZE(x); i++) - x[i] = state[i]; - for (i = 0; i < 20; i += 2) { x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 16); @@ -65,10 +61,52 @@ void chacha20_block(u32 *state, u32 *stream) x[8] += x[13]; x[7] = rol32(x[7] ^ x[8], 7); x[9] += x[14]; x[4] = rol32(x[4] ^ x[9], 7); } +} + +/** + * chacha20_block - generate one keystream block and increment block counter + * @state: input state matrix (16 32-bit words) + * @stream: output keystream block (64 bytes) + * + * This is the ChaCha20 core, a function from 64-byte strings to 64-byte + * strings. The caller has already converted the endianness of the input. This + * function also handles incrementing the block counter in the input matrix. + */ +void chacha20_block(u32 *state, u32 *stream) +{ + u32 x[16]; + int i; + + memcpy(x, state, 64); + + chacha20_permute(x); for (i = 0; i < ARRAY_SIZE(x); i++) - out[i] = cpu_to_le32(x[i] + state[i]); + stream[i] = cpu_to_le32(x[i] + state[i]); state[12]++; } EXPORT_SYMBOL(chacha20_block); + +/** + * hchacha20_block - abbreviated ChaCha20 core, for XChaCha20 + * @in: input state matrix (16 32-bit words) + * @out: output (8 32-bit words) + * + * HChaCha20 is the ChaCha equivalent of HSalsa20 and is an intermediate step + * towards XChaCha20 (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). + * HChaCha20 skips the final addition of the initial state, and outputs only + * certain words of the state. It should not be used for streaming directly. + */ +void hchacha20_block(const u32 *in, u32 *out) +{ + u32 x[16]; + + memcpy(x, in, 64); + + chacha20_permute(x); + + memcpy(&out[0], &x[0], 16); + memcpy(&out[4], &x[12], 16); +} +EXPORT_SYMBOL(hchacha20_block); From patchwork Mon Aug 6 22:32:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558047 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08AD913AC for ; Mon, 6 Aug 2018 22:35:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA11929570 for ; Mon, 6 Aug 2018 22:35:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE881297D6; Mon, 6 Aug 2018 22:35:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A9D529570 for ; Mon, 6 Aug 2018 22:35:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732395AbeHGAqN (ORCPT ); Mon, 6 Aug 2018 20:46:13 -0400 Received: from mail.kernel.org ([198.145.29.99]:45096 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731092AbeHGAqM (ORCPT ); Mon, 6 Aug 2018 20:46:12 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1107421A6A; Mon, 6 Aug 2018 22:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594899; bh=OAYhKIE1GiaaEDZLSVbfX0VMmjsaIZStmHreCRkV1sU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ncJEovRHCPTmstl+jvanHoH8jGkzAv9f7WYrnagZKxr/zf3ds1y38s1SVkAhnhE12 w5PMxU1fc45ivXXOv4idmIiQ+PU48SQwSSmoggW50+LQi21r2KylmBmkDxEjUEAN1q 56KUv5lbwZ1xxkFyMhpHJiVeyQxP74kiYt/kpOxY= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 2/9] crypto: chacha20-generic - add XChaCha20 support Date: Mon, 6 Aug 2018 15:32:53 -0700 Message-Id: <20180806223300.113891-3-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Add support for the XChaCha20 stream cipher. XChaCha20 is the application of the XSalsa20 construction (https://cr.yp.to/snuffle/xsalsa-20081128.pdf) to ChaCha20 rather than to Salsa20. XChaCha20 extends ChaCha20's nonce length from 64 bits (or 96 bits, depending on convention) to 192 bits, while provably retaining ChaCha20's security. XChaCha20 uses the ChaCha20 permutation to map the key and first 128 nonce bits to a 256-bit subkey. Then, it does the ChaCha20 stream cipher with the subkey and remaining 64 bits of nonce. We need XChaCha20 support in order to add support for the HPolyC construction, which uses XChaCha. Note that to meet our performance requirements, we actually plan to primarily use the reduced-round variant XChaCha12. But we believe it's wise to first add XChaCha20 as a baseline with a higher security margin, in case there are any situations where it can be used. Supporting both variants is straightforward. Since XChaCha20's subkey differs for each request, XChaCha20 can't be a template that wraps ChaCha20; that would require re-keying the underlying ChaCha20 for every request, which wouldn't be thread-safe. Instead, we make XChaCha20 its own top-level algorithm which calls the ChaCha20 streaming implementation internally. Similar to the existing ChaCha20 implementation, we define the IV to be the nonce and stream position concatenated together. This allows users to seek to any position in the stream. I considered splitting the code into separate chacha20-common, chacha20, and xchacha20 modules, so that chacha20 and xchacha20 could be enabled/disabled independently. However, since nearly all the code is shared anyway, I ultimately decided there would have been little benefit to the added complexity of separate modules. Signed-off-by: Eric Biggers --- crypto/Kconfig | 14 +- crypto/chacha20_generic.c | 120 +++++--- crypto/testmgr.c | 6 + crypto/testmgr.h | 577 ++++++++++++++++++++++++++++++++++++++ include/crypto/chacha20.h | 14 +- 5 files changed, 689 insertions(+), 42 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index f3e40ac56d93..a58e4b1f7967 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1437,18 +1437,22 @@ config CRYPTO_SALSA20 Bernstein . See config CRYPTO_CHACHA20 - tristate "ChaCha20 cipher algorithm" + tristate "ChaCha20 stream cipher algorithms" select CRYPTO_BLKCIPHER help - ChaCha20 cipher algorithm, RFC7539. + The ChaCha20 and XChaCha20 stream cipher algorithms. ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J. Bernstein and further specified in RFC7539 for use in IETF protocols. - This is the portable C implementation of ChaCha20. - - See also: + This is the portable C implementation of ChaCha20. See also: + XChaCha20 is the application of the XSalsa20 construction to ChaCha20 + rather than to Salsa20. XChaCha20 extends ChaCha20's nonce length + from 64 bits (or 96 bits using the RFC7539 convention) to 192 bits, + while provably retaining ChaCha20's security. See also: + + config CRYPTO_CHACHA20_X86_64 tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)" depends on X86 && 64BIT diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c index e451c3cb6a56..ba97aac93912 100644 --- a/crypto/chacha20_generic.c +++ b/crypto/chacha20_generic.c @@ -1,7 +1,8 @@ /* - * ChaCha20 256-bit cipher algorithm, RFC7539 + * ChaCha20 (RFC7539) and XChaCha20 stream cipher algorithms * * Copyright (C) 2015 Martin Willi + * Copyright (C) 2018 Google LLC * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -35,6 +36,31 @@ static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src, } } +static int chacha20_stream_xor(struct skcipher_request *req, + struct chacha20_ctx *ctx, u8 *iv) +{ + struct skcipher_walk walk; + u32 state[16]; + int err; + + err = skcipher_walk_virt(&walk, req, true); + + crypto_chacha20_init(state, ctx, iv); + + while (walk.nbytes > 0) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.stride); + + chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, + nbytes); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); + } + + return err; +} + void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv) { state[0] = 0x61707865; /* "expa" */ @@ -76,54 +102,74 @@ int crypto_chacha20_crypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); - struct skcipher_walk walk; - u32 state[16]; - int err; - - err = skcipher_walk_virt(&walk, req, true); - crypto_chacha20_init(state, ctx, walk.iv); + return chacha20_stream_xor(req, ctx, req->iv); +} +EXPORT_SYMBOL_GPL(crypto_chacha20_crypt); - while (walk.nbytes > 0) { - unsigned int nbytes = walk.nbytes; +int crypto_xchacha20_crypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha20_ctx subctx; + u32 state[16]; + u8 real_iv[16]; - if (nbytes < walk.total) - nbytes = round_down(nbytes, walk.stride); + /* Compute the subkey given the original key and first 128 nonce bits */ + crypto_chacha20_init(state, ctx, req->iv); + hchacha20_block(state, subctx.key); - chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, - nbytes); - err = skcipher_walk_done(&walk, walk.nbytes - nbytes); - } + /* Build the real IV */ + memcpy(&real_iv[0], req->iv + 24, 8); /* stream position */ + memcpy(&real_iv[8], req->iv + 16, 8); /* remaining 64 nonce bits */ - return err; + /* Generate the stream and XOR it with the data */ + return chacha20_stream_xor(req, &subctx, real_iv); } -EXPORT_SYMBOL_GPL(crypto_chacha20_crypt); - -static struct skcipher_alg alg = { - .base.cra_name = "chacha20", - .base.cra_driver_name = "chacha20-generic", - .base.cra_priority = 100, - .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), - .base.cra_module = THIS_MODULE, - - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = CHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, - .setkey = crypto_chacha20_setkey, - .encrypt = crypto_chacha20_crypt, - .decrypt = crypto_chacha20_crypt, +EXPORT_SYMBOL_GPL(crypto_xchacha20_crypt); + +static struct skcipher_alg algs[] = { + { + .base.cra_name = "chacha20", + .base.cra_driver_name = "chacha20-generic", + .base.cra_priority = 100, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA20_KEY_SIZE, + .max_keysize = CHACHA20_KEY_SIZE, + .ivsize = CHACHA20_IV_SIZE, + .chunksize = CHACHA20_BLOCK_SIZE, + .setkey = crypto_chacha20_setkey, + .encrypt = crypto_chacha20_crypt, + .decrypt = crypto_chacha20_crypt, + }, { + .base.cra_name = "xchacha20", + .base.cra_driver_name = "xchacha20-generic", + .base.cra_priority = 100, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA20_KEY_SIZE, + .max_keysize = CHACHA20_KEY_SIZE, + .ivsize = XCHACHA20_IV_SIZE, + .chunksize = CHACHA20_BLOCK_SIZE, + .setkey = crypto_chacha20_setkey, + .encrypt = crypto_xchacha20_crypt, + .decrypt = crypto_xchacha20_crypt, + } }; static int __init chacha20_generic_mod_init(void) { - return crypto_register_skcipher(&alg); + return crypto_register_skciphers(algs, ARRAY_SIZE(algs)); } static void __exit chacha20_generic_mod_fini(void) { - crypto_unregister_skcipher(&alg); + crypto_unregister_skciphers(algs, ARRAY_SIZE(algs)); } module_init(chacha20_generic_mod_init); @@ -131,6 +177,8 @@ module_exit(chacha20_generic_mod_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Martin Willi "); -MODULE_DESCRIPTION("chacha20 cipher algorithm"); +MODULE_DESCRIPTION("ChaCha20 and XChaCha20 stream ciphers (generic)"); MODULE_ALIAS_CRYPTO("chacha20"); MODULE_ALIAS_CRYPTO("chacha20-generic"); +MODULE_ALIAS_CRYPTO("xchacha20"); +MODULE_ALIAS_CRYPTO("xchacha20-generic"); diff --git a/crypto/testmgr.c b/crypto/testmgr.c index a1d42245082a..bf250ca9a6c3 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -3544,6 +3544,12 @@ static const struct alg_test_desc alg_test_descs[] = { .suite = { .hash = __VECS(aes_xcbc128_tv_template) } + }, { + .alg = "xchacha20", + .test = alg_test_skcipher, + .suite = { + .cipher = __VECS(xchacha20_tv_template) + }, }, { .alg = "xts(aes)", .test = alg_test_skcipher, diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 173111c70746..3f992867b747 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -31413,6 +31413,583 @@ static const struct cipher_testvec chacha20_tv_template[] = { }, }; +static const struct cipher_testvec xchacha20_tv_template[] = { + { /* from libsodium test/default/xchacha20.c */ + .key = "\x79\xc9\x97\x98\xac\x67\x30\x0b" + "\xbb\x27\x04\xc9\x5c\x34\x1e\x32" + "\x45\xf3\xdc\xb2\x17\x61\xb9\x8e" + "\x52\xff\x45\xb2\x4f\x30\x4f\xc4", + .klen = 32, + .iv = "\xb3\x3f\xfd\x30\x96\x47\x9b\xcf" + "\xbc\x9a\xee\x49\x41\x76\x88\xa0" + "\xa2\x55\x4f\x8d\x95\x38\x94\x19" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00", + .ctext = "\xc6\xe9\x75\x81\x60\x08\x3a\xc6" + "\x04\xef\x90\xe7\x12\xce\x6e\x75" + "\xd7\x79\x75\x90\x74\x4e\x0c\xf0" + "\x60\xf0\x13\x73\x9c", + .len = 29, + }, { /* from libsodium test/default/xchacha20.c */ + .key = "\x9d\x23\xbd\x41\x49\xcb\x97\x9c" + "\xcf\x3c\x5c\x94\xdd\x21\x7e\x98" + "\x08\xcb\x0e\x50\xcd\x0f\x67\x81" + "\x22\x35\xea\xaf\x60\x1d\x62\x32", + .klen = 32, + .iv = "\xc0\x47\x54\x82\x66\xb7\xc3\x70" + "\xd3\x35\x66\xa2\x42\x5c\xbf\x30" + "\xd8\x2d\x1e\xaf\x52\x94\x10\x9e" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00", + .ctext = "\xa2\x12\x09\x09\x65\x94\xde\x8c" + "\x56\x67\xb1\xd1\x3a\xd9\x3f\x74" + "\x41\x06\xd0\x54\xdf\x21\x0e\x47" + "\x82\xcd\x39\x6f\xec\x69\x2d\x35" + "\x15\xa2\x0b\xf3\x51\xee\xc0\x11" + "\xa9\x2c\x36\x78\x88\xbc\x46\x4c" + "\x32\xf0\x80\x7a\xcd\x6c\x20\x3a" + "\x24\x7e\x0d\xb8\x54\x14\x84\x68" + "\xe9\xf9\x6b\xee\x4c\xf7\x18\xd6" + "\x8d\x5f\x63\x7c\xbd\x5a\x37\x64" + "\x57\x78\x8e\x6f\xae\x90\xfc\x31" + "\x09\x7c\xfc", + .len = 91, + }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes + to nonce, and recomputed the ciphertext with libsodium */ + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x67\xc6\x69\x73" + "\x51\xff\x4a\xec\x29\xcd\xba\xab" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ctext = "\x9c\x49\x2a\xe7\x8a\x2f\x93\xc7" + "\xb3\x33\x6f\x82\x17\xd8\xc4\x1e" + "\xad\x80\x11\x11\x1d\x4c\x16\x18" + "\x07\x73\x9b\x4f\xdb\x7c\xcb\x47" + "\xfd\xef\x59\x74\xfa\x3f\xe5\x4c" + "\x9b\xd0\xea\xbc\xba\x56\xad\x32" + "\x03\xdc\xf8\x2b\xc1\xe1\x75\x67" + "\x23\x7b\xe6\xfc\xd4\x03\x86\x54", + .len = 64, + }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes + to nonce, and recomputed the ciphertext with libsodium */ + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x01", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x02\xf2\xfb\xe3\x46" + "\x7c\xc2\x54\xf8\x1b\xe8\xe7\x8d" + "\x01\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x41\x6e\x79\x20\x73\x75\x62\x6d" + "\x69\x73\x73\x69\x6f\x6e\x20\x74" + "\x6f\x20\x74\x68\x65\x20\x49\x45" + "\x54\x46\x20\x69\x6e\x74\x65\x6e" + "\x64\x65\x64\x20\x62\x79\x20\x74" + "\x68\x65\x20\x43\x6f\x6e\x74\x72" + "\x69\x62\x75\x74\x6f\x72\x20\x66" + "\x6f\x72\x20\x70\x75\x62\x6c\x69" + "\x63\x61\x74\x69\x6f\x6e\x20\x61" + "\x73\x20\x61\x6c\x6c\x20\x6f\x72" + "\x20\x70\x61\x72\x74\x20\x6f\x66" + "\x20\x61\x6e\x20\x49\x45\x54\x46" + "\x20\x49\x6e\x74\x65\x72\x6e\x65" + "\x74\x2d\x44\x72\x61\x66\x74\x20" + "\x6f\x72\x20\x52\x46\x43\x20\x61" + "\x6e\x64\x20\x61\x6e\x79\x20\x73" + "\x74\x61\x74\x65\x6d\x65\x6e\x74" + "\x20\x6d\x61\x64\x65\x20\x77\x69" + "\x74\x68\x69\x6e\x20\x74\x68\x65" + "\x20\x63\x6f\x6e\x74\x65\x78\x74" + "\x20\x6f\x66\x20\x61\x6e\x20\x49" + "\x45\x54\x46\x20\x61\x63\x74\x69" + "\x76\x69\x74\x79\x20\x69\x73\x20" + "\x63\x6f\x6e\x73\x69\x64\x65\x72" + "\x65\x64\x20\x61\x6e\x20\x22\x49" + "\x45\x54\x46\x20\x43\x6f\x6e\x74" + "\x72\x69\x62\x75\x74\x69\x6f\x6e" + "\x22\x2e\x20\x53\x75\x63\x68\x20" + "\x73\x74\x61\x74\x65\x6d\x65\x6e" + "\x74\x73\x20\x69\x6e\x63\x6c\x75" + "\x64\x65\x20\x6f\x72\x61\x6c\x20" + "\x73\x74\x61\x74\x65\x6d\x65\x6e" + "\x74\x73\x20\x69\x6e\x20\x49\x45" + "\x54\x46\x20\x73\x65\x73\x73\x69" + "\x6f\x6e\x73\x2c\x20\x61\x73\x20" + "\x77\x65\x6c\x6c\x20\x61\x73\x20" + "\x77\x72\x69\x74\x74\x65\x6e\x20" + "\x61\x6e\x64\x20\x65\x6c\x65\x63" + "\x74\x72\x6f\x6e\x69\x63\x20\x63" + "\x6f\x6d\x6d\x75\x6e\x69\x63\x61" + "\x74\x69\x6f\x6e\x73\x20\x6d\x61" + "\x64\x65\x20\x61\x74\x20\x61\x6e" + "\x79\x20\x74\x69\x6d\x65\x20\x6f" + "\x72\x20\x70\x6c\x61\x63\x65\x2c" + "\x20\x77\x68\x69\x63\x68\x20\x61" + "\x72\x65\x20\x61\x64\x64\x72\x65" + "\x73\x73\x65\x64\x20\x74\x6f", + .ctext = "\xf9\xab\x7a\x4a\x60\xb8\x5f\xa0" + "\x50\xbb\x57\xce\xef\x8c\xc1\xd9" + "\x24\x15\xb3\x67\x5e\x7f\x01\xf6" + "\x1c\x22\xf6\xe5\x71\xb1\x43\x64" + "\x63\x05\xd5\xfc\x5c\x3d\xc0\x0e" + "\x23\xef\xd3\x3b\xd9\xdc\x7f\xa8" + "\x58\x26\xb3\xd0\xc2\xd5\x04\x3f" + "\x0a\x0e\x8f\x17\xe4\xcd\xf7\x2a" + "\xb4\x2c\x09\xe4\x47\xec\x8b\xfb" + "\x59\x37\x7a\xa1\xd0\x04\x7e\xaa" + "\xf1\x98\x5f\x24\x3d\x72\x9a\x43" + "\xa4\x36\x51\x92\x22\x87\xff\x26" + "\xce\x9d\xeb\x59\x78\x84\x5e\x74" + "\x97\x2e\x63\xc0\xef\x29\xf7\x8a" + "\xb9\xee\x35\x08\x77\x6a\x35\x9a" + "\x3e\xe6\x4f\x06\x03\x74\x1b\xc1" + "\x5b\xb3\x0b\x89\x11\x07\xd3\xb7" + "\x53\xd6\x25\x04\xd9\x35\xb4\x5d" + "\x4c\x33\x5a\xc2\x42\x4c\xe6\xa4" + "\x97\x6e\x0e\xd2\xb2\x8b\x2f\x7f" + "\x28\xe5\x9f\xac\x4b\x2e\x02\xab" + "\x85\xfa\xa9\x0d\x7c\x2d\x10\xe6" + "\x91\xab\x55\x63\xf0\xde\x3a\x94" + "\x25\x08\x10\x03\xc2\x68\xd1\xf4" + "\xaf\x7d\x9c\x99\xf7\x86\x96\x30" + "\x60\xfc\x0b\xe6\xa8\x80\x15\xb0" + "\x81\xb1\x0c\xbe\xb9\x12\x18\x25" + "\xe9\x0e\xb1\xe7\x23\xb2\xef\x4a" + "\x22\x8f\xc5\x61\x89\xd4\xe7\x0c" + "\x64\x36\x35\x61\xb6\x34\x60\xf7" + "\x7b\x61\x37\x37\x12\x10\xa2\xf6" + "\x7e\xdb\x7f\x39\x3f\xb6\x8e\x89" + "\x9e\xf3\xfe\x13\x98\xbb\x66\x5a" + "\xec\xea\xab\x3f\x9c\x87\xc4\x8c" + "\x8a\x04\x18\x49\xfc\x77\x11\x50" + "\x16\xe6\x71\x2b\xee\xc0\x9c\xb6" + "\x87\xfd\x80\xff\x0b\x1d\x73\x38" + "\xa4\x1d\x6f\xae\xe4\x12\xd7\x93" + "\x9d\xcd\x38\x26\x09\x40\x52\xcd" + "\x67\x01\x67\x26\xe0\x3e\x98\xa8" + "\xe8\x1a\x13\x41\xbb\x90\x4d\x87" + "\xbb\x42\x82\x39\xce\x3a\xd0\x18" + "\x6d\x7b\x71\x8f\xbb\x2c\x6a\xd1" + "\xbd\xf5\xc7\x8a\x7e\xe1\x1e\x0f" + "\x0d\x0d\x13\x7c\xd9\xd8\x3c\x91" + "\xab\xff\x1f\x12\xc3\xee\xe5\x65" + "\x12\x8d\x7b\x61\xe5\x1f\x98", + .len = 375, + .also_non_np = 1, + .np = 3, + .tap = { 375 - 20, 4, 16 }, + + }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes + to nonce, and recomputed the ciphertext with libsodium */ + .key = "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a" + "\xf3\x33\x88\x86\x04\xf6\xb5\xf0" + "\x47\x39\x17\xc1\x40\x2b\x80\x09" + "\x9d\xca\x5c\xbc\x20\x70\x75\xc0", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x02\x76\x5a\x2e\x63" + "\x33\x9f\xc9\x9a\x66\x32\x0d\xb7" + "\x2a\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x27\x54\x77\x61\x73\x20\x62\x72" + "\x69\x6c\x6c\x69\x67\x2c\x20\x61" + "\x6e\x64\x20\x74\x68\x65\x20\x73" + "\x6c\x69\x74\x68\x79\x20\x74\x6f" + "\x76\x65\x73\x0a\x44\x69\x64\x20" + "\x67\x79\x72\x65\x20\x61\x6e\x64" + "\x20\x67\x69\x6d\x62\x6c\x65\x20" + "\x69\x6e\x20\x74\x68\x65\x20\x77" + "\x61\x62\x65\x3a\x0a\x41\x6c\x6c" + "\x20\x6d\x69\x6d\x73\x79\x20\x77" + "\x65\x72\x65\x20\x74\x68\x65\x20" + "\x62\x6f\x72\x6f\x67\x6f\x76\x65" + "\x73\x2c\x0a\x41\x6e\x64\x20\x74" + "\x68\x65\x20\x6d\x6f\x6d\x65\x20" + "\x72\x61\x74\x68\x73\x20\x6f\x75" + "\x74\x67\x72\x61\x62\x65\x2e", + .ctext = "\x95\xb9\x51\xe7\x8f\xb4\xa4\x03" + "\xca\x37\xcc\xde\x60\x1d\x8c\xe2" + "\xf1\xbb\x8a\x13\x7f\x61\x85\xcc" + "\xad\xf4\xf0\xdc\x86\xa6\x1e\x10" + "\xbc\x8e\xcb\x38\x2b\xa5\xc8\x8f" + "\xaa\x03\x3d\x53\x4a\x42\xb1\x33" + "\xfc\xd3\xef\xf0\x8e\x7e\x10\x9c" + "\x6f\x12\x5e\xd4\x96\xfe\x5b\x08" + "\xb6\x48\xf0\x14\x74\x51\x18\x7c" + "\x07\x92\xfc\xac\x9d\xf1\x94\xc0" + "\xc1\x9d\xc5\x19\x43\x1f\x1d\xbb" + "\x07\xf0\x1b\x14\x25\x45\xbb\xcb" + "\x5c\xe2\x8b\x28\xf3\xcf\x47\x29" + "\x27\x79\x67\x24\xa6\x87\xc2\x11" + "\x65\x03\xfa\x45\xf7\x9e\x53\x7a" + "\x99\xf1\x82\x25\x4f\x8d\x07", + .len = 127, + }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes + to nonce, and recomputed the ciphertext with libsodium */ + .key = "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a" + "\xf3\x33\x88\x86\x04\xf6\xb5\xf0" + "\x47\x39\x17\xc1\x40\x2b\x80\x09" + "\x9d\xca\x5c\xbc\x20\x70\x75\xc0", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x01\x31\x58\xa3\x5a" + "\x25\x5d\x05\x17\x58\xe9\x5e\xd4" + "\x1c\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x49\xee\xe0\xdc\x24\x90\x40\xcd" + "\xc5\x40\x8f\x47\x05\xbc\xdd\x81" + "\x47\xc6\x8d\xe6\xb1\x8f\xd7\xcb" + "\x09\x0e\x6e\x22\x48\x1f\xbf\xb8" + "\x5c\xf7\x1e\x8a\xc1\x23\xf2\xd4" + "\x19\x4b\x01\x0f\x4e\xa4\x43\xce" + "\x01\xc6\x67\xda\x03\x91\x18\x90" + "\xa5\xa4\x8e\x45\x03\xb3\x2d\xac" + "\x74\x92\xd3\x53\x47\xc8\xdd\x25" + "\x53\x6c\x02\x03\x87\x0d\x11\x0c" + "\x58\xe3\x12\x18\xfd\x2a\x5b\x40" + "\x0c\x30\xf0\xb8\x3f\x43\xce\xae" + "\x65\x3a\x7d\x7c\xf4\x54\xaa\xcc" + "\x33\x97\xc3\x77\xba\xc5\x70\xde" + "\xd7\xd5\x13\xa5\x65\xc4\x5f\x0f" + "\x46\x1a\x0d\x97\xb5\xf3\xbb\x3c" + "\x84\x0f\x2b\xc5\xaa\xea\xf2\x6c" + "\xc9\xb5\x0c\xee\x15\xf3\x7d\xbe" + "\x9f\x7b\x5a\xa6\xae\x4f\x83\xb6" + "\x79\x49\x41\xf4\x58\x18\xcb\x86" + "\x7f\x30\x0e\xf8\x7d\x44\x36\xea" + "\x75\xeb\x88\x84\x40\x3c\xad\x4f" + "\x6f\x31\x6b\xaa\x5d\xe5\xa5\xc5" + "\x21\x66\xe9\xa7\xe3\xb2\x15\x88" + "\x78\xf6\x79\xa1\x59\x47\x12\x4e" + "\x9f\x9f\x64\x1a\xa0\x22\x5b\x08" + "\xbe\x7c\x36\xc2\x2b\x66\x33\x1b" + "\xdd\x60\x71\xf7\x47\x8c\x61\xc3" + "\xda\x8a\x78\x1e\x16\xfa\x1e\x86" + "\x81\xa6\x17\x2a\xa7\xb5\xc2\xe7" + "\xa4\xc7\x42\xf1\xcf\x6a\xca\xb4" + "\x45\xcf\xf3\x93\xf0\xe7\xea\xf6" + "\xf4\xe6\x33\x43\x84\x93\xa5\x67" + "\x9b\x16\x58\x58\x80\x0f\x2b\x5c" + "\x24\x74\x75\x7f\x95\x81\xb7\x30" + "\x7a\x33\xa7\xf7\x94\x87\x32\x27" + "\x10\x5d\x14\x4c\x43\x29\xdd\x26" + "\xbd\x3e\x3c\x0e\xfe\x0e\xa5\x10" + "\xea\x6b\x64\xfd\x73\xc6\xed\xec" + "\xa8\xc9\xbf\xb3\xba\x0b\x4d\x07" + "\x70\xfc\x16\xfd\x79\x1e\xd7\xc5" + "\x49\x4e\x1c\x8b\x8d\x79\x1b\xb1" + "\xec\xca\x60\x09\x4c\x6a\xd5\x09" + "\x49\x46\x00\x88\x22\x8d\xce\xea" + "\xb1\x17\x11\xde\x42\xd2\x23\xc1" + "\x72\x11\xf5\x50\x73\x04\x40\x47" + "\xf9\x5d\xe7\xa7\x26\xb1\x7e\xb0" + "\x3f\x58\xc1\x52\xab\x12\x67\x9d" + "\x3f\x43\x4b\x68\xd4\x9c\x68\x38" + "\x07\x8a\x2d\x3e\xf3\xaf\x6a\x4b" + "\xf9\xe5\x31\x69\x22\xf9\xa6\x69" + "\xc6\x9c\x96\x9a\x12\x35\x95\x1d" + "\x95\xd5\xdd\xbe\xbf\x93\x53\x24" + "\xfd\xeb\xc2\x0a\x64\xb0\x77\x00" + "\x6f\x88\xc4\x37\x18\x69\x7c\xd7" + "\x41\x92\x55\x4c\x03\xa1\x9a\x4b" + "\x15\xe5\xdf\x7f\x37\x33\x72\xc1" + "\x8b\x10\x67\xa3\x01\x57\x94\x25" + "\x7b\x38\x71\x7e\xdd\x1e\xcc\x73" + "\x55\xd2\x8e\xeb\x07\xdd\xf1\xda" + "\x58\xb1\x47\x90\xfe\x42\x21\x72" + "\xa3\x54\x7a\xa0\x40\xec\x9f\xdd" + "\xc6\x84\x6e\xca\xae\xe3\x68\xb4" + "\x9d\xe4\x78\xff\x57\xf2\xf8\x1b" + "\x03\xa1\x31\xd9\xde\x8d\xf5\x22" + "\x9c\xdd\x20\xa4\x1e\x27\xb1\x76" + "\x4f\x44\x55\xe2\x9b\xa1\x9c\xfe" + "\x54\xf7\x27\x1b\xf4\xde\x02\xf5" + "\x1b\x55\x48\x5c\xdc\x21\x4b\x9e" + "\x4b\x6e\xed\x46\x23\xdc\x65\xb2" + "\xcf\x79\x5f\x28\xe0\x9e\x8b\xe7" + "\x4c\x9d\x8a\xff\xc1\xa6\x28\xb8" + "\x65\x69\x8a\x45\x29\xef\x74\x85" + "\xde\x79\xc7\x08\xae\x30\xb0\xf4" + "\xa3\x1d\x51\x41\xab\xce\xcb\xf6" + "\xb5\xd8\x6d\xe0\x85\xe1\x98\xb3" + "\x43\xbb\x86\x83\x0a\xa0\xf5\xb7" + "\x04\x0b\xfa\x71\x1f\xb0\xf6\xd9" + "\x13\x00\x15\xf0\xc7\xeb\x0d\x5a" + "\x9f\xd7\xb9\x6c\x65\x14\x22\x45" + "\x6e\x45\x32\x3e\x7e\x60\x1a\x12" + "\x97\x82\x14\xfb\xaa\x04\x22\xfa" + "\xa0\xe5\x7e\x8c\x78\x02\x48\x5d" + "\x78\x33\x5a\x7c\xad\xdb\x29\xce" + "\xbb\x8b\x61\xa4\xb7\x42\xe2\xac" + "\x8b\x1a\xd9\x2f\x0b\x8b\x62\x21" + "\x83\x35\x7e\xad\x73\xc2\xb5\x6c" + "\x10\x26\x38\x07\xe5\xc7\x36\x80" + "\xe2\x23\x12\x61\xf5\x48\x4b\x2b" + "\xc5\xdf\x15\xd9\x87\x01\xaa\xac" + "\x1e\x7c\xad\x73\x78\x18\x63\xe0" + "\x8b\x9f\x81\xd8\x12\x6a\x28\x10" + "\xbe\x04\x68\x8a\x09\x7c\x1b\x1c" + "\x83\x66\x80\x47\x80\xe8\xfd\x35" + "\x1c\x97\x6f\xae\x49\x10\x66\xcc" + "\xc6\xd8\xcc\x3a\x84\x91\x20\x77" + "\x72\xe4\x24\xd2\x37\x9f\xc5\xc9" + "\x25\x94\x10\x5f\x40\x00\x64\x99" + "\xdc\xae\xd7\x21\x09\x78\x50\x15" + "\xac\x5f\xc6\x2c\xa2\x0b\xa9\x39" + "\x87\x6e\x6d\xab\xde\x08\x51\x16" + "\xc7\x13\xe9\xea\xed\x06\x8e\x2c" + "\xf8\x37\x8c\xf0\xa6\x96\x8d\x43" + "\xb6\x98\x37\xb2\x43\xed\xde\xdf" + "\x89\x1a\xe7\xeb\x9d\xa1\x7b\x0b" + "\x77\xb0\xe2\x75\xc0\xf1\x98\xd9" + "\x80\x55\xc9\x34\x91\xd1\x59\xe8" + "\x4b\x0f\xc1\xa9\x4b\x7a\x84\x06" + "\x20\xa8\x5d\xfa\xd1\xde\x70\x56" + "\x2f\x9e\x91\x9c\x20\xb3\x24\xd8" + "\x84\x3d\xe1\x8c\x7e\x62\x52\xe5" + "\x44\x4b\x9f\xc2\x93\x03\xea\x2b" + "\x59\xc5\xfa\x3f\x91\x2b\xbb\x23" + "\xf5\xb2\x7b\xf5\x38\xaf\xb3\xee" + "\x63\xdc\x7b\xd1\xff\xaa\x8b\xab" + "\x82\x6b\x37\x04\xeb\x74\xbe\x79" + "\xb9\x83\x90\xef\x20\x59\x46\xff" + "\xe9\x97\x3e\x2f\xee\xb6\x64\x18" + "\x38\x4c\x7a\x4a\xf9\x61\xe8\x9a" + "\xa1\xb5\x01\xa6\x47\xd3\x11\xd4" + "\xce\xd3\x91\x49\x88\xc7\xb8\x4d" + "\xb1\xb9\x07\x6d\x16\x72\xae\x46" + "\x5e\x03\xa1\x4b\xb6\x02\x30\xa8" + "\x3d\xa9\x07\x2a\x7c\x19\xe7\x62" + "\x87\xe3\x82\x2f\x6f\xe1\x09\xd9" + "\x94\x97\xea\xdd\x58\x9e\xae\x76" + "\x7e\x35\xe5\xb4\xda\x7e\xf4\xde" + "\xf7\x32\x87\xcd\x93\xbf\x11\x56" + "\x11\xbe\x08\x74\xe1\x69\xad\xe2" + "\xd7\xf8\x86\x75\x8a\x3c\xa4\xbe" + "\x70\xa7\x1b\xfc\x0b\x44\x2a\x76" + "\x35\xea\x5d\x85\x81\xaf\x85\xeb" + "\xa0\x1c\x61\xc2\xf7\x4f\xa5\xdc" + "\x02\x7f\xf6\x95\x40\x6e\x8a\x9a" + "\xf3\x5d\x25\x6e\x14\x3a\x22\xc9" + "\x37\x1c\xeb\x46\x54\x3f\xa5\x91" + "\xc2\xb5\x8c\xfe\x53\x08\x97\x32" + "\x1b\xb2\x30\x27\xfe\x25\x5d\xdc" + "\x08\x87\xd0\xe5\x94\x1a\xd4\xf1" + "\xfe\xd6\xb4\xa3\xe6\x74\x81\x3c" + "\x1b\xb7\x31\xa7\x22\xfd\xd4\xdd" + "\x20\x4e\x7c\x51\xb0\x60\x73\xb8" + "\x9c\xac\x91\x90\x7e\x01\xb0\xe1" + "\x8a\x2f\x75\x1c\x53\x2a\x98\x2a" + "\x06\x52\x95\x52\xb2\xe9\x25\x2e" + "\x4c\xe2\x5a\x00\xb2\x13\x81\x03" + "\x77\x66\x0d\xa5\x99\xda\x4e\x8c" + "\xac\xf3\x13\x53\x27\x45\xaf\x64" + "\x46\xdc\xea\x23\xda\x97\xd1\xab" + "\x7d\x6c\x30\x96\x1f\xbc\x06\x34" + "\x18\x0b\x5e\x21\x35\x11\x8d\x4c" + "\xe0\x2d\xe9\x50\x16\x74\x81\xa8" + "\xb4\x34\xb9\x72\x42\xa6\xcc\xbc" + "\xca\x34\x83\x27\x10\x5b\x68\x45" + "\x8f\x52\x22\x0c\x55\x3d\x29\x7c" + "\xe3\xc0\x66\x05\x42\x91\x5f\x58" + "\xfe\x4a\x62\xd9\x8c\xa9\x04\x19" + "\x04\xa9\x08\x4b\x57\xfc\x67\x53" + "\x08\x7c\xbc\x66\x8a\xb0\xb6\x9f" + "\x92\xd6\x41\x7c\x5b\x2a\x00\x79" + "\x72", + .ctext = "\x3a\x92\xee\x53\x31\xaf\x2b\x60" + "\x5f\x55\x8d\x00\x5d\xfc\x74\x97" + "\x28\x54\xf4\xa5\x75\xf1\x9b\x25" + "\x62\x1c\xc0\xe0\x13\xc8\x87\x53" + "\xd0\xf3\xa7\x97\x1f\x3b\x1e\xea" + "\xe0\xe5\x2a\xd1\xdd\xa4\x3b\x50" + "\x45\xa3\x0d\x7e\x1b\xc9\xa0\xad" + "\xb9\x2c\x54\xa6\xc7\x55\x16\xd0" + "\xc5\x2e\x02\x44\x35\xd0\x7e\x67" + "\xf2\xc4\x9b\xcd\x95\x10\xcc\x29" + "\x4b\xfa\x86\x87\xbe\x40\x36\xbe" + "\xe1\xa3\x52\x89\x55\x20\x9b\xc2" + "\xab\xf2\x31\x34\x16\xad\xc8\x17" + "\x65\x24\xc0\xff\x12\x37\xfe\x5a" + "\x62\x3b\x59\x47\x6c\x5f\x3a\x8e" + "\x3b\xd9\x30\xc8\x7f\x2f\x88\xda" + "\x80\xfd\x02\xda\x7f\x9a\x7a\x73" + "\x59\xc5\x34\x09\x9a\x11\xcb\xa7" + "\xfc\xf6\xa1\xa0\x60\xfb\x43\xbb" + "\xf1\xe9\xd7\xc6\x79\x27\x4e\xff" + "\x22\xb4\x24\xbf\x76\xee\x47\xb9" + "\x6d\x3f\x8b\xb0\x9c\x3c\x43\xdd" + "\xff\x25\x2e\x6d\xa4\x2b\xfb\x5d" + "\x1b\x97\x6c\x55\x0a\x82\x7a\x7b" + "\x94\x34\xc2\xdb\x2f\x1f\xc1\xea" + "\xd4\x4d\x17\x46\x3b\x51\x69\x09" + "\xe4\x99\x32\x25\xfd\x94\xaf\xfb" + "\x10\xf7\x4f\xdd\x0b\x3c\x8b\x41" + "\xb3\x6a\xb7\xd1\x33\xa8\x0c\x2f" + "\x62\x4c\x72\x11\xd7\x74\xe1\x3b" + "\x38\x43\x66\x7b\x6c\x36\x48\xe7" + "\xe3\xe7\x9d\xb9\x42\x73\x7a\x2a" + "\x89\x20\x1a\x41\x80\x03\xf7\x8f" + "\x61\x78\x13\xbf\xfe\x50\xf5\x04" + "\x52\xf9\xac\x47\xf8\x62\x4b\xb2" + "\x24\xa9\xbf\x64\xb0\x18\x69\xd2" + "\xf5\xe4\xce\xc8\xb1\x87\x75\xd6" + "\x2c\x24\x79\x00\x7d\x26\xfb\x44" + "\xe7\x45\x7a\xee\x58\xa5\x83\xc1" + "\xb4\x24\xab\x23\x2f\x4d\xd7\x4f" + "\x1c\xc7\xaa\xa9\x50\xf4\xa3\x07" + "\x12\x13\x89\x74\xdc\x31\x6a\xb2" + "\xf5\x0f\x13\x8b\xb9\xdb\x85\x1f" + "\xf5\xbc\x88\xd9\x95\xea\x31\x6c" + "\x36\x60\xb6\x49\xdc\xc4\xf7\x55" + "\x3f\x21\xc1\xb5\x92\x18\x5e\xbc" + "\x9f\x87\x7f\xe7\x79\x25\x40\x33" + "\xd6\xb9\x33\xd5\x50\xb3\xc7\x89" + "\x1b\x12\xa0\x46\xdd\xa7\xd8\x3e" + "\x71\xeb\x6f\x66\xa1\x26\x0c\x67" + "\xab\xb2\x38\x58\x17\xd8\x44\x3b" + "\x16\xf0\x8e\x62\x8d\x16\x10\x00" + "\x32\x8b\xef\xb9\x28\xd3\xc5\xad" + "\x0a\x19\xa2\xe4\x03\x27\x7d\x94" + "\x06\x18\xcd\xd6\x27\x00\xf9\x1f" + "\xb6\xb3\xfe\x96\x35\x5f\xc4\x1c" + "\x07\x62\x10\x79\x68\x50\xf1\x7e" + "\x29\xe7\xc4\xc4\xe7\xee\x54\xd6" + "\x58\x76\x84\x6d\x8d\xe4\x59\x31" + "\xe9\xf4\xdc\xa1\x1f\xe5\x1a\xd6" + "\xe6\x64\x46\xf5\x77\x9c\x60\x7a" + "\x5e\x62\xe3\x0a\xd4\x9f\x7a\x2d" + "\x7a\xa5\x0a\x7b\x29\x86\x7a\x74" + "\x74\x71\x6b\xca\x7d\x1d\xaa\xba" + "\x39\x84\x43\x76\x35\xfe\x4f\x9b" + "\xbb\xbb\xb5\x6a\x32\xb5\x5d\x41" + "\x51\xf0\x5b\x68\x03\x47\x4b\x8a" + "\xca\x88\xf6\x37\xbd\x73\x51\x70" + "\x66\xfe\x9e\x5f\x21\x9c\xf3\xdd" + "\xc3\xea\x27\xf9\x64\x94\xe1\x19" + "\xa0\xa9\xab\x60\xe0\x0e\xf7\x78" + "\x70\x86\xeb\xe0\xd1\x5c\x05\xd3" + "\xd7\xca\xe0\xc0\x47\x47\x34\xee" + "\x11\xa3\xa3\x54\x98\xb7\x49\x8e" + "\x84\x28\x70\x2c\x9e\xfb\x55\x54" + "\x4d\xf8\x86\xf7\x85\x7c\xbd\xf3" + "\x17\xd8\x47\xcb\xac\xf4\x20\x85" + "\x34\x66\xad\x37\x2d\x5e\x52\xda" + "\x8a\xfe\x98\x55\x30\xe7\x2d\x2b" + "\x19\x10\x8e\x7b\x66\x5e\xdc\xe0" + "\x45\x1f\x7b\xb4\x08\xfb\x8f\xf6" + "\x8c\x89\x21\x34\x55\x27\xb2\x76" + "\xb2\x07\xd9\xd6\x68\x9b\xea\x6b" + "\x2d\xb4\xc4\x35\xdd\xd2\x79\xae" + "\xc7\xd6\x26\x7f\x12\x01\x8c\xa7" + "\xe3\xdb\xa8\xf4\xf7\x2b\xec\x99" + "\x11\x00\xf1\x35\x8c\xcf\xd5\xc9" + "\xbd\x91\x36\x39\x70\xcf\x7d\x70" + "\x47\x1a\xfc\x6b\x56\xe0\x3f\x9c" + "\x60\x49\x01\x72\xa9\xaf\x2c\x9c" + "\xe8\xab\xda\x8c\x14\x19\xf3\x75" + "\x07\x17\x9d\x44\x67\x7a\x2e\xef" + "\xb7\x83\x35\x4a\xd1\x3d\x1c\x84" + "\x32\xdd\xaa\xea\xca\x1d\xdc\x72" + "\x2c\xcc\x43\xcd\x5d\xe3\x21\xa4" + "\xd0\x8a\x4b\x20\x12\xa3\xd5\x86" + "\x76\x96\xff\x5f\x04\x57\x0f\xe6" + "\xba\xe8\x76\x50\x0c\x64\x1d\x83" + "\x9c\x9b\x9a\x9a\x58\x97\x9c\x5c" + "\xb4\xa4\xa6\x3e\x19\xeb\x8f\x5a" + "\x61\xb2\x03\x7b\x35\x19\xbe\xa7" + "\x63\x0c\xfd\xdd\xf9\x90\x6c\x08" + "\x19\x11\xd3\x65\x4a\xf5\x96\x92" + "\x59\xaa\x9c\x61\x0c\x29\xa7\xf8" + "\x14\x39\x37\xbf\x3c\xf2\x16\x72" + "\x02\xfa\xa2\xf3\x18\x67\x5d\xcb" + "\xdc\x4d\xbb\x96\xff\x70\x08\x2d" + "\xc2\xa8\x52\xe1\x34\x5f\x72\xfe" + "\x64\xbf\xca\xa7\x74\x38\xfb\x74" + "\x55\x9c\xfa\x8a\xed\xfb\x98\xeb" + "\x58\x2e\x6c\xe1\x52\x76\x86\xd7" + "\xcf\xa1\xa4\xfc\xb2\x47\x41\x28" + "\xa3\xc1\xe5\xfd\x53\x19\x28\x2b" + "\x37\x04\x65\x96\x99\x7a\x28\x0f" + "\x07\x68\x4b\xc7\x52\x0a\x55\x35" + "\x40\x19\x95\x61\xe8\x59\x40\x1f" + "\x9d\xbf\x78\x7d\x8f\x84\xff\x6f" + "\xd0\xd5\x63\xd2\x22\xbd\xc8\x4e" + "\xfb\xe7\x9f\x06\xe6\xe7\x39\x6d" + "\x6a\x96\x9f\xf0\x74\x7e\xc9\x35" + "\xb7\x26\xb8\x1c\x0a\xa6\x27\x2c" + "\xa2\x2b\xfe\xbe\x0f\x07\x73\xae" + "\x7f\x7f\x54\xf5\x7c\x6a\x0a\x56" + "\x49\xd4\x81\xe5\x85\x53\x99\x1f" + "\x95\x05\x13\x58\x8d\x0e\x1b\x90" + "\xc3\x75\x48\x64\x58\x98\x67\x84" + "\xae\xe2\x21\xa2\x8a\x04\x0a\x0b" + "\x61\xaa\xb0\xd4\x28\x60\x7a\xf8" + "\xbc\x52\xfb\x24\x7f\xed\x0d\x2a" + "\x0a\xb2\xf9\xc6\x95\xb5\x11\xc9" + "\xf4\x0f\x26\x11\xcf\x2a\x57\x87" + "\x7a\xf3\xe7\x94\x65\xc2\xb5\xb3" + "\xab\x98\xe3\xc1\x2b\x59\x19\x7c" + "\xd6\xf3\xf9\xbf\xff\x6d\xc6\x82" + "\x13\x2f\x4a\x2e\xcd\x26\xfe\x2d" + "\x01\x70\xf4\xc2\x7f\x1f\x4c\xcb" + "\x47\x77\x0c\xa0\xa3\x03\xec\xda" + "\xa9\xbf\x0d\x2d\xae\xe4\xb8\x7b" + "\xa9\xbc\x08\xb4\x68\x2e\xc5\x60" + "\x8d\x87\x41\x2b\x0f\x69\xf0\xaf" + "\x5f\xba\x72\x20\x0f\x33\xcd\x6d" + "\x36\x7d\x7b\xd5\x05\xf1\x4b\x05" + "\xc4\xfc\x7f\x80\xb9\x4d\xbd\xf7" + "\x7c\x84\x07\x01\xc2\x40\x66\x5b" + "\x98\xc7\x2c\xe3\x97\xfa\xdf\x87" + "\xa0\x1f\xe9\x21\x42\x0f\x3b\xeb" + "\x89\x1c\x3b\xca\x83\x61\x77\x68" + "\x84\xbb\x60\x87\x38\x2e\x25\xd5" + "\x9e\x04\x41\x70\xac\xda\xc0\x9c" + "\x9c\x69\xea\x8d\x4e\x55\x2a\x29" + "\xed\x05\x4b\x7b\x73\x71\x90\x59" + "\x4d\xc8\xd8\x44\xf0\x4c\xe1\x5e" + "\x84\x47\x55\xcc\x32\x3f\xe7\x97" + "\x42\xc6\x32\xac\x40\xe5\xa5\xc7" + "\x8b\xed\xdb\xf7\x83\xd6\xb1\xc2" + "\x52\x5e\x34\xb7\xeb\x6e\xd9\xfc" + "\xe5\x93\x9a\x97\x3e\xb0\xdc\xd9" + "\xd7\x06\x10\xb6\x1d\x80\x59\xdd" + "\x0d\xfe\x64\x35\xcd\x5d\xec\xf0" + "\xba\xd0\x34\xc9\x2d\x91\xc5\x17" + "\x11", + .len = 1281, + .also_non_np = 1, + .np = 3, + .tap = { 1200, 1, 80 }, + }, +}; + /* * CTS (Cipher Text Stealing) mode tests */ diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h index f00052137942..f977e685925d 100644 --- a/include/crypto/chacha20.h +++ b/include/crypto/chacha20.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* - * Common values for the ChaCha20 algorithm + * Common values and helper functions for the ChaCha20 and XChaCha20 algorithms. + * + * XChaCha20 extends ChaCha20's nonce to 192 bits, while provably retaining + * ChaCha20's security. Here they share the same key size, tfm context, and + * setkey function; only their IV size and encrypt/decrypt function differ. */ #ifndef _CRYPTO_CHACHA20_H @@ -10,11 +14,16 @@ #include #include +/* 32-bit stream position, then 96-bit nonce (RFC7539 convention) */ #define CHACHA20_IV_SIZE 16 + #define CHACHA20_KEY_SIZE 32 #define CHACHA20_BLOCK_SIZE 64 #define CHACHA20_BLOCK_WORDS (CHACHA20_BLOCK_SIZE / sizeof(u32)) +/* 192-bit nonce, then 64-bit stream position */ +#define XCHACHA20_IV_SIZE 32 + struct chacha20_ctx { u32 key[8]; }; @@ -23,8 +32,11 @@ void chacha20_block(u32 *state, u32 *stream); void hchacha20_block(const u32 *in, u32 *out); void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv); + int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keysize); + int crypto_chacha20_crypt(struct skcipher_request *req); +int crypto_xchacha20_crypt(struct skcipher_request *req); #endif From patchwork Mon Aug 6 22:32:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558051 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2AEB1390 for ; Mon, 6 Aug 2018 22:35:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E36E29570 for ; Mon, 6 Aug 2018 22:35:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 921B6297C0; Mon, 6 Aug 2018 22:35:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFD78297BF for ; Mon, 6 Aug 2018 22:35:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387448AbeHGAqs (ORCPT ); Mon, 6 Aug 2018 20:46:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:45108 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731143AbeHGAqN (ORCPT ); Mon, 6 Aug 2018 20:46:13 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 445FD21A6C; Mon, 6 Aug 2018 22:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594899; bh=bNmrdw7MhY26yo+UXxjn7y5BehaDYe61YY+9anpZeBc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S14Plrq26rd0PzmEZIORdFz8ascZ+aKsaYEuVd03B/jymdXkzAHPJHaQ48nd28hRM ZRZycUASJ56NXtfHXVugzWM7Dva2mVrYPWfMW45usM7Z+5Z/WZvRiMTlqOxjJc7ss0 rdb8I1PAHChk3kpG57LaDhttme3r5wZx3vADjd+8= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 3/9] crypto: chacha20-generic - refactor to allow varying number of rounds Date: Mon, 6 Aug 2018 15:32:54 -0700 Message-Id: <20180806223300.113891-4-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers In preparation for adding XChaCha12 support, rename/refactor chacha20-generic to support different numbers of rounds. The only difference between ChaCha{8,12,20} are the number of rounds itself; all other parts of the algorithm are the same. Therefore, remove the "20" from all definitions, structures, functions, files, etc. that will be shared by all ChaCha versions. Also make ->setkey() store the round count in the chacha_ctx (previously chacha20_ctx). The generic code then passes the round count through to chacha_block(). There will be a ->setkey() function for each explicitly allowed round count; the encrypt/decrypt functions will be the same. I decided not to do it the opposite way (same ->setkey() function for all round counts, with different encrypt/decrypt functions) because that would have required more boilerplate code in architecture-specific implementations of ChaCha and XChaCha. To be as careful as possible, we whitelist the allowed round counts in the low-level generic code. Currently only 20 is allowed, i.e. no actual use of reduced-round ChaCha is introduced by this patch. Signed-off-by: Eric Biggers --- arch/arm/crypto/chacha20-neon-glue.c | 44 +++---- arch/arm64/crypto/chacha20-neon-glue.c | 40 +++---- arch/x86/crypto/chacha20_glue.c | 52 ++++----- crypto/Makefile | 2 +- crypto/chacha20poly1305.c | 10 +- .../{chacha20_generic.c => chacha_generic.c} | 110 ++++++++++-------- drivers/char/random.c | 50 ++++---- include/crypto/chacha.h | 47 ++++++++ include/crypto/chacha20.h | 42 ------- lib/Makefile | 2 +- lib/{chacha20.c => chacha.c} | 43 ++++--- 11 files changed, 230 insertions(+), 212 deletions(-) rename crypto/{chacha20_generic.c => chacha_generic.c} (56%) create mode 100644 include/crypto/chacha.h delete mode 100644 include/crypto/chacha20.h rename lib/{chacha20.c => chacha.c} (67%) diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c index 59a7be08e80c..ed8dec0f1768 100644 --- a/arch/arm/crypto/chacha20-neon-glue.c +++ b/arch/arm/crypto/chacha20-neon-glue.c @@ -19,7 +19,7 @@ */ #include -#include +#include #include #include #include @@ -34,20 +34,20 @@ asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src); static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { - u8 buf[CHACHA20_BLOCK_SIZE]; + u8 buf[CHACHA_BLOCK_SIZE]; - while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + while (bytes >= CHACHA_BLOCK_SIZE * 4) { chacha20_4block_xor_neon(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE * 4; - src += CHACHA20_BLOCK_SIZE * 4; - dst += CHACHA20_BLOCK_SIZE * 4; + bytes -= CHACHA_BLOCK_SIZE * 4; + src += CHACHA_BLOCK_SIZE * 4; + dst += CHACHA_BLOCK_SIZE * 4; state[12] += 4; } - while (bytes >= CHACHA20_BLOCK_SIZE) { + while (bytes >= CHACHA_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; + bytes -= CHACHA_BLOCK_SIZE; + src += CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; state[12]++; } if (bytes) { @@ -60,17 +60,17 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, static int chacha20_neon(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; u32 state[16]; int err; - if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd()) - return crypto_chacha20_crypt(req); + if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) + return crypto_chacha_crypt(req); err = skcipher_walk_virt(&walk, req, true); - crypto_chacha20_init(state, ctx, walk.iv); + crypto_chacha_init(state, ctx, walk.iv); kernel_neon_begin(); while (walk.nbytes > 0) { @@ -93,17 +93,17 @@ static struct skcipher_alg alg = { .base.cra_driver_name = "chacha20-neon", .base.cra_priority = 300, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct chacha_ctx), .base.cra_module = THIS_MODULE, - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = CHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, - .walksize = 4 * CHACHA20_BLOCK_SIZE, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = 4 * CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, - .encrypt = chacha20_neon, - .decrypt = chacha20_neon, + .encrypt = chacha_neon, + .decrypt = chacha_neon, }; static int __init chacha20_simd_mod_init(void) diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c index 727579c93ded..96e0cfb8c3f5 100644 --- a/arch/arm64/crypto/chacha20-neon-glue.c +++ b/arch/arm64/crypto/chacha20-neon-glue.c @@ -19,7 +19,7 @@ */ #include -#include +#include #include #include #include @@ -34,15 +34,15 @@ asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src); static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { - u8 buf[CHACHA20_BLOCK_SIZE]; + u8 buf[CHACHA_BLOCK_SIZE]; - while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + while (bytes >= CHACHA_BLOCK_SIZE * 4) { kernel_neon_begin(); chacha20_4block_xor_neon(state, dst, src); kernel_neon_end(); - bytes -= CHACHA20_BLOCK_SIZE * 4; - src += CHACHA20_BLOCK_SIZE * 4; - dst += CHACHA20_BLOCK_SIZE * 4; + bytes -= CHACHA_BLOCK_SIZE * 4; + src += CHACHA_BLOCK_SIZE * 4; + dst += CHACHA_BLOCK_SIZE * 4; state[12] += 4; } @@ -50,11 +50,11 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, return; kernel_neon_begin(); - while (bytes >= CHACHA20_BLOCK_SIZE) { + while (bytes >= CHACHA_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; + bytes -= CHACHA_BLOCK_SIZE; + src += CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; state[12]++; } if (bytes) { @@ -68,17 +68,17 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, static int chacha20_neon(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; u32 state[16]; int err; - if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE) - return crypto_chacha20_crypt(req); + if (!may_use_simd() || req->cryptlen <= CHACHA_BLOCK_SIZE) + return crypto_chacha_crypt(req); err = skcipher_walk_virt(&walk, req, false); - crypto_chacha20_init(state, ctx, walk.iv); + crypto_chacha_init(state, ctx, walk.iv); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -99,14 +99,14 @@ static struct skcipher_alg alg = { .base.cra_driver_name = "chacha20-neon", .base.cra_priority = 300, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct chacha_ctx), .base.cra_module = THIS_MODULE, - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = CHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, - .walksize = 4 * CHACHA20_BLOCK_SIZE, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = 4 * CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, .encrypt = chacha20_neon, .decrypt = chacha20_neon, diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c index dce7c5d39c2f..bd249f0b29dc 100644 --- a/arch/x86/crypto/chacha20_glue.c +++ b/arch/x86/crypto/chacha20_glue.c @@ -10,7 +10,7 @@ */ #include -#include +#include #include #include #include @@ -29,31 +29,31 @@ static bool chacha20_use_avx2; static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { - u8 buf[CHACHA20_BLOCK_SIZE]; + u8 buf[CHACHA_BLOCK_SIZE]; #ifdef CONFIG_AS_AVX2 if (chacha20_use_avx2) { - while (bytes >= CHACHA20_BLOCK_SIZE * 8) { + while (bytes >= CHACHA_BLOCK_SIZE * 8) { chacha20_8block_xor_avx2(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE * 8; - src += CHACHA20_BLOCK_SIZE * 8; - dst += CHACHA20_BLOCK_SIZE * 8; + bytes -= CHACHA_BLOCK_SIZE * 8; + src += CHACHA_BLOCK_SIZE * 8; + dst += CHACHA_BLOCK_SIZE * 8; state[12] += 8; } } #endif - while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + while (bytes >= CHACHA_BLOCK_SIZE * 4) { chacha20_4block_xor_ssse3(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE * 4; - src += CHACHA20_BLOCK_SIZE * 4; - dst += CHACHA20_BLOCK_SIZE * 4; + bytes -= CHACHA_BLOCK_SIZE * 4; + src += CHACHA_BLOCK_SIZE * 4; + dst += CHACHA_BLOCK_SIZE * 4; state[12] += 4; } - while (bytes >= CHACHA20_BLOCK_SIZE) { + while (bytes >= CHACHA_BLOCK_SIZE) { chacha20_block_xor_ssse3(state, dst, src); - bytes -= CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; + bytes -= CHACHA_BLOCK_SIZE; + src += CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; state[12]++; } if (bytes) { @@ -66,7 +66,7 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src, static int chacha20_simd(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); u32 *state, state_buf[16 + 2] __aligned(8); struct skcipher_walk walk; int err; @@ -74,20 +74,20 @@ static int chacha20_simd(struct skcipher_request *req) BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16); state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN); - if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd()) - return crypto_chacha20_crypt(req); + if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) + return crypto_chacha_crypt(req); err = skcipher_walk_virt(&walk, req, true); - crypto_chacha20_init(state, ctx, walk.iv); + crypto_chacha_init(state, ctx, walk.iv); kernel_fpu_begin(); - while (walk.nbytes >= CHACHA20_BLOCK_SIZE) { + while (walk.nbytes >= CHACHA_BLOCK_SIZE) { chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr, - rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE)); + rounddown(walk.nbytes, CHACHA_BLOCK_SIZE)); err = skcipher_walk_done(&walk, - walk.nbytes % CHACHA20_BLOCK_SIZE); + walk.nbytes % CHACHA_BLOCK_SIZE); } if (walk.nbytes) { @@ -106,13 +106,13 @@ static struct skcipher_alg alg = { .base.cra_driver_name = "chacha20-simd", .base.cra_priority = 300, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct chacha_ctx), .base.cra_module = THIS_MODULE, - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = CHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, .encrypt = chacha20_simd, .decrypt = chacha20_simd, diff --git a/crypto/Makefile b/crypto/Makefile index 6d1d40eeb964..0701c4577dc6 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -117,7 +117,7 @@ obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o obj-$(CONFIG_CRYPTO_SEED) += seed.o obj-$(CONFIG_CRYPTO_SPECK) += speck.o obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o -obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o +obj-$(CONFIG_CRYPTO_CHACHA20) += chacha_generic.o obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c index 600afa99941f..573c07e6f189 100644 --- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include @@ -51,7 +51,7 @@ struct poly_req { }; struct chacha_req { - u8 iv[CHACHA20_IV_SIZE]; + u8 iv[CHACHA_IV_SIZE]; struct scatterlist src[1]; struct skcipher_request req; /* must be last member */ }; @@ -91,7 +91,7 @@ static void chacha_iv(u8 *iv, struct aead_request *req, u32 icb) memcpy(iv, &leicb, sizeof(leicb)); memcpy(iv + sizeof(leicb), ctx->salt, ctx->saltlen); memcpy(iv + sizeof(leicb) + ctx->saltlen, req->iv, - CHACHA20_IV_SIZE - sizeof(leicb) - ctx->saltlen); + CHACHA_IV_SIZE - sizeof(leicb) - ctx->saltlen); } static int poly_verify_tag(struct aead_request *req) @@ -494,7 +494,7 @@ static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key, struct chachapoly_ctx *ctx = crypto_aead_ctx(aead); int err; - if (keylen != ctx->saltlen + CHACHA20_KEY_SIZE) + if (keylen != ctx->saltlen + CHACHA_KEY_SIZE) return -EINVAL; keylen -= ctx->saltlen; @@ -639,7 +639,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, err = -EINVAL; /* Need 16-byte IV size, including Initial Block Counter value */ - if (crypto_skcipher_alg_ivsize(chacha) != CHACHA20_IV_SIZE) + if (crypto_skcipher_alg_ivsize(chacha) != CHACHA_IV_SIZE) goto out_drop_chacha; /* Not a stream cipher? */ if (chacha->base.cra_blocksize != 1) diff --git a/crypto/chacha20_generic.c b/crypto/chacha_generic.c similarity index 56% rename from crypto/chacha20_generic.c rename to crypto/chacha_generic.c index ba97aac93912..46496997847a 100644 --- a/crypto/chacha20_generic.c +++ b/crypto/chacha_generic.c @@ -12,32 +12,32 @@ #include #include -#include +#include #include #include -static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src, - unsigned int bytes) +static void chacha_docrypt(u32 *state, u8 *dst, const u8 *src, + unsigned int bytes, int nrounds) { - u32 stream[CHACHA20_BLOCK_WORDS]; + u32 stream[CHACHA_BLOCK_WORDS]; if (dst != src) memcpy(dst, src, bytes); - while (bytes >= CHACHA20_BLOCK_SIZE) { - chacha20_block(state, stream); - crypto_xor(dst, (const u8 *)stream, CHACHA20_BLOCK_SIZE); - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; + while (bytes >= CHACHA_BLOCK_SIZE) { + chacha_block(state, stream, nrounds); + crypto_xor(dst, (const u8 *)stream, CHACHA_BLOCK_SIZE); + bytes -= CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; } if (bytes) { - chacha20_block(state, stream); + chacha_block(state, stream, nrounds); crypto_xor(dst, (const u8 *)stream, bytes); } } -static int chacha20_stream_xor(struct skcipher_request *req, - struct chacha20_ctx *ctx, u8 *iv) +static int chacha_stream_xor(struct skcipher_request *req, + struct chacha_ctx *ctx, u8 *iv) { struct skcipher_walk walk; u32 state[16]; @@ -45,7 +45,7 @@ static int chacha20_stream_xor(struct skcipher_request *req, err = skcipher_walk_virt(&walk, req, true); - crypto_chacha20_init(state, ctx, iv); + crypto_chacha_init(state, ctx, iv); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -53,15 +53,15 @@ static int chacha20_stream_xor(struct skcipher_request *req, if (nbytes < walk.total) nbytes = round_down(nbytes, walk.stride); - chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, - nbytes); + chacha_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, + nbytes, ctx->nrounds); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv) +void crypto_chacha_init(u32 *state, struct chacha_ctx *ctx, u8 *iv) { state[0] = 0x61707865; /* "expa" */ state[1] = 0x3320646e; /* "nd 3" */ @@ -80,53 +80,61 @@ void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv) state[14] = get_unaligned_le32(iv + 8); state[15] = get_unaligned_le32(iv + 12); } -EXPORT_SYMBOL_GPL(crypto_chacha20_init); +EXPORT_SYMBOL_GPL(crypto_chacha_init); -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, - unsigned int keysize) +static int chacha_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize, int nrounds) { - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); int i; - if (keysize != CHACHA20_KEY_SIZE) + if (keysize != CHACHA_KEY_SIZE) return -EINVAL; for (i = 0; i < ARRAY_SIZE(ctx->key); i++) ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32)); + ctx->nrounds = nrounds; return 0; } + +int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize) +{ + return chacha_setkey(tfm, key, keysize, 20); +} EXPORT_SYMBOL_GPL(crypto_chacha20_setkey); -int crypto_chacha20_crypt(struct skcipher_request *req) +int crypto_chacha_crypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); - return chacha20_stream_xor(req, ctx, req->iv); + return chacha_stream_xor(req, ctx, req->iv); } -EXPORT_SYMBOL_GPL(crypto_chacha20_crypt); +EXPORT_SYMBOL_GPL(crypto_chacha_crypt); -int crypto_xchacha20_crypt(struct skcipher_request *req) +int crypto_xchacha_crypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm); - struct chacha20_ctx subctx; + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx subctx; u32 state[16]; u8 real_iv[16]; /* Compute the subkey given the original key and first 128 nonce bits */ - crypto_chacha20_init(state, ctx, req->iv); - hchacha20_block(state, subctx.key); + crypto_chacha_init(state, ctx, req->iv); + hchacha_block(state, subctx.key, ctx->nrounds); + subctx.nrounds = ctx->nrounds; /* Build the real IV */ memcpy(&real_iv[0], req->iv + 24, 8); /* stream position */ memcpy(&real_iv[8], req->iv + 16, 8); /* remaining 64 nonce bits */ /* Generate the stream and XOR it with the data */ - return chacha20_stream_xor(req, &subctx, real_iv); + return chacha_stream_xor(req, &subctx, real_iv); } -EXPORT_SYMBOL_GPL(crypto_xchacha20_crypt); +EXPORT_SYMBOL_GPL(crypto_xchacha_crypt); static struct skcipher_alg algs[] = { { @@ -134,50 +142,50 @@ static struct skcipher_alg algs[] = { .base.cra_driver_name = "chacha20-generic", .base.cra_priority = 100, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct chacha_ctx), .base.cra_module = THIS_MODULE, - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = CHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, - .encrypt = crypto_chacha20_crypt, - .decrypt = crypto_chacha20_crypt, + .encrypt = crypto_chacha_crypt, + .decrypt = crypto_chacha_crypt, }, { .base.cra_name = "xchacha20", .base.cra_driver_name = "xchacha20-generic", .base.cra_priority = 100, .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha20_ctx), + .base.cra_ctxsize = sizeof(struct chacha_ctx), .base.cra_module = THIS_MODULE, - .min_keysize = CHACHA20_KEY_SIZE, - .max_keysize = CHACHA20_KEY_SIZE, - .ivsize = XCHACHA20_IV_SIZE, - .chunksize = CHACHA20_BLOCK_SIZE, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = XCHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, - .encrypt = crypto_xchacha20_crypt, - .decrypt = crypto_xchacha20_crypt, + .encrypt = crypto_xchacha_crypt, + .decrypt = crypto_xchacha_crypt, } }; -static int __init chacha20_generic_mod_init(void) +static int __init chacha_generic_mod_init(void) { return crypto_register_skciphers(algs, ARRAY_SIZE(algs)); } -static void __exit chacha20_generic_mod_fini(void) +static void __exit chacha_generic_mod_fini(void) { crypto_unregister_skciphers(algs, ARRAY_SIZE(algs)); } -module_init(chacha20_generic_mod_init); -module_exit(chacha20_generic_mod_fini); +module_init(chacha_generic_mod_init); +module_exit(chacha_generic_mod_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Martin Willi "); -MODULE_DESCRIPTION("ChaCha20 and XChaCha20 stream ciphers (generic)"); +MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (generic)"); MODULE_ALIAS_CRYPTO("chacha20"); MODULE_ALIAS_CRYPTO("chacha20-generic"); MODULE_ALIAS_CRYPTO("xchacha20"); diff --git a/drivers/char/random.c b/drivers/char/random.c index bd449ad52442..edf956084179 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -265,7 +265,7 @@ #include #include #include -#include +#include #include #include @@ -431,11 +431,11 @@ static int crng_init = 0; #define crng_ready() (likely(crng_init > 1)) static int crng_init_cnt = 0; static unsigned long crng_global_init_time = 0; -#define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE) +#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE) static void _extract_crng(struct crng_state *crng, - __u32 out[CHACHA20_BLOCK_WORDS]); + __u32 out[CHACHA_BLOCK_WORDS]); static void _crng_backtrack_protect(struct crng_state *crng, - __u32 tmp[CHACHA20_BLOCK_WORDS], int used); + __u32 tmp[CHACHA_BLOCK_WORDS], int used); static void process_random_ready_list(void); static void _get_random_bytes(void *buf, int nbytes); @@ -849,7 +849,7 @@ static int crng_fast_load(const char *cp, size_t len) } p = (unsigned char *) &primary_crng.state[4]; while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) { - p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp; + p[crng_init_cnt % CHACHA_KEY_SIZE] ^= *cp; cp++; crng_init_cnt++; len--; } spin_unlock_irqrestore(&primary_crng.lock, flags); @@ -881,7 +881,7 @@ static int crng_slow_load(const char *cp, size_t len) unsigned long flags; static unsigned char lfsr = 1; unsigned char tmp; - unsigned i, max = CHACHA20_KEY_SIZE; + unsigned i, max = CHACHA_KEY_SIZE; const char * src_buf = cp; char * dest_buf = (char *) &primary_crng.state[4]; @@ -899,8 +899,8 @@ static int crng_slow_load(const char *cp, size_t len) lfsr >>= 1; if (tmp & 1) lfsr ^= 0xE1; - tmp = dest_buf[i % CHACHA20_KEY_SIZE]; - dest_buf[i % CHACHA20_KEY_SIZE] ^= src_buf[i % len] ^ lfsr; + tmp = dest_buf[i % CHACHA_KEY_SIZE]; + dest_buf[i % CHACHA_KEY_SIZE] ^= src_buf[i % len] ^ lfsr; lfsr += (tmp << 3) | (tmp >> 5); } spin_unlock_irqrestore(&primary_crng.lock, flags); @@ -912,7 +912,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) unsigned long flags; int i, num; union { - __u32 block[CHACHA20_BLOCK_WORDS]; + __u32 block[CHACHA_BLOCK_WORDS]; __u32 key[8]; } buf; @@ -923,7 +923,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) } else { _extract_crng(&primary_crng, buf.block); _crng_backtrack_protect(&primary_crng, buf.block, - CHACHA20_KEY_SIZE); + CHACHA_KEY_SIZE); } spin_lock_irqsave(&crng->lock, flags); for (i = 0; i < 8; i++) { @@ -959,7 +959,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) } static void _extract_crng(struct crng_state *crng, - __u32 out[CHACHA20_BLOCK_WORDS]) + __u32 out[CHACHA_BLOCK_WORDS]) { unsigned long v, flags; @@ -976,7 +976,7 @@ static void _extract_crng(struct crng_state *crng, spin_unlock_irqrestore(&crng->lock, flags); } -static void extract_crng(__u32 out[CHACHA20_BLOCK_WORDS]) +static void extract_crng(__u32 out[CHACHA_BLOCK_WORDS]) { struct crng_state *crng = NULL; @@ -994,14 +994,14 @@ static void extract_crng(__u32 out[CHACHA20_BLOCK_WORDS]) * enough) to mutate the CRNG key to provide backtracking protection. */ static void _crng_backtrack_protect(struct crng_state *crng, - __u32 tmp[CHACHA20_BLOCK_WORDS], int used) + __u32 tmp[CHACHA_BLOCK_WORDS], int used) { unsigned long flags; __u32 *s, *d; int i; used = round_up(used, sizeof(__u32)); - if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) { + if (used + CHACHA_KEY_SIZE > CHACHA_BLOCK_SIZE) { extract_crng(tmp); used = 0; } @@ -1013,7 +1013,7 @@ static void _crng_backtrack_protect(struct crng_state *crng, spin_unlock_irqrestore(&crng->lock, flags); } -static void crng_backtrack_protect(__u32 tmp[CHACHA20_BLOCK_WORDS], int used) +static void crng_backtrack_protect(__u32 tmp[CHACHA_BLOCK_WORDS], int used) { struct crng_state *crng = NULL; @@ -1028,8 +1028,8 @@ static void crng_backtrack_protect(__u32 tmp[CHACHA20_BLOCK_WORDS], int used) static ssize_t extract_crng_user(void __user *buf, size_t nbytes) { - ssize_t ret = 0, i = CHACHA20_BLOCK_SIZE; - __u32 tmp[CHACHA20_BLOCK_WORDS]; + ssize_t ret = 0, i = CHACHA_BLOCK_SIZE; + __u32 tmp[CHACHA_BLOCK_WORDS]; int large_request = (nbytes > 256); while (nbytes) { @@ -1043,7 +1043,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t nbytes) } extract_crng(tmp); - i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE); + i = min_t(int, nbytes, CHACHA_BLOCK_SIZE); if (copy_to_user(buf, tmp, i)) { ret = -EFAULT; break; @@ -1612,14 +1612,14 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller, */ static void _get_random_bytes(void *buf, int nbytes) { - __u32 tmp[CHACHA20_BLOCK_WORDS]; + __u32 tmp[CHACHA_BLOCK_WORDS]; trace_get_random_bytes(nbytes, _RET_IP_); - while (nbytes >= CHACHA20_BLOCK_SIZE) { + while (nbytes >= CHACHA_BLOCK_SIZE) { extract_crng(buf); - buf += CHACHA20_BLOCK_SIZE; - nbytes -= CHACHA20_BLOCK_SIZE; + buf += CHACHA_BLOCK_SIZE; + nbytes -= CHACHA_BLOCK_SIZE; } if (nbytes > 0) { @@ -1627,7 +1627,7 @@ static void _get_random_bytes(void *buf, int nbytes) memcpy(buf, tmp, nbytes); crng_backtrack_protect(tmp, nbytes); } else - crng_backtrack_protect(tmp, CHACHA20_BLOCK_SIZE); + crng_backtrack_protect(tmp, CHACHA_BLOCK_SIZE); memzero_explicit(tmp, sizeof(tmp)); } @@ -2182,8 +2182,8 @@ struct ctl_table random_table[] = { struct batched_entropy { union { - u64 entropy_u64[CHACHA20_BLOCK_SIZE / sizeof(u64)]; - u32 entropy_u32[CHACHA20_BLOCK_SIZE / sizeof(u32)]; + u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)]; + u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)]; }; unsigned int position; }; diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h new file mode 100644 index 000000000000..a504350f54df --- /dev/null +++ b/include/crypto/chacha.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Common values and helper functions for the ChaCha and XChaCha algorithms. + * + * XChaCha extends ChaCha's nonce to 192 bits, while provably retaining ChaCha's + * security. Here they share the same key size, tfm context, and setkey + * function; only their IV size and encrypt/decrypt function differ. + */ + +#ifndef _CRYPTO_CHACHA_H +#define _CRYPTO_CHACHA_H + +#include +#include +#include + +/* 32-bit stream position, then 96-bit nonce (RFC7539 convention) */ +#define CHACHA_IV_SIZE 16 + +#define CHACHA_KEY_SIZE 32 +#define CHACHA_BLOCK_SIZE 64 +#define CHACHA_BLOCK_WORDS (CHACHA_BLOCK_SIZE / sizeof(u32)) + +/* 192-bit nonce, then 64-bit stream position */ +#define XCHACHA_IV_SIZE 32 + +struct chacha_ctx { + u32 key[8]; + int nrounds; +}; + +void chacha_block(u32 *state, u32 *stream, int nrounds); +static inline void chacha20_block(u32 *state, u32 *stream) +{ + chacha_block(state, stream, 20); +} +void hchacha_block(const u32 *in, u32 *out, int nrounds); + +void crypto_chacha_init(u32 *state, struct chacha_ctx *ctx, u8 *iv); + +int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize); + +int crypto_chacha_crypt(struct skcipher_request *req); +int crypto_xchacha_crypt(struct skcipher_request *req); + +#endif /* _CRYPTO_CHACHA_H */ diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h deleted file mode 100644 index f977e685925d..000000000000 --- a/include/crypto/chacha20.h +++ /dev/null @@ -1,42 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Common values and helper functions for the ChaCha20 and XChaCha20 algorithms. - * - * XChaCha20 extends ChaCha20's nonce to 192 bits, while provably retaining - * ChaCha20's security. Here they share the same key size, tfm context, and - * setkey function; only their IV size and encrypt/decrypt function differ. - */ - -#ifndef _CRYPTO_CHACHA20_H -#define _CRYPTO_CHACHA20_H - -#include -#include -#include - -/* 32-bit stream position, then 96-bit nonce (RFC7539 convention) */ -#define CHACHA20_IV_SIZE 16 - -#define CHACHA20_KEY_SIZE 32 -#define CHACHA20_BLOCK_SIZE 64 -#define CHACHA20_BLOCK_WORDS (CHACHA20_BLOCK_SIZE / sizeof(u32)) - -/* 192-bit nonce, then 64-bit stream position */ -#define XCHACHA20_IV_SIZE 32 - -struct chacha20_ctx { - u32 key[8]; -}; - -void chacha20_block(u32 *state, u32 *stream); -void hchacha20_block(const u32 *in, u32 *out); - -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv); - -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, - unsigned int keysize); - -int crypto_chacha20_crypt(struct skcipher_request *req); -int crypto_xchacha20_crypt(struct skcipher_request *req); - -#endif diff --git a/lib/Makefile b/lib/Makefile index 90dc5520b784..e37f0f922185 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -20,7 +20,7 @@ KCOV_INSTRUMENT_dynamic_debug.o := n lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o timerqueue.o\ idr.o int_sqrt.o extable.o \ - sha1.o chacha20.o irq_regs.o argv_split.o \ + sha1.o chacha.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ earlycpio.o seq_buf.o siphash.o dec_and_lock.o \ diff --git a/lib/chacha20.c b/lib/chacha.c similarity index 67% rename from lib/chacha20.c rename to lib/chacha.c index 13a0bdcb1604..b0fd1e0882e3 100644 --- a/lib/chacha20.c +++ b/lib/chacha.c @@ -1,5 +1,5 @@ /* - * The "hash function" used as the core of the ChaCha20 stream cipher (RFC7539) + * The "hash function" used as the core of the ChaCha stream cipher (RFC7539) * * Copyright (C) 2015 Martin Willi * @@ -14,13 +14,16 @@ #include #include #include -#include +#include -static void chacha20_permute(u32 *x) +static void chacha_permute(u32 *x, int nrounds) { int i; - for (i = 0; i < 20; i += 2) { + /* whitelist the allowed round counts */ + BUG_ON(nrounds != 20); + + for (i = 0; i < nrounds; i += 2) { x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 16); x[2] += x[6]; x[14] = rol32(x[14] ^ x[2], 16); @@ -64,49 +67,51 @@ static void chacha20_permute(u32 *x) } /** - * chacha20_block - generate one keystream block and increment block counter + * chacha_block - generate one keystream block and increment block counter * @state: input state matrix (16 32-bit words) * @stream: output keystream block (64 bytes) + * @nrounds: number of rounds (currently must be 20) * - * This is the ChaCha20 core, a function from 64-byte strings to 64-byte - * strings. The caller has already converted the endianness of the input. This - * function also handles incrementing the block counter in the input matrix. + * This is the ChaCha core, a function from 64-byte strings to 64-byte strings. + * The caller has already converted the endianness of the input. This function + * also handles incrementing the block counter in the input matrix. */ -void chacha20_block(u32 *state, u32 *stream) +void chacha_block(u32 *state, u32 *stream, int nrounds) { u32 x[16]; int i; memcpy(x, state, 64); - chacha20_permute(x); + chacha_permute(x, nrounds); for (i = 0; i < ARRAY_SIZE(x); i++) stream[i] = cpu_to_le32(x[i] + state[i]); state[12]++; } -EXPORT_SYMBOL(chacha20_block); +EXPORT_SYMBOL(chacha_block); /** - * hchacha20_block - abbreviated ChaCha20 core, for XChaCha20 + * hchacha_block - abbreviated ChaCha core, for XChaCha * @in: input state matrix (16 32-bit words) * @out: output (8 32-bit words) + * @nrounds: number of rounds (currently must be 20) * - * HChaCha20 is the ChaCha equivalent of HSalsa20 and is an intermediate step - * towards XChaCha20 (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). - * HChaCha20 skips the final addition of the initial state, and outputs only - * certain words of the state. It should not be used for streaming directly. + * HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step + * towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha + * skips the final addition of the initial state, and outputs only certain words + * of the state. It should not be used for streaming directly. */ -void hchacha20_block(const u32 *in, u32 *out) +void hchacha_block(const u32 *in, u32 *out, int nrounds) { u32 x[16]; memcpy(x, in, 64); - chacha20_permute(x); + chacha_permute(x, nrounds); memcpy(&out[0], &x[0], 16); memcpy(&out[4], &x[12], 16); } -EXPORT_SYMBOL(hchacha20_block); +EXPORT_SYMBOL(hchacha_block); From patchwork Mon Aug 6 22:32:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558025 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8472D1390 for ; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71774297BF for ; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 656E2297D6; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C1DA297BF for ; Mon, 6 Aug 2018 22:35:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732952AbeHGAqQ (ORCPT ); Mon, 6 Aug 2018 20:46:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:45120 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731308AbeHGAqO (ORCPT ); Mon, 6 Aug 2018 20:46:14 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7876E21A6D; Mon, 6 Aug 2018 22:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594899; bh=WJl7TGMlAeWg1QQLypzGRBHLtgES2bK0Ti3JFDnKgn4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wj/EGFOvor6ggkMy5mwULJxj/2yyIh5/TE80DDSZlEJh90kKMIvsv2oxVvyjDxknH wzfgW+EZEj0/w3My3rz1kP6Uacefn114n97p5FojjvxYPKEDtktLxy7RoAJzejctK/ ExNzvke28yptl2eNa0ugIpuy+70pmRskFrYVAIVE= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 4/9] crypto: chacha - add XChaCha12 support Date: Mon, 6 Aug 2018 15:32:55 -0700 Message-Id: <20180806223300.113891-5-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Now that the generic implementation of ChaCha20 has been refactored to allow varying the number of rounds, add support for XChaCha12, which is the XSalsa construction applied to ChaCha12. ChaCha12 is one of the three ciphers specified by the original ChaCha paper (https://cr.yp.to/chacha/chacha-20080128.pdf: "ChaCha, a variant of Salsa20"), alongside ChaCha8 and ChaCha20. ChaCha12 is faster than ChaCha20, but has a lower security margin. Still, it hasn't been broken yet, as the best known attack on ChaCha makes it through only 7 rounds. We need XChaCha12 support so that it can be used in the construction HPolyC-XChaCha12-AES, which can provide reasonably fast disk/file encryption on low-end devices where AES-XTS is too slow as the CPUs lack AES instructions. This is an alternative to using a lightweight block cipher such as Speck128 in XTS mode. We'd prefer XChaCha20, but it's too slow on some of our target devices, so at least in some cases we do need the XChaCha12-based version. Note that it would be trivial to add vanilla ChaCha12 in addition to XChaCha12. However, it is omitted for now as we don't currently need it, and users should generally prefer the existing 20-round variant. Like discussed in the patch that introduced XChaCha20 support, I considered splitting the code into separate chacha-common, chacha20, xchacha20, and xchacha12 modules, so that these algorithms could be enabled/disabled independently. However, since nearly all the code is shared anyway, I ultimately decided there would have been little benefit to the added complexity. Signed-off-by: Eric Biggers --- crypto/Kconfig | 10 +- crypto/chacha_generic.c | 26 +- crypto/testmgr.c | 6 + crypto/testmgr.h | 578 ++++++++++++++++++++++++++++++++++++++++ include/crypto/chacha.h | 9 + lib/chacha.c | 6 +- 6 files changed, 629 insertions(+), 6 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index a58e4b1f7967..d35d423bb4d1 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1437,10 +1437,10 @@ config CRYPTO_SALSA20 Bernstein . See config CRYPTO_CHACHA20 - tristate "ChaCha20 stream cipher algorithms" + tristate "ChaCha stream cipher algorithms" select CRYPTO_BLKCIPHER help - The ChaCha20 and XChaCha20 stream cipher algorithms. + The ChaCha20, XChaCha20, and XChaCha12 stream cipher algorithms. ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J. Bernstein and further specified in RFC7539 for use in IETF protocols. @@ -1453,6 +1453,12 @@ config CRYPTO_CHACHA20 while provably retaining ChaCha20's security. See also: + XChaCha12 is XChaCha20 reduced to 12 rounds, with correspondingly + reduced security margin but increased performance. It can be needed + in some very performance-sensitive scenarios. There is no known + attack on XChaCha12 (or ChaCha12) yet, but it has a higher risk of + being broken than the 20-round variant. + config CRYPTO_CHACHA20_X86_64 tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)" depends on X86 && 64BIT diff --git a/crypto/chacha_generic.c b/crypto/chacha_generic.c index 46496997847a..0a0b66c32d03 100644 --- a/crypto/chacha_generic.c +++ b/crypto/chacha_generic.c @@ -1,5 +1,5 @@ /* - * ChaCha20 (RFC7539) and XChaCha20 stream cipher algorithms + * ChaCha and XChaCha stream ciphers, including ChaCha20 (RFC7539) * * Copyright (C) 2015 Martin Willi * Copyright (C) 2018 Google LLC @@ -105,6 +105,13 @@ int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, } EXPORT_SYMBOL_GPL(crypto_chacha20_setkey); +int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize) +{ + return chacha_setkey(tfm, key, keysize, 12); +} +EXPORT_SYMBOL_GPL(crypto_chacha12_setkey); + int crypto_chacha_crypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -167,6 +174,21 @@ static struct skcipher_alg algs[] = { .setkey = crypto_chacha20_setkey, .encrypt = crypto_xchacha_crypt, .decrypt = crypto_xchacha_crypt, + }, { + .base.cra_name = "xchacha12", + .base.cra_driver_name = "xchacha12-generic", + .base.cra_priority = 100, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = XCHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .setkey = crypto_chacha12_setkey, + .encrypt = crypto_xchacha_crypt, + .decrypt = crypto_xchacha_crypt, } }; @@ -190,3 +212,5 @@ MODULE_ALIAS_CRYPTO("chacha20"); MODULE_ALIAS_CRYPTO("chacha20-generic"); MODULE_ALIAS_CRYPTO("xchacha20"); MODULE_ALIAS_CRYPTO("xchacha20-generic"); +MODULE_ALIAS_CRYPTO("xchacha12"); +MODULE_ALIAS_CRYPTO("xchacha12-generic"); diff --git a/crypto/testmgr.c b/crypto/testmgr.c index bf250ca9a6c3..c06aeb1f01bc 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -3544,6 +3544,12 @@ static const struct alg_test_desc alg_test_descs[] = { .suite = { .hash = __VECS(aes_xcbc128_tv_template) } + }, { + .alg = "xchacha12", + .test = alg_test_skcipher, + .suite = { + .cipher = __VECS(xchacha12_tv_template) + }, }, { .alg = "xchacha20", .test = alg_test_skcipher, diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 3f992867b747..ba5c31ada273 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -31990,6 +31990,584 @@ static const struct cipher_testvec xchacha20_tv_template[] = { }, }; +/* + * Same as XChaCha20 test vectors above, but recomputed the ciphertext with + * XChaCha12, using a modified libsodium. + */ +static const struct cipher_testvec xchacha12_tv_template[] = { + { + .key = "\x79\xc9\x97\x98\xac\x67\x30\x0b" + "\xbb\x27\x04\xc9\x5c\x34\x1e\x32" + "\x45\xf3\xdc\xb2\x17\x61\xb9\x8e" + "\x52\xff\x45\xb2\x4f\x30\x4f\xc4", + .klen = 32, + .iv = "\xb3\x3f\xfd\x30\x96\x47\x9b\xcf" + "\xbc\x9a\xee\x49\x41\x76\x88\xa0" + "\xa2\x55\x4f\x8d\x95\x38\x94\x19" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00", + .ctext = "\x1b\x78\x7f\xd7\xa1\x41\x68\xab" + "\x3d\x3f\xd1\x7b\x69\x56\xb2\xd5" + "\x43\xce\xeb\xaf\x36\xf0\x29\x9d" + "\x3a\xfb\x18\xae\x1b", + .len = 29, + }, { + .key = "\x9d\x23\xbd\x41\x49\xcb\x97\x9c" + "\xcf\x3c\x5c\x94\xdd\x21\x7e\x98" + "\x08\xcb\x0e\x50\xcd\x0f\x67\x81" + "\x22\x35\xea\xaf\x60\x1d\x62\x32", + .klen = 32, + .iv = "\xc0\x47\x54\x82\x66\xb7\xc3\x70" + "\xd3\x35\x66\xa2\x42\x5c\xbf\x30" + "\xd8\x2d\x1e\xaf\x52\x94\x10\x9e" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00", + .ctext = "\xfb\x32\x09\x1d\x83\x05\xae\x4c" + "\x13\x1f\x12\x71\xf2\xca\xb2\xeb" + "\x5b\x83\x14\x7d\x83\xf6\x57\x77" + "\x2e\x40\x1f\x92\x2c\xf9\xec\x35" + "\x34\x1f\x93\xdf\xfb\x30\xd7\x35" + "\x03\x05\x78\xc1\x20\x3b\x7a\xe3" + "\x62\xa3\x89\xdc\x11\x11\x45\xa8" + "\x82\x89\xa0\xf1\x4e\xc7\x0f\x11" + "\x69\xdd\x0c\x84\x2b\x89\x5c\xdc" + "\xf0\xde\x01\xef\xc5\x65\x79\x23" + "\x87\x67\xd6\x50\xd9\x8d\xd9\x92" + "\x54\x5b\x0e", + .len = 91, + }, { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x67\xc6\x69\x73" + "\x51\xff\x4a\xec\x29\xcd\xba\xab" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ctext = "\xdf\x2d\xc6\x21\x2a\x9d\xa1\xbb" + "\xc2\x77\x66\x0c\x5c\x46\xef\xa7" + "\x79\x1b\xb9\xdf\x55\xe2\xf9\x61" + "\x4c\x7b\xa4\x52\x24\xaf\xa2\xda" + "\xd1\x8f\x8f\xa2\x9e\x53\x4d\xc4" + "\xb8\x55\x98\x08\x7c\x08\xd4\x18" + "\x67\x8f\xef\x50\xb1\x5f\xa5\x77" + "\x4c\x25\xe7\x86\x26\x42\xca\x44", + .len = 64, + }, { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x01", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x02\xf2\xfb\xe3\x46" + "\x7c\xc2\x54\xf8\x1b\xe8\xe7\x8d" + "\x01\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x41\x6e\x79\x20\x73\x75\x62\x6d" + "\x69\x73\x73\x69\x6f\x6e\x20\x74" + "\x6f\x20\x74\x68\x65\x20\x49\x45" + "\x54\x46\x20\x69\x6e\x74\x65\x6e" + "\x64\x65\x64\x20\x62\x79\x20\x74" + "\x68\x65\x20\x43\x6f\x6e\x74\x72" + "\x69\x62\x75\x74\x6f\x72\x20\x66" + "\x6f\x72\x20\x70\x75\x62\x6c\x69" + "\x63\x61\x74\x69\x6f\x6e\x20\x61" + "\x73\x20\x61\x6c\x6c\x20\x6f\x72" + "\x20\x70\x61\x72\x74\x20\x6f\x66" + "\x20\x61\x6e\x20\x49\x45\x54\x46" + "\x20\x49\x6e\x74\x65\x72\x6e\x65" + "\x74\x2d\x44\x72\x61\x66\x74\x20" + "\x6f\x72\x20\x52\x46\x43\x20\x61" + "\x6e\x64\x20\x61\x6e\x79\x20\x73" + "\x74\x61\x74\x65\x6d\x65\x6e\x74" + "\x20\x6d\x61\x64\x65\x20\x77\x69" + "\x74\x68\x69\x6e\x20\x74\x68\x65" + "\x20\x63\x6f\x6e\x74\x65\x78\x74" + "\x20\x6f\x66\x20\x61\x6e\x20\x49" + "\x45\x54\x46\x20\x61\x63\x74\x69" + "\x76\x69\x74\x79\x20\x69\x73\x20" + "\x63\x6f\x6e\x73\x69\x64\x65\x72" + "\x65\x64\x20\x61\x6e\x20\x22\x49" + "\x45\x54\x46\x20\x43\x6f\x6e\x74" + "\x72\x69\x62\x75\x74\x69\x6f\x6e" + "\x22\x2e\x20\x53\x75\x63\x68\x20" + "\x73\x74\x61\x74\x65\x6d\x65\x6e" + "\x74\x73\x20\x69\x6e\x63\x6c\x75" + "\x64\x65\x20\x6f\x72\x61\x6c\x20" + "\x73\x74\x61\x74\x65\x6d\x65\x6e" + "\x74\x73\x20\x69\x6e\x20\x49\x45" + "\x54\x46\x20\x73\x65\x73\x73\x69" + "\x6f\x6e\x73\x2c\x20\x61\x73\x20" + "\x77\x65\x6c\x6c\x20\x61\x73\x20" + "\x77\x72\x69\x74\x74\x65\x6e\x20" + "\x61\x6e\x64\x20\x65\x6c\x65\x63" + "\x74\x72\x6f\x6e\x69\x63\x20\x63" + "\x6f\x6d\x6d\x75\x6e\x69\x63\x61" + "\x74\x69\x6f\x6e\x73\x20\x6d\x61" + "\x64\x65\x20\x61\x74\x20\x61\x6e" + "\x79\x20\x74\x69\x6d\x65\x20\x6f" + "\x72\x20\x70\x6c\x61\x63\x65\x2c" + "\x20\x77\x68\x69\x63\x68\x20\x61" + "\x72\x65\x20\x61\x64\x64\x72\x65" + "\x73\x73\x65\x64\x20\x74\x6f", + .ctext = "\xe4\xa6\xc8\x30\xc4\x23\x13\xd6" + "\x08\x4d\xc9\xb7\xa5\x64\x7c\xb9" + "\x71\xe2\xab\x3e\xa8\x30\x8a\x1c" + "\x4a\x94\x6d\x9b\xe0\xb3\x6f\xf1" + "\xdc\xe3\x1b\xb3\xa9\x6d\x0d\xd6" + "\xd0\xca\x12\xef\xe7\x5f\xd8\x61" + "\x3c\x82\xd3\x99\x86\x3c\x6f\x66" + "\x02\x06\xdc\x55\xf9\xed\xdf\x38" + "\xb4\xa6\x17\x00\x7f\xef\xbf\x4f" + "\xf8\x36\xf1\x60\x7e\x47\xaf\xdb" + "\x55\x9b\x12\xcb\x56\x44\xa7\x1f" + "\xd3\x1a\x07\x3b\x00\xec\xe6\x4c" + "\xa2\x43\x27\xdf\x86\x19\x4f\x16" + "\xed\xf9\x4a\xf3\x63\x6f\xfa\x7f" + "\x78\x11\xf6\x7d\x97\x6f\xec\x6f" + "\x85\x0f\x5c\x36\x13\x8d\x87\xe0" + "\x80\xb1\x69\x0b\x98\x89\x9c\x4e" + "\xf8\xdd\xee\x5c\x0a\x85\xce\xd4" + "\xea\x1b\x48\xbe\x08\xf8\xe2\xa8" + "\xa5\xb0\x3c\x79\xb1\x15\xb4\xb9" + "\x75\x10\x95\x35\x81\x7e\x26\xe6" + "\x78\xa4\x88\xcf\xdb\x91\x34\x18" + "\xad\xd7\x8e\x07\x7d\xab\x39\xf9" + "\xa3\x9e\xa5\x1d\xbb\xed\x61\xfd" + "\xdc\xb7\x5a\x27\xfc\xb5\xc9\x10" + "\xa8\xcc\x52\x7f\x14\x76\x90\xe7" + "\x1b\x29\x60\x74\xc0\x98\x77\xbb" + "\xe0\x54\xbb\x27\x49\x59\x1e\x62" + "\x3d\xaf\x74\x06\xa4\x42\x6f\xc6" + "\x52\x97\xc4\x1d\xc4\x9f\xe2\xe5" + "\x38\x57\x91\xd1\xa2\x28\xcc\x40" + "\xcc\x70\x59\x37\xfc\x9f\x4b\xda" + "\xa0\xeb\x97\x9a\x7d\xed\x14\x5c" + "\x9c\xb7\x93\x26\x41\xa8\x66\xdd" + "\x87\x6a\xc0\xd3\xc2\xa9\x3e\xae" + "\xe9\x72\xfe\xd1\xb3\xac\x38\xea" + "\x4d\x15\xa9\xd5\x36\x61\xe9\x96" + "\x6c\x23\xf8\x43\xe4\x92\x29\xd9" + "\x8b\x78\xf7\x0a\x52\xe0\x19\x5b" + "\x59\x69\x5b\x5d\xa1\x53\xc4\x68" + "\xe1\xbb\xac\x89\x14\xe2\xe2\x85" + "\x41\x18\xf5\xb3\xd1\xfa\x68\x19" + "\x44\x78\xdc\xcf\xe7\x88\x2d\x52" + "\x5f\x40\xb5\x7e\xf8\x88\xa2\xae" + "\x4a\xb2\x07\x35\x9d\x9b\x07\x88" + "\xb7\x00\xd0\x0c\xb6\xa0\x47\x59" + "\xda\x4e\xc9\xab\x9b\x8a\x7b", + + .len = 375, + .also_non_np = 1, + .np = 3, + .tap = { 375 - 20, 4, 16 }, + + }, { + .key = "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a" + "\xf3\x33\x88\x86\x04\xf6\xb5\xf0" + "\x47\x39\x17\xc1\x40\x2b\x80\x09" + "\x9d\xca\x5c\xbc\x20\x70\x75\xc0", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x02\x76\x5a\x2e\x63" + "\x33\x9f\xc9\x9a\x66\x32\x0d\xb7" + "\x2a\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x27\x54\x77\x61\x73\x20\x62\x72" + "\x69\x6c\x6c\x69\x67\x2c\x20\x61" + "\x6e\x64\x20\x74\x68\x65\x20\x73" + "\x6c\x69\x74\x68\x79\x20\x74\x6f" + "\x76\x65\x73\x0a\x44\x69\x64\x20" + "\x67\x79\x72\x65\x20\x61\x6e\x64" + "\x20\x67\x69\x6d\x62\x6c\x65\x20" + "\x69\x6e\x20\x74\x68\x65\x20\x77" + "\x61\x62\x65\x3a\x0a\x41\x6c\x6c" + "\x20\x6d\x69\x6d\x73\x79\x20\x77" + "\x65\x72\x65\x20\x74\x68\x65\x20" + "\x62\x6f\x72\x6f\x67\x6f\x76\x65" + "\x73\x2c\x0a\x41\x6e\x64\x20\x74" + "\x68\x65\x20\x6d\x6f\x6d\x65\x20" + "\x72\x61\x74\x68\x73\x20\x6f\x75" + "\x74\x67\x72\x61\x62\x65\x2e", + .ctext = "\xb9\x68\xbc\x6a\x24\xbc\xcc\xd8" + "\x9b\x2a\x8d\x5b\x96\xaf\x56\xe3" + "\x11\x61\xe7\xa7\x9b\xce\x4e\x7d" + "\x60\x02\x48\xac\xeb\xd5\x3a\x26" + "\x9d\x77\x3b\xb5\x32\x13\x86\x8e" + "\x20\x82\x26\x72\xae\x64\x1b\x7e" + "\x2e\x01\x68\xb4\x87\x45\xa1\x24" + "\xe4\x48\x40\xf0\xaa\xac\xee\xa9" + "\xfc\x31\xad\x9d\x89\xa3\xbb\xd2" + "\xe4\x25\x13\xad\x0f\x5e\xdf\x3c" + "\x27\xab\xb8\x62\x46\x22\x30\x48" + "\x55\x2c\x4e\x84\x78\x1d\x0d\x34" + "\x8d\x3c\x91\x0a\x7f\x5b\x19\x9f" + "\x97\x05\x4c\xa7\x62\x47\x8b\xc5" + "\x44\x2e\x20\x33\xdd\xa0\x82\xa9" + "\x25\x76\x37\xe6\x3c\x67\x5b", + .len = 127, + }, { + .key = "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a" + "\xf3\x33\x88\x86\x04\xf6\xb5\xf0" + "\x47\x39\x17\xc1\x40\x2b\x80\x09" + "\x9d\xca\x5c\xbc\x20\x70\x75\xc0", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x01\x31\x58\xa3\x5a" + "\x25\x5d\x05\x17\x58\xe9\x5e\xd4" + "\x1c\x00\x00\x00\x00\x00\x00\x00", + .ptext = "\x49\xee\xe0\xdc\x24\x90\x40\xcd" + "\xc5\x40\x8f\x47\x05\xbc\xdd\x81" + "\x47\xc6\x8d\xe6\xb1\x8f\xd7\xcb" + "\x09\x0e\x6e\x22\x48\x1f\xbf\xb8" + "\x5c\xf7\x1e\x8a\xc1\x23\xf2\xd4" + "\x19\x4b\x01\x0f\x4e\xa4\x43\xce" + "\x01\xc6\x67\xda\x03\x91\x18\x90" + "\xa5\xa4\x8e\x45\x03\xb3\x2d\xac" + "\x74\x92\xd3\x53\x47\xc8\xdd\x25" + "\x53\x6c\x02\x03\x87\x0d\x11\x0c" + "\x58\xe3\x12\x18\xfd\x2a\x5b\x40" + "\x0c\x30\xf0\xb8\x3f\x43\xce\xae" + "\x65\x3a\x7d\x7c\xf4\x54\xaa\xcc" + "\x33\x97\xc3\x77\xba\xc5\x70\xde" + "\xd7\xd5\x13\xa5\x65\xc4\x5f\x0f" + "\x46\x1a\x0d\x97\xb5\xf3\xbb\x3c" + "\x84\x0f\x2b\xc5\xaa\xea\xf2\x6c" + "\xc9\xb5\x0c\xee\x15\xf3\x7d\xbe" + "\x9f\x7b\x5a\xa6\xae\x4f\x83\xb6" + "\x79\x49\x41\xf4\x58\x18\xcb\x86" + "\x7f\x30\x0e\xf8\x7d\x44\x36\xea" + "\x75\xeb\x88\x84\x40\x3c\xad\x4f" + "\x6f\x31\x6b\xaa\x5d\xe5\xa5\xc5" + "\x21\x66\xe9\xa7\xe3\xb2\x15\x88" + "\x78\xf6\x79\xa1\x59\x47\x12\x4e" + "\x9f\x9f\x64\x1a\xa0\x22\x5b\x08" + "\xbe\x7c\x36\xc2\x2b\x66\x33\x1b" + "\xdd\x60\x71\xf7\x47\x8c\x61\xc3" + "\xda\x8a\x78\x1e\x16\xfa\x1e\x86" + "\x81\xa6\x17\x2a\xa7\xb5\xc2\xe7" + "\xa4\xc7\x42\xf1\xcf\x6a\xca\xb4" + "\x45\xcf\xf3\x93\xf0\xe7\xea\xf6" + "\xf4\xe6\x33\x43\x84\x93\xa5\x67" + "\x9b\x16\x58\x58\x80\x0f\x2b\x5c" + "\x24\x74\x75\x7f\x95\x81\xb7\x30" + "\x7a\x33\xa7\xf7\x94\x87\x32\x27" + "\x10\x5d\x14\x4c\x43\x29\xdd\x26" + "\xbd\x3e\x3c\x0e\xfe\x0e\xa5\x10" + "\xea\x6b\x64\xfd\x73\xc6\xed\xec" + "\xa8\xc9\xbf\xb3\xba\x0b\x4d\x07" + "\x70\xfc\x16\xfd\x79\x1e\xd7\xc5" + "\x49\x4e\x1c\x8b\x8d\x79\x1b\xb1" + "\xec\xca\x60\x09\x4c\x6a\xd5\x09" + "\x49\x46\x00\x88\x22\x8d\xce\xea" + "\xb1\x17\x11\xde\x42\xd2\x23\xc1" + "\x72\x11\xf5\x50\x73\x04\x40\x47" + "\xf9\x5d\xe7\xa7\x26\xb1\x7e\xb0" + "\x3f\x58\xc1\x52\xab\x12\x67\x9d" + "\x3f\x43\x4b\x68\xd4\x9c\x68\x38" + "\x07\x8a\x2d\x3e\xf3\xaf\x6a\x4b" + "\xf9\xe5\x31\x69\x22\xf9\xa6\x69" + "\xc6\x9c\x96\x9a\x12\x35\x95\x1d" + "\x95\xd5\xdd\xbe\xbf\x93\x53\x24" + "\xfd\xeb\xc2\x0a\x64\xb0\x77\x00" + "\x6f\x88\xc4\x37\x18\x69\x7c\xd7" + "\x41\x92\x55\x4c\x03\xa1\x9a\x4b" + "\x15\xe5\xdf\x7f\x37\x33\x72\xc1" + "\x8b\x10\x67\xa3\x01\x57\x94\x25" + "\x7b\x38\x71\x7e\xdd\x1e\xcc\x73" + "\x55\xd2\x8e\xeb\x07\xdd\xf1\xda" + "\x58\xb1\x47\x90\xfe\x42\x21\x72" + "\xa3\x54\x7a\xa0\x40\xec\x9f\xdd" + "\xc6\x84\x6e\xca\xae\xe3\x68\xb4" + "\x9d\xe4\x78\xff\x57\xf2\xf8\x1b" + "\x03\xa1\x31\xd9\xde\x8d\xf5\x22" + "\x9c\xdd\x20\xa4\x1e\x27\xb1\x76" + "\x4f\x44\x55\xe2\x9b\xa1\x9c\xfe" + "\x54\xf7\x27\x1b\xf4\xde\x02\xf5" + "\x1b\x55\x48\x5c\xdc\x21\x4b\x9e" + "\x4b\x6e\xed\x46\x23\xdc\x65\xb2" + "\xcf\x79\x5f\x28\xe0\x9e\x8b\xe7" + "\x4c\x9d\x8a\xff\xc1\xa6\x28\xb8" + "\x65\x69\x8a\x45\x29\xef\x74\x85" + "\xde\x79\xc7\x08\xae\x30\xb0\xf4" + "\xa3\x1d\x51\x41\xab\xce\xcb\xf6" + "\xb5\xd8\x6d\xe0\x85\xe1\x98\xb3" + "\x43\xbb\x86\x83\x0a\xa0\xf5\xb7" + "\x04\x0b\xfa\x71\x1f\xb0\xf6\xd9" + "\x13\x00\x15\xf0\xc7\xeb\x0d\x5a" + "\x9f\xd7\xb9\x6c\x65\x14\x22\x45" + "\x6e\x45\x32\x3e\x7e\x60\x1a\x12" + "\x97\x82\x14\xfb\xaa\x04\x22\xfa" + "\xa0\xe5\x7e\x8c\x78\x02\x48\x5d" + "\x78\x33\x5a\x7c\xad\xdb\x29\xce" + "\xbb\x8b\x61\xa4\xb7\x42\xe2\xac" + "\x8b\x1a\xd9\x2f\x0b\x8b\x62\x21" + "\x83\x35\x7e\xad\x73\xc2\xb5\x6c" + "\x10\x26\x38\x07\xe5\xc7\x36\x80" + "\xe2\x23\x12\x61\xf5\x48\x4b\x2b" + "\xc5\xdf\x15\xd9\x87\x01\xaa\xac" + "\x1e\x7c\xad\x73\x78\x18\x63\xe0" + "\x8b\x9f\x81\xd8\x12\x6a\x28\x10" + "\xbe\x04\x68\x8a\x09\x7c\x1b\x1c" + "\x83\x66\x80\x47\x80\xe8\xfd\x35" + "\x1c\x97\x6f\xae\x49\x10\x66\xcc" + "\xc6\xd8\xcc\x3a\x84\x91\x20\x77" + "\x72\xe4\x24\xd2\x37\x9f\xc5\xc9" + "\x25\x94\x10\x5f\x40\x00\x64\x99" + "\xdc\xae\xd7\x21\x09\x78\x50\x15" + "\xac\x5f\xc6\x2c\xa2\x0b\xa9\x39" + "\x87\x6e\x6d\xab\xde\x08\x51\x16" + "\xc7\x13\xe9\xea\xed\x06\x8e\x2c" + "\xf8\x37\x8c\xf0\xa6\x96\x8d\x43" + "\xb6\x98\x37\xb2\x43\xed\xde\xdf" + "\x89\x1a\xe7\xeb\x9d\xa1\x7b\x0b" + "\x77\xb0\xe2\x75\xc0\xf1\x98\xd9" + "\x80\x55\xc9\x34\x91\xd1\x59\xe8" + "\x4b\x0f\xc1\xa9\x4b\x7a\x84\x06" + "\x20\xa8\x5d\xfa\xd1\xde\x70\x56" + "\x2f\x9e\x91\x9c\x20\xb3\x24\xd8" + "\x84\x3d\xe1\x8c\x7e\x62\x52\xe5" + "\x44\x4b\x9f\xc2\x93\x03\xea\x2b" + "\x59\xc5\xfa\x3f\x91\x2b\xbb\x23" + "\xf5\xb2\x7b\xf5\x38\xaf\xb3\xee" + "\x63\xdc\x7b\xd1\xff\xaa\x8b\xab" + "\x82\x6b\x37\x04\xeb\x74\xbe\x79" + "\xb9\x83\x90\xef\x20\x59\x46\xff" + "\xe9\x97\x3e\x2f\xee\xb6\x64\x18" + "\x38\x4c\x7a\x4a\xf9\x61\xe8\x9a" + "\xa1\xb5\x01\xa6\x47\xd3\x11\xd4" + "\xce\xd3\x91\x49\x88\xc7\xb8\x4d" + "\xb1\xb9\x07\x6d\x16\x72\xae\x46" + "\x5e\x03\xa1\x4b\xb6\x02\x30\xa8" + "\x3d\xa9\x07\x2a\x7c\x19\xe7\x62" + "\x87\xe3\x82\x2f\x6f\xe1\x09\xd9" + "\x94\x97\xea\xdd\x58\x9e\xae\x76" + "\x7e\x35\xe5\xb4\xda\x7e\xf4\xde" + "\xf7\x32\x87\xcd\x93\xbf\x11\x56" + "\x11\xbe\x08\x74\xe1\x69\xad\xe2" + "\xd7\xf8\x86\x75\x8a\x3c\xa4\xbe" + "\x70\xa7\x1b\xfc\x0b\x44\x2a\x76" + "\x35\xea\x5d\x85\x81\xaf\x85\xeb" + "\xa0\x1c\x61\xc2\xf7\x4f\xa5\xdc" + "\x02\x7f\xf6\x95\x40\x6e\x8a\x9a" + "\xf3\x5d\x25\x6e\x14\x3a\x22\xc9" + "\x37\x1c\xeb\x46\x54\x3f\xa5\x91" + "\xc2\xb5\x8c\xfe\x53\x08\x97\x32" + "\x1b\xb2\x30\x27\xfe\x25\x5d\xdc" + "\x08\x87\xd0\xe5\x94\x1a\xd4\xf1" + "\xfe\xd6\xb4\xa3\xe6\x74\x81\x3c" + "\x1b\xb7\x31\xa7\x22\xfd\xd4\xdd" + "\x20\x4e\x7c\x51\xb0\x60\x73\xb8" + "\x9c\xac\x91\x90\x7e\x01\xb0\xe1" + "\x8a\x2f\x75\x1c\x53\x2a\x98\x2a" + "\x06\x52\x95\x52\xb2\xe9\x25\x2e" + "\x4c\xe2\x5a\x00\xb2\x13\x81\x03" + "\x77\x66\x0d\xa5\x99\xda\x4e\x8c" + "\xac\xf3\x13\x53\x27\x45\xaf\x64" + "\x46\xdc\xea\x23\xda\x97\xd1\xab" + "\x7d\x6c\x30\x96\x1f\xbc\x06\x34" + "\x18\x0b\x5e\x21\x35\x11\x8d\x4c" + "\xe0\x2d\xe9\x50\x16\x74\x81\xa8" + "\xb4\x34\xb9\x72\x42\xa6\xcc\xbc" + "\xca\x34\x83\x27\x10\x5b\x68\x45" + "\x8f\x52\x22\x0c\x55\x3d\x29\x7c" + "\xe3\xc0\x66\x05\x42\x91\x5f\x58" + "\xfe\x4a\x62\xd9\x8c\xa9\x04\x19" + "\x04\xa9\x08\x4b\x57\xfc\x67\x53" + "\x08\x7c\xbc\x66\x8a\xb0\xb6\x9f" + "\x92\xd6\x41\x7c\x5b\x2a\x00\x79" + "\x72", + .ctext = "\xe1\xb6\x8b\x5c\x80\xb8\xcc\x08" + "\x1b\x84\xb2\xd1\xad\xa4\x70\xac" + "\x67\xa9\x39\x27\xac\xb4\x5b\xb7" + "\x4c\x26\x77\x23\x1d\xce\x0a\xbe" + "\x18\x9e\x42\x8b\xbd\x7f\xd6\xf1" + "\xf1\x6b\xe2\x6d\x7f\x92\x0e\xcb" + "\xb8\x79\xba\xb4\xac\x7e\x2d\xc0" + "\x9e\x83\x81\x91\xd5\xea\xc3\x12" + "\x8d\xa4\x26\x70\xa4\xf9\x71\x0b" + "\xbd\x2e\xe1\xb3\x80\x42\x25\xb3" + "\x0b\x31\x99\xe1\x0d\xde\xa6\x90" + "\xf2\xa3\x10\xf7\xe5\xf3\x83\x1e" + "\x2c\xfb\x4d\xf0\x45\x3d\x28\x3c" + "\xb8\xf1\xcb\xbf\x67\xd8\x43\x5a" + "\x9d\x7b\x73\x29\x88\x0f\x13\x06" + "\x37\x50\x0d\x7c\xe6\x9b\x07\xdd" + "\x7e\x01\x1f\x81\x90\x10\x69\xdb" + "\xa4\xad\x8a\x5e\xac\x30\x72\xf2" + "\x36\xcd\xe3\x23\x49\x02\x93\xfa" + "\x3d\xbb\xe2\x98\x83\xeb\xe9\x8d" + "\xb3\x8f\x11\xaa\x53\xdb\xaf\x2e" + "\x95\x13\x99\x3d\x71\xbd\x32\x92" + "\xdd\xfc\x9d\x5e\x6f\x63\x2c\xee" + "\x91\x1f\x4c\x64\x3d\x87\x55\x0f" + "\xcc\x3d\x89\x61\x53\x02\x57\x8f" + "\xe4\x77\x29\x32\xaf\xa6\x2f\x0a" + "\xae\x3c\x3f\x3f\xf4\xfb\x65\x52" + "\xc5\xc1\x78\x78\x53\x28\xad\xed" + "\xd1\x67\x37\xc7\x59\x70\xcd\x0a" + "\xb8\x0f\x80\x51\x9f\xc0\x12\x5e" + "\x06\x0a\x7e\xec\x24\x5f\x73\x00" + "\xb1\x0b\x31\x47\x4f\x73\x8d\xb4" + "\xce\xf3\x55\x45\x6c\x84\x27\xba" + "\xb9\x6f\x03\x4a\xeb\x98\x88\x6e" + "\x53\xed\x25\x19\x0d\x8f\xfe\xca" + "\x60\xe5\x00\x93\x6e\x3c\xff\x19" + "\xae\x08\x3b\x8a\xa6\x84\x05\xfe" + "\x9b\x59\xa0\x8c\xc8\x05\x45\xf5" + "\x05\x37\xdc\x45\x6f\x8b\x95\x8c" + "\x4e\x11\x45\x7a\xce\x21\xa5\xf7" + "\x71\x67\xb9\xce\xd7\xf9\xe9\x5e" + "\x60\xf5\x53\x7a\xa8\x85\x14\x03" + "\xa0\x92\xec\xf3\x51\x80\x84\xc4" + "\xdc\x11\x9e\x57\xce\x4b\x45\xcf" + "\x90\x95\x85\x0b\x96\xe9\xee\x35" + "\x10\xb8\x9b\xf2\x59\x4a\xc6\x7e" + "\x85\xe5\x6f\x38\x51\x93\x40\x0c" + "\x99\xd7\x7f\x32\xa8\x06\x27\xd1" + "\x2b\xd5\xb5\x3a\x1a\xe1\x5e\xda" + "\xcd\x5a\x50\x30\x3c\xc7\xe7\x65" + "\xa6\x07\x0b\x98\x91\xc6\x20\x27" + "\x2a\x03\x63\x1b\x1e\x3d\xaf\xc8" + "\x71\x48\x46\x6a\x64\x28\xf9\x3d" + "\xd1\x1d\xab\xc8\x40\x76\xc2\x39" + "\x4e\x00\x75\xd2\x0e\x82\x58\x8c" + "\xd3\x73\x5a\xea\x46\x89\xbe\xfd" + "\x4e\x2c\x0d\x94\xaa\x9b\x68\xac" + "\x86\x87\x30\x7e\xa9\x16\xcd\x59" + "\xd2\xa6\xbe\x0a\xd8\xf5\xfd\x2d" + "\x49\x69\xd2\x1a\x90\xd2\x1b\xed" + "\xff\x71\x04\x87\x87\x21\xc4\xb8" + "\x1f\x5b\x51\x33\xd0\xd6\x59\x9a" + "\x03\x0e\xd3\x8b\xfb\x57\x73\xfd" + "\x5a\x52\x63\x82\xc8\x85\x2f\xcb" + "\x74\x6d\x4e\xd9\x68\x37\x85\x6a" + "\xd4\xfb\x94\xed\x8d\xd1\x1a\xaf" + "\x76\xa7\xb7\x88\xd0\x2b\x4e\xda" + "\xec\x99\x94\x27\x6f\x87\x8c\xdf" + "\x4b\x5e\xa6\x66\xdd\xcb\x33\x7b" + "\x64\x94\x31\xa8\x37\xa6\x1d\xdb" + "\x0d\x5c\x93\xa4\x40\xf9\x30\x53" + "\x4b\x74\x8d\xdd\xf6\xde\x3c\xac" + "\x5c\x80\x01\x3a\xef\xb1\x9a\x02" + "\x0c\x22\x8e\xe7\x44\x09\x74\x4c" + "\xf2\x9a\x27\x69\x7f\x12\x32\x36" + "\xde\x92\xdf\xde\x8f\x5b\x31\xab" + "\x4a\x01\x26\xe0\xb1\xda\xe8\x37" + "\x21\x64\xe8\xff\x69\xfc\x9e\x41" + "\xd2\x96\x2d\x18\x64\x98\x33\x78" + "\x24\x61\x73\x9b\x47\x29\xf1\xa7" + "\xcb\x27\x0f\xf0\x85\x6d\x8c\x9d" + "\x2c\x95\x9e\xe5\xb2\x8e\x30\x29" + "\x78\x8a\x9d\x65\xb4\x8e\xde\x7b" + "\xd9\x00\x50\xf5\x7f\x81\xc3\x1b" + "\x25\x85\xeb\xc2\x8c\x33\x22\x1e" + "\x68\x38\x22\x30\xd8\x2e\x00\x98" + "\x85\x16\x06\x56\xb4\x81\x74\x20" + "\x95\xdb\x1c\x05\x19\xe8\x23\x4d" + "\x65\x5d\xcc\xd8\x7f\xc4\x2d\x0f" + "\x57\x26\x71\x07\xad\xaa\x71\x9f" + "\x19\x76\x2f\x25\x51\x88\xe4\xc0" + "\x82\x6e\x08\x05\x37\x04\xee\x25" + "\x23\x90\xe9\x4e\xce\x9b\x16\xc1" + "\x31\xe7\x6e\x2c\x1b\xe1\x85\x9a" + "\x0c\x8c\xbb\x12\x1e\x68\x7b\x93" + "\xa9\x3c\x39\x56\x23\x3e\x6e\xc7" + "\x77\x84\xd3\xe0\x86\x59\xaa\xb9" + "\xd5\x53\x58\xc9\x0a\x83\x5f\x85" + "\xd8\x47\x14\x67\x8a\x3c\x17\xe0" + "\xab\x02\x51\xea\xf1\xf0\x4f\x30" + "\x7d\xe0\x92\xc2\x5f\xfb\x19\x5a" + "\x3f\xbd\xf4\x39\xa4\x31\x0c\x39" + "\xd1\xae\x4e\xf7\x65\x7f\x1f\xce" + "\xc2\x39\xd1\x84\xd4\xe5\x02\xe0" + "\x58\xaa\xf1\x5e\x81\xaf\x7f\x72" + "\x0f\x08\x99\x43\xb9\xd8\xac\x41" + "\x35\x55\xf2\xb2\xd4\x98\xb8\x3b" + "\x2b\x3c\x3e\x16\x06\x31\xfc\x79" + "\x47\x38\x63\x51\xc5\xd0\x26\xd7" + "\x43\xb4\x2b\xd9\xc5\x05\xf2\x9d" + "\x18\xc9\x26\x82\x56\xd2\x11\x05" + "\xb6\x89\xb4\x43\x9c\xb5\x9d\x11" + "\x6c\x83\x37\x71\x27\x1c\xae\xbf" + "\xcd\x57\xd2\xee\x0d\x5a\x15\x26" + "\x67\x88\x80\x80\x1b\xdc\xc1\x62" + "\xdd\x4c\xff\x92\x5c\x6c\xe1\xa0" + "\xe3\x79\xa9\x65\x8c\x8c\x14\x42" + "\xe5\x11\xd2\x1a\xad\xa9\x56\x6f" + "\x98\xfc\x8a\x7b\x56\x1f\xc6\xc1" + "\x52\x12\x92\x9b\x41\x0f\x4b\xae" + "\x1b\x4a\xbc\xfe\x23\xb6\x94\x70" + "\x04\x30\x9e\x69\x47\xbe\xb8\x8f" + "\xca\x45\xd7\x8a\xf4\x78\x3e\xaa" + "\x71\x17\xd8\x1e\xb8\x11\x8f\xbc" + "\xc8\x1a\x65\x7b\x41\x89\x72\xc7" + "\x5f\xbe\xc5\x2a\xdb\x5c\x54\xf9" + "\x25\xa3\x7a\x80\x56\x9c\x8c\xab" + "\x26\x19\x10\x36\xa6\xf3\x14\x79" + "\x40\x98\x70\x68\xb7\x35\xd9\xb9" + "\x27\xd4\xe7\x74\x5b\x3d\x97\xb4" + "\xd9\xaa\xd9\xf2\xb5\x14\x84\x1f" + "\xa9\xde\x12\x44\x5b\x00\xc0\xbc" + "\xc8\x11\x25\x1b\x67\x7a\x15\x72" + "\xa6\x31\x6f\xf4\x68\x7a\x86\x9d" + "\x43\x1c\x5f\x16\xd3\xad\x2e\x52" + "\xf3\xb4\xc3\xfa\x27\x2e\x68\x6c" + "\x06\xe7\x4c\x4f\xa2\xe0\xe4\x21" + "\x5d\x9e\x33\x58\x8d\xbf\xd5\x70" + "\xf8\x80\xa5\xdd\xe7\x18\x79\xfa" + "\x7b\xfd\x09\x69\x2c\x37\x32\xa8" + "\x65\xfa\x8d\x8b\x5c\xcc\xe8\xf3" + "\x37\xf6\xa6\xc6\x5c\xa2\x66\x79" + "\xfa\x8a\xa7\xd1\x0b\x2e\x1b\x5e" + "\x95\x35\x00\x76\xae\x42\xf7\x50" + "\x51\x78\xfb\xb4\x28\x24\xde\x1a" + "\x70\x8b\xed\xca\x3c\x5e\xe4\xbd" + "\x28\xb5\xf3\x76\x4f\x67\x5d\x81" + "\xb2\x60\x87\xd9\x7b\x19\x1a\xa7" + "\x79\xa2\xfa\x3f\x9e\xa9\xd7\x25" + "\x61\xe1\x74\x31\xa2\x77\xa0\x1b" + "\xf6\xf7\xcb\xc5\xaa\x9e\xce\xf9" + "\x9b\x96\xef\x51\xc3\x1a\x44\x96" + "\xae\x17\x50\xab\x29\x08\xda\xcc" + "\x1a\xb3\x12\xd0\x24\xe4\xe2\xe0" + "\xc6\xe3\xcc\x82\xd0\xba\x47\x4c" + "\x3f\x49\xd7\xe8\xb6\x61\xaa\x65" + "\x25\x18\x40\x2d\x62\x25\x02\x71" + "\x61\xa2\xc1\xb2\x13\xd2\x71\x3f" + "\x43\x1a\xc9\x09\x92\xff\xd5\x57" + "\xf0\xfc\x5e\x1c\xf1\xf5\xf9\xf3" + "\x5b", + .len = 1281, + .also_non_np = 1, + .np = 3, + .tap = { 1200, 1, 80 }, + }, +}; + /* * CTS (Cipher Text Stealing) mode tests */ diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h index a504350f54df..3cd9bfc71ed9 100644 --- a/include/crypto/chacha.h +++ b/include/crypto/chacha.h @@ -5,6 +5,13 @@ * XChaCha extends ChaCha's nonce to 192 bits, while provably retaining ChaCha's * security. Here they share the same key size, tfm context, and setkey * function; only their IV size and encrypt/decrypt function differ. + * + * The ChaCha paper specifies 20, 12, and 8-round variants. In general, it is + * strongly recommended to use the 20-round variant ChaCha20. However, the + * reduced-round variants can be needed in some very performance-sensitive + * scenarios where a reduced security margin may be tolerated. + * + * The generic ChaCha code currently allows only the 20 and 12-round variants. */ #ifndef _CRYPTO_CHACHA_H @@ -40,6 +47,8 @@ void crypto_chacha_init(u32 *state, struct chacha_ctx *ctx, u8 *iv); int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keysize); +int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keysize); int crypto_chacha_crypt(struct skcipher_request *req); int crypto_xchacha_crypt(struct skcipher_request *req); diff --git a/lib/chacha.c b/lib/chacha.c index b0fd1e0882e3..d29965a5cf5e 100644 --- a/lib/chacha.c +++ b/lib/chacha.c @@ -21,7 +21,7 @@ static void chacha_permute(u32 *x, int nrounds) int i; /* whitelist the allowed round counts */ - BUG_ON(nrounds != 20); + BUG_ON(nrounds != 20 && nrounds != 12); for (i = 0; i < nrounds; i += 2) { x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); @@ -70,7 +70,7 @@ static void chacha_permute(u32 *x, int nrounds) * chacha_block - generate one keystream block and increment block counter * @state: input state matrix (16 32-bit words) * @stream: output keystream block (64 bytes) - * @nrounds: number of rounds (currently must be 20) + * @nrounds: number of rounds (20 or 12; 20 is recommended) * * This is the ChaCha core, a function from 64-byte strings to 64-byte strings. * The caller has already converted the endianness of the input. This function @@ -96,7 +96,7 @@ EXPORT_SYMBOL(chacha_block); * hchacha_block - abbreviated ChaCha core, for XChaCha * @in: input state matrix (16 32-bit words) * @out: output (8 32-bit words) - * @nrounds: number of rounds (currently must be 20) + * @nrounds: number of rounds (20 or 12; 20 is recommended) * * HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step * towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha From patchwork Mon Aug 6 22:32:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558037 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 181AE1390 for ; Mon, 6 Aug 2018 22:35:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0714B29570 for ; Mon, 6 Aug 2018 22:35:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF74F297C0; Mon, 6 Aug 2018 22:35:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5553729570 for ; Mon, 6 Aug 2018 22:35:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732602AbeHGAqO (ORCPT ); Mon, 6 Aug 2018 20:46:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:45134 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732295AbeHGAqN (ORCPT ); Mon, 6 Aug 2018 20:46:13 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A798E21A6E; Mon, 6 Aug 2018 22:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594899; bh=Fl+z/fw7th53OB8OCy3uOHbhqBvoGVHD5NhmQGOkfJY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LtFW08u47cAJ8ITgsrkQ8vXB0He8SJXGkv38O2BXqNr4D1tLPFeEGybNPvY9J2Oct Qx+PQoH0KRkvWgnjWjmUvzk7A/kX05vOVRA36AyM7cW4zDCAshghi8PcnRnpJy4efS 83fr0ad8BAiyHE35htmtqs5y7YNam6p17Z3pvEBI= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 5/9] crypto: arm/chacha20 - add XChaCha20 support Date: Mon, 6 Aug 2018 15:32:56 -0700 Message-Id: <20180806223300.113891-6-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Add an XChaCha20 implementation that is hooked up to the ARM NEON implementation of ChaCha20. This is needed for use in the HPolyC construction for disk/file encryption; see the generic code patch, "crypto: chacha20-generic - add XChaCha20 support", for more details. We also update the NEON code to support HChaCha20 on one block, so we can use that in XChaCha20 rather than calling the generic HChaCha20. This required factoring the permutation out into its own macro. Signed-off-by: Eric Biggers --- arch/arm/crypto/Kconfig | 2 +- arch/arm/crypto/chacha20-neon-core.S | 68 ++++++++++------ arch/arm/crypto/chacha20-neon-glue.c | 111 ++++++++++++++++++++------- 3 files changed, 130 insertions(+), 51 deletions(-) diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 925d1364727a..896dcf142719 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -116,7 +116,7 @@ config CRYPTO_CRC32_ARM_CE select CRYPTO_HASH config CRYPTO_CHACHA20_NEON - tristate "NEON accelerated ChaCha20 symmetric cipher" + tristate "NEON accelerated ChaCha20 stream cipher algorithms" depends on KERNEL_MODE_NEON select CRYPTO_BLKCIPHER select CRYPTO_CHACHA20 diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S index 451a849ad518..8e63208cc025 100644 --- a/arch/arm/crypto/chacha20-neon-core.S +++ b/arch/arm/crypto/chacha20-neon-core.S @@ -24,31 +24,20 @@ .fpu neon .align 5 -ENTRY(chacha20_block_xor_neon) - // r0: Input state matrix, s - // r1: 1 data block output, o - // r2: 1 data block input, i - - // - // This function encrypts one ChaCha20 block by loading the state matrix - // in four NEON registers. It performs matrix operation on four words in - // parallel, but requireds shuffling to rearrange the words after each - // round. - // - - // x0..3 = s0..3 - add ip, r0, #0x20 - vld1.32 {q0-q1}, [r0] - vld1.32 {q2-q3}, [ip] - - vmov q8, q0 - vmov q9, q1 - vmov q10, q2 - vmov q11, q3 +/* + * _chacha20_permute - permute one block + * + * Permute one 64-byte block where the state matrix is stored in the four NEON + * registers q0-q3. It performs matrix operation on four words in parallel, but + * requires shuffling to rearrange the words after each round. + * + * Clobbers: r3, q4 + */ +.macro _chacha_permute mov r3, #10 -.Ldoubleround: +.Ldoubleround_\@: // x0 += x1, x3 = rotl32(x3 ^ x0, 16) vadd.i32 q0, q0, q1 veor q3, q3, q0 @@ -110,7 +99,25 @@ ENTRY(chacha20_block_xor_neon) vext.8 q3, q3, q3, #4 subs r3, r3, #1 - bne .Ldoubleround + bne .Ldoubleround_\@ +.endm + +ENTRY(chacha20_block_xor_neon) + // r0: Input state matrix, s + // r1: 1 data block output, o + // r2: 1 data block input, i + + // x0..3 = s0..3 + add ip, r0, #0x20 + vld1.32 {q0-q1}, [r0] + vld1.32 {q2-q3}, [ip] + + vmov q8, q0 + vmov q9, q1 + vmov q10, q2 + vmov q11, q3 + + _chacha20_permute add ip, r2, #0x20 vld1.8 {q4-q5}, [r2] @@ -139,6 +146,21 @@ ENTRY(chacha20_block_xor_neon) bx lr ENDPROC(chacha20_block_xor_neon) +ENTRY(hchacha20_block_neon) + // r0: Input state matrix, s + // r1: output (8 32-bit words) + + vld1.32 {q0-q1}, [r0]! + vld1.32 {q2-q3}, [r0] + + _chacha20_permute + + vst1.8 {q0}, [r1]! + vst1.8 {q3}, [r1] + + bx lr +ENDPROC(hchacha20_block_neon) + .align 5 ENTRY(chacha20_4block_xor_neon) push {r4-r6, lr} diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c index ed8dec0f1768..becc7990b1d3 100644 --- a/arch/arm/crypto/chacha20-neon-glue.c +++ b/arch/arm/crypto/chacha20-neon-glue.c @@ -1,5 +1,5 @@ /* - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions + * ChaCha20 (RFC7539) and XChaCha20 stream ciphers, NEON accelerated * * Copyright (C) 2016 Linaro, Ltd. * @@ -30,6 +30,7 @@ asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src); asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src); +asmlinkage void hchacha20_block_neon(const u32 *state, u32 *out); static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) @@ -57,22 +58,17 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, } } -static int chacha20_neon(struct skcipher_request *req) +static int chacha20_neon_stream_xor(struct skcipher_request *req, + struct chacha_ctx *ctx, u8 *iv) { - struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); - struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; u32 state[16]; int err; - if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) - return crypto_chacha_crypt(req); - err = skcipher_walk_virt(&walk, req, true); - crypto_chacha_init(state, ctx, walk.iv); + crypto_chacha_init(state, ctx, iv); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -83,27 +79,85 @@ static int chacha20_neon(struct skcipher_request *req) nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } + + return err; +} + +static int chacha20_neon(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); + int err; + + if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) + return crypto_chacha_crypt(req); + + kernel_neon_begin(); + err = chacha20_neon_stream_xor(req, ctx, req->iv); + kernel_neon_end(); + return err; +} + +static int xchacha20_neon(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); + struct chacha_ctx subctx; + u32 state[16]; + u8 real_iv[16]; + int err; + + if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) + return crypto_xchacha_crypt(req); + + crypto_chacha_init(state, ctx, req->iv); + + kernel_neon_begin(); + + hchacha20_block_neon(state, subctx.key); + memcpy(&real_iv[0], req->iv + 24, 8); + memcpy(&real_iv[8], req->iv + 16, 8); + err = chacha20_neon_stream_xor(req, &subctx, real_iv); + kernel_neon_end(); return err; } -static struct skcipher_alg alg = { - .base.cra_name = "chacha20", - .base.cra_driver_name = "chacha20-neon", - .base.cra_priority = 300, - .base.cra_blocksize = 1, - .base.cra_ctxsize = sizeof(struct chacha_ctx), - .base.cra_module = THIS_MODULE, - - .min_keysize = CHACHA_KEY_SIZE, - .max_keysize = CHACHA_KEY_SIZE, - .ivsize = CHACHA_IV_SIZE, - .chunksize = CHACHA_BLOCK_SIZE, - .walksize = 4 * CHACHA_BLOCK_SIZE, - .setkey = crypto_chacha20_setkey, - .encrypt = chacha_neon, - .decrypt = chacha_neon, +static struct skcipher_alg algs[] = { + { + .base.cra_name = "chacha20", + .base.cra_driver_name = "chacha20-neon", + .base.cra_priority = 300, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = 4 * CHACHA_BLOCK_SIZE, + .setkey = crypto_chacha20_setkey, + .encrypt = chacha20_neon, + .decrypt = chacha20_neon, + }, { + .base.cra_name = "xchacha20", + .base.cra_driver_name = "xchacha20-neon", + .base.cra_priority = 300, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = XCHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = 4 * CHACHA_BLOCK_SIZE, + .setkey = crypto_chacha20_setkey, + .encrypt = xchacha20_neon, + .decrypt = xchacha20_neon, + } }; static int __init chacha20_simd_mod_init(void) @@ -111,12 +165,12 @@ static int __init chacha20_simd_mod_init(void) if (!(elf_hwcap & HWCAP_NEON)) return -ENODEV; - return crypto_register_skcipher(&alg); + return crypto_register_skciphers(algs, ARRAY_SIZE(algs)); } static void __exit chacha20_simd_mod_fini(void) { - crypto_unregister_skcipher(&alg); + crypto_unregister_skciphers(algs, ARRAY_SIZE(algs)); } module_init(chacha20_simd_mod_init); @@ -125,3 +179,6 @@ module_exit(chacha20_simd_mod_fini); MODULE_AUTHOR("Ard Biesheuvel "); MODULE_LICENSE("GPL v2"); MODULE_ALIAS_CRYPTO("chacha20"); +MODULE_ALIAS_CRYPTO("chacha20-neon"); +MODULE_ALIAS_CRYPTO("xchacha20"); +MODULE_ALIAS_CRYPTO("xchacha20-neon"); From patchwork Mon Aug 6 22:32:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558041 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 237ED13AC for ; Mon, 6 Aug 2018 22:35:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10D7629570 for ; Mon, 6 Aug 2018 22:35:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 04EBF297CE; Mon, 6 Aug 2018 22:35:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25D7829570 for ; Mon, 6 Aug 2018 22:35:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732515AbeHGAqh (ORCPT ); Mon, 6 Aug 2018 20:46:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:45138 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730888AbeHGAqN (ORCPT ); Mon, 6 Aug 2018 20:46:13 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D45B721A6F; Mon, 6 Aug 2018 22:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594899; bh=2IY6N9Kr67jtDVWF8cwQRVhHzFPh8kFDNO+xF04FTaU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sCdpj5yC8WRVWS1nctXXAU+IwAZLtTWfCUg5EkjcNEboxE32mPItKIivogILPoC2P AlHMTDava1TdmPOXqRVlN3K4StpPUowUc1jS8krxhnFsuHnSOUu6KEyiAFOIsU3ipu shMweNYnOoHAo2zxbC6meSy9+0TwJSqVw93oFNLU= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 6/9] crypto: arm/chacha20 - refactor to allow varying number of rounds Date: Mon, 6 Aug 2018 15:32:57 -0700 Message-Id: <20180806223300.113891-7-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers In preparation for adding XChaCha12 support, rename/refactor the NEON implementation of ChaCha20 to support different numbers of rounds. Signed-off-by: Eric Biggers --- arch/arm/crypto/Makefile | 4 +- ...hacha20-neon-core.S => chacha-neon-core.S} | 51 +++++++++-------- ...hacha20-neon-glue.c => chacha-neon-glue.c} | 56 ++++++++++--------- 3 files changed, 59 insertions(+), 52 deletions(-) rename arch/arm/crypto/{chacha20-neon-core.S => chacha-neon-core.S} (93%) rename arch/arm/crypto/{chacha20-neon-glue.c => chacha-neon-glue.c} (73%) diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 8de542c48ade..6f58a24faa4a 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -9,7 +9,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o -obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o +obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o @@ -53,7 +53,7 @@ aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o -chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o +chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o speck-neon-y := speck-neon-core.o speck-neon-glue.o ifdef REGENERATE_ARM_CRYPTO diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha-neon-core.S similarity index 93% rename from arch/arm/crypto/chacha20-neon-core.S rename to arch/arm/crypto/chacha-neon-core.S index 8e63208cc025..3b38d3cac522 100644 --- a/arch/arm/crypto/chacha20-neon-core.S +++ b/arch/arm/crypto/chacha-neon-core.S @@ -1,5 +1,5 @@ /* - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions + * ChaCha/XChaCha NEON helper functions * * Copyright (C) 2016 Linaro, Ltd. * @@ -25,18 +25,18 @@ .align 5 /* - * _chacha20_permute - permute one block + * _chacha_permute - permute one block * * Permute one 64-byte block where the state matrix is stored in the four NEON * registers q0-q3. It performs matrix operation on four words in parallel, but * requires shuffling to rearrange the words after each round. * + * The round count is given in r3. + * * Clobbers: r3, q4 */ .macro _chacha_permute - mov r3, #10 - .Ldoubleround_\@: // x0 += x1, x3 = rotl32(x3 ^ x0, 16) vadd.i32 q0, q0, q1 @@ -98,14 +98,15 @@ // x3 = shuffle32(x3, MASK(0, 3, 2, 1)) vext.8 q3, q3, q3, #4 - subs r3, r3, #1 + subs r3, r3, #2 bne .Ldoubleround_\@ .endm -ENTRY(chacha20_block_xor_neon) +ENTRY(chacha_block_xor_neon) // r0: Input state matrix, s // r1: 1 data block output, o // r2: 1 data block input, i + // r3: nrounds // x0..3 = s0..3 add ip, r0, #0x20 @@ -117,7 +118,7 @@ ENTRY(chacha20_block_xor_neon) vmov q10, q2 vmov q11, q3 - _chacha20_permute + _chacha_permute add ip, r2, #0x20 vld1.8 {q4-q5}, [r2] @@ -144,37 +145,41 @@ ENTRY(chacha20_block_xor_neon) vst1.8 {q2-q3}, [ip] bx lr -ENDPROC(chacha20_block_xor_neon) +ENDPROC(chacha_block_xor_neon) -ENTRY(hchacha20_block_neon) +ENTRY(hchacha_block_neon) // r0: Input state matrix, s // r1: output (8 32-bit words) + // r2: nrounds vld1.32 {q0-q1}, [r0]! vld1.32 {q2-q3}, [r0] - _chacha20_permute + mov r3, r2 + _chacha_permute vst1.8 {q0}, [r1]! vst1.8 {q3}, [r1] bx lr -ENDPROC(hchacha20_block_neon) +ENDPROC(hchacha_block_neon) .align 5 -ENTRY(chacha20_4block_xor_neon) +ENTRY(chacha_4block_xor_neon) push {r4-r6, lr} mov ip, sp // preserve the stack pointer - sub r3, sp, #0x20 // allocate a 32 byte buffer - bic r3, r3, #0x1f // aligned to 32 bytes - mov sp, r3 + sub r4, sp, #0x20 // allocate a 32 byte buffer + bic r4, r4, #0x1f // aligned to 32 bytes + mov sp, r4 + // r0: Input state matrix, s // r1: 4 data blocks output, o // r2: 4 data blocks input, i + // r3: nrounds // - // This function encrypts four consecutive ChaCha20 blocks by loading + // This function encrypts four consecutive ChaCha blocks by loading // the state matrix in NEON registers four times. The algorithm performs // each operation on the corresponding word of each state matrix, hence // requires no word shuffling. For final XORing step we transpose the @@ -183,14 +188,14 @@ ENTRY(chacha20_4block_xor_neon) // // x0..15[0-3] = s0..3[0..3] - add r3, r0, #0x20 + add r4, r0, #0x20 vld1.32 {q0-q1}, [r0] - vld1.32 {q2-q3}, [r3] + vld1.32 {q2-q3}, [r4] - adr r3, CTRINC + adr r4, CTRINC vdup.32 q15, d7[1] vdup.32 q14, d7[0] - vld1.32 {q11}, [r3, :128] + vld1.32 {q11}, [r4, :128] vdup.32 q13, d6[1] vdup.32 q12, d6[0] vadd.i32 q12, q12, q11 // x12 += counter values 0-3 @@ -207,8 +212,6 @@ ENTRY(chacha20_4block_xor_neon) vdup.32 q1, d0[1] vdup.32 q0, d0[0] - mov r3, #10 - .Ldoubleround4: // x0 += x4, x12 = rotl32(x12 ^ x0, 16) // x1 += x5, x13 = rotl32(x13 ^ x1, 16) @@ -400,7 +403,7 @@ ENTRY(chacha20_4block_xor_neon) vsri.u32 q5, q8, #25 vsri.u32 q6, q9, #25 - subs r3, r3, #1 + subs r3, r3, #2 beq 0f vld1.32 {q8-q9}, [sp, :256] @@ -537,7 +540,7 @@ ENTRY(chacha20_4block_xor_neon) mov sp, ip pop {r4-r6, pc} -ENDPROC(chacha20_4block_xor_neon) +ENDPROC(chacha_4block_xor_neon) .align 4 CTRINC: .word 0, 1, 2, 3 diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha-neon-glue.c similarity index 73% rename from arch/arm/crypto/chacha20-neon-glue.c rename to arch/arm/crypto/chacha-neon-glue.c index becc7990b1d3..b236af4889c6 100644 --- a/arch/arm/crypto/chacha20-neon-glue.c +++ b/arch/arm/crypto/chacha-neon-glue.c @@ -28,24 +28,26 @@ #include #include -asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src); -asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src); -asmlinkage void hchacha20_block_neon(const u32 *state, u32 *out); - -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, - unsigned int bytes) +asmlinkage void chacha_block_xor_neon(const u32 *state, u8 *dst, const u8 *src, + int nrounds); +asmlinkage void chacha_4block_xor_neon(const u32 *state, u8 *dst, const u8 *src, + int nrounds); +asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds); + +static void chacha_doneon(u32 *state, u8 *dst, const u8 *src, + unsigned int bytes, int nrounds) { u8 buf[CHACHA_BLOCK_SIZE]; while (bytes >= CHACHA_BLOCK_SIZE * 4) { - chacha20_4block_xor_neon(state, dst, src); + chacha_4block_xor_neon(state, dst, src, nrounds); bytes -= CHACHA_BLOCK_SIZE * 4; src += CHACHA_BLOCK_SIZE * 4; dst += CHACHA_BLOCK_SIZE * 4; state[12] += 4; } while (bytes >= CHACHA_BLOCK_SIZE) { - chacha20_block_xor_neon(state, dst, src); + chacha_block_xor_neon(state, dst, src, nrounds); bytes -= CHACHA_BLOCK_SIZE; src += CHACHA_BLOCK_SIZE; dst += CHACHA_BLOCK_SIZE; @@ -53,13 +55,13 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, } if (bytes) { memcpy(buf, src, bytes); - chacha20_block_xor_neon(state, buf, buf); + chacha_block_xor_neon(state, buf, buf, nrounds); memcpy(dst, buf, bytes); } } -static int chacha20_neon_stream_xor(struct skcipher_request *req, - struct chacha_ctx *ctx, u8 *iv) +static int chacha_neon_stream_xor(struct skcipher_request *req, + struct chacha_ctx *ctx, u8 *iv) { struct skcipher_walk walk; u32 state[16]; @@ -75,15 +77,15 @@ static int chacha20_neon_stream_xor(struct skcipher_request *req, if (nbytes < walk.total) nbytes = round_down(nbytes, walk.stride); - chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr, - nbytes); + chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr, + nbytes, ctx->nrounds); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } -static int chacha20_neon(struct skcipher_request *req) +static int chacha_neon(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); @@ -93,12 +95,12 @@ static int chacha20_neon(struct skcipher_request *req) return crypto_chacha_crypt(req); kernel_neon_begin(); - err = chacha20_neon_stream_xor(req, ctx, req->iv); + err = chacha_neon_stream_xor(req, ctx, req->iv); kernel_neon_end(); return err; } -static int xchacha20_neon(struct skcipher_request *req) +static int xchacha_neon(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); @@ -114,10 +116,11 @@ static int xchacha20_neon(struct skcipher_request *req) kernel_neon_begin(); - hchacha20_block_neon(state, subctx.key); + hchacha_block_neon(state, subctx.key, ctx->nrounds); + subctx.nrounds = ctx->nrounds; memcpy(&real_iv[0], req->iv + 24, 8); memcpy(&real_iv[8], req->iv + 16, 8); - err = chacha20_neon_stream_xor(req, &subctx, real_iv); + err = chacha_neon_stream_xor(req, &subctx, real_iv); kernel_neon_end(); @@ -139,8 +142,8 @@ static struct skcipher_alg algs[] = { .chunksize = CHACHA_BLOCK_SIZE, .walksize = 4 * CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, - .encrypt = chacha20_neon, - .decrypt = chacha20_neon, + .encrypt = chacha_neon, + .decrypt = chacha_neon, }, { .base.cra_name = "xchacha20", .base.cra_driver_name = "xchacha20-neon", @@ -155,12 +158,12 @@ static struct skcipher_alg algs[] = { .chunksize = CHACHA_BLOCK_SIZE, .walksize = 4 * CHACHA_BLOCK_SIZE, .setkey = crypto_chacha20_setkey, - .encrypt = xchacha20_neon, - .decrypt = xchacha20_neon, + .encrypt = xchacha_neon, + .decrypt = xchacha_neon, } }; -static int __init chacha20_simd_mod_init(void) +static int __init chacha_simd_mod_init(void) { if (!(elf_hwcap & HWCAP_NEON)) return -ENODEV; @@ -168,14 +171,15 @@ static int __init chacha20_simd_mod_init(void) return crypto_register_skciphers(algs, ARRAY_SIZE(algs)); } -static void __exit chacha20_simd_mod_fini(void) +static void __exit chacha_simd_mod_fini(void) { crypto_unregister_skciphers(algs, ARRAY_SIZE(algs)); } -module_init(chacha20_simd_mod_init); -module_exit(chacha20_simd_mod_fini); +module_init(chacha_simd_mod_init); +module_exit(chacha_simd_mod_fini); +MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)"); MODULE_AUTHOR("Ard Biesheuvel "); MODULE_LICENSE("GPL v2"); MODULE_ALIAS_CRYPTO("chacha20"); From patchwork Mon Aug 6 22:32:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558055 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CD5513AC for ; Mon, 6 Aug 2018 22:35:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B2BE29570 for ; Mon, 6 Aug 2018 22:35:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F81F297C0; Mon, 6 Aug 2018 22:35:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CAA529570 for ; Mon, 6 Aug 2018 22:35:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733300AbeHGAqs (ORCPT ); Mon, 6 Aug 2018 20:46:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:45146 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732382AbeHGAqN (ORCPT ); Mon, 6 Aug 2018 20:46:13 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0C84321A70; Mon, 6 Aug 2018 22:35:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594900; bh=69ARqbIkf3tYzc5fBb5j9RP57lLwH6Z2F2ov684TlD4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qYduDkWnyEZ+S6VEntCjnH80y/06iyBMxl6Ic8kdPFriGUyEGHdGpBmcMAhiXSApm YQv68HgHjfZJhxFnN9EQ4aXYoV1+XDNKfpm6qJhbukN5WPlYhtEJQc54SmPc+6AkeO VjdH9SGATxuiGybI9muumZomviP//yDpkJR2okzE= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 7/9] crypto: arm/chacha - add XChaCha12 support Date: Mon, 6 Aug 2018 15:32:58 -0700 Message-Id: <20180806223300.113891-8-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Now that the 32-bit ARM NEON implementation of ChaCha20 and XChaCha20 has been refactored to support varying the number of rounds, add support for XChaCha12. This is identical to XChaCha20 except for the number of rounds, which is reduced from 20 to 12. As I explained in more detail in the patch which added XChaCha12 to the generic code, "crypto: chacha - add XChaCha12 support", we'd prefer to use XChaCha20, but unfortunately it is not fast enough for our use case. Thus, we must settle for a reduced-round variant. See that patch for a more detailed explanation. Signed-off-by: Eric Biggers --- arch/arm/crypto/Kconfig | 2 +- arch/arm/crypto/chacha-neon-glue.c | 21 ++++++++++++++++++++- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 896dcf142719..75c613413e31 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -116,7 +116,7 @@ config CRYPTO_CRC32_ARM_CE select CRYPTO_HASH config CRYPTO_CHACHA20_NEON - tristate "NEON accelerated ChaCha20 stream cipher algorithms" + tristate "NEON accelerated ChaCha stream cipher algorithms" depends on KERNEL_MODE_NEON select CRYPTO_BLKCIPHER select CRYPTO_CHACHA20 diff --git a/arch/arm/crypto/chacha-neon-glue.c b/arch/arm/crypto/chacha-neon-glue.c index b236af4889c6..0b1b23822770 100644 --- a/arch/arm/crypto/chacha-neon-glue.c +++ b/arch/arm/crypto/chacha-neon-glue.c @@ -1,5 +1,6 @@ /* - * ChaCha20 (RFC7539) and XChaCha20 stream ciphers, NEON accelerated + * ARM NEON accelerated ChaCha and XChaCha stream ciphers, + * including ChaCha20 (RFC7539) * * Copyright (C) 2016 Linaro, Ltd. * @@ -160,6 +161,22 @@ static struct skcipher_alg algs[] = { .setkey = crypto_chacha20_setkey, .encrypt = xchacha_neon, .decrypt = xchacha_neon, + }, { + .base.cra_name = "xchacha12", + .base.cra_driver_name = "xchacha12-neon", + .base.cra_priority = 300, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct chacha_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = XCHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = 4 * CHACHA_BLOCK_SIZE, + .setkey = crypto_chacha12_setkey, + .encrypt = xchacha_neon, + .decrypt = xchacha_neon, } }; @@ -186,3 +203,5 @@ MODULE_ALIAS_CRYPTO("chacha20"); MODULE_ALIAS_CRYPTO("chacha20-neon"); MODULE_ALIAS_CRYPTO("xchacha20"); MODULE_ALIAS_CRYPTO("xchacha20-neon"); +MODULE_ALIAS_CRYPTO("xchacha12"); +MODULE_ALIAS_CRYPTO("xchacha12-neon"); From patchwork Mon Aug 6 22:32:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558027 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2B8C13AC for ; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D8CB29570 for ; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 815D029808; Mon, 6 Aug 2018 22:35:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5011A29570 for ; Mon, 6 Aug 2018 22:35:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732960AbeHGAqQ (ORCPT ); Mon, 6 Aug 2018 20:46:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:45154 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732383AbeHGAqO (ORCPT ); Mon, 6 Aug 2018 20:46:14 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3A6D421A71; Mon, 6 Aug 2018 22:35:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594900; bh=Fu9EZ7j9F5XIRhoXaF7Vw7tS9+mrW0u4jU/bOqxWYzk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PXaPgFD0e6vaJQgHQpLyYo8R5ozcEsTFWKDPvu1OnSfrBVIfb1HeWlN23vTtkvFiR zsEimRbZqii4bThmqzR2jKcp56DWSbqOHx57MjBsuvnmTXCxg3HlGJ5yUzqI6B9ZV1 tC9z7nFm7lO5oPaX+peeiPjhfvSmwIF5myffoBbc= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 8/9] crypto: arm/poly1305 - add NEON accelerated Poly1305 implementation Date: Mon, 6 Aug 2018 15:32:59 -0700 Message-Id: <20180806223300.113891-9-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Add the Poly1305 code from OpenSSL, which was written by Andy Polyakov. I took the .S file from WireGuard, whose author has made the needed tweaks for Linux kernel integration and verified that Andy had given permission for GPLv2 distribution. I didn't make any additional changes to the .S file. Note, for HPolyC I'd eventually like a Poly1305 implementation that allows precomputing powers of the key. But for now this implementation just provides the existing semantics where the key and nonce are treated as a "one-time key" that must be provided for every message. Signed-off-by: Eric Biggers --- arch/arm/crypto/Kconfig | 5 + arch/arm/crypto/Makefile | 2 + arch/arm/crypto/poly1305-neon-core.S | 1115 ++++++++++++++++++++++++++ arch/arm/crypto/poly1305-neon-glue.c | 325 ++++++++ 4 files changed, 1447 insertions(+) create mode 100644 arch/arm/crypto/poly1305-neon-core.S create mode 100644 arch/arm/crypto/poly1305-neon-glue.c diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 75c613413e31..8c205994120e 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -121,6 +121,11 @@ config CRYPTO_CHACHA20_NEON select CRYPTO_BLKCIPHER select CRYPTO_CHACHA20 +config CRYPTO_POLY1305_NEON + tristate "NEON accelerated Poly1305 authenticator algorithm" + depends on KERNEL_MODE_NEON + select CRYPTO_POLY1305 + config CRYPTO_SPECK_NEON tristate "NEON accelerated Speck cipher algorithms" depends on KERNEL_MODE_NEON diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 6f58a24faa4a..6423442c86fe 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o +obj-$(CONFIG_CRYPTO_POLY1305_NEON) += poly1305-neon.o obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o @@ -54,6 +55,7 @@ ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o +poly1305-neon-y := poly1305-neon-core.o poly1305-neon-glue.o speck-neon-y := speck-neon-core.o speck-neon-glue.o ifdef REGENERATE_ARM_CRYPTO diff --git a/arch/arm/crypto/poly1305-neon-core.S b/arch/arm/crypto/poly1305-neon-core.S new file mode 100644 index 000000000000..d6b9a80dcc8b --- /dev/null +++ b/arch/arm/crypto/poly1305-neon-core.S @@ -0,0 +1,1115 @@ +/* SPDX-License-Identifier: OpenSSL OR (BSD-3-Clause OR GPL-2.0) + * + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * Copyright 2016 The OpenSSL Project Authors. All Rights Reserved. + */ + +#include + +.text +#if defined(__thumb2__) +.syntax unified +.thumb +#else +.code 32 +#endif + +.align 5 +ENTRY(poly1305_init_arm) + stmdb sp!,{r4-r11} + + eor r3,r3,r3 + cmp r1,#0 + str r3,[r0,#0] @ zero hash value + str r3,[r0,#4] + str r3,[r0,#8] + str r3,[r0,#12] + str r3,[r0,#16] + str r3,[r0,#36] @ is_base2_26 + add r0,r0,#20 + +#ifdef __thumb2__ + it eq +#endif + moveq r0,#0 + beq .Lno_key + + ldrb r4,[r1,#0] + mov r10,#0x0fffffff + ldrb r5,[r1,#1] + and r3,r10,#-4 @ 0x0ffffffc + ldrb r6,[r1,#2] + ldrb r7,[r1,#3] + orr r4,r4,r5,lsl#8 + ldrb r5,[r1,#4] + orr r4,r4,r6,lsl#16 + ldrb r6,[r1,#5] + orr r4,r4,r7,lsl#24 + ldrb r7,[r1,#6] + and r4,r4,r10 + + ldrb r8,[r1,#7] + orr r5,r5,r6,lsl#8 + ldrb r6,[r1,#8] + orr r5,r5,r7,lsl#16 + ldrb r7,[r1,#9] + orr r5,r5,r8,lsl#24 + ldrb r8,[r1,#10] + and r5,r5,r3 + + ldrb r9,[r1,#11] + orr r6,r6,r7,lsl#8 + ldrb r7,[r1,#12] + orr r6,r6,r8,lsl#16 + ldrb r8,[r1,#13] + orr r6,r6,r9,lsl#24 + ldrb r9,[r1,#14] + and r6,r6,r3 + + ldrb r10,[r1,#15] + orr r7,r7,r8,lsl#8 + str r4,[r0,#0] + orr r7,r7,r9,lsl#16 + str r5,[r0,#4] + orr r7,r7,r10,lsl#24 + str r6,[r0,#8] + and r7,r7,r3 + str r7,[r0,#12] +.Lno_key: + ldmia sp!,{r4-r11} +#if __LINUX_ARM_ARCH__ >= 5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +ENDPROC(poly1305_init_arm) + +.align 5 +ENTRY(poly1305_blocks_arm) +.Lpoly1305_blocks_arm: + stmdb sp!,{r3-r11,lr} + + ands r2,r2,#-16 + beq .Lno_data + + cmp r3,#0 + add r2,r2,r1 @ end pointer + sub sp,sp,#32 + + ldmia r0,{r4-r12} @ load context + + str r0,[sp,#12] @ offload stuff + mov lr,r1 + str r2,[sp,#16] + str r10,[sp,#20] + str r11,[sp,#24] + str r12,[sp,#28] + b .Loop + +.Loop: +#if __LINUX_ARM_ARCH__ < 7 + ldrb r0,[lr],#16 @ load input +#ifdef __thumb2__ + it hi +#endif + addhi r8,r8,#1 @ 1<<128 + ldrb r1,[lr,#-15] + ldrb r2,[lr,#-14] + ldrb r3,[lr,#-13] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-12] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-11] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-10] + adds r4,r4,r3 @ accumulate input + + ldrb r3,[lr,#-9] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-8] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-7] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-6] + adcs r5,r5,r3 + + ldrb r3,[lr,#-5] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-4] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-3] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-2] + adcs r6,r6,r3 + + ldrb r3,[lr,#-1] + orr r1,r0,r1,lsl#8 + str lr,[sp,#8] @ offload input pointer + orr r2,r1,r2,lsl#16 + add r10,r10,r10,lsr#2 + orr r3,r2,r3,lsl#24 +#else + ldr r0,[lr],#16 @ load input +#ifdef __thumb2__ + it hi +#endif + addhi r8,r8,#1 @ padbit + ldr r1,[lr,#-12] + ldr r2,[lr,#-8] + ldr r3,[lr,#-4] +#ifdef __ARMEB__ + rev r0,r0 + rev r1,r1 + rev r2,r2 + rev r3,r3 +#endif + adds r4,r4,r0 @ accumulate input + str lr,[sp,#8] @ offload input pointer + adcs r5,r5,r1 + add r10,r10,r10,lsr#2 + adcs r6,r6,r2 +#endif + add r11,r11,r11,lsr#2 + adcs r7,r7,r3 + add r12,r12,r12,lsr#2 + + umull r2,r3,r5,r9 + adc r8,r8,#0 + umull r0,r1,r4,r9 + umlal r2,r3,r8,r10 + umlal r0,r1,r7,r10 + ldr r10,[sp,#20] @ reload r10 + umlal r2,r3,r6,r12 + umlal r0,r1,r5,r12 + umlal r2,r3,r7,r11 + umlal r0,r1,r6,r11 + umlal r2,r3,r4,r10 + str r0,[sp,#0] @ future r4 + mul r0,r11,r8 + ldr r11,[sp,#24] @ reload r11 + adds r2,r2,r1 @ d1+=d0>>32 + eor r1,r1,r1 + adc lr,r3,#0 @ future r6 + str r2,[sp,#4] @ future r5 + + mul r2,r12,r8 + eor r3,r3,r3 + umlal r0,r1,r7,r12 + ldr r12,[sp,#28] @ reload r12 + umlal r2,r3,r7,r9 + umlal r0,r1,r6,r9 + umlal r2,r3,r6,r10 + umlal r0,r1,r5,r10 + umlal r2,r3,r5,r11 + umlal r0,r1,r4,r11 + umlal r2,r3,r4,r12 + ldr r4,[sp,#0] + mul r8,r9,r8 + ldr r5,[sp,#4] + + adds r6,lr,r0 @ d2+=d1>>32 + ldr lr,[sp,#8] @ reload input pointer + adc r1,r1,#0 + adds r7,r2,r1 @ d3+=d2>>32 + ldr r0,[sp,#16] @ reload end pointer + adc r3,r3,#0 + add r8,r8,r3 @ h4+=d3>>32 + + and r1,r8,#-4 + and r8,r8,#3 + add r1,r1,r1,lsr#2 @ *=5 + adds r4,r4,r1 + adcs r5,r5,#0 + adcs r6,r6,#0 + adcs r7,r7,#0 + adc r8,r8,#0 + + cmp r0,lr @ done yet? + bhi .Loop + + ldr r0,[sp,#12] + add sp,sp,#32 + stmia r0,{r4-r8} @ store the result + +.Lno_data: +#if __LINUX_ARM_ARCH__ >= 5 + ldmia sp!,{r3-r11,pc} +#else + ldmia sp!,{r3-r11,lr} + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +ENDPROC(poly1305_blocks_arm) + +.align 5 +ENTRY(poly1305_emit_arm) + stmdb sp!,{r4-r11} +.Lpoly1305_emit_enter: + ldmia r0,{r3-r7} + adds r8,r3,#5 @ compare to modulus + adcs r9,r4,#0 + adcs r10,r5,#0 + adcs r11,r6,#0 + adc r7,r7,#0 + tst r7,#4 @ did it carry/borrow? + +#ifdef __thumb2__ + it ne +#endif + movne r3,r8 + ldr r8,[r2,#0] +#ifdef __thumb2__ + it ne +#endif + movne r4,r9 + ldr r9,[r2,#4] +#ifdef __thumb2__ + it ne +#endif + movne r5,r10 + ldr r10,[r2,#8] +#ifdef __thumb2__ + it ne +#endif + movne r6,r11 + ldr r11,[r2,#12] + + adds r3,r3,r8 + adcs r4,r4,r9 + adcs r5,r5,r10 + adc r6,r6,r11 + +#if __LINUX_ARM_ARCH__ >= 7 +#ifdef __ARMEB__ + rev r3,r3 + rev r4,r4 + rev r5,r5 + rev r6,r6 +#endif + str r3,[r1,#0] + str r4,[r1,#4] + str r5,[r1,#8] + str r6,[r1,#12] +#else + strb r3,[r1,#0] + mov r3,r3,lsr#8 + strb r4,[r1,#4] + mov r4,r4,lsr#8 + strb r5,[r1,#8] + mov r5,r5,lsr#8 + strb r6,[r1,#12] + mov r6,r6,lsr#8 + + strb r3,[r1,#1] + mov r3,r3,lsr#8 + strb r4,[r1,#5] + mov r4,r4,lsr#8 + strb r5,[r1,#9] + mov r5,r5,lsr#8 + strb r6,[r1,#13] + mov r6,r6,lsr#8 + + strb r3,[r1,#2] + mov r3,r3,lsr#8 + strb r4,[r1,#6] + mov r4,r4,lsr#8 + strb r5,[r1,#10] + mov r5,r5,lsr#8 + strb r6,[r1,#14] + mov r6,r6,lsr#8 + + strb r3,[r1,#3] + strb r4,[r1,#7] + strb r5,[r1,#11] + strb r6,[r1,#15] +#endif + ldmia sp!,{r4-r11} +#if __LINUX_ARM_ARCH__ >= 5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +ENDPROC(poly1305_emit_arm) + + +#if __LINUX_ARM_ARCH__ >= 7 +.fpu neon + +.align 5 +ENTRY(poly1305_init_neon) +.Lpoly1305_init_neon: + ldr r4,[r0,#20] @ load key base 2^32 + ldr r5,[r0,#24] + ldr r6,[r0,#28] + ldr r7,[r0,#32] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + and r3,r3,#0x03ffffff + and r4,r4,#0x03ffffff + and r5,r5,#0x03ffffff + + vdup.32 d0,r2 @ r^1 in both lanes + add r2,r3,r3,lsl#2 @ *5 + vdup.32 d1,r3 + add r3,r4,r4,lsl#2 + vdup.32 d2,r2 + vdup.32 d3,r4 + add r4,r5,r5,lsl#2 + vdup.32 d4,r3 + vdup.32 d5,r5 + add r5,r6,r6,lsl#2 + vdup.32 d6,r4 + vdup.32 d7,r6 + vdup.32 d8,r5 + + mov r5,#2 @ counter + +.Lsquare_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + + vmull.u32 q5,d0,d0[1] + vmull.u32 q6,d1,d0[1] + vmull.u32 q7,d3,d0[1] + vmull.u32 q8,d5,d0[1] + vmull.u32 q9,d7,d0[1] + + vmlal.u32 q5,d7,d2[1] + vmlal.u32 q6,d0,d1[1] + vmlal.u32 q7,d1,d1[1] + vmlal.u32 q8,d3,d1[1] + vmlal.u32 q9,d5,d1[1] + + vmlal.u32 q5,d5,d4[1] + vmlal.u32 q6,d7,d4[1] + vmlal.u32 q8,d1,d3[1] + vmlal.u32 q7,d0,d3[1] + vmlal.u32 q9,d3,d3[1] + + vmlal.u32 q5,d3,d6[1] + vmlal.u32 q8,d0,d5[1] + vmlal.u32 q6,d5,d6[1] + vmlal.u32 q7,d7,d6[1] + vmlal.u32 q9,d1,d5[1] + + vmlal.u32 q8,d7,d8[1] + vmlal.u32 q5,d1,d8[1] + vmlal.u32 q6,d3,d8[1] + vmlal.u32 q7,d5,d8[1] + vmlal.u32 q9,d0,d7[1] + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + @ and P. Schwabe + @ + @ H0>>+H1>>+H2>>+H3>>+H4 + @ H3>>+H4>>*5+H0>>+H1 + @ + @ Trivia. + @ + @ Result of multiplication of n-bit number by m-bit number is + @ n+m bits wide. However! Even though 2^n is a n+1-bit number, + @ m-bit number multiplied by 2^n is still n+m bits wide. + @ + @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2, + @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit + @ one is n+1 bits wide. + @ + @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that + @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4 + @ can be 27. However! In cases when their width exceeds 26 bits + @ they are limited by 2^26+2^6. This in turn means that *sum* + @ of the products with these values can still be viewed as sum + @ of 52-bit numbers as long as the amount of addends is not a + @ power of 2. For example, + @ + @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4, + @ + @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or + @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than + @ 8 * (2^52) or 2^55. However, the value is then multiplied by + @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12), + @ which is less than 32 * (2^52) or 2^57. And when processing + @ data we are looking at triple as many addends... + @ + @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and + @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the + @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while + @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32 + @ instruction accepts 2x32-bit input and writes 2x64-bit result. + @ This means that result of reduction have to be compressed upon + @ loop wrap-around. This can be done in the process of reduction + @ to minimize amount of instructions [as well as amount of + @ 128-bit instructions, which benefits low-end processors], but + @ one has to watch for H2 (which is narrower than H0) and 5*H4 + @ not being wider than 58 bits, so that result of right shift + @ by 26 bits fits in 32 bits. This is also useful on x86, + @ because it allows to use paddd in place for paddq, which + @ benefits Atom, where paddq is ridiculously slow. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 @ &=0x03ffffff + vadd.i64 q6,q6,q4 @ h0 -> h1 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vbic.i32 d18,#0xfc000000 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vadd.i32 d10,d10,d30 @ h4 -> h0 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vbic.i32 d14,#0xfc000000 + + vshr.u32 d30,d10,#26 + vbic.i32 d10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + + subs r5,r5,#1 + beq .Lsquare_break_neon + + add r6,r0,#(48+0*9*4) + add r7,r0,#(48+1*9*4) + + vtrn.32 d0,d10 @ r^2:r^1 + vtrn.32 d3,d14 + vtrn.32 d5,d16 + vtrn.32 d1,d12 + vtrn.32 d7,d18 + + vshl.u32 d4,d3,#2 @ *5 + vshl.u32 d6,d5,#2 + vshl.u32 d2,d1,#2 + vshl.u32 d8,d7,#2 + vadd.i32 d4,d4,d3 + vadd.i32 d2,d2,d1 + vadd.i32 d6,d6,d5 + vadd.i32 d8,d8,d7 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6,:32] + vst1.32 {d8[1]},[r7,:32] + + b .Lsquare_neon + +.align 4 +.Lsquare_break_neon: + add r6,r0,#(48+2*4*9) + add r7,r0,#(48+3*4*9) + + vmov d0,d10 @ r^4:r^3 + vshl.u32 d2,d12,#2 @ *5 + vmov d1,d12 + vshl.u32 d4,d14,#2 + vmov d3,d14 + vshl.u32 d6,d16,#2 + vmov d5,d16 + vshl.u32 d8,d18,#2 + vmov d7,d18 + vadd.i32 d2,d2,d12 + vadd.i32 d4,d4,d14 + vadd.i32 d6,d6,d16 + vadd.i32 d8,d8,d18 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6] + vst1.32 {d8[1]},[r7] + + bx lr @ bx lr +ENDPROC(poly1305_init_neon) + +.align 5 +ENTRY(poly1305_blocks_neon) + ldr ip,[r0,#36] @ is_base2_26 + ands r2,r2,#-16 + beq .Lno_data_neon + + cmp r2,#64 + bhs .Lenter_neon + tst ip,ip @ is_base2_26? + beq .Lpoly1305_blocks_arm + +.Lenter_neon: + stmdb sp!,{r4-r7} + vstmdb sp!,{d8-d15} @ ABI specification says so + + tst ip,ip @ is_base2_26? + bne .Lbase2_26_neon + + stmdb sp!,{r1-r3,lr} + bl .Lpoly1305_init_neon + + ldr r4,[r0,#0] @ load hash value base 2^32 + ldr r5,[r0,#4] + ldr r6,[r0,#8] + ldr r7,[r0,#12] + ldr ip,[r0,#16] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + veor d10,d10,d10 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + veor d12,d12,d12 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + veor d14,d14,d14 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + veor d16,d16,d16 + and r3,r3,#0x03ffffff + orr r6,r6,ip,lsl#24 + veor d18,d18,d18 + and r4,r4,#0x03ffffff + mov r1,#1 + and r5,r5,#0x03ffffff + str r1,[r0,#36] @ is_base2_26 + + vmov.32 d10[0],r2 + vmov.32 d12[0],r3 + vmov.32 d14[0],r4 + vmov.32 d16[0],r5 + vmov.32 d18[0],r6 + adr r5,.Lzeros + + ldmia sp!,{r1-r3,lr} + b .Lbase2_32_neon + +.align 4 +.Lbase2_26_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ load hash value + + veor d10,d10,d10 + veor d12,d12,d12 + veor d14,d14,d14 + veor d16,d16,d16 + veor d18,d18,d18 + vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + adr r5,.Lzeros + vld1.32 {d18[0]},[r0] + sub r0,r0,#16 @ rewind + +.Lbase2_32_neon: + add r4,r1,#32 + mov r3,r3,lsl#24 + tst r2,#31 + beq .Leven + + vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]! + vmov.32 d28[0],r3 + sub r2,r2,#16 + add r4,r1,#32 + +#ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +#endif + vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26 + vshl.u32 d26,d26,#18 + + vsri.u32 d26,d24,#14 + vshl.u32 d24,d24,#12 + vadd.i32 d29,d28,d18 @ add hash value and move to #hi + + vbic.i32 d26,#0xfc000000 + vsri.u32 d24,d22,#20 + vshl.u32 d22,d22,#6 + + vbic.i32 d24,#0xfc000000 + vsri.u32 d22,d20,#26 + vadd.i32 d27,d26,d16 + + vbic.i32 d20,#0xfc000000 + vbic.i32 d22,#0xfc000000 + vadd.i32 d25,d24,d14 + + vadd.i32 d21,d20,d10 + vadd.i32 d23,d22,d12 + + mov r7,r5 + add r6,r0,#48 + + cmp r2,r2 + b .Long_tail + +.align 4 +.Leven: + subs r2,r2,#64 + it lo + movlo r4,r5 + + vmov.i32 q14,#1<<24 @ padbit, yes, always + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + itt hi + addhi r7,r0,#(48+1*9*4) + addhi r6,r0,#(48+3*9*4) + +#ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +#endif + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vshl.u32 q13,q13,#18 + + vsri.u32 q13,q12,#14 + vshl.u32 q12,q12,#12 + + vbic.i32 q13,#0xfc000000 + vsri.u32 q12,q11,#20 + vshl.u32 q11,q11,#6 + + vbic.i32 q12,#0xfc000000 + vsri.u32 q11,q10,#26 + + vbic.i32 q10,#0xfc000000 + vbic.i32 q11,#0xfc000000 + + bls .Lskip_loop + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + b .Loop_neon + +.align 5 +.Loop_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + @ ___________________/ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + @ ___________________/ ____________________/ + @ + @ Note that we start with inp[2:3]*r^2. This is because it + @ doesn't depend on reduction in previous iteration. + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ inp[2:3]*r^2 + + vadd.i32 d24,d24,d14 @ accumulate inp[0:1] + vmull.u32 q7,d25,d0[1] + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0[1] + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0[1] + vmlal.u32 q7,d23,d1[1] + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0[1] + + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0[1] + subs r2,r2,#64 + vmlal.u32 q5,d29,d2[1] + it lo + movlo r4,r5 + vmlal.u32 q8,d25,d1[1] + vld1.32 d8[1],[r7,:32] + vmlal.u32 q6,d21,d1[1] + vmlal.u32 q9,d27,d1[1] + + vmlal.u32 q5,d27,d4[1] + vmlal.u32 q8,d23,d3[1] + vmlal.u32 q9,d25,d3[1] + vmlal.u32 q6,d29,d4[1] + vmlal.u32 q7,d21,d3[1] + + vmlal.u32 q8,d21,d5[1] + vmlal.u32 q5,d25,d6[1] + vmlal.u32 q9,d23,d5[1] + vmlal.u32 q6,d27,d6[1] + vmlal.u32 q7,d29,d6[1] + + vmlal.u32 q8,d29,d8[1] + vmlal.u32 q5,d23,d8[1] + vmlal.u32 q9,d21,d7[1] + vmlal.u32 q6,d25,d8[1] + vmlal.u32 q7,d27,d8[1] + + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4 and accumulate + + vmlal.u32 q8,d26,d0[0] + vmlal.u32 q5,d20,d0[0] + vmlal.u32 q9,d28,d0[0] + vmlal.u32 q6,d22,d0[0] + vmlal.u32 q7,d24,d0[0] + vld1.32 d8[0],[r6,:32] + + vmlal.u32 q8,d24,d1[0] + vmlal.u32 q5,d28,d2[0] + vmlal.u32 q9,d26,d1[0] + vmlal.u32 q6,d20,d1[0] + vmlal.u32 q7,d22,d1[0] + + vmlal.u32 q8,d22,d3[0] + vmlal.u32 q5,d26,d4[0] + vmlal.u32 q9,d24,d3[0] + vmlal.u32 q6,d28,d4[0] + vmlal.u32 q7,d20,d3[0] + + vmlal.u32 q8,d20,d5[0] + vmlal.u32 q5,d24,d6[0] + vmlal.u32 q9,d22,d5[0] + vmlal.u32 q6,d26,d6[0] + vmlal.u32 q8,d28,d8[0] + + vmlal.u32 q7,d28,d6[0] + vmlal.u32 q5,d22,d8[0] + vmlal.u32 q9,d20,d7[0] + vmov.i32 q14,#1<<24 @ padbit, yes, always + vmlal.u32 q6,d24,d8[0] + vmlal.u32 q7,d26,d8[0] + + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 +#ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q11,q11 + vrev32.8 q12,q12 + vrev32.8 q13,q13 +#endif + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction interleaved with base 2^32 -> base 2^26 of + @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vadd.i64 q6,q6,q4 @ h0 -> h1 + vshl.u32 q13,q13,#18 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vsri.u32 q13,q12,#14 + vbic.i32 d18,#0xfc000000 + vshl.u32 q12,q12,#12 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vbic.i32 q13,#0xfc000000 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec] + vsri.u32 q12,q11,#20 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vshl.u32 q11,q11,#6 + vbic.i32 d14,#0xfc000000 + vbic.i32 q12,#0xfc000000 + + vshrn.u64 d30,q5,#26 @ re-narrow + vmovn.i64 d10,q5 + vsri.u32 q11,q10,#26 + vbic.i32 q10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vbic.i32 d10,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + vbic.i32 q11,#0xfc000000 + + bhi .Loop_neon + +.Lskip_loop: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + add r7,r0,#(48+0*9*4) + add r6,r0,#(48+1*9*4) + adds r2,r2,#32 + it ne + movne r2,#0 + bne .Long_tail + + vadd.i32 d25,d24,d14 @ add hash value and move to #hi + vadd.i32 d21,d20,d10 + vadd.i32 d27,d26,d16 + vadd.i32 d23,d22,d12 + vadd.i32 d29,d28,d18 + +.Long_tail: + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2 + + vadd.i32 d24,d24,d14 @ can be redundant + vmull.u32 q7,d25,d0 + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0 + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0 + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0 + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0 + + vmlal.u32 q5,d29,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d25,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d21,d1 + vmlal.u32 q9,d27,d1 + vmlal.u32 q7,d23,d1 + + vmlal.u32 q8,d23,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d27,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d25,d3 + vmlal.u32 q6,d29,d4 + vmlal.u32 q7,d21,d3 + + vmlal.u32 q8,d21,d5 + it ne + addne r7,r0,#(48+2*9*4) + vmlal.u32 q5,d25,d6 + it ne + addne r6,r0,#(48+3*9*4) + vmlal.u32 q9,d23,d5 + vmlal.u32 q6,d27,d6 + vmlal.u32 q7,d29,d6 + + vmlal.u32 q8,d29,d8 + vorn q0,q0,q0 @ all-ones, can be redundant + vmlal.u32 q5,d23,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d21,d7 + vmlal.u32 q6,d25,d8 + vmlal.u32 q7,d27,d8 + + beq .Lshort_tail + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4:r^3 and accumulate + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + + vmlal.u32 q7,d24,d0 + vmlal.u32 q5,d20,d0 + vmlal.u32 q8,d26,d0 + vmlal.u32 q6,d22,d0 + vmlal.u32 q9,d28,d0 + + vmlal.u32 q5,d28,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d24,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d20,d1 + vmlal.u32 q9,d26,d1 + vmlal.u32 q7,d22,d1 + + vmlal.u32 q8,d22,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d26,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d24,d3 + vmlal.u32 q6,d28,d4 + vmlal.u32 q7,d20,d3 + + vmlal.u32 q8,d20,d5 + vmlal.u32 q5,d24,d6 + vmlal.u32 q9,d22,d5 + vmlal.u32 q6,d26,d6 + vmlal.u32 q7,d28,d6 + + vmlal.u32 q8,d28,d8 + vorn q0,q0,q0 @ all-ones + vmlal.u32 q5,d22,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d20,d7 + vmlal.u32 q6,d24,d8 + vmlal.u32 q7,d26,d8 + +.Lshort_tail: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ horizontal addition + + vadd.i64 d16,d16,d17 + vadd.i64 d10,d10,d11 + vadd.i64 d18,d18,d19 + vadd.i64 d12,d12,d13 + vadd.i64 d14,d14,d15 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction, but without narrowing + + vshr.u64 q15,q8,#26 + vand.i64 q8,q8,q0 + vshr.u64 q4,q5,#26 + vand.i64 q5,q5,q0 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vadd.i64 q6,q6,q4 @ h0 -> h1 + + vshr.u64 q15,q9,#26 + vand.i64 q9,q9,q0 + vshr.u64 q4,q6,#26 + vand.i64 q6,q6,q0 + vadd.i64 q7,q7,q4 @ h1 -> h2 + + vadd.i64 q5,q5,q15 + vshl.u64 q15,q15,#2 + vshr.u64 q4,q7,#26 + vand.i64 q7,q7,q0 + vadd.i64 q5,q5,q15 @ h4 -> h0 + vadd.i64 q8,q8,q4 @ h2 -> h3 + + vshr.u64 q15,q5,#26 + vand.i64 q5,q5,q0 + vshr.u64 q4,q8,#26 + vand.i64 q8,q8,q0 + vadd.i64 q6,q6,q15 @ h0 -> h1 + vadd.i64 q9,q9,q4 @ h3 -> h4 + + cmp r2,#0 + bne .Leven + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ store hash value + + vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + vst1.32 {d18[0]},[r0] + + vldmia sp!,{d8-d15} @ epilogue + ldmia sp!,{r4-r7} +.Lno_data_neon: + bx lr @ bx lr +ENDPROC(poly1305_blocks_neon) + +.align 5 +ENTRY(poly1305_emit_neon) + ldr ip,[r0,#36] @ is_base2_26 + + stmdb sp!,{r4-r11} + + tst ip,ip + beq .Lpoly1305_emit_enter + + ldmia r0,{r3-r7} + eor r8,r8,r8 + + adds r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32 + mov r4,r4,lsr#6 + adcs r4,r4,r5,lsl#20 + mov r5,r5,lsr#12 + adcs r5,r5,r6,lsl#14 + mov r6,r6,lsr#18 + adcs r6,r6,r7,lsl#8 + adc r7,r8,r7,lsr#24 @ can be partially reduced ... + + and r8,r7,#-4 @ ... so reduce + and r7,r6,#3 + add r8,r8,r8,lsr#2 @ *= 5 + adds r3,r3,r8 + adcs r4,r4,#0 + adcs r5,r5,#0 + adcs r6,r6,#0 + adc r7,r7,#0 + + adds r8,r3,#5 @ compare to modulus + adcs r9,r4,#0 + adcs r10,r5,#0 + adcs r11,r6,#0 + adc r7,r7,#0 + tst r7,#4 @ did it carry/borrow? + + it ne + movne r3,r8 + ldr r8,[r2,#0] + it ne + movne r4,r9 + ldr r9,[r2,#4] + it ne + movne r5,r10 + ldr r10,[r2,#8] + it ne + movne r6,r11 + ldr r11,[r2,#12] + + adds r3,r3,r8 @ accumulate nonce + adcs r4,r4,r9 + adcs r5,r5,r10 + adc r6,r6,r11 + +#ifdef __ARMEB__ + rev r3,r3 + rev r4,r4 + rev r5,r5 + rev r6,r6 +#endif + str r3,[r1,#0] @ store the result + str r4,[r1,#4] + str r5,[r1,#8] + str r6,[r1,#12] + + ldmia sp!,{r4-r11} + bx lr @ bx lr +ENDPROC(poly1305_emit_neon) + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +#endif diff --git a/arch/arm/crypto/poly1305-neon-glue.c b/arch/arm/crypto/poly1305-neon-glue.c new file mode 100644 index 000000000000..633ac0d157db --- /dev/null +++ b/arch/arm/crypto/poly1305-neon-glue.c @@ -0,0 +1,325 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Poly1305 authenticator, NEON-accelerated - + * glue code for OpenSSL implementation + * + * Copyright (c) 2018 Google LLC + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]); +asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, size_t len, + u32 padbit); +asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]); + +struct poly1305_neon_desc_ctx { + u8 buf[POLY1305_BLOCK_SIZE]; + unsigned int buflen; + bool key_set; + bool nonce_set; + u32 nonce[4]; + u8 neon_ctx[192] __aligned(16); +} __aligned(16); + +static int poly1305_neon_init(struct shash_desc *desc) +{ + struct poly1305_neon_desc_ctx *dctx = shash_desc_ctx(desc); + + dctx->buflen = 0; + dctx->key_set = false; + dctx->nonce_set = false; + return 0; +} + +static void poly1305_neon_blocks(struct poly1305_neon_desc_ctx *dctx, + const u8 *src, unsigned int srclen, u32 padbit) +{ + if (!dctx->key_set) { + poly1305_init_arm(dctx->neon_ctx, src); + dctx->key_set = true; + src += POLY1305_BLOCK_SIZE; + srclen -= POLY1305_BLOCK_SIZE; + if (!srclen) + return; + } + + if (!dctx->nonce_set) { + dctx->nonce[0] = get_unaligned_le32(src + 0); + dctx->nonce[1] = get_unaligned_le32(src + 4); + dctx->nonce[2] = get_unaligned_le32(src + 8); + dctx->nonce[3] = get_unaligned_le32(src + 12); + dctx->nonce_set = true; + src += POLY1305_BLOCK_SIZE; + srclen -= POLY1305_BLOCK_SIZE; + if (!srclen) + return; + } + + kernel_neon_begin(); + poly1305_blocks_neon(dctx->neon_ctx, src, srclen, padbit); + kernel_neon_end(); +} + +static int poly1305_neon_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + struct poly1305_neon_desc_ctx *dctx = shash_desc_ctx(desc); + unsigned int bytes; + + if (dctx->buflen) { + bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->buflen); + memcpy(&dctx->buf[dctx->buflen], src, bytes); + dctx->buflen += bytes; + src += bytes; + srclen -= bytes; + + if (dctx->buflen == POLY1305_BLOCK_SIZE) { + poly1305_neon_blocks(dctx, dctx->buf, + POLY1305_BLOCK_SIZE, 1); + dctx->buflen = 0; + } + } + + if (srclen >= POLY1305_BLOCK_SIZE) { + bytes = round_down(srclen, POLY1305_BLOCK_SIZE); + poly1305_neon_blocks(dctx, src, bytes, 1); + src += bytes; + srclen -= bytes; + } + + if (srclen) { + memcpy(dctx->buf, src, srclen); + dctx->buflen = srclen; + } + + return 0; +} + +static int poly1305_neon_final(struct shash_desc *desc, u8 *dst) +{ + struct poly1305_neon_desc_ctx *dctx = shash_desc_ctx(desc); + + if (!dctx->nonce_set) + return -ENOKEY; + + if (dctx->buflen) { + dctx->buf[dctx->buflen++] = 1; + memset(&dctx->buf[dctx->buflen], 0, + POLY1305_BLOCK_SIZE - dctx->buflen); + poly1305_neon_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0); + } + + /* + * This part doesn't actually use NEON instructions, so no need for + * kernel_neon_begin(). + */ + poly1305_emit_neon(dctx->neon_ctx, dst, dctx->nonce); + return 0; +} + +static struct shash_alg poly1305_alg = { + .digestsize = POLY1305_DIGEST_SIZE, + .init = poly1305_neon_init, + .update = poly1305_neon_update, + .final = poly1305_neon_final, + .descsize = sizeof(struct poly1305_desc_ctx), + .base = { + .cra_name = "__poly1305", + .cra_driver_name = "__driver-poly1305-neon", + .cra_priority = 0, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = POLY1305_BLOCK_SIZE, + .cra_module = THIS_MODULE, + }, +}; + +/* Boilerplate to wrap the use of kernel_neon_begin() */ + +struct poly1305_async_ctx { + struct cryptd_ahash *cryptd_tfm; +}; + +static int poly1305_async_init(struct ahash_request *req) +{ + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct poly1305_async_ctx *ctx = crypto_ahash_ctx(tfm); + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm; + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + struct crypto_shash *child = cryptd_ahash_child(cryptd_tfm); + + desc->tfm = child; + desc->flags = req->base.flags; + return crypto_shash_init(desc); +} + +static int poly1305_async_update(struct ahash_request *req) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct poly1305_async_ctx *ctx = crypto_ahash_ctx(tfm); + struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm; + + if (!may_use_simd() || + (in_atomic() && cryptd_ahash_queued(cryptd_tfm))) { + memcpy(cryptd_req, req, sizeof(*req)); + ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base); + return crypto_ahash_update(cryptd_req); + } else { + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + return shash_ahash_update(req, desc); + } +} + +static int poly1305_async_final(struct ahash_request *req) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct poly1305_async_ctx *ctx = crypto_ahash_ctx(tfm); + struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm; + + if (!may_use_simd() || + (in_atomic() && cryptd_ahash_queued(cryptd_tfm))) { + memcpy(cryptd_req, req, sizeof(*req)); + ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base); + return crypto_ahash_final(cryptd_req); + } else { + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + return crypto_shash_final(desc, req->result); + } +} + +static int poly1305_async_digest(struct ahash_request *req) +{ + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct poly1305_async_ctx *ctx = crypto_ahash_ctx(tfm); + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm; + + if (!may_use_simd() || + (in_atomic() && cryptd_ahash_queued(cryptd_tfm))) { + memcpy(cryptd_req, req, sizeof(*req)); + ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base); + return crypto_ahash_digest(cryptd_req); + } else { + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + struct crypto_shash *child = cryptd_ahash_child(cryptd_tfm); + + desc->tfm = child; + desc->flags = req->base.flags; + return shash_ahash_digest(req, desc); + } +} + +static int poly1305_async_import(struct ahash_request *req, const void *in) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct poly1305_async_ctx *ctx = crypto_ahash_ctx(tfm); + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + + desc->tfm = cryptd_ahash_child(ctx->cryptd_tfm); + desc->flags = req->base.flags; + + return crypto_shash_import(desc, in); +} + +static int poly1305_async_export(struct ahash_request *req, void *out) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + + return crypto_shash_export(desc, out); +} + +static int poly1305_async_init_tfm(struct crypto_tfm *tfm) +{ + struct cryptd_ahash *cryptd_tfm; + struct poly1305_async_ctx *ctx = crypto_tfm_ctx(tfm); + + cryptd_tfm = cryptd_alloc_ahash("__driver-poly1305-neon", + CRYPTO_ALG_INTERNAL, + CRYPTO_ALG_INTERNAL); + if (IS_ERR(cryptd_tfm)) + return PTR_ERR(cryptd_tfm); + ctx->cryptd_tfm = cryptd_tfm; + crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm), + sizeof(struct ahash_request) + + crypto_ahash_reqsize(&cryptd_tfm->base)); + + return 0; +} + +static void poly1305_async_exit_tfm(struct crypto_tfm *tfm) +{ + struct poly1305_async_ctx *ctx = crypto_tfm_ctx(tfm); + + cryptd_free_ahash(ctx->cryptd_tfm); +} + +static struct ahash_alg poly1305_async_alg = { + .init = poly1305_async_init, + .update = poly1305_async_update, + .final = poly1305_async_final, + .digest = poly1305_async_digest, + .import = poly1305_async_import, + .export = poly1305_async_export, + .halg.digestsize = POLY1305_DIGEST_SIZE, + .halg.statesize = sizeof(struct poly1305_neon_desc_ctx), + .halg.base = { + .cra_name = "poly1305", + .cra_driver_name = "poly1305-neon", + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_ASYNC, + .cra_blocksize = POLY1305_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct poly1305_async_ctx), + .cra_module = THIS_MODULE, + .cra_init = poly1305_async_init_tfm, + .cra_exit = poly1305_async_exit_tfm, + }, +}; + +static int __init poly1305_neon_module_init(void) +{ + int err; + + if (!(elf_hwcap & HWCAP_NEON)) + return -ENODEV; + + err = crypto_register_shash(&poly1305_alg); + if (err) + return err; + err = crypto_register_ahash(&poly1305_async_alg); + if (err) + goto err_shash; + + return 0; + +err_shash: + crypto_unregister_shash(&poly1305_alg); + return err; +} + +static void __exit poly1305_neon_module_exit(void) +{ + crypto_unregister_shash(&poly1305_alg); + crypto_unregister_ahash(&poly1305_async_alg); +} + +module_init(poly1305_neon_module_init); +module_exit(poly1305_neon_module_exit); + +MODULE_DESCRIPTION("Poly1305 authenticator (NEON-accelerated)"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("poly1305"); +MODULE_ALIAS_CRYPTO("poly1305-neon"); From patchwork Mon Aug 6 22:33:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10558033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2984213AC for ; Mon, 6 Aug 2018 22:35:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15D0B29570 for ; Mon, 6 Aug 2018 22:35:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0889D297CE; Mon, 6 Aug 2018 22:35:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B18B29570 for ; Mon, 6 Aug 2018 22:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732754AbeHGAqP (ORCPT ); Mon, 6 Aug 2018 20:46:15 -0400 Received: from mail.kernel.org ([198.145.29.99]:45160 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732394AbeHGAqO (ORCPT ); Mon, 6 Aug 2018 20:46:14 -0400 Received: from ebiggers-linuxstation.kir.corp.google.com (unknown [104.132.51.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 73F3421A72; Mon, 6 Aug 2018 22:35:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533594900; bh=x6uQXym7jhnXQaki8Br0f6u2ygg8gmNsIbofwYWwpVQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VVETOSVzXdwyu81WctfjeCz5gw9rh3lyUQa1SuMPAHdU22e3ZKIfN/2I4mxALoXEv wr6JuBhX1Jo4UprnLsDZsPkRZiRkC9GI85wwdBbBgz2sw03+2tv9dv5gHxDg1ojy+q bhkey8PGGQJozxy+QQNhjzIMMRyFRJv8kSSWPgec= From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur , Eric Biggers Subject: [RFC PATCH 9/9] crypto: hpolyc - add support for the HPolyC encryption mode Date: Mon, 6 Aug 2018 15:33:00 -0700 Message-Id: <20180806223300.113891-10-ebiggers@kernel.org> X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog In-Reply-To: <20180806223300.113891-1-ebiggers@kernel.org> References: <20180806223300.113891-1-ebiggers@kernel.org> MIME-Version: 1.0 Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Add support for HPolyC, which is a tweakable, length-preserving encryption mode that encrypts each message using XChaCha sandwiched between two passes of Poly1305, and one invocation of a block cipher such as AES. HPolyC was designed by Paul Crowley and is fully specified by our paper https://eprint.iacr.org/2018/720.pdf ("HPolyC: length-preserving encryption for entry-level processors"). HPolyC is similar to some existing modes such as XCB, HCTR, HCH, and HMC, but by necessity it has some novelties. See the paper for details; this patch only provides a brief overview and an explanation of why HPolyC is needed in the crypto API. HPolyC is suitable for disk/file encryption with dm-crypt and fscrypt, where currently the only other suitable options in the kernel are block cipher modes such as XTS. Moreover, on low-end devices whose processors lack AES instructions (e.g. ARM Cortex-A7), AES-XTS is much too slow to provide an acceptable user experience, resulting in "lightweight" block ciphers such as Speck being the only viable option on these devices. However, Speck is considered controversial in some circles; and other published lightweight block ciphers are too slow in software, haven't received sufficient cryptanalysis, or have other issues. Stream ciphers such as ChaCha perform much better but are insecure if naively used directly in dm-crypt or fscrypt, due to the IV reuse when data is overwritten. Even restricting the threat model to offline attacks only isn't enough, since modern flash storage devices make no guarantee that "overwrites" are really overwrites, due to wear-leveling. Of course, the ideal solution is to store unique nonces on-disk, ideally alongside authentication tags. Unfortunately, this is usually impractical. Hardware support for per-sector metadata is extremely rare, especially in consumer-grade devices. Software workarounds for this limitation struggle with the crash consistency problem (often ignored by academic cryptographers): the crypto metadata MUST be written atomically with regards to the data. This can be solved with data journaling, e.g. as dm-integrity does, but that has a severe performance penalty. Or, for file-level encryption only, per-block metadata is possible on copy-on-write and log-structured filesystems. However, the most common Linux filesystems, ext4 and xfs, are neither; and even f2fs is not fully log-structured as it sometimes overwrites data in-place. A solution that works for more than just btrfs and zfs is needed. So, we're mostly stuck with length-preserving encryption for now. HPolyC therefore provides a way to securely use ChaCha in this context. Essentially, it uses a hash-XOR-hash construction where a Poly1305 hash of the tweak and message is used as the nonce for XChaCha, resulting in a different nonce whenever either the message or tweak is changed. A block cipher invocation is also needed, but only once per message. Note that Poly1305 is much faster than ChaCha20, making HPolyC faster than might be first assumed; still, due to the overhead of the two Poly1305 passes, some users will need ChaCha12 to get acceptable performance. See the Performance section of the paper. (Currently, ChaCha12 is still secure, though it has a lower security margin.) HPolyC has a proof (section 5 of the paper) that shows it is secure if the underlying primitives are secure, subject to a security bound. Unless there is a mistake in this proof, one therefore does not need to "trust" HPolyC; they need only trust XChaCha (which itself has a security reduction to ChaCha) and AES. Unlike XTS, HPolyC is also a true wide-block mode, or tweakable super-pseudorandom permutation: changing one plaintext bit affects all ciphertext bits, and vice versa. HPolyC supports any message length >= 16 bytes without any need for "ciphertext stealing". Thus, it will also be useful for fscrypt filenames encryption, where CBC-CTS is currently used. We implement HPolyC as a template that wraps existing Poly1305, XChaCha, and block cipher implementations. So, it can be used with either XChaCha20 or XChaCha12, and with any block cipher with a 256-bit key and 128-bit block size -- though we recommend and plan to use AES-256, even on processors without AES instructions, as the block cipher performance is not critical when it's invoked only once per message. We include test vectors for HPolyC-XChaCha20-AES and HPolyC-XChaCha12-AES. Signed-off-by: Eric Biggers --- crypto/Kconfig | 28 +++ crypto/Makefile | 1 + crypto/hpolyc.c | 577 +++++++++++++++++++++++++++++++++++++++++++++++ crypto/testmgr.c | 12 + crypto/testmgr.h | 158 +++++++++++++ 5 files changed, 776 insertions(+) create mode 100644 crypto/hpolyc.c diff --git a/crypto/Kconfig b/crypto/Kconfig index d35d423bb4d1..4b412116b888 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -495,6 +495,34 @@ config CRYPTO_KEYWRAP Support for key wrapping (NIST SP800-38F / RFC3394) without padding. +config CRYPTO_HPOLYC + tristate "HPolyC support" + select CRYPTO_CHACHA20 + select CRYPTO_POLY1305 + help + HPolyC is a tweakable, length-preserving encryption + construction that encrypts each message using XChaCha + sandwiched between two passes of Poly1305, and one invocation + of a block cipher such as AES on a single 128-bit block. + + Unlike bare stream ciphers such as ChaCha and + ciphertext-expanding modes (e.g. AEADs), HPolyC is suitable for + disk and file contents encryption, e.g. with dm-crypt or + fscrypt, where normally only block ciphers in the XTS, LRW, or + CBC-ESSIV modes of operation are suitable. HPolyC was designed + primarily for this use case on low-end processors that lack AES + instructions, resulting in traditional modes being too slow + unless used with certain "lightweight" block ciphers such as + Speck, but where XChaCha and Poly1305 are reasonably fast. + + HPolyC's security is proven reducible to that of the underlying + XChaCha variant and block cipher, subject to a security bound. + Unlike XTS, HPolyC is a true wide-block encryption mode, so it + actually provides an even stronger notion of security than XTS, + subject to the security bound. + + If unsure, say N. + comment "Hash modes" config CRYPTO_CMAC diff --git a/crypto/Makefile b/crypto/Makefile index 0701c4577dc6..e0af9efd6a08 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -83,6 +83,7 @@ obj-$(CONFIG_CRYPTO_LRW) += lrw.o obj-$(CONFIG_CRYPTO_XTS) += xts.o obj-$(CONFIG_CRYPTO_CTR) += ctr.o obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o +obj-$(CONFIG_CRYPTO_HPOLYC) += hpolyc.o obj-$(CONFIG_CRYPTO_GCM) += gcm.o obj-$(CONFIG_CRYPTO_CCM) += ccm.o obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o diff --git a/crypto/hpolyc.c b/crypto/hpolyc.c new file mode 100644 index 000000000000..d62f289e2705 --- /dev/null +++ b/crypto/hpolyc.c @@ -0,0 +1,577 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * HPolyC: length-preserving encryption for entry-level processors + * + * Reference: https://eprint.iacr.org/2018/720.pdf + * + * Copyright (c) 2018 Google LLC + */ + +#include +#include +#include +#include +#include +#include + +#include "internal.h" + +/* Poly1305 and block cipher block size */ +#define HPOLYC_BLOCK_SIZE 16 + +/* Key sizes in bytes */ +#define HPOLYC_STREAM_KEY_SIZE 32 /* XChaCha stream key (K_S) */ +#define HPOLYC_HASH_KEY_SIZE 16 /* Poly1305 hash key (K_H) */ +#define HPOLYC_BLKCIPHER_KEY_SIZE 32 /* Block cipher key (K_E) */ + +/* + * The HPolyC specification allows any tweak (IV) length <= UINT32_MAX bits, but + * Linux's crypto API currently only allows algorithms to support a single IV + * length. We choose 12 bytes, which is the longest tweak that fits into a + * single 16-byte Poly1305 block (as HPolyC reserves 4 bytes for the tweak + * length), for the fastest performance. And it's good enough for disk + * encryption which really only needs an 8-byte tweak anyway. + */ +#define HPOLYC_IV_SIZE 12 + +struct hpolyc_instance_ctx { + struct crypto_ahash_spawn poly1305_spawn; + struct crypto_skcipher_spawn xchacha_spawn; + struct crypto_spawn blkcipher_spawn; +}; + +struct hpolyc_tfm_ctx { + struct crypto_ahash *poly1305; + struct crypto_skcipher *xchacha; + struct crypto_cipher *blkcipher; + u8 poly1305_key[HPOLYC_HASH_KEY_SIZE]; /* K_H (unclamped) */ +}; + +struct hpolyc_request_ctx { + + /* First part of data passed to the two Poly1305 hash steps */ + struct { + u8 rkey[HPOLYC_BLOCK_SIZE]; + u8 skey[HPOLYC_BLOCK_SIZE]; + __le32 tweak_len; + u8 tweak[HPOLYC_IV_SIZE]; + } hash_head; + struct scatterlist hash_sg[2]; + + /* + * Buffer for rightmost portion of data, i.e. the last 16-byte block + * + * P_L => P_M => C_M => C_R when encrypting, or + * C_R => C_M => P_M => P_L when decrypting. + * + * Also used to build the XChaCha IV as C_M || 1 || 0^63 || 0^64. + */ + u8 rbuf[XCHACHA_IV_SIZE]; + + bool enc; /* true if encrypting, false if decrypting */ + + /* Sub-requests, must be last */ + union { + struct ahash_request poly1305_req; + struct skcipher_request xchacha_req; + } u; +}; + +/* + * Given the 256-bit XChaCha stream key K_S, derive the 128-bit Poly1305 hash + * key K_H and the 256-bit block cipher key K_E as follows: + * + * K_H || K_E || ... = XChaCha(key=K_S, nonce=1||0^191) + * + * Note that this denotes using bits from the XChaCha keystream, which here we + * get indirectly by encrypting a buffer containing all 0's. + */ +static int hpolyc_setkey(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keylen) +{ + struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + struct { + u8 iv[XCHACHA_IV_SIZE]; + u8 derived_keys[HPOLYC_HASH_KEY_SIZE + + HPOLYC_BLKCIPHER_KEY_SIZE]; + struct scatterlist sg; + struct crypto_wait wait; + struct skcipher_request req; /* must be last */ + } *data; + int err; + + /* Set XChaCha key */ + crypto_skcipher_clear_flags(tctx->xchacha, CRYPTO_TFM_REQ_MASK); + crypto_skcipher_set_flags(tctx->xchacha, + crypto_skcipher_get_flags(tfm) & + CRYPTO_TFM_REQ_MASK); + err = crypto_skcipher_setkey(tctx->xchacha, key, keylen); + crypto_skcipher_set_flags(tfm, + crypto_skcipher_get_flags(tctx->xchacha) & + CRYPTO_TFM_RES_MASK); + if (err) + return err; + + /* Derive the Poly1305 and block cipher keys */ + data = kzalloc(sizeof(*data) + crypto_skcipher_reqsize(tctx->xchacha), + GFP_KERNEL); + if (!data) + return -ENOMEM; + data->iv[0] = 1; + sg_init_one(&data->sg, data->derived_keys, sizeof(data->derived_keys)); + crypto_init_wait(&data->wait); + skcipher_request_set_tfm(&data->req, tctx->xchacha); + skcipher_request_set_callback(&data->req, CRYPTO_TFM_REQ_MAY_SLEEP | + CRYPTO_TFM_REQ_MAY_BACKLOG, + crypto_req_done, &data->wait); + skcipher_request_set_crypt(&data->req, &data->sg, &data->sg, + sizeof(data->derived_keys), data->iv); + err = crypto_wait_req(crypto_skcipher_encrypt(&data->req), + &data->wait); + if (err) + goto out; + + /* + * Save the Poly1305 key. It is not clamped here, since that is handled + * by the Poly1305 implementation. + */ + memcpy(tctx->poly1305_key, data->derived_keys, HPOLYC_HASH_KEY_SIZE); + + /* Set block cipher key */ + crypto_cipher_clear_flags(tctx->blkcipher, CRYPTO_TFM_REQ_MASK); + crypto_cipher_set_flags(tctx->blkcipher, + crypto_skcipher_get_flags(tfm) & + CRYPTO_TFM_REQ_MASK); + err = crypto_cipher_setkey(tctx->blkcipher, + &data->derived_keys[HPOLYC_HASH_KEY_SIZE], + HPOLYC_BLKCIPHER_KEY_SIZE); + crypto_skcipher_set_flags(tfm, + crypto_cipher_get_flags(tctx->blkcipher) & + CRYPTO_TFM_RES_MASK); +out: + kzfree(data); + return err; +} + +static inline void async_done(struct crypto_async_request *areq, int err, + int (*next_step)(struct skcipher_request *, u32)) +{ + struct skcipher_request *req = areq->data; + + if (err) + goto out; + + err = next_step(req, req->base.flags & ~CRYPTO_TFM_REQ_MAY_SLEEP); + if (err == -EINPROGRESS || err == -EBUSY) + return; +out: + skcipher_request_complete(req, err); +} + +/* + * Following completion of the second hash step, do the second bitwise inversion + * to complete the identity a - b = ~(a + ~(b)), then copy the result to the + * last block of the destination scatterlist. This completes HPolyC. + */ +static int hpolyc_finish(struct skcipher_request *req, u32 flags) +{ + struct hpolyc_request_ctx *rctx = skcipher_request_ctx(req); + int i; + + for (i = 0; i < HPOLYC_BLOCK_SIZE; i++) + rctx->rbuf[i] ^= 0xff; + + scatterwalk_map_and_copy(rctx->rbuf, req->dst, + req->cryptlen - HPOLYC_BLOCK_SIZE, + HPOLYC_BLOCK_SIZE, 1); + return 0; +} + +static void hpolyc_hash2_done(struct crypto_async_request *areq, int err) +{ + async_done(areq, err, hpolyc_finish); +} + +/* + * Following completion of the XChaCha step, do the second hash step to compute + * the last output block. Note that the last block needs to be subtracted + * rather than added, which isn't compatible with typical Poly1305 + * implementations. Thus, we use the identity a - b = ~(a + (~b)). + */ +static int hpolyc_hash2_step(struct skcipher_request *req, u32 flags) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + struct hpolyc_request_ctx *rctx = skcipher_request_ctx(req); + int i; + + /* If decrypting, decrypt C_M with the block cipher to get P_M */ + if (!rctx->enc) + crypto_cipher_decrypt_one(tctx->blkcipher, rctx->rbuf, + rctx->rbuf); + + for (i = 0; i < HPOLYC_BLOCK_SIZE; i++) + rctx->hash_head.skey[i] = rctx->rbuf[i] ^ 0xff; + + sg_chain(rctx->hash_sg, 2, req->dst); + + ahash_request_set_tfm(&rctx->u.poly1305_req, tctx->poly1305); + ahash_request_set_crypt(&rctx->u.poly1305_req, rctx->hash_sg, + rctx->rbuf, sizeof(rctx->hash_head) + + req->cryptlen - HPOLYC_BLOCK_SIZE); + ahash_request_set_callback(&rctx->u.poly1305_req, flags, + hpolyc_hash2_done, req); + return crypto_ahash_digest(&rctx->u.poly1305_req) ?: + hpolyc_finish(req, flags); +} + +static void hpolyc_xchacha_done(struct crypto_async_request *areq, int err) +{ + async_done(areq, err, hpolyc_hash2_step); +} + +static int hpolyc_xchacha_step(struct skcipher_request *req, u32 flags) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + struct hpolyc_request_ctx *rctx = skcipher_request_ctx(req); + unsigned int xchacha_len; + + /* If encrypting, encrypt P_M with the block cipher to get C_M */ + if (rctx->enc) + crypto_cipher_encrypt_one(tctx->blkcipher, rctx->rbuf, + rctx->rbuf); + + /* Initialize the rest of the XChaCha IV (first part is C_M) */ + rctx->rbuf[HPOLYC_BLOCK_SIZE] = 1; + memset(&rctx->rbuf[HPOLYC_BLOCK_SIZE + 1], 0, + sizeof(rctx->rbuf) - (HPOLYC_BLOCK_SIZE + 1)); + + /* + * XChaCha needs to be done on all the data except the last 16 bytes; + * for disk encryption that usually means 4080 or 496 bytes. But ChaCha + * implementations tend to be most efficient when passed a whole number + * of 64-byte ChaCha blocks, or sometimes even a multiple of 256 bytes. + * And here it doesn't matter whether the last 16 bytes are written to, + * as the second hash step will overwrite them. Thus, round the XChaCha + * length up to the next 64-byte boundary if possible. + */ + xchacha_len = req->cryptlen - HPOLYC_BLOCK_SIZE; + if (round_up(xchacha_len, CHACHA_BLOCK_SIZE) <= req->cryptlen) + xchacha_len = round_up(xchacha_len, CHACHA_BLOCK_SIZE); + + skcipher_request_set_tfm(&rctx->u.xchacha_req, tctx->xchacha); + skcipher_request_set_crypt(&rctx->u.xchacha_req, req->src, req->dst, + xchacha_len, rctx->rbuf); + skcipher_request_set_callback(&rctx->u.xchacha_req, flags, + hpolyc_xchacha_done, req); + return crypto_skcipher_encrypt(&rctx->u.xchacha_req) ?: + hpolyc_hash2_step(req, flags); +} + +static void hpolyc_hash1_done(struct crypto_async_request *areq, int err) +{ + async_done(areq, err, hpolyc_xchacha_step); +} + +/* + * HPolyC encryption/decryption. + * + * The first step is to Poly1305-hash the tweak and source data to get P_M (if + * encrypting) or C_M (if decrypting), storing the result in rctx->rbuf. + * Linux's Poly1305 doesn't use the usual keying mechanism and instead + * interprets the data as (rkey, skey, real data), so we pass: + * + * 1. rkey = poly1305_key + * 2. skey = last block of data (P_R or C_R) + * 3. tweak block (assuming 12-byte tweak, so it fits in one block) + * 4. rest of the data + * + * We put 1-3 in rctx->hash_head and chain it to the rest from req->src. + * + * Note: as a future optimization, a keyed version of Poly1305 that is keyed + * with the 'rkey' could be implemented, allowing vectorized implementations of + * Poly1305 to precompute powers of the key. Though, that would be most + * beneficial on small messages, whereas in the disk/file encryption use case, + * longer 512-byte or 4096-byte messages are the most performance-critical. + * + * Afterwards, we continue on to the XChaCha step. + */ +static int hpolyc_crypt(struct skcipher_request *req, bool enc) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + struct hpolyc_request_ctx *rctx = skcipher_request_ctx(req); + + if (req->cryptlen < HPOLYC_BLOCK_SIZE) + return -EINVAL; + + rctx->enc = enc; + + BUILD_BUG_ON(sizeof(rctx->hash_head) % HPOLYC_BLOCK_SIZE != 0); + BUILD_BUG_ON(HPOLYC_HASH_KEY_SIZE != HPOLYC_BLOCK_SIZE); + BUILD_BUG_ON(sizeof(__le32) + HPOLYC_IV_SIZE != HPOLYC_BLOCK_SIZE); + memcpy(rctx->hash_head.rkey, tctx->poly1305_key, HPOLYC_BLOCK_SIZE); + scatterwalk_map_and_copy(rctx->hash_head.skey, req->src, + req->cryptlen - HPOLYC_BLOCK_SIZE, + HPOLYC_BLOCK_SIZE, 0); + rctx->hash_head.tweak_len = cpu_to_le32(8 * HPOLYC_IV_SIZE); + memcpy(rctx->hash_head.tweak, req->iv, HPOLYC_IV_SIZE); + + sg_init_table(rctx->hash_sg, 2); + sg_set_buf(&rctx->hash_sg[0], &rctx->hash_head, + sizeof(rctx->hash_head)); + sg_chain(rctx->hash_sg, 2, req->src); + + ahash_request_set_tfm(&rctx->u.poly1305_req, tctx->poly1305); + ahash_request_set_crypt(&rctx->u.poly1305_req, rctx->hash_sg, + rctx->rbuf, sizeof(rctx->hash_head) + + req->cryptlen - HPOLYC_BLOCK_SIZE); + ahash_request_set_callback(&rctx->u.poly1305_req, req->base.flags, + hpolyc_hash1_done, req); + return crypto_ahash_digest(&rctx->u.poly1305_req) ?: + hpolyc_xchacha_step(req, req->base.flags); +} + +static int hpolyc_encrypt(struct skcipher_request *req) +{ + return hpolyc_crypt(req, true); +} + +static int hpolyc_decrypt(struct skcipher_request *req) +{ + return hpolyc_crypt(req, false); +} + +static int hpolyc_init_tfm(struct crypto_skcipher *tfm) +{ + struct skcipher_instance *inst = skcipher_alg_instance(tfm); + struct hpolyc_instance_ctx *ictx = skcipher_instance_ctx(inst); + struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + struct crypto_ahash *poly1305; + struct crypto_skcipher *xchacha; + struct crypto_cipher *blkcipher; + int err; + + poly1305 = crypto_spawn_ahash(&ictx->poly1305_spawn); + if (IS_ERR(poly1305)) + return PTR_ERR(poly1305); + + xchacha = crypto_spawn_skcipher(&ictx->xchacha_spawn); + if (IS_ERR(xchacha)) { + err = PTR_ERR(xchacha); + goto err_free_poly1305; + } + + blkcipher = crypto_spawn_cipher(&ictx->blkcipher_spawn); + if (IS_ERR(blkcipher)) { + err = PTR_ERR(blkcipher); + goto err_free_xchacha; + } + + tctx->poly1305 = poly1305; + tctx->xchacha = xchacha; + tctx->blkcipher = blkcipher; + + crypto_skcipher_set_reqsize(tfm, + offsetof(struct hpolyc_request_ctx, u) + + max(FIELD_SIZEOF(struct hpolyc_request_ctx, + u.poly1305_req) + + crypto_ahash_reqsize(poly1305), + FIELD_SIZEOF(struct hpolyc_request_ctx, + u.xchacha_req) + + crypto_skcipher_reqsize(xchacha))); + return 0; + +err_free_xchacha: + crypto_free_skcipher(xchacha); +err_free_poly1305: + crypto_free_ahash(poly1305); + return err; +} + +static void hpolyc_exit_tfm(struct crypto_skcipher *tfm) +{ + struct hpolyc_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); + + crypto_free_ahash(tctx->poly1305); + crypto_free_skcipher(tctx->xchacha); + crypto_free_cipher(tctx->blkcipher); +} + +static void hpolyc_free_instance(struct skcipher_instance *inst) +{ + struct hpolyc_instance_ctx *ictx = skcipher_instance_ctx(inst); + + crypto_drop_ahash(&ictx->poly1305_spawn); + crypto_drop_skcipher(&ictx->xchacha_spawn); + crypto_drop_spawn(&ictx->blkcipher_spawn); + kfree(inst); +} + +static int hpolyc_create(struct crypto_template *tmpl, struct rtattr **tb) +{ + struct crypto_attr_type *algt; + u32 mask; + const char *xchacha_name; + const char *blkcipher_name; + struct skcipher_instance *inst; + struct hpolyc_instance_ctx *ictx; + struct crypto_alg *poly1305_alg; + struct hash_alg_common *poly1305; + struct crypto_alg *blkcipher_alg; + struct skcipher_alg *xchacha_alg; + int err; + + algt = crypto_get_attr_type(tb); + if (IS_ERR(algt)) + return PTR_ERR(algt); + + if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask) + return -EINVAL; + + mask = crypto_requires_sync(algt->type, algt->mask); + + xchacha_name = crypto_attr_alg_name(tb[1]); + if (IS_ERR(xchacha_name)) + return PTR_ERR(xchacha_name); + + blkcipher_name = crypto_attr_alg_name(tb[2]); + if (IS_ERR(blkcipher_name)) + return PTR_ERR(blkcipher_name); + + inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL); + if (!inst) + return -ENOMEM; + ictx = skcipher_instance_ctx(inst); + + /* Poly1305 */ + + poly1305_alg = crypto_find_alg("poly1305", &crypto_ahash_type, 0, mask); + if (IS_ERR(poly1305_alg)) { + err = PTR_ERR(poly1305_alg); + goto out_free_inst; + } + poly1305 = __crypto_hash_alg_common(poly1305_alg); + err = crypto_init_ahash_spawn(&ictx->poly1305_spawn, poly1305, + skcipher_crypto_instance(inst)); + if (err) { + crypto_mod_put(poly1305_alg); + goto out_free_inst; + } + err = -EINVAL; + if (poly1305->digestsize != HPOLYC_BLOCK_SIZE) + goto out_drop_poly1305; + + /* XChaCha */ + + err = crypto_grab_skcipher(&ictx->xchacha_spawn, xchacha_name, 0, mask); + if (err) + goto out_drop_poly1305; + xchacha_alg = crypto_spawn_skcipher_alg(&ictx->xchacha_spawn); + err = -EINVAL; + if (xchacha_alg->min_keysize != HPOLYC_STREAM_KEY_SIZE || + xchacha_alg->max_keysize != HPOLYC_STREAM_KEY_SIZE) + goto out_drop_xchacha; + if (xchacha_alg->base.cra_blocksize != 1) + goto out_drop_xchacha; + if (crypto_skcipher_alg_ivsize(xchacha_alg) != XCHACHA_IV_SIZE) + goto out_drop_xchacha; + + /* Block cipher */ + + err = crypto_grab_spawn(&ictx->blkcipher_spawn, blkcipher_name, + CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK); + if (err) + goto out_drop_xchacha; + blkcipher_alg = ictx->blkcipher_spawn.alg; + err = -EINVAL; + if (blkcipher_alg->cra_blocksize != HPOLYC_BLOCK_SIZE) + goto out_drop_blkcipher; + if (blkcipher_alg->cra_cipher.cia_min_keysize > + HPOLYC_BLKCIPHER_KEY_SIZE || + blkcipher_alg->cra_cipher.cia_max_keysize < + HPOLYC_BLKCIPHER_KEY_SIZE) + goto out_drop_blkcipher; + + /* Instance fields */ + + err = -ENAMETOOLONG; + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, + "hpolyc(%s,%s)", + xchacha_alg->base.cra_name, + blkcipher_alg->cra_name) >= CRYPTO_MAX_ALG_NAME) + goto out_drop_blkcipher; + if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, + "hpolyc(%s,%s,%s)", + poly1305_alg->cra_driver_name, + xchacha_alg->base.cra_driver_name, + blkcipher_alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) + goto out_drop_blkcipher; + + inst->alg.base.cra_blocksize = HPOLYC_BLOCK_SIZE; + inst->alg.base.cra_ctxsize = sizeof(struct hpolyc_tfm_ctx); + inst->alg.base.cra_alignmask = xchacha_alg->base.cra_alignmask | + poly1305_alg->cra_alignmask; + /* + * The block cipher is only invoked once per message, so for long + * messages (e.g. sectors for disk encryption) its performance doesn't + * matter nearly as much as that of XChaCha and Poly1305. Thus, weigh + * the block cipher's ->cra_priority less. + */ + inst->alg.base.cra_priority = (2 * xchacha_alg->base.cra_priority + + 2 * poly1305_alg->cra_priority + + blkcipher_alg->cra_priority) / 5; + + inst->alg.setkey = hpolyc_setkey; + inst->alg.encrypt = hpolyc_encrypt; + inst->alg.decrypt = hpolyc_decrypt; + inst->alg.init = hpolyc_init_tfm; + inst->alg.exit = hpolyc_exit_tfm; + inst->alg.min_keysize = HPOLYC_STREAM_KEY_SIZE; + inst->alg.max_keysize = HPOLYC_STREAM_KEY_SIZE; + inst->alg.ivsize = HPOLYC_IV_SIZE; + + inst->free = hpolyc_free_instance; + + err = skcipher_register_instance(tmpl, inst); + if (err) + goto out_drop_blkcipher; + + return 0; + +out_drop_blkcipher: + crypto_drop_spawn(&ictx->blkcipher_spawn); +out_drop_xchacha: + crypto_drop_skcipher(&ictx->xchacha_spawn); +out_drop_poly1305: + crypto_drop_ahash(&ictx->poly1305_spawn); +out_free_inst: + kfree(inst); + return err; +} + +/* hpolyc(xchacha_name, blkcipher_name) */ +static struct crypto_template hpolyc_tmpl = { + .name = "hpolyc", + .create = hpolyc_create, + .module = THIS_MODULE, +}; + +static int hpolyc_module_init(void) +{ + return crypto_register_template(&hpolyc_tmpl); +} + +static void __exit hpolyc_module_exit(void) +{ + crypto_unregister_template(&hpolyc_tmpl); +} + +module_init(hpolyc_module_init); +module_exit(hpolyc_module_exit); + +MODULE_DESCRIPTION("HPolyC length-preserving encryption mode"); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Eric Biggers "); +MODULE_ALIAS_CRYPTO("hpolyc"); diff --git a/crypto/testmgr.c b/crypto/testmgr.c index c06aeb1f01bc..c0511ffd997e 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -3184,6 +3184,18 @@ static const struct alg_test_desc alg_test_descs[] = { .suite = { .hash = __VECS(hmac_sha512_tv_template) } + }, { + .alg = "hpolyc(xchacha12,aes)", + .test = alg_test_skcipher, + .suite = { + .cipher = __VECS(hpolyc_xchacha12_aes_tv_template) + }, + }, { + .alg = "hpolyc(xchacha20,aes)", + .test = alg_test_skcipher, + .suite = { + .cipher = __VECS(hpolyc_xchacha20_aes_tv_template) + }, }, { .alg = "jitterentropy_rng", .fips_allowed = 1, diff --git a/crypto/testmgr.h b/crypto/testmgr.h index ba5c31ada273..27242c74a1fb 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -32568,6 +32568,164 @@ static const struct cipher_testvec xchacha12_tv_template[] = { }, }; +/* + * Some HPolyC-XChaCha20-AES test vectors, taken from the reference code: + * https://github.com/google/hpolyc/blob/master/test_vectors/ours/HPolyC/HPolyC_XChaCha20_32_AES256.json + */ +static const struct cipher_testvec hpolyc_xchacha20_aes_tv_template[] = { + { + .key = "\x86\x1b\xc2\xf4\xa4\x19\xa7\x5f" + "\x86\xe4\xbd\x55\xc0\x36\x66\xae" + "\x1b\x79\x72\x6f\x95\xc5\x85\xb7" + "\xb7\xf6\x5d\xa4\xff\xef\xcd\x2f", + .klen = 32, + .iv = "\x23\x4f\xff\xd4\x5a\xcc\x74\x56" + "\x9c\x01\x08\xb8", + .ptext = "\xb1\x5b\x42\xc7\x95\xfa\x2f\xac" + "\xee\x90\xe0\xa2\x97\x1c\xba\x40", + .ctext = "\xa4\x02\xf0\xd1\x51\x69\x00\x5d" + "\x87\x61\x9b\xa2\x75\x23\x40\x94", + .len = 16, + }, { + .key = "\xce\x94\xdc\xc7\x33\xd6\x43\x99" + "\x03\x51\x3f\x6f\xee\x8e\xea\x83" + "\x1c\x99\x1a\x31\x88\xf9\x28\x81" + "\x10\xd9\x68\x8c\xfd\x36\x3f\x81", + .klen = 32, + .iv = "\xbb\x17\x6f\x18\xbc\x07\xb1\xbc" + "\x21\x16\xdf\x8e", + .ptext = "\x0a\xcc\x14\x3b\x1f\x4e\x69\x88" + "\xe7\xe5\x69\xbb\x0d\xa5\xd6\x28" + "\xfb\x14\xe1\xec\xa9\x4c\x1c\x0e" + "\xe6\x0e\xce\xa4\x0b\xcc\x12", + .ctext = "\xed\xfa\x38\x58\x8a\x9b\xd5\xb0" + "\xda\xd5\xe7\x10\xe0\xd5\xbb\x1f" + "\xe2\xd7\xe7\x61\x71\x2e\x58\xc7" + "\xd9\x2d\x49\xbc\x7b\xa3\x7e", + .len = 31, + }, { + .key = "\x59\x62\xdc\xdc\xd3\xb5\x6b\x49" + "\xc6\xc2\xc4\xdf\xbc\x23\x66\x7a" + "\x93\x9a\x11\xb6\x59\xd6\x60\x01" + "\x17\x18\x76\xe2\x60\x1c\x28\xad", + .klen = 32, + .iv = "\xac\x80\x02\x91\xfa\xd5\x31\xfe" + "\xfa\xff\xec\x00", + .ptext = "\xd5\x6d\x14\x2e\x21\xb4\x45\x69" + "\xf4\x48\x0d\x27\x09\x69\xba\xa0" + "\xe6\x2a\xd4\x23\xdb\xf4\xc3\xc6" + "\x1c\xab\x74\x74\x15\x7b\x95\x0c" + "\x13\x8b\x39\x74\x23\x12\x9e\xb2" + "\x2a\x6a\xf6\x82\x75\xcc\x97\x3c" + "\x74\xc7\x06\x90\x78\xca\x78\x7a" + "\x6b\x5b\xda\xf7\xec\x07\x13\xc6" + "\xd5\x51\xa2\x23\x20\x3d\xb8\x49" + "\x40\x12\x99\x88\x8e\x60\x0e\x0c" + "\x90\x51\x49\x5e\x52\x94\xbf\x47" + "\x86\xbe\xc8\x8e\x04\xc4\xd2\x8f" + "\x17\x9f\xdb\xc2\xf3\x8d\xbb\x36" + "\xe8\x97\x0c\xe8\x83\xf6\xa4\xd4" + "\x23\xdb\x5e\x64\xe8\x17\x80\xf5" + "\xe0\x4f\x33\xdf\xc5\xac\x79\x44", + .ctext = "\xde\x99\xb2\xb4\x53\xf7\xf4\xd6" + "\xdd\x2e\x42\x02\xd5\x05\x4d\x20" + "\xb1\xef\xa1\x6e\x9d\xa3\x58\x7c" + "\x25\xfa\xd5\x5e\x79\xb4\xd6\xd1" + "\x84\xad\x74\xa1\x27\x72\xc7\x37" + "\x4e\x0e\x1e\x94\xa0\x87\x2f\xfa" + "\xa5\xbf\xe2\xbd\x21\xd1\xe9\x16" + "\xc9\x19\xcf\xfa\x84\x0a\x19\x66" + "\x33\xf9\xbf\xff\xab\x6b\x87\xd2" + "\x92\x69\xc3\xeb\x54\xbc\x1b\xd9" + "\x58\x12\x17\xd4\x90\xa2\xc6\xe1" + "\xbe\x15\x8b\x9d\x06\xde\x80\x76" + "\x69\x03\xc7\x87\xff\x28\x03\x3a" + "\xbe\x11\x3a\xd3\x26\x27\x9d\x91" + "\x4a\x3f\x99\x10\x10\x51\xd3\x63" + "\x5e\x13\x41\xd2\x82\x16\xbc\xb7", + .len = 128, + }, +}; + +/* + * Some HPolyC-XChaCha12-AES test vectors, taken from the reference code: + * https://github.com/google/hpolyc/blob/master/test_vectors/ours/HPolyC/HPolyC_XChaCha12_32_AES256.json + */ +static const struct cipher_testvec hpolyc_xchacha12_aes_tv_template[] = { + { + .key = "\x86\x1b\xc2\xf4\xa4\x19\xa7\x5f" + "\x86\xe4\xbd\x55\xc0\x36\x66\xae" + "\x1b\x79\x72\x6f\x95\xc5\x85\xb7" + "\xb7\xf6\x5d\xa4\xff\xef\xcd\x2f", + .klen = 32, + .iv = "\x23\x4f\xff\xd4\x5a\xcc\x74\x56" + "\x9c\x01\x08\xb8", + .ptext = "\xb1\x5b\x42\xc7\x95\xfa\x2f\xac" + "\xee\x90\xe0\xa2\x97\x1c\xba\x40", + .ctext = "\x9d\xe4\x4b\xa8\x34\x89\x93\x19" + "\x7c\x89\x11\x0e\x50\x80\xa4\x8b", + .len = 16, + }, { + .key = "\xce\x94\xdc\xc7\x33\xd6\x43\x99" + "\x03\x51\x3f\x6f\xee\x8e\xea\x83" + "\x1c\x99\x1a\x31\x88\xf9\x28\x81" + "\x10\xd9\x68\x8c\xfd\x36\x3f\x81", + .klen = 32, + .iv = "\xbb\x17\x6f\x18\xbc\x07\xb1\xbc" + "\x21\x16\xdf\x8e", + .ptext = "\xf4\x4c\x81\xb5\x26\xf4\x59\x5f" + "\x5f\x8f\xa7\xc9\xa4\x3f\xf0\x5d" + "\x00\xd7\x58\xe4\x5a\xb8\xc3\xf5" + "\xe1\xf5\x7d\xff\xca\x8a\x00", + .ctext = "\xed\xfa\x38\x58\x8a\x9b\xd5\xb0" + "\xda\xd5\xe7\x10\xe0\xd5\xbb\x1f" + "\xe2\xd7\xe7\x61\x71\x2e\x58\xc7" + "\xd9\x2d\x49\xbc\x7b\xa3\x7e", + .len = 31, + }, { + .key = "\x59\x62\xdc\xdc\xd3\xb5\x6b\x49" + "\xc6\xc2\xc4\xdf\xbc\x23\x66\x7a" + "\x93\x9a\x11\xb6\x59\xd6\x60\x01" + "\x17\x18\x76\xe2\x60\x1c\x28\xad", + .klen = 32, + .iv = "\xac\x80\x02\x91\xfa\xd5\x31\xfe" + "\xfa\xff\xec\x00", + .ptext = "\xd5\x6d\x14\x2e\x21\xb4\x45\x69" + "\xf4\x48\x0d\x27\x09\x69\xba\xa0" + "\xe6\x2a\xd4\x23\xdb\xf4\xc3\xc6" + "\x1c\xab\x74\x74\x15\x7b\x95\x0c" + "\x13\x8b\x39\x74\x23\x12\x9e\xb2" + "\x2a\x6a\xf6\x82\x75\xcc\x97\x3c" + "\x74\xc7\x06\x90\x78\xca\x78\x7a" + "\x6b\x5b\xda\xf7\xec\x07\x13\xc6" + "\xd5\x51\xa2\x23\x20\x3d\xb8\x49" + "\x40\x12\x99\x88\x8e\x60\x0e\x0c" + "\x90\x51\x49\x5e\x52\x94\xbf\x47" + "\x86\xbe\xc8\x8e\x04\xc4\xd2\x8f" + "\x17\x9f\xdb\xc2\xf3\x8d\xbb\x36" + "\xe8\x97\x0c\xe8\x83\xf6\xa4\xd4" + "\x23\xdb\x5e\x64\xe8\x17\x80\xf5" + "\xe0\x4f\x33\xdf\xc5\xac\x79\x44", + .ctext = "\x04\x52\x8e\x33\xb7\xa5\x37\xd7" + "\x3a\x4d\x98\x3c\x25\x87\x8b\x7d" + "\xa8\x10\xfe\x20\x4e\xdc\x19\xf3" + "\x34\x01\xa6\x3d\xb7\xf3\x14\x4d" + "\x10\xcb\xae\x4b\xe6\x5f\xa9\x50" + "\xee\xe4\x41\x0c\xae\xa5\x51\x66" + "\x28\xa5\x16\xed\x55\xa3\x8a\x86" + "\x15\x33\x92\x98\xa3\x73\x70\x9d" + "\xeb\x11\xb6\xf4\xdd\x53\xa9\xa3" + "\x5e\x5f\x70\x0e\x50\x1b\xca\x34" + "\x5c\xa6\xd9\x05\x47\x0b\x6d\x74" + "\x64\xa1\x83\x59\x73\xfa\x83\x1c" + "\x35\x79\xa9\x9d\x2c\xaf\x54\x19" + "\x03\xff\x66\xfb\xb5\x55\xe4\x2b" + "\x7d\x93\x0e\x85\x62\x21\x20\xc0" + "\xb9\x7c\xa9\xd2\xd7\x5c\x50\x9a", + .len = 128, + }, +}; + /* * CTS (Cipher Text Stealing) mode tests */