From patchwork Sat Sep 8 11:42:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10593025 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41F0C109C for ; Sat, 8 Sep 2018 11:42:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C6262A823 for ; Sat, 8 Sep 2018 11:42:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 17CC32A934; Sat, 8 Sep 2018 11:42:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D519D2A823 for ; Sat, 8 Sep 2018 11:42:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=Xr/5Irblb+loLG48QfD10nEwf14kWbomSakIkPSzCg4=; b=MyP lJazJFOVGxehx9QfnaphPfPXU0rfuBfEKNuwiTuzq9oOctoAIjj0GKhneCEAYmPIy0TqQRtMHQJXA nV2ZqdZ89X/Qk+JKckCjoW8l9zdbVHDTp8MTdOmrrWygUdyHvKEze574bqqteu6NAU6pkQQ28sJPG 0n9tLUmCEAEiq8Gc8YvkUpxJA0eqBD1puk4Od828GCGa/YvD0WbzQ2TuIcz3KwTIzxYc+7xDwekGD Fs6HT4xvId1LzOlGURrLnLcEkzabwcw7fTWO2g84afp7M0orQPs9ZqMV9YosCNKLVXF6wBPYgpWE9 KwdYjv1Xa9g3jCcgBUed3A5XlFOtBnw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fybdJ-0002FT-Ol; Sat, 08 Sep 2018 11:42:33 +0000 Received: from mail-ed1-x542.google.com ([2a00:1450:4864:20::542]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fybdG-0002EL-6F for linux-arm-kernel@lists.infradead.org; Sat, 08 Sep 2018 11:42:32 +0000 Received: by mail-ed1-x542.google.com with SMTP id h33-v6so13266583edb.5 for ; Sat, 08 Sep 2018 04:42:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=FJtXsV2c9m4rmePfJabMd+5Clfy1zfRxU0QbsF6/9xw=; b=hkQoHfC4DTlmfaPtoN8K0I89ucOFM2GWwfwgUHRIafGKsmQ37DnlXUTAXzFAYPNpP2 MYDd65lcKbt/mXiD+4dDnqAJCpAeb14/DKF2NKoFV/9Dq6Nkn8W42rTJniTNV4jiwkNQ J/H/dIxPUQyJDa1Fq1fA0oGFb4WtYAja8n7o8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=FJtXsV2c9m4rmePfJabMd+5Clfy1zfRxU0QbsF6/9xw=; b=FkCa/k5Xm7oWox8nSCGIfcoHvwSs8+A5R6oVCTAB+pcctNTDy1JYv0Nzvz0efRgTRU KQu3jbGk+y7JGQh9RoBzTKXANEk5ogaLn58DIfYeoXxMoewOdpZqE1scEKUt59PBqc6u nBPgUwNmsRJc0dXvRPgCApfxKKUqanbC6oBMtuex5iJ0MOTgm16Ena8b/FyAgrIdpdKa YfA5Ro/IQB0nDGRoZZMCN+gtvHxL4Il86YN1HOg+zYp3LSN9a6wpRs7g7IXkcu0mxoMS Zq6Xc6co6zEq8Wn6ITKOYwQhoha9MCKKKlhGG084YDqsGB2ZP+02nJ4KLSRC3Wc1hpt9 dRzw== X-Gm-Message-State: APzg51AuaB9k3dtU6bAwmt5o+GptZVj9LBRcD54aQohiDP14Np9KVMPY +irEFAYuMhVIP9JnyKxq/KLNhA== X-Google-Smtp-Source: ANB0VdaVbecT/krad7igvgONbGi+HOrc+N84/4PCenrYqV2zBQ+6nD1yIzHmaBpN3sQpW3ki2PAmwQ== X-Received: by 2002:a50:a6c7:: with SMTP id f7-v6mr13235753edc.225.1536406936851; Sat, 08 Sep 2018 04:42:16 -0700 (PDT) Received: from rev02.home ([2a02:a212:9283:9800:24b9:e2d6:9acc:50dd]) by smtp.gmail.com with ESMTPSA id i3-v6sm6541413eda.84.2018.09.08.04.42.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Sep 2018 04:42:16 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC/RFT PATCH] crypto: arm64/aes-ce - add support for CTS-CBC mode Date: Sat, 8 Sep 2018 13:42:13 +0200 Message-Id: <20180908114213.9839-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180908_044230_245261_06A0C170 X-CRM114-Status: GOOD ( 15.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , tytso@mit.edu, herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, ebiggers@google.com MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the core algorithm. On Cortex-A53, this results in a ~50% speedup for smaller block sizes. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-glue.c | 142 ++++++++++++++++++++ arch/arm64/crypto/aes-modes.S | 73 ++++++++++ 2 files changed, 215 insertions(+) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index adcb83eb683c..0860feedbafe 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -31,6 +32,8 @@ #define aes_ecb_decrypt ce_aes_ecb_decrypt #define aes_cbc_encrypt ce_aes_cbc_encrypt #define aes_cbc_decrypt ce_aes_cbc_decrypt +#define aes_cbc_cts_encrypt ce_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt ce_aes_cbc_cts_decrypt #define aes_ctr_encrypt ce_aes_ctr_encrypt #define aes_xts_encrypt ce_aes_xts_encrypt #define aes_xts_decrypt ce_aes_xts_decrypt @@ -45,6 +48,8 @@ MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions"); #define aes_ecb_decrypt neon_aes_ecb_decrypt #define aes_cbc_encrypt neon_aes_cbc_encrypt #define aes_cbc_decrypt neon_aes_cbc_decrypt +#define aes_cbc_cts_encrypt neon_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt neon_aes_cbc_cts_decrypt #define aes_ctr_encrypt neon_aes_ctr_encrypt #define aes_xts_encrypt neon_aes_xts_encrypt #define aes_xts_decrypt neon_aes_xts_decrypt @@ -73,6 +78,11 @@ asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, int blocks, u8 iv[]); +asmlinkage void aes_cbc_cts_encrypt(u8 out[], u8 const in[], u8 const rk[], + int rounds, int bytes, u8 iv[]); +asmlinkage void aes_cbc_cts_decrypt(u8 out[], u8 const in[], u8 const rk[], + int rounds, int bytes, u8 iv[]); + asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, int blocks, u8 ctr[]); @@ -209,6 +219,120 @@ static int cbc_decrypt(struct skcipher_request *req) return err; } +static int cts_cbc_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct skcipher_request subreq = *req; + struct scatterlist sg_src[2], sg_dst[2]; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + unsigned int blocks; + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + skcipher_request_set_crypt(&subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + err = skcipher_walk_virt(&walk, &subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8 *)ctx->key_enc, rounds, blocks, + walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + src = scatterwalk_ffwd(sg_src, req->src, subreq.cryptlen); + dst = scatterwalk_ffwd(sg_dst, req->dst, subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8 *)ctx->key_enc, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + +static int cts_cbc_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct skcipher_request subreq = *req; + struct scatterlist sg_src[2], sg_dst[2]; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + unsigned int blocks; + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + skcipher_request_set_crypt(&subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + err = skcipher_walk_virt(&walk, &subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8 *)ctx->key_dec, rounds, blocks, + walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + src = scatterwalk_ffwd(sg_src, req->src, subreq.cryptlen); + dst = scatterwalk_ffwd(sg_dst, req->dst, subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8 *)ctx->key_dec, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -334,6 +458,24 @@ static struct skcipher_alg aes_algs[] = { { .setkey = skcipher_aes_setkey, .encrypt = cbc_encrypt, .decrypt = cbc_decrypt, +}, { + .base = { + .cra_name = "__cts(cbc(aes))", + .cra_driver_name = "__cts-cbc-aes-" MODE, + .cra_priority = PRIO, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = 1, + .cra_ctxsize = sizeof(struct crypto_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .walksize = 2 * AES_BLOCK_SIZE, + .setkey = skcipher_aes_setkey, + .encrypt = cts_cbc_encrypt, + .decrypt = cts_cbc_decrypt, }, { .base = { .cra_name = "__ctr(aes)", diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 483a7130cf0e..61bab20de8da 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -205,6 +205,79 @@ AES_ENTRY(aes_cbc_decrypt) ret AES_ENDPROC(aes_cbc_decrypt) + /* + * aes_cbc_cts_encrypt(u8 out[], u8 const in[], u8 const rk[], + * int rounds, int bytes, u8 iv[]) + * aes_cbc_cts_decrypt(u8 out[], u8 const in[], u8 const rk[], + * int rounds, int bytes, u8 iv[]) + */ + +AES_ENTRY(aes_cbc_cts_encrypt) + adr x8, .Lcts_permute_table + 48 + sub x9, x8, x4 + sub x4, x4, #16 + sub x8, x8, #48 + add x8, x8, x4 + ld1 {v6.16b}, [x9] + ld1 {v7.16b}, [x8] + + ld1 {v4.16b}, [x5] /* get iv */ + enc_prepare w3, x2, x6 + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + eor v0.16b, v0.16b, v4.16b /* xor with iv */ + tbl v1.16b, {v1.16b}, v6.16b + encrypt_block v0, w3, x2, x6, w7 + + eor v1.16b, v1.16b, v0.16b + tbl v0.16b, {v0.16b}, v7.16b + encrypt_block v1, w3, x2, x6, w7 + + add x4, x0, x4 + st1 {v0.16b}, [x4] /* overlapping stores */ + st1 {v1.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_encrypt) + +AES_ENTRY(aes_cbc_cts_decrypt) + adr x8, .Lcts_permute_table + 48 + sub x9, x8, x4 + sub x4, x4, #16 + sub x8, x8, #48 + add x8, x8, x4 + ld1 {v6.16b}, [x9] + ld1 {v7.16b}, [x8] + + ld1 {v4.16b}, [x5] /* get iv */ + dec_prepare w3, x2, x6 + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + tbl v2.16b, {v1.16b}, v6.16b + decrypt_block v0, w3, x2, x6, w7 + eor v2.16b, v2.16b, v0.16b + + tbx v0.16b, {v1.16b}, v6.16b + tbl v2.16b, {v2.16b}, v7.16b + decrypt_block v0, w3, x2, x6, w7 + eor v0.16b, v0.16b, v4.16b /* xor with iv */ + + add x4, x0, x4 + st1 {v2.16b}, [x4] /* overlapping stores */ + st1 {v0.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_decrypt) + +.Lcts_permute_table: + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7 + .byte 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,