From patchwork Mon Sep 10 14:41:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10594283 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2192414BD for ; Mon, 10 Sep 2018 14:44:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 116A628C86 for ; Mon, 10 Sep 2018 14:44:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 051B728E53; Mon, 10 Sep 2018 14:44:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 764F328C86 for ; Mon, 10 Sep 2018 14:44:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=j6/SwSDvx+5zrlulbyqEnl8zpiKSWe4M/Dy6RHIyYDk=; b=ad41ound0YJe4SI84nJFszdyB4 iViQ6MDfFKNeLQozivaQ4CthobQGHysPHjglFmb4cZuIdua5zTf2EsKtfqeTGLhRbXp9xQmpxrZ4S eGPHAo1sCFmH9c2c6aG3KKuq0FUX/r6TrpWzGraola+cLRU9dN1oIST4AsB+ilrPPCl1Wmpdcxc4G SatWVxnadYdRLPqSD+XBlsQ010phMHGN5EkxqCU+UAmWhENK62eAaWowPk3g9321kHMaoyM8zzAkU C52X/KyxUjkFbCBR1RXQurogeNEIwKvhF4qqZOIDyzYUnUVDeUG0DIJpRgUpsLKk6Ki0u/GW7ldfs rAo3ieDw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPu-0003qX-Kz; Mon, 10 Sep 2018 14:43:54 +0000 Received: from mail-ed1-x541.google.com ([2a00:1450:4864:20::541]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPp-0003mv-Fs for linux-arm-kernel@lists.infradead.org; Mon, 10 Sep 2018 14:43:51 +0000 Received: by mail-ed1-x541.google.com with SMTP id l5so16712523edw.9 for ; Mon, 10 Sep 2018 07:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DxzK5rTVQtRtOt64mIU0TR0vJxnD1pnarcwPWibBfSM=; b=VhFaOUa6vuFA+XfHPuBw+IbWMewIlyezYp8h8Z5LfU56yqQ+IZEwoHhOl1ySQGZRJe 3ZQsvAFbW/+rakGtmABz2JPaSO0yhlU2mHya8NoD0LvgDQiSjDm17qBKmvCq8iv8RsZ2 hxjt9/UhFaphdEfq6upzkiSFrF8reD3agTfAM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DxzK5rTVQtRtOt64mIU0TR0vJxnD1pnarcwPWibBfSM=; b=FQRzTIiMBt61/DxtFwV9qqt9TLkcxI4yNjhZN1kGg0DyDsLLbjsDDwVTDX1CD9Tac/ blJNe/7X5xGbrn1Ib4DiC3d0Ug4Lk/hBccGz+uGMiue3UxzMp1tLl4zFwchKfJ4SmaeQ ZT/2Ld8iMzWcbJRZ4Ypo3M88I8WMDQ1R4xMTuThwoQyXwrVTQ9UpS+uBju9w3LRkssn6 noef9o7Fuq4H9iA5WPbNgW2czL8dxD33yHnMZ+sqPGUtm23ZqhIVr89bMlxGR4W+rIMH yoxk4iB5Iy/PuKu5wK8nvbu7Z7u2uFFmaW84p0Xzaa3GtEobIVdvxAQIr2S+IyaoiE81 ZpwA== X-Gm-Message-State: APzg51Ay5dE1R9siQxMKhWX2tz8MRjm3mxzTju37LgcqV/WBVZXbYfu/ IWcYtsjHcySr0AbnoQBMYD273A== X-Google-Smtp-Source: ANB0VdadSOEm97IfDxbwS5wcg80j7GiuQmt25X8ZBvRlyUIawHDUiROXxNr4hziRyyPrJcVJPWB+cg== X-Received: by 2002:a50:a93c:: with SMTP id l57-v6mr23265868edc.229.1536590615465; Mon, 10 Sep 2018 07:43:35 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:34 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH 1/4] crypto: arm64/aes-blk - remove pointless (u8 *) casts Date: Mon, 10 Sep 2018 16:41:12 +0200 Message-Id: <20180910144115.25727-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180910_074349_555447_528421E8 X-CRM114-Status: GOOD ( 12.63 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Theodore Ts'o , herbert@gondor.apana.org.au, Steve Capper , Eric Biggers , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8* casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 47 ++++++++++---------- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index adcb83eb683c..1c6934544c1f 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -63,24 +63,24 @@ MODULE_AUTHOR("Ard Biesheuvel "); MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ -asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks); -asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks); -asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); -asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); -asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 ctr[]); -asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], - int rounds, int blocks, u8 const rk2[], u8 iv[], +asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u32 const rk1[], + int rounds, int blocks, u32 const rk2[], u8 iv[], int first); -asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u8 const rk1[], - int rounds, int blocks, u8 const rk2[], u8 iv[], +asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u32 const rk1[], + int rounds, int blocks, u32 const rk2[], u8 iv[], int first); asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds, @@ -142,7 +142,7 @@ static int ecb_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks); + ctx->key_enc, rounds, blocks); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -162,7 +162,7 @@ static int ecb_decrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks); + ctx->key_dec, rounds, blocks); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -182,7 +182,7 @@ static int cbc_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -202,7 +202,7 @@ static int cbc_decrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + ctx->key_dec, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -222,7 +222,7 @@ static int ctr_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -238,7 +238,7 @@ static int ctr_encrypt(struct skcipher_request *req) blocks = -1; kernel_neon_begin(); - aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, + aes_ctr_encrypt(tail, NULL, ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); @@ -272,8 +272,8 @@ static int xts_encrypt(struct skcipher_request *req) for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key1.key_enc, rounds, blocks, - (u8 *)ctx->key2.key_enc, walk.iv, first); + ctx->key1.key_enc, rounds, blocks, + ctx->key2.key_enc, walk.iv, first); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -294,8 +294,8 @@ static int xts_decrypt(struct skcipher_request *req) for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key1.key_dec, rounds, blocks, - (u8 *)ctx->key2.key_enc, walk.iv, first); + ctx->key1.key_dec, rounds, blocks, + ctx->key2.key_enc, walk.iv, first); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -412,7 +412,6 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, { struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm); be128 *consts = (be128 *)ctx->consts; - u8 *rk = (u8 *)ctx->key.key_enc; int rounds = 6 + key_len / 4; int err; @@ -422,7 +421,8 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, ctx->key.key_enc, + rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -441,7 +441,6 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, }; struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm); - u8 *rk = (u8 *)ctx->key.key_enc; int rounds = 6 + key_len / 4; u8 key[AES_BLOCK_SIZE]; int err; @@ -451,8 +450,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); + aes_ecb_encrypt(key, ks[0], ctx->key.key_enc, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], ctx->key.key_enc, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); From patchwork Mon Sep 10 14:41:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10594287 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAC85109C for ; Mon, 10 Sep 2018 14:45:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA50128C86 for ; Mon, 10 Sep 2018 14:45:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BDEF228E53; Mon, 10 Sep 2018 14:45:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E083E28C86 for ; Mon, 10 Sep 2018 14:45:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=zr43ta9o2Q2yLRDS9QWQTiyYFRWsb7Ju+1LtOzl7BVY=; b=jjFPdzDRCVdNnaH35NNE+2APtk i+hQqZSf7f7YmIGOG5axXB3BBPJ5YH9kzaQbVEXFJF3Njt4wguEIuNl5bop7X6SNcqAx+zu7bIE/i kSO1foUds92ySFlrf8rwZs081ncqri8n1f2U3F907MlRyWUFpAJg1FpFLCUP7G/9vtgWyao32w+Sx ykAYr3I5o2ArlhfO1KbwXWyznT4Q3uLrXc1fm3abicULbScs+lpWwb+hJgAOal5W9b4KTAFJ49NYl AgLMBgCpR9lVIYYD8MV56vfZ4zdGVkW1d5BERYYa2bgNjk3rnay8qSPj4VP7J0MaigBtjZLqiTKoE qKyvsDsg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNRP-0005IK-WA; Mon, 10 Sep 2018 14:45:28 +0000 Received: from mail-ed1-x544.google.com ([2a00:1450:4864:20::544]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPq-0003mw-2J for linux-arm-kernel@lists.infradead.org; Mon, 10 Sep 2018 14:43:54 +0000 Received: by mail-ed1-x544.google.com with SMTP id d8-v6so16746450edv.0 for ; Mon, 10 Sep 2018 07:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=J3nTDlngnQajkhPfrknhmz15LcdNryx7N6oB4x8rhF8=; b=EZzcachbkmLjHiDx24FWbqlumeUalxsgQ34TgYIeh5kf3ojs9qfoLhUV2e9mS8nVYw cj4gJIdwstBHN8SOV8cCwWeSHJ1nd/CIz76uOHQyVA6gP6Gc2RgeZ876JSljyVzKrVJ9 PbrGBTTnnxD28umHjw+0vzMQ1FkpgwcR92/zE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=J3nTDlngnQajkhPfrknhmz15LcdNryx7N6oB4x8rhF8=; b=f+6KMXvCl8ZsMY4tQEtv+Xs/WGq/o4fUIpYILV+SAEkoMit+6KLlmSfoP3wusYFyU9 IUPg8oATGIVDUUiB85o4gS5n9Uf5ub2g655YiYuK+6WYxEErg/zFVvUAu/Z7J6Ex0lpS bIyeQu8aHWtU8cT9SQm3tQcYBX28aYoiHLn14bKv6wSMBEB7dZRyVRoaHF3U9JFS4B6E 8aUX/+Rl3WM9XTp0VR1k3Ct9quZqXH3QGlgWaVQEAMLEAp+IrG2bbUGeZL8tjvFDi26E Avduu91ir/gCUs3m/pPk0LVkTJ44ZOXPjLBKwpRmMVEod34nQIYNvs0XPzPPnEfK716u Hj0g== X-Gm-Message-State: APzg51A1uO5DLNezEsoRMZ0Mkw7nloL5ixXuRKLek3DudJMLdIKexeKu q9OVLrCmi3lrrZxcoJ7f3CrWkA== X-Google-Smtp-Source: ANB0Vda8HngRPMBrV0okjCIbcuJQn0z5HwQ2qNeS3la9jfT0WmwzwN46i9onA75rW2RYD24QRGk5uQ== X-Received: by 2002:a50:9732:: with SMTP id c47-v6mr23025034edb.89.1536590616709; Mon, 10 Sep 2018 07:43:36 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:35 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH 2/4] crypto: arm64/aes-blk - revert NEON yield for skciphers Date: Mon, 10 Sep 2018 16:41:13 +0200 Message-Id: <20180910144115.25727-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180910_074350_135911_6570A363 X-CRM114-Status: GOOD ( 12.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Theodore Ts'o , herbert@gondor.apana.org.au, Steve Capper , Eric Biggers , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP The reasoning of commit f10dc56c64bb ("crypto: arm64 - revert NEON yield for fast AEAD implementations") applies equally to skciphers: the walk API already guarantees that the input size of each call into the NEON code is bounded to the size of a page, and so there is no need for an additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 281 ++++++++------------ 1 file changed, 108 insertions(+), 173 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 496c243de4ac..35632d11200f 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,71 +31,57 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - frame_push 5 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -.Lecbencrestart: - enc_prepare w22, x21, x5 + enc_prepare w3, x2, x5 .LecbencloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x19], #64 - cond_yield_neon .Lecbencrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LecbencloopNx .Lecbenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x20], #16 /* get next pt block */ - encrypt_block v0, w22, x21, x5, w6 - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + ld1 {v0.16b}, [x1], #16 /* get next pt block */ + encrypt_block v0, w3, x2, x5, w6 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lecbencloop .Lecbencout: - frame_pop + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - frame_push 5 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -.Lecbdecrestart: - dec_prepare w22, x21, x5 + dec_prepare w3, x2, x5 .LecbdecloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lecbdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ bl aes_decrypt_block4x - st1 {v0.16b-v3.16b}, [x19], #64 - cond_yield_neon .Lecbdecrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LecbdecloopNx .Lecbdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lecbdecout .Lecbdecloop: - ld1 {v0.16b}, [x20], #16 /* get next ct block */ - decrypt_block v0, w22, x21, x5, w6 - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + ld1 {v0.16b}, [x1], #16 /* get next ct block */ + decrypt_block v0, w3, x2, x5, w6 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lecbdecloop .Lecbdecout: - frame_pop + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_decrypt) @@ -108,100 +94,78 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 - -.Lcbcencrestart: - ld1 {v4.16b}, [x24] /* get iv */ - enc_prepare w22, x21, x6 + ld1 {v4.16b}, [x5] /* get iv */ + enc_prepare w3, x2, x6 .Lcbcencloop4x: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lcbcenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ - encrypt_block v0, w22, x21, x6, w7 + encrypt_block v0, w3, x2, x6, w7 eor v1.16b, v1.16b, v0.16b - encrypt_block v1, w22, x21, x6, w7 + encrypt_block v1, w3, x2, x6, w7 eor v2.16b, v2.16b, v1.16b - encrypt_block v2, w22, x21, x6, w7 + encrypt_block v2, w3, x2, x6, w7 eor v3.16b, v3.16b, v2.16b - encrypt_block v3, w22, x21, x6, w7 - st1 {v0.16b-v3.16b}, [x19], #64 + encrypt_block v3, w3, x2, x6, w7 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v3.16b - st1 {v4.16b}, [x24] /* return iv */ - cond_yield_neon .Lcbcencrestart b .Lcbcencloop4x .Lcbcenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lcbcencout .Lcbcencloop: - ld1 {v0.16b}, [x20], #16 /* get next pt block */ + ld1 {v0.16b}, [x1], #16 /* get next pt block */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ - encrypt_block v4, w22, x21, x6, w7 - st1 {v4.16b}, [x19], #16 - subs w23, w23, #1 + encrypt_block v4, w3, x2, x6, w7 + st1 {v4.16b}, [x0], #16 + subs w4, w4, #1 bne .Lcbcencloop .Lcbcencout: - st1 {v4.16b}, [x24] /* return iv */ - frame_pop + st1 {v4.16b}, [x5] /* return iv */ ret AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 + stp x29, x30, [sp, #-16]! + mov x29, sp -.Lcbcdecrestart: - ld1 {v7.16b}, [x24] /* get iv */ - dec_prepare w22, x21, x6 + ld1 {v7.16b}, [x5] /* get iv */ + dec_prepare w3, x2, x6 .LcbcdecloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lcbcdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b bl aes_decrypt_block4x - sub x20, x20, #16 + sub x1, x1, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b - ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */ + ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 - st1 {v7.16b}, [x24] /* return iv */ - cond_yield_neon .Lcbcdecrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LcbcdecloopNx .Lcbcdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lcbcdecout .Lcbcdecloop: - ld1 {v1.16b}, [x20], #16 /* get next ct block */ + ld1 {v1.16b}, [x1], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ - decrypt_block v0, w22, x21, x6, w7 + decrypt_block v0, w3, x2, x6, w7 eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ mov v7.16b, v1.16b /* ct is next iv */ - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lcbcdecloop .Lcbcdecout: - st1 {v7.16b}, [x24] /* return iv */ - frame_pop + st1 {v7.16b}, [x5] /* return iv */ + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_cbc_decrypt) @@ -212,26 +176,19 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 + stp x29, x30, [sp, #-16]! + mov x29, sp -.Lctrrestart: - enc_prepare w22, x21, x6 - ld1 {v4.16b}, [x24] + enc_prepare w3, x2, x6 + ld1 {v4.16b}, [x5] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 + cmn w6, w4 /* 32 bit overflow? */ + bcs .Lctrloop .LctrloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lctr1x - cmn w6, #4 /* 32 bit overflow? */ - bcs .Lctr1x add w7, w6, #1 mov v0.16b, v4.16b add w8, w6, #2 @@ -245,27 +202,25 @@ AES_ENTRY(aes_ctr_encrypt) rev w9, w9 mov v2.s[3], w8 mov v3.s[3], w9 - ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ + ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b - ld1 {v5.16b}, [x20], #16 /* get 1 input block */ + ld1 {v5.16b}, [x1], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 - cbz w23, .Lctrout - st1 {v4.16b}, [x24] /* return next CTR value */ - cond_yield_neon .Lctrrestart + cbz w4, .Lctrout b .LctrloopNx .Lctr1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lctrout .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w22, x21, x8, w7 + encrypt_block v0, w3, x2, x8, w7 adds x6, x6, #1 /* increment BE ctr */ rev x7, x6 @@ -273,22 +228,22 @@ AES_ENTRY(aes_ctr_encrypt) bcs .Lctrcarry /* overflow? */ .Lctrcarrydone: - subs w23, w23, #1 + subs w4, w4, #1 bmi .Lctrtailblock /* blocks <0 means tail block */ - ld1 {v3.16b}, [x20], #16 + ld1 {v3.16b}, [x1], #16 eor v3.16b, v0.16b, v3.16b - st1 {v3.16b}, [x19], #16 + st1 {v3.16b}, [x0], #16 bne .Lctrloop .Lctrout: - st1 {v4.16b}, [x24] /* return next CTR value */ -.Lctrret: - frame_pop + st1 {v4.16b}, [x5] /* return next CTR value */ + ldp x29, x30, [sp], #16 ret .Lctrtailblock: - st1 {v0.16b}, [x19] - b .Lctrret + st1 {v0.16b}, [x0] + ldp x29, x30, [sp], #16 + ret .Lctrcarry: umov x7, v4.d[0] /* load upper word of ctr */ @@ -321,16 +276,10 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - frame_push 6 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x6 - - ld1 {v4.16b}, [x24] + ld1 {v4.16b}, [x6] cbz w7, .Lxtsencnotfirst enc_prepare w3, x5, x8 @@ -339,17 +288,15 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x b .LxtsencNx -.Lxtsencrestart: - ld1 {v4.16b}, [x24] .Lxtsencnotfirst: - enc_prepare w22, x21, x8 + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lxtsenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -362,43 +309,35 @@ AES_ENTRY(aes_xts_encrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v7.16b - cbz w23, .Lxtsencout - st1 {v4.16b}, [x24] - cond_yield_neon .Lxtsencrestart + cbz w4, .Lxtsencout b .LxtsencloopNx .Lxtsenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lxtsencout .Lxtsencloop: - ld1 {v1.16b}, [x20], #16 + ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w22, x21, x8, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 beq .Lxtsencout next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: - st1 {v4.16b}, [x24] - frame_pop + st1 {v4.16b}, [x6] + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x6 + stp x29, x30, [sp, #-16]! + mov x29, sp - ld1 {v4.16b}, [x24] + ld1 {v4.16b}, [x6] cbz w7, .Lxtsdecnotfirst enc_prepare w3, x5, x8 @@ -407,17 +346,15 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x b .LxtsdecNx -.Lxtsdecrestart: - ld1 {v4.16b}, [x24] .Lxtsdecnotfirst: - dec_prepare w22, x21, x8 + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lxtsdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -430,28 +367,26 @@ AES_ENTRY(aes_xts_decrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v7.16b - cbz w23, .Lxtsdecout - st1 {v4.16b}, [x24] - cond_yield_neon .Lxtsdecrestart + cbz w4, .Lxtsdecout b .LxtsdecloopNx .Lxtsdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lxtsdecout .Lxtsdecloop: - ld1 {v1.16b}, [x20], #16 + ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w22, x21, x8, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 beq .Lxtsdecout next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: - st1 {v4.16b}, [x24] - frame_pop + st1 {v4.16b}, [x6] + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_decrypt) From patchwork Mon Sep 10 14:41:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10594291 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E2E7921 for ; Mon, 10 Sep 2018 14:47:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C35728C80 for ; Mon, 10 Sep 2018 14:47:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5FEB928CC2; Mon, 10 Sep 2018 14:47:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6633328C80 for ; Mon, 10 Sep 2018 14:47:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=mzPMLGX8V/An9bqexzBPh8dcQ34K8jPkewujktSGqKY=; b=UxgkYW1OPPGdbfEN1ntr924mI6 ziGDl4ysdrKNuTtG65lkTVnmVMgozHlh5vSxrfW7GjHnc0TTNXBQBUTscz4EJIj635202nPCosmov MCg/OUyMhfQVMpvDS+J+IiRQRwj2hb0lb+UOlTP02qtZG5GD6sjOMnB51BdQeWudpRgxCUblmD4l/ f3oxC1v1ZrSmakrrkXBZqitTMqJyGSppv+ZM3bdkjnPwNqugpLf7qC4+bdVduzBHJ5y8DZ8ti7VZD ZizxN0LbuM1Apqinggb9fLksBU+aQWtmemmwZfoZjF1NHkjcFWZu8gAaHXRnx1pgQNwMhrZ3Wq6LM hJfQPN5g==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNSz-0006b9-CK; Mon, 10 Sep 2018 14:47:05 +0000 Received: from mail-ed1-x542.google.com ([2a00:1450:4864:20::542]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPr-0003my-D9 for linux-arm-kernel@lists.infradead.org; Mon, 10 Sep 2018 14:43:56 +0000 Received: by mail-ed1-x542.google.com with SMTP id s10-v6so16690908edb.11 for ; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QdQn+Jnqnhc3mKNK3m6fspLGagvSsd7tXLlRb+bffL4=; b=Ji5w2j/xNigRL1UUc/1hzcFLIYjdOAu/rCkuyoKfC/2uzCtgQvcu3o02E9fg6R78oF GCdteO7flD+PpRMA2wEBe5idscz2/fCplJp0ruEOYX7Yh0QcLxoFBQ6BDukTE9G5T8Fo co5GzmQt4Qj+eGW+NgtxmLCmZAWqFwb8JSlEA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QdQn+Jnqnhc3mKNK3m6fspLGagvSsd7tXLlRb+bffL4=; b=BNhlRyOUa8DqHfhUDyXKYgRUXeqoUegWLGsE5DrTuCFJv9XF5VsaHI9TPcjYoPnSZ7 5nE4BxpXAYyG0ndVaRI3ZmMVX2AA8U/4prglIgj7WMh03a3pTd4t5j4jpoYHu9vKaCik uSoPPGRAvfeJdmdy13uXw9G2O7W7iicHBI6nnXmoIMz6xCqYkTXzk0UQCluqB6mcgjjz njueDawuXsde1qgkUFA0e9PC5wWtKnaNSasRoE84YPDbbSpo3xXwpbSZ9vCSJRVvAMdJ TSwVy4Tbz4ucVvPejG/HPgbA6/aKLI4ppJgEZfySobj8LBzHZ6+ga2AmOqAso2dMa0l4 RMpQ== X-Gm-Message-State: APzg51CsnXhFlofCh1pr8K8vaJe42gGlj+NNdrbcVn3Ar1TwtTMaUECG ST4HEQ/qDkILuCN4n/lD59s+Wg== X-Google-Smtp-Source: ANB0VdZUDg5TagsfRd1MMc0pJ7qFP4wYLnSFvKI8QyIUN72E9QqS5UPEnNuXC4D5/2Qr+GkeS+Zztw== X-Received: by 2002:a50:c05a:: with SMTP id u26-v6mr24105733edd.107.1536590618170; Mon, 10 Sep 2018 07:43:38 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:37 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH 3/4] crypto: arm64/aes-blk - add support for CTS-CBC mode Date: Mon, 10 Sep 2018 16:41:14 +0200 Message-Id: <20180910144115.25727-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180910_074351_480954_091F94F6 X-CRM114-Status: GOOD ( 18.28 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Theodore Ts'o , herbert@gondor.apana.org.au, Steve Capper , Eric Biggers , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel --- This patch supersedes '[RFC/RFT PATCH] crypto: arm64/aes-ce - add support for CTS-CBC mode' sent out last Saturday. Changes: - keep subreq and scatterlist in request ctx structure - optimize away second scatterwalk_ffwd() invocation when encrypting in-place - keep permute table in .rodata section - polish asm code (drop literal + offset reference, reorder insns) Raw performance numbers after the patch. arch/arm64/crypto/aes-glue.c | 165 ++++++++++++++++++++ arch/arm64/crypto/aes-modes.S | 79 +++++++++- 2 files changed, 243 insertions(+), 1 deletion(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 1c6934544c1f..26d2b0263ba6 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -31,6 +32,8 @@ #define aes_ecb_decrypt ce_aes_ecb_decrypt #define aes_cbc_encrypt ce_aes_cbc_encrypt #define aes_cbc_decrypt ce_aes_cbc_decrypt +#define aes_cbc_cts_encrypt ce_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt ce_aes_cbc_cts_decrypt #define aes_ctr_encrypt ce_aes_ctr_encrypt #define aes_xts_encrypt ce_aes_xts_encrypt #define aes_xts_decrypt ce_aes_xts_decrypt @@ -45,6 +48,8 @@ MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions"); #define aes_ecb_decrypt neon_aes_ecb_decrypt #define aes_cbc_encrypt neon_aes_cbc_encrypt #define aes_cbc_decrypt neon_aes_cbc_decrypt +#define aes_cbc_cts_encrypt neon_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt neon_aes_cbc_cts_decrypt #define aes_ctr_encrypt neon_aes_ctr_encrypt #define aes_xts_encrypt neon_aes_xts_encrypt #define aes_xts_decrypt neon_aes_xts_decrypt @@ -73,6 +78,11 @@ asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); +asmlinkage void aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[], + int rounds, int bytes, u8 const iv[]); +asmlinkage void aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[], + int rounds, int bytes, u8 const iv[]); + asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 ctr[]); @@ -87,6 +97,12 @@ asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds, int blocks, u8 dg[], int enc_before, int enc_after); +struct cts_cbc_req_ctx { + struct scatterlist sg_src[2]; + struct scatterlist sg_dst[2]; + struct skcipher_request subreq; +}; + struct crypto_aes_xts_ctx { struct crypto_aes_ctx key1; struct crypto_aes_ctx __aligned(8) key2; @@ -209,6 +225,136 @@ static int cbc_decrypt(struct skcipher_request *req) return err; } +static int cts_cbc_init_tfm(struct crypto_skcipher *tfm) +{ + crypto_skcipher_set_reqsize(tfm, sizeof(struct cts_cbc_req_ctx)); + return 0; +} + +static int cts_cbc_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + + skcipher_request_set_tfm(&rctx->subreq, tfm); + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + unsigned int blocks; + + skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + dst = src = scatterwalk_ffwd(rctx->sg_src, req->src, + rctx->subreq.cryptlen); + if (req->dst != req->src) + dst = scatterwalk_ffwd(rctx->sg_dst, req->dst, + rctx->subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&rctx->subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_enc, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + +static int cts_cbc_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + + skcipher_request_set_tfm(&rctx->subreq, tfm); + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + unsigned int blocks; + + skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + dst = src = scatterwalk_ffwd(rctx->sg_src, req->src, + rctx->subreq.cryptlen); + if (req->dst != req->src) + dst = scatterwalk_ffwd(rctx->sg_dst, req->dst, + rctx->subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&rctx->subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_dec, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -334,6 +480,25 @@ static struct skcipher_alg aes_algs[] = { { .setkey = skcipher_aes_setkey, .encrypt = cbc_encrypt, .decrypt = cbc_decrypt, +}, { + .base = { + .cra_name = "__cts(cbc(aes))", + .cra_driver_name = "__cts-cbc-aes-" MODE, + .cra_priority = PRIO, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = 1, + .cra_ctxsize = sizeof(struct crypto_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .walksize = 2 * AES_BLOCK_SIZE, + .setkey = skcipher_aes_setkey, + .encrypt = cts_cbc_encrypt, + .decrypt = cts_cbc_decrypt, + .init = cts_cbc_init_tfm, }, { .base = { .cra_name = "__ctr(aes)", diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 35632d11200f..82931fba53d2 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -170,6 +170,84 @@ AES_ENTRY(aes_cbc_decrypt) AES_ENDPROC(aes_cbc_decrypt) + /* + * aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[], + * int rounds, int bytes, u8 const iv[]) + * aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[], + * int rounds, int bytes, u8 const iv[]) + */ + +AES_ENTRY(aes_cbc_cts_encrypt) + adr_l x8, .Lcts_permute_table + sub x4, x4, #16 + add x9, x8, #32 + add x8, x8, x4 + sub x9, x9, x4 + ld1 {v3.16b}, [x8] + ld1 {v4.16b}, [x9] + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + ld1 {v5.16b}, [x5] /* get iv */ + enc_prepare w3, x2, x6 + + eor v0.16b, v0.16b, v5.16b /* xor with iv */ + tbl v1.16b, {v1.16b}, v4.16b + encrypt_block v0, w3, x2, x6, w7 + + eor v1.16b, v1.16b, v0.16b + tbl v0.16b, {v0.16b}, v3.16b + encrypt_block v1, w3, x2, x6, w7 + + add x4, x0, x4 + st1 {v0.16b}, [x4] /* overlapping stores */ + st1 {v1.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_encrypt) + +AES_ENTRY(aes_cbc_cts_decrypt) + adr_l x8, .Lcts_permute_table + sub x4, x4, #16 + add x9, x8, #32 + add x8, x8, x4 + sub x9, x9, x4 + ld1 {v3.16b}, [x8] + ld1 {v4.16b}, [x9] + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + ld1 {v5.16b}, [x5] /* get iv */ + dec_prepare w3, x2, x6 + + tbl v2.16b, {v1.16b}, v4.16b + decrypt_block v0, w3, x2, x6, w7 + eor v2.16b, v2.16b, v0.16b + + tbx v0.16b, {v1.16b}, v4.16b + tbl v2.16b, {v2.16b}, v3.16b + decrypt_block v0, w3, x2, x6, w7 + eor v0.16b, v0.16b, v5.16b /* xor with iv */ + + add x4, x0, x4 + st1 {v2.16b}, [x4] /* overlapping stores */ + st1 {v0.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_decrypt) + + .section ".rodata", "a" + .align 6 +.Lcts_permute_table: + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7 + .byte 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .previous + + /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, * int blocks, u8 ctr[]) @@ -253,7 +331,6 @@ AES_ENTRY(aes_ctr_encrypt) ins v4.d[0], x7 b .Lctrcarrydone AES_ENDPROC(aes_ctr_encrypt) - .ltorg /* From patchwork Mon Sep 10 14:41:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10594289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03E70109C for ; Mon, 10 Sep 2018 14:46:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E745528C80 for ; Mon, 10 Sep 2018 14:46:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8F4E28CC2; Mon, 10 Sep 2018 14:46:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B3B1328C80 for ; Mon, 10 Sep 2018 14:46:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=z0g5W2ORiRDWwPb1+tM26DjU28Xeom6aDAORv84Q9q0=; b=H0lX/dxv6wFM329qOPvxqq90V5 f6Ou62MC/bKiLtbZhaIZP4pWTdD4HwwNfErsceFd9aQDvaS6nTWmlW23NHsIUUcxzSXQChkKly+Lj zMBXCGleSdDE5hNDcCqu+m1fEY8LoZK4KVxoVJdYRy0ZRx4Ylpmjevx7etc0PKrrspMWEI3Y5r90H fHTnGJGnvN5CF0/KRgaF0fyx8kioaUJyPr9+lRYJPcLb0mfb+n/5MGsB28VoujdiaeoubomkFd9/S FQotwtgOev36uzFid+iWFJem0vZ9xYb26+VhOexdOSav4dlylaa3X/SyHgAIRyqmK22UNsrVdTpcW zj18YiYg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNS7-0006Gp-M9; Mon, 10 Sep 2018 14:46:11 +0000 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPr-0003n3-MJ for linux-arm-kernel@lists.infradead.org; Mon, 10 Sep 2018 14:43:54 +0000 Received: by mail-ed1-x52b.google.com with SMTP id p52-v6so16681237eda.12 for ; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=MphNqBrB+3ihpLXpKNXKwodVTnQo3XyzqKSpovUW/gVC71mf7roYPOJLuM+0RFXhZG 0EObiDWWWgIcN3QhuW45aCbelYHT5VZlGJTOoet2VbUiaSniOer57HuYlje7iAZqcZGz w2TNl/GYGrk0h+QnYGVHQtaVUQm+j4357dNRw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=MgzFydDXuzoxrwOoC3jufNl0YGGNU8hBVGNTpFB8IS24s5H1cfRdCdCbgFo+soaV9B ERI2OhNel8xeBVDObWJzPyER2nbi5ffetP6SnUA1hZ833rU+3tssOqrhOrMa5MbMRqDo Q+DIuIRfsCMSbEnfE9LrS1IOUrJp7ypnf+L2yxy3IKgRNEKqWlddqvsH94B+a+CH99Z9 jsua/BIqk2T5uvbGNTD6DnD3qGBHQZPEIY3lb1Zqu0301Aisx73w2Q82FBdGrdX68N6H IS0uystAb5lZAY1Qz6Hsa5567fW35ZZKOHXX4+1pdddwHlookXwldc0cSPYNcGsz+V5I V1YA== X-Gm-Message-State: APzg51Atjn1mFL4KVvQhkJT9pcU+43BiF0M+SGNaCJ/O9FJueCKD/nF8 YZSMikxqVs/VLJ/+YNqQPuFxqw== X-Google-Smtp-Source: ANB0VdbgxBIgf7Rno/m+5nsNcNh0jTkRXlOXOIJX1ZNAHf54BjKf32xRMmLnZ8mWsqx9AN1uF1PtIA== X-Received: by 2002:a50:a267:: with SMTP id 94-v6mr24264780edl.189.1536590619233; Mon, 10 Sep 2018 07:43:39 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:38 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH 4/4] crypto: arm64/aes-blk - improve XTS mask handling Date: Mon, 10 Sep 2018 16:41:15 +0200 Message-Id: <20180910144115.25727-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180910_074351_761517_25C075F9 X-CRM114-Status: GOOD ( 13.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Theodore Ts'o , herbert@gondor.apana.org.au, Steve Capper , Eric Biggers , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-ce.S | 5 +++ arch/arm64/crypto/aes-modes.S | 40 ++++++++++---------- arch/arm64/crypto/aes-neon.S | 6 +++ 3 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 623e74ed1c67..143070510809 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -17,6 +17,11 @@ .arch armv8-a+crypto + xtsmask .req v16 + + .macro xts_reload_mask, tmp + .endm + /* preload all round keys */ .macro load_round_keys, rounds, rk cmp \rounds, #12 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 82931fba53d2..5c0fa7905d24 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -340,17 +340,19 @@ AES_ENDPROC(aes_ctr_encrypt) * int blocks, u8 const rk2[], u8 iv[], int first) */ - .macro next_tweak, out, in, const, tmp + .macro next_tweak, out, in, tmp sshr \tmp\().2d, \in\().2d, #63 - and \tmp\().16b, \tmp\().16b, \const\().16b + and \tmp\().16b, \tmp\().16b, xtsmask.16b add \out\().2d, \in\().2d, \in\().2d ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8 eor \out\().16b, \out\().16b, \tmp\().16b .endm -.Lxts_mul_x: -CPU_LE( .quad 1, 0x87 ) -CPU_BE( .quad 0x87, 1 ) + .macro xts_load_mask, tmp + movi xtsmask.2s, #0x1 + movi \tmp\().2s, #0x87 + uzp1 xtsmask.4s, xtsmask.4s, \tmp\().4s + .endm AES_ENTRY(aes_xts_encrypt) stp x29, x30, [sp, #-16]! @@ -362,24 +364,24 @@ AES_ENTRY(aes_xts_encrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ enc_switch_key w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsencNx .Lxtsencnotfirst: enc_prepare w3, x2, x8 .LxtsencloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsencNx: subs w4, w4, #4 bmi .Lxtsenc1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_encrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -401,7 +403,7 @@ AES_ENTRY(aes_xts_encrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsencout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsencloop .Lxtsencout: st1 {v4.16b}, [x6] @@ -420,24 +422,24 @@ AES_ENTRY(aes_xts_decrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ dec_prepare w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsdecNx .Lxtsdecnotfirst: dec_prepare w3, x2, x8 .LxtsdecloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsdecNx: subs w4, w4, #4 bmi .Lxtsdec1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_decrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -459,7 +461,7 @@ AES_ENTRY(aes_xts_decrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsdecout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsdecloop .Lxtsdecout: st1 {v4.16b}, [x6] diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index 1c7b45b7268e..29100f692e8a 100644 --- a/arch/arm64/crypto/aes-neon.S +++ b/arch/arm64/crypto/aes-neon.S @@ -14,6 +14,12 @@ #define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func) + xtsmask .req v7 + + .macro xts_reload_mask, tmp + xts_load_mask \tmp + .endm + /* multiply by polynomial 'x' in GF(2^8) */ .macro mul_by_x, out, in, temp, const sshr \temp, \in, #7