From patchwork Wed Dec 6 19:43:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10096899 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DC2B2602BF for ; Wed, 6 Dec 2017 19:44:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4F4128ADA for ; Wed, 6 Dec 2017 19:44:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C34CB29EEC; Wed, 6 Dec 2017 19:44:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2026129A75 for ; Wed, 6 Dec 2017 19:44:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752172AbdLFToU (ORCPT ); Wed, 6 Dec 2017 14:44:20 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:34512 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752166AbdLFToQ (ORCPT ); Wed, 6 Dec 2017 14:44:16 -0500 Received: by mail-wr0-f193.google.com with SMTP id y21so5132249wrc.1 for ; Wed, 06 Dec 2017 11:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=I6Gnvs3Ig+XMQwO7jGoODHjYQvJyRj6RsM7VfBN0Rns=; b=gPnc/IaLPhCdaYOcGr4J+2RoP5OwvmoAXTKq9ICW09VxYkYF54b42E1vjjqzmKGLoM T7GLu3KWgqOXwI0iRN8boNafiqAXMmo+78Twwjs6gs3A3YmjVaKpmmOMOmWT0+WjGoUb R0uCMHgmXAQVpei9kx6u1t9YmKaE1qwLbasfo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=I6Gnvs3Ig+XMQwO7jGoODHjYQvJyRj6RsM7VfBN0Rns=; b=gNaBnca4iorO+Hsm5ZX9UR1cPsqIzXaHMIdDwMJPmqwZEhec3DOdrhPTZ0/QH3kY/L hijWPM7RlKCT8DBsqg3LghtlayyvlGW3mCfkFKfRfFAVN1d9bTKSGawC7UQ8uZGopT1G knHFaPDelHnUvpqvdrUrztFCVQvEqy9JgZyK8XHinwU0wK794UQWhQw3bHGl4ZwOYqm/ Yn+ZYToJTRMYmFeMqWuUQdciMkilvrBUW8ll4HgtCwj6pshnfNDxnDPPdSisoOz7gbda iAXCAH6o996W/kdPn8efIhkXeQiR01LECqT8iJTBAXwZ9+oVKcdJk0nmrIRiF5reqvQ/ Z7KA== X-Gm-Message-State: AJaThX5ylyfzoYBiopl9zWdcmheXWyy6E7YSogBLXVAUlPVhEAYN+ngf k6Ll82CpoHLpIWtyBYAp7tOYQoEa7ZM= X-Google-Smtp-Source: AGs4zMaChjpxA64JWdrosbyLzKpwMP0o2OHejqGUGcOVS1KSyE1l4AdfewArHZ1505m8x3TUaqUAMg== X-Received: by 10.223.195.103 with SMTP id e36mr21193552wrg.10.1512589454918; Wed, 06 Dec 2017 11:44:14 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:13 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 07/20] crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path Date: Wed, 6 Dec 2017 19:43:33 +0000 Message-Id: <20171206194346.24393-8-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP CBC encryption is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC encrypt which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 31 ++++++++++++++++---- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 27a235b2ddee..e86535a1329d 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -94,17 +94,36 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v0.16b}, [x5] /* get iv */ + ld1 {v4.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 -.Lcbcencloop: - ld1 {v1.16b}, [x1], #16 /* get next pt block */ - eor v0.16b, v0.16b, v1.16b /* ..and xor with iv */ +.Lcbcencloop4x: + subs w4, w4, #4 + bmi .Lcbcenc1x + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ encrypt_block v0, w3, x2, x6, w7 - st1 {v0.16b}, [x0], #16 + eor v1.16b, v1.16b, v0.16b + encrypt_block v1, w3, x2, x6, w7 + eor v2.16b, v2.16b, v1.16b + encrypt_block v2, w3, x2, x6, w7 + eor v3.16b, v3.16b, v2.16b + encrypt_block v3, w3, x2, x6, w7 + st1 {v0.16b-v3.16b}, [x0], #64 + mov v4.16b, v3.16b + b .Lcbcencloop4x +.Lcbcenc1x: + adds w4, w4, #4 + beq .Lcbcencout +.Lcbcencloop: + ld1 {v0.16b}, [x1], #16 /* get next pt block */ + eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ + encrypt_block v4, w3, x2, x6, w7 + st1 {v4.16b}, [x0], #16 subs w4, w4, #1 bne .Lcbcencloop - st1 {v0.16b}, [x5] /* return iv */ +.Lcbcencout: + st1 {v4.16b}, [x5] /* return iv */ ret AES_ENDPROC(aes_cbc_encrypt)