From patchwork Tue Mar 17 18:05:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 6033521 Return-Path: X-Original-To: patchwork-linux-crypto@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 628879F399 for ; Tue, 17 Mar 2015 18:05:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7928F20497 for ; Tue, 17 Mar 2015 18:05:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7D914204A2 for ; Tue, 17 Mar 2015 18:05:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932076AbbCQSFg (ORCPT ); Tue, 17 Mar 2015 14:05:36 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:38825 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753496AbbCQSFc (ORCPT ); Tue, 17 Mar 2015 14:05:32 -0400 Received: by wifj2 with SMTP id j2so18021111wif.1 for ; Tue, 17 Mar 2015 11:05:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=OI4dlHX2KUiGOoYgSwyyJ6QEv5Rqc+YyTuRlGyJmuPg=; b=M5zaLjnZcT882nGh6ntJIu6VG084Jcrw5gVJooJ8weNfL6XpXkGElz1qQc25u4GsGM hjPws3ITNKENZPEXPhSruimwyhGvXLT7DT6W2fWiuy8XKwdXox6bSoBr9m/+NYWK0bJr JwGWfBSuZmYaIoopxlqLt69IbXcy+jmP8Y1DGRudnkOMQED/yUMh8N9QJJIFe1Bv5OXp NDfmDAY6YkdtWe8Tc5cZ0p8qRWRD+e877TVSmOQHPDLrZZiqIpcgaoSufgFp1Y8Kg8Ph BYAPYZyQUccGtssjXWmUtnx/AeFpmstq9xlxtV68Y390B4j4Na+5TgXupjtZyBR/4zNr e9dw== X-Gm-Message-State: ALoCoQmmzCjXmUXs7XS93hn/v80ZkHV1AUDTbZM6u4H6/LN1pDHkhOfOXStTOlcvyoPU9l+S+K1t X-Received: by 10.194.110.69 with SMTP id hy5mr137901691wjb.121.1426615530233; Tue, 17 Mar 2015 11:05:30 -0700 (PDT) Received: from ards-macbook-pro.local ([90.174.4.220]) by mx.google.com with ESMTPSA id vv9sm20931534wjc.35.2015.03.17.11.05.28 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 17 Mar 2015 11:05:29 -0700 (PDT) From: Ard Biesheuvel To: will.deacon@arm.com, catalin.marinas@arm.com, herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, linux-crypto@vger.kernel.org Cc: Ard Biesheuvel Subject: [PATCH] arm64/crypto: issue aese/aesmc instructions in pairs Date: Tue, 17 Mar 2015 19:05:13 +0100 Message-Id: <1426615513-28587-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 1.8.3.2 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This changes the AES core transform implementations to issue aese/aesmc (and aesd/aesimc) in pairs. This enables a micro-architectural optimization in recent Cortex-A5x cores that improves performance by 50-90%. Measured performance in cycles per byte (Cortex-A57): CBC enc CBC dec CTR before 3.64 1.34 1.32 after 1.95 0.85 0.93 Note that this results in a ~5% performance decrease for older cores. Signed-off-by: Ard Biesheuvel --- Will, This is the optimization you yourself mentioned to me about a year ago (or even longer perhaps?) Anyway, we have now been able to confirm it on a sample 'in the wild', (i.e., a Galaxy S6 phone) arch/arm64/crypto/aes-ce-ccm-core.S | 12 ++++++------ arch/arm64/crypto/aes-ce.S | 10 +++------- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 432e4841cd81..a2a7fbcacc14 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -101,19 +101,19 @@ ENTRY(ce_aes_ccm_final) 0: mov v4.16b, v3.16b 1: ld1 {v5.2d}, [x2], #16 /* load next round key */ aese v0.16b, v4.16b - aese v1.16b, v4.16b aesmc v0.16b, v0.16b + aese v1.16b, v4.16b aesmc v1.16b, v1.16b 2: ld1 {v3.2d}, [x2], #16 /* load next round key */ aese v0.16b, v5.16b - aese v1.16b, v5.16b aesmc v0.16b, v0.16b + aese v1.16b, v5.16b aesmc v1.16b, v1.16b 3: ld1 {v4.2d}, [x2], #16 /* load next round key */ subs w3, w3, #3 aese v0.16b, v3.16b - aese v1.16b, v3.16b aesmc v0.16b, v0.16b + aese v1.16b, v3.16b aesmc v1.16b, v1.16b bpl 1b aese v0.16b, v4.16b @@ -146,19 +146,19 @@ ENDPROC(ce_aes_ccm_final) ld1 {v5.2d}, [x10], #16 /* load 2nd round key */ 2: /* inner loop: 3 rounds, 2x interleaved */ aese v0.16b, v4.16b - aese v1.16b, v4.16b aesmc v0.16b, v0.16b + aese v1.16b, v4.16b aesmc v1.16b, v1.16b 3: ld1 {v3.2d}, [x10], #16 /* load next round key */ aese v0.16b, v5.16b - aese v1.16b, v5.16b aesmc v0.16b, v0.16b + aese v1.16b, v5.16b aesmc v1.16b, v1.16b 4: ld1 {v4.2d}, [x10], #16 /* load next round key */ subs w7, w7, #3 aese v0.16b, v3.16b - aese v1.16b, v3.16b aesmc v0.16b, v0.16b + aese v1.16b, v3.16b aesmc v1.16b, v1.16b ld1 {v5.2d}, [x10], #16 /* load next round key */ bpl 2b diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 685a18f731eb..78f3cfe92c08 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -45,18 +45,14 @@ .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 aes\de \i0\().16b, \k\().16b - .ifnb \i1 - aes\de \i1\().16b, \k\().16b - .ifnb \i3 - aes\de \i2\().16b, \k\().16b - aes\de \i3\().16b, \k\().16b - .endif - .endif aes\mc \i0\().16b, \i0\().16b .ifnb \i1 + aes\de \i1\().16b, \k\().16b aes\mc \i1\().16b, \i1\().16b .ifnb \i3 + aes\de \i2\().16b, \k\().16b aes\mc \i2\().16b, \i2\().16b + aes\de \i3\().16b, \k\().16b aes\mc \i3\().16b, \i3\().16b .endif .endif