From patchwork Mon Sep 10 14:41:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10594289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03E70109C for ; Mon, 10 Sep 2018 14:46:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E745528C80 for ; Mon, 10 Sep 2018 14:46:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8F4E28CC2; Mon, 10 Sep 2018 14:46:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B3B1328C80 for ; Mon, 10 Sep 2018 14:46:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=z0g5W2ORiRDWwPb1+tM26DjU28Xeom6aDAORv84Q9q0=; b=H0lX/dxv6wFM329qOPvxqq90V5 f6Ou62MC/bKiLtbZhaIZP4pWTdD4HwwNfErsceFd9aQDvaS6nTWmlW23NHsIUUcxzSXQChkKly+Lj zMBXCGleSdDE5hNDcCqu+m1fEY8LoZK4KVxoVJdYRy0ZRx4Ylpmjevx7etc0PKrrspMWEI3Y5r90H fHTnGJGnvN5CF0/KRgaF0fyx8kioaUJyPr9+lRYJPcLb0mfb+n/5MGsB28VoujdiaeoubomkFd9/S FQotwtgOev36uzFid+iWFJem0vZ9xYb26+VhOexdOSav4dlylaa3X/SyHgAIRyqmK22UNsrVdTpcW zj18YiYg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNS7-0006Gp-M9; Mon, 10 Sep 2018 14:46:11 +0000 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fzNPr-0003n3-MJ for linux-arm-kernel@lists.infradead.org; Mon, 10 Sep 2018 14:43:54 +0000 Received: by mail-ed1-x52b.google.com with SMTP id p52-v6so16681237eda.12 for ; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=MphNqBrB+3ihpLXpKNXKwodVTnQo3XyzqKSpovUW/gVC71mf7roYPOJLuM+0RFXhZG 0EObiDWWWgIcN3QhuW45aCbelYHT5VZlGJTOoet2VbUiaSniOer57HuYlje7iAZqcZGz w2TNl/GYGrk0h+QnYGVHQtaVUQm+j4357dNRw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=MgzFydDXuzoxrwOoC3jufNl0YGGNU8hBVGNTpFB8IS24s5H1cfRdCdCbgFo+soaV9B ERI2OhNel8xeBVDObWJzPyER2nbi5ffetP6SnUA1hZ833rU+3tssOqrhOrMa5MbMRqDo Q+DIuIRfsCMSbEnfE9LrS1IOUrJp7ypnf+L2yxy3IKgRNEKqWlddqvsH94B+a+CH99Z9 jsua/BIqk2T5uvbGNTD6DnD3qGBHQZPEIY3lb1Zqu0301Aisx73w2Q82FBdGrdX68N6H IS0uystAb5lZAY1Qz6Hsa5567fW35ZZKOHXX4+1pdddwHlookXwldc0cSPYNcGsz+V5I V1YA== X-Gm-Message-State: APzg51Atjn1mFL4KVvQhkJT9pcU+43BiF0M+SGNaCJ/O9FJueCKD/nF8 YZSMikxqVs/VLJ/+YNqQPuFxqw== X-Google-Smtp-Source: ANB0VdbgxBIgf7Rno/m+5nsNcNh0jTkRXlOXOIJX1ZNAHf54BjKf32xRMmLnZ8mWsqx9AN1uF1PtIA== X-Received: by 2002:a50:a267:: with SMTP id 94-v6mr24264780edl.189.1536590619233; Mon, 10 Sep 2018 07:43:39 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:38 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH 4/4] crypto: arm64/aes-blk - improve XTS mask handling Date: Mon, 10 Sep 2018 16:41:15 +0200 Message-Id: <20180910144115.25727-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180910_074351_761517_25C075F9 X-CRM114-Status: GOOD ( 13.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Theodore Ts'o , herbert@gondor.apana.org.au, Steve Capper , Eric Biggers , linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-ce.S | 5 +++ arch/arm64/crypto/aes-modes.S | 40 ++++++++++---------- arch/arm64/crypto/aes-neon.S | 6 +++ 3 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 623e74ed1c67..143070510809 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -17,6 +17,11 @@ .arch armv8-a+crypto + xtsmask .req v16 + + .macro xts_reload_mask, tmp + .endm + /* preload all round keys */ .macro load_round_keys, rounds, rk cmp \rounds, #12 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 82931fba53d2..5c0fa7905d24 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -340,17 +340,19 @@ AES_ENDPROC(aes_ctr_encrypt) * int blocks, u8 const rk2[], u8 iv[], int first) */ - .macro next_tweak, out, in, const, tmp + .macro next_tweak, out, in, tmp sshr \tmp\().2d, \in\().2d, #63 - and \tmp\().16b, \tmp\().16b, \const\().16b + and \tmp\().16b, \tmp\().16b, xtsmask.16b add \out\().2d, \in\().2d, \in\().2d ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8 eor \out\().16b, \out\().16b, \tmp\().16b .endm -.Lxts_mul_x: -CPU_LE( .quad 1, 0x87 ) -CPU_BE( .quad 0x87, 1 ) + .macro xts_load_mask, tmp + movi xtsmask.2s, #0x1 + movi \tmp\().2s, #0x87 + uzp1 xtsmask.4s, xtsmask.4s, \tmp\().4s + .endm AES_ENTRY(aes_xts_encrypt) stp x29, x30, [sp, #-16]! @@ -362,24 +364,24 @@ AES_ENTRY(aes_xts_encrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ enc_switch_key w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsencNx .Lxtsencnotfirst: enc_prepare w3, x2, x8 .LxtsencloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsencNx: subs w4, w4, #4 bmi .Lxtsenc1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_encrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -401,7 +403,7 @@ AES_ENTRY(aes_xts_encrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsencout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsencloop .Lxtsencout: st1 {v4.16b}, [x6] @@ -420,24 +422,24 @@ AES_ENTRY(aes_xts_decrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ dec_prepare w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsdecNx .Lxtsdecnotfirst: dec_prepare w3, x2, x8 .LxtsdecloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsdecNx: subs w4, w4, #4 bmi .Lxtsdec1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_decrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -459,7 +461,7 @@ AES_ENTRY(aes_xts_decrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsdecout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsdecloop .Lxtsdecout: st1 {v4.16b}, [x6] diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index 1c7b45b7268e..29100f692e8a 100644 --- a/arch/arm64/crypto/aes-neon.S +++ b/arch/arm64/crypto/aes-neon.S @@ -14,6 +14,12 @@ #define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func) + xtsmask .req v7 + + .macro xts_reload_mask, tmp + xts_load_mask \tmp + .endm + /* multiply by polynomial 'x' in GF(2^8) */ .macro mul_by_x, out, in, temp, const sshr \temp, \in, #7