From patchwork Thu Aug 23 16:48:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10574343 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 498D713B6 for ; Thu, 23 Aug 2018 16:49:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CB7E2C4D5 for ; Thu, 23 Aug 2018 16:49:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E9752C4F7; Thu, 23 Aug 2018 16:49:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BC24D2C4EC for ; Thu, 23 Aug 2018 16:49:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=TkqRfOu9mLXDvmNrCOyLjIMfwEs7/kR7cez9fmctU50=; b=kKr 6CJRGFz+JIcl7i3F7CziFMsa74bOf0Q4ePog8XGBLnLXW/o/cnsbqCEogFY5x6lhhPXblGZWK77yC w1/TmCpbw3OioK5uQW2Oe92V5slsm575h3Nyjk+fhGPl+JeLdzrpfLF5CW9tyhBJn0j/r460xbbEr mYIEyZS/CSrIH7w27Hdm/ZfgoKiIMMthoNE+3EsJZSYKD601GJ5+VI3yYQUmjy11HiYaFxBMDrwYm koZ/jvr6qan347ugXvexg276W0C2SK76L6DBZSWh4kpIgZX/xh67MoIi+EeuCyv8bgeZsoxisPwaX Ffj8tFRa4LmLq60Ex+YLyUf8Noz0nng==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fssnC-0000pn-MR; Thu, 23 Aug 2018 16:49:06 +0000 Received: from mail-wr1-x444.google.com ([2a00:1450:4864:20::444]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fssn8-0000Xn-Sv for linux-arm-kernel@lists.infradead.org; Thu, 23 Aug 2018 16:49:05 +0000 Received: by mail-wr1-x444.google.com with SMTP id j26-v6so5205909wre.2 for ; Thu, 23 Aug 2018 09:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=HXprc1UMt3DU7pU+ISD0EBii8sWYj4A3IxG92gQ7Xgo=; b=Ss1mQZU9qS7nuFM6gsu9CmhFsnO0hIcAUSPVfLZ94Ndeb4U0aE7Mwd7u4gxgs5Oos7 GLzzmsQSVbcyyG2jaO1iq4YqJURoy8Qyv0A2PJ6QdcgFXmCS8dzF5cIjJAa9oiLQ+q+O JAKT1KeT0iWc6IxDB7fsyslC4DGEfmP2YcqQs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=HXprc1UMt3DU7pU+ISD0EBii8sWYj4A3IxG92gQ7Xgo=; b=m94/fPU8yJhrb21vsElY+Wn9JlR0aVk93XEIkWRuugdVv+2oq2sgtwib641mVdV1sM S8H8OEOZ4D4g+92HS/4aysS7JI6pvKDI3fARyV95cfPF71fYXEBQKLLjPENccrJi8VDH nOEmJMxV59AT8E8hROPtrPFSFTm+dLrty1IUUFq8kLJanA8BO56JlSvmCKRmD6VWSIq1 HcsgHUpwqD92j7ESAKqBl0c98DX/ZdTKxKZ5HrQfTw71nosbZRl1hN9dAaey5QaL8GYM 9ujwGRCrKE7BzHltsl43O+NA/eRePLmU1vD+iFPzxrS6dCD0jTdM/qq80d58CAL6rKXu CdXw== X-Gm-Message-State: APzg51AKEQ7WEDhDP/nXR9Rh24SGAjARw0GlpJEHxnibmEIzRhGv/wr+ oLqES11HIHvzJaknO0E+sPCRuA== X-Google-Smtp-Source: ANB0VdYVmMw+m91g83S/ozJYqetiFLub6e4cRZgu6pAJrssKklefbpDbV3aq7tgToOB5Ih61c9SsrQ== X-Received: by 2002:a5d:4a44:: with SMTP id v4-v6mr6892544wrs.278.1535042930556; Thu, 23 Aug 2018 09:48:50 -0700 (PDT) Received: from rev02.lan (cpc107249-cmbg18-2-0-cust143.5-4.cable.virginm.net. [80.3.80.144]) by smtp.gmail.com with ESMTPSA id d12-v6sm5391382wru.36.2018.08.23.09.48.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Aug 2018 09:48:49 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH v2] crypto: arm64/aes-modes - get rid of literal load of addend vector Date: Thu, 23 Aug 2018 17:48:45 +0100 Message-Id: <20180823164845.20055-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180823_094902_958211_5E7DB886 X-CRM114-Status: GOOD ( 10.48 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nick Desaulniers , will.deacon@arm.com, herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Replace the literal load of the addend vector with a sequence that performs each add individually. This sequence is only 2 instructions longer than the original, and 2% faster on Cortex-A53. This is an improvement by itself, but also works around a Clang issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers Signed-off-by: Ard Biesheuvel Reviewed-by: Nick Desaulniers --- v2: replace convoluted code involving a SIMD add to increment four BE counters at once with individual add/rev/mov instructions arch/arm64/crypto/aes-modes.S | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 483a7130cf0e..496c243de4ac 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -232,17 +232,19 @@ AES_ENTRY(aes_ctr_encrypt) bmi .Lctr1x cmn w6, #4 /* 32 bit overflow? */ bcs .Lctr1x - ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w6 + add w7, w6, #1 mov v0.16b, v4.16b - add v7.4s, v7.4s, v8.4s + add w8, w6, #2 mov v1.16b, v4.16b - rev32 v8.16b, v7.16b + add w9, w6, #3 mov v2.16b, v4.16b + rev w7, w7 mov v3.16b, v4.16b - mov v1.s[3], v8.s[0] - mov v2.s[3], v8.s[1] - mov v3.s[3], v8.s[2] + rev w8, w8 + mov v1.s[3], w7 + rev w9, w9 + mov v2.s[3], w8 + mov v3.s[3], w9 ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b