From patchwork Wed Dec 6 19:43:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10096965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CE62F60327 for ; Wed, 6 Dec 2017 19:49:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFCC6297C2 for ; Wed, 6 Dec 2017 19:49:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B3EA429930; Wed, 6 Dec 2017 19:49:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_MED autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id ECAA8297C2 for ; Wed, 6 Dec 2017 19:49:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=0Dfhm1Figbw7iVqdy8Pju01gLgbRekFJwEJvi/Fc2WM=; b=jR7lvqRdJD6x+6/7a5YUb2cv4/ 55S7Kdoe181fdemrucOE7wUGc6uSCHGzYkUHwR3V9cRYd6rNVR0j/d3aPvc/5lOQ3AO8jZ/X2mBhn zSyCQM6H9ufOSNOJR20VuqKjluPIRKUymB1LDEJ3rgc64NRkamhUCQMAck74XE0AGm4Go8XllC26a sUj5UQNCbbsnJNCkWAwuxxzVOMzy7ndNRvBwbWg9PrAO3EU1Y0kCqC5bhi3ibTEiNmEAYpdAc7JnO UyJRi4LYQhNLtxRAfkX/tVaQx4x6TA1UNZ3+SxCrqj0C9b1vCkM0Ti90rfPe6H8OU9NH3UKWHg10Q CvmvG6Dw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1eMfgb-0003Iu-Oe; Wed, 06 Dec 2017 19:48:53 +0000 Received: from merlin.infradead.org ([2001:8b0:10b:1231::1]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1eMfd0-0007eb-D2 for linux-arm-kernel@bombadil.infradead.org; Wed, 06 Dec 2017 19:45:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=References:In-Reply-To:Message-Id:Date: Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=nh9jojNehE5dWRrFAPxRAsZS0 nt4o+cvWuBI4QP1JqrL4hSV+HVDn9s+db7SnTNED2dtfUD4tIGDMYst/qGCQ3GcH+83G32JdUY9Ql gEvD4b7XHvp26bWgMsAF8q2DT/0Ytkj4hrHzQ01TbPgEI01rSbKXxzTBGWQzwkOL885cpSQLFhwwX 450JwRzZgnGOvzaIAq1UNtCRDHLqEKdh71XXBcMzavEanv+REv6ZbSNsuUWl4zf7ENvaZQbdTVVOj y5SAGDtkqclozjK5w9eY9cbtHCZo/L4eljtPHsN2sbg8T2wnoGTJt/3BBwlID0iqX1sJNat+81SIr 56r4s7Pvg==; Received: from mail-wm0-x243.google.com ([2a00:1450:400c:c09::243]) by merlin.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1eMfcw-0002Ne-Fy for linux-arm-kernel@lists.infradead.org; Wed, 06 Dec 2017 19:45:08 +0000 Received: by mail-wm0-x243.google.com with SMTP id i11so9061674wmf.4 for ; Wed, 06 Dec 2017 11:44:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=kXYqo57p33Hy3/x16ckJAztWjdIpzA5b2nIezLW7E238hHMqzmmhAl7ZWr1SWzNx3S u72VLwhnHVzLuiB9LDSqVsviswiIpCIjWXSKJvPo0Thc7EJB5U+sbQppJn3+18Ct8ozA VGEhlGgrQM+rzP/ztrBFftBHrFnQ1+MJ3UOdA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=v6Kxl/UUbEqYTxgRE3N+cXpJEN/y9kmHE+RlumD6CuQ=; b=QJhVKf9ZVv+rzxcuREM3Y2slltN3a4Hl+ex3bpqJtYJexI/OAqcmTOgiuXZHAw+cSu AgU2bp9WMw4Y10wbBcTmjKM12PGEwylZbcGpA/tXQFDWNSr2qR+KLqmgmlbBI5VPVUd9 4KfaM7hPIqKm1D6WbcbI+nU4U58zSyf9SMiCTon+BKQjQhQb0ZON9pBLJi7AWwNvP+1u hvH7hLZ1qAfMXuNLgLW7hA7MIpqVCnpR9vpz8qxjYWZ+ktrA/kQ6xWhbu2Brsaf5drqC ulqkeWQpXxgFRz4eZuEy8AM+qED7cT2CuYD6/dXlnPKILVIEkZoTPLL1R5RQbFPO3rru K5gw== X-Gm-Message-State: AKGB3mL8VhOflx2dYM7f00LchMhEqoYsJBpOnk2kKBSKbgMJP6sNOx8/ eDzzQXNgbFTgFM/tWlDXLATY0g== X-Google-Smtp-Source: AGs4zMZ+8X5t2L3cuN9jV95EBiy12MV/UQ4T8Qbohu/oPGSbvxjyTXfFYzMamFQWD3LWqNqyQxklhw== X-Received: by 10.28.191.3 with SMTP id p3mr12783957wmf.81.1512589483914; Wed, 06 Dec 2017 11:44:43 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:43 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [PATCH v3 16/20] crypto: arm64/aes-bs - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:42 +0000 Message-Id: <20171206194346.24393-17-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , herbert@gondor.apana.org.au, Ard Biesheuvel , Peter Zijlstra , Catalin Marinas , Sebastian Andrzej Siewior , Will Deacon , Russell King - ARM Linux , Steven Rostedt , Thomas Gleixner , Dave Martin , linux-arm-kernel@lists.infradead.org, linux-rt-users@vger.kernel.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 305 +++++++++++--------- 1 file changed, 170 insertions(+), 135 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index ca0472500433..23659369da78 100644 --- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8) * int blocks) */ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 99: mov x5, #1 - lsl x5, x5, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x5, x5, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x5, x5, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 tbnz x5, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 tbnz x5, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 tbnz x5, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 tbnz x5, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 tbnz x5, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 tbnz x5, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 tbnz x5, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl \do8 - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 tbnz x5, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 tbnz x5, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 tbnz x5, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 tbnz x5, #4, 1f - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x5, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x5, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x5, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + cond_yield_neon + b 99b -1: ldp x29, x30, [sp], #16 +1: frame_pop 5 ret .endm @@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt) */ .align 4 ENTRY(aesbs_cbc_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 99: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 mov v25.16b, v0.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 mov v26.16b, v1.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 mov v27.16b, v2.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 mov v28.16b, v3.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 mov v29.16b, v4.16b tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 mov v30.16b, v5.16b tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 mov v31.16b, v6.16b tbnz x6, #7, 0f - ld1 {v7.16b}, [x1] + ld1 {v7.16b}, [x20] -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_decrypt8 - ld1 {v24.16b}, [x5] // load IV + ld1 {v24.16b}, [x24] // load IV eor v1.16b, v1.16b, v25.16b eor v6.16b, v6.16b, v26.16b @@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt) eor v3.16b, v3.16b, v30.16b eor v5.16b, v5.16b, v31.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 mov v24.16b, v25.16b tbnz x6, #1, 1f - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 mov v24.16b, v26.16b tbnz x6, #2, 1f - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 mov v24.16b, v27.16b tbnz x6, #3, 1f - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 mov v24.16b, v28.16b tbnz x6, #4, 1f - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 mov v24.16b, v29.16b tbnz x6, #5, 1f - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 mov v24.16b, v30.16b tbnz x6, #6, 1f - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 mov v24.16b, v31.16b tbnz x6, #7, 1f - ld1 {v24.16b}, [x1], #16 - st1 {v5.16b}, [x0], #16 -1: st1 {v24.16b}, [x5] // store IV + ld1 {v24.16b}, [x20], #16 + st1 {v5.16b}, [x19], #16 +1: st1 {v24.16b}, [x24] // store IV - cbnz x4, 99b + cbz x23, 2f + cond_yield_neon + b 99b - ldp x29, x30, [sp], #16 +2: frame_pop 6 ret ENDPROC(aesbs_cbc_decrypt) @@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 ) */ __xts_crypt8: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 next_tweak v26, v25, v30, v31 eor v0.16b, v0.16b, v25.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 next_tweak v27, v26, v30, v31 eor v1.16b, v1.16b, v26.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 next_tweak v28, v27, v30, v31 eor v2.16b, v2.16b, v27.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 next_tweak v29, v28, v30, v31 eor v3.16b, v3.16b, v28.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 - str q29, [sp, #16] + ld1 {v4.16b}, [x20], #16 + str q29, [sp, #64] eor v4.16b, v4.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 - str q29, [sp, #32] + ld1 {v5.16b}, [x20], #16 + str q29, [sp, #80] eor v5.16b, v5.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 - str q29, [sp, #48] + ld1 {v6.16b}, [x20], #16 + str q29, [sp, #96] eor v6.16b, v6.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #7, 0f - ld1 {v7.16b}, [x1], #16 - str q29, [sp, #64] + ld1 {v7.16b}, [x20], #16 + str q29, [sp, #112] eor v7.16b, v7.16b, v29.16b next_tweak v29, v29, v30, v31 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 br x7 ENDPROC(__xts_crypt8) .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-80]! - mov x29, sp + frame_push 6, 64 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - ldr q30, .Lxts_mul_x - ld1 {v25.16b}, [x5] +0: ldr q30, .Lxts_mul_x + ld1 {v25.16b}, [x24] 99: adr x7, \do8 bl __xts_crypt8 - ldp q16, q17, [sp, #16] - ldp q18, q19, [sp, #48] + ldp q16, q17, [sp, #64] + ldp q18, q19, [sp, #96] eor \o0\().16b, \o0\().16b, v25.16b eor \o1\().16b, \o1\().16b, v26.16b eor \o2\().16b, \o2\().16b, v27.16b eor \o3\().16b, \o3\().16b, v28.16b - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 mov v25.16b, v26.16b tbnz x6, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 mov v25.16b, v27.16b tbnz x6, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 mov v25.16b, v28.16b tbnz x6, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 mov v25.16b, v29.16b tbnz x6, #4, 1f @@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8) eor \o6\().16b, \o6\().16b, v18.16b eor \o7\().16b, \o7\().16b, v19.16b - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x6, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x6, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x6, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + st1 {v25.16b}, [x24] -1: st1 {v25.16b}, [x5] - ldp x29, x30, [sp], #80 + cond_yield_neon 0b + b 99b + +1: st1 {v25.16b}, [x24] + frame_pop 6, 64 ret .endm @@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt) * int rounds, int blocks, u8 iv[], u8 final[]) */ ENTRY(aesbs_ctr_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp - - cmp x6, #0 - cset x10, ne - add x4, x4, x10 // do one extra block if final - - ldp x7, x8, [x5] - ld1 {v0.16b}, [x5] + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + cmp x25, #0 + cset x26, ne + add x23, x23, x26 // do one extra block if final + +98: ldp x7, x8, [x24] + ld1 {v0.16b}, [x24] CPU_LE( rev x7, x7 ) CPU_LE( rev x8, x8 ) adds x8, x8, #1 adc x7, x7, xzr 99: mov x9, #1 - lsl x9, x9, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x9, x9, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x9, x9, xzr, le tbnz x9, #1, 0f @@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 ) tbnz x9, #7, 0f next_ctr v7 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_encrypt8 - lsr x9, x9, x10 // disregard the extra block + lsr x9, x9, x26 // disregard the extra block tbnz x9, #0, 0f - ld1 {v8.16b}, [x1], #16 + ld1 {v8.16b}, [x20], #16 eor v0.16b, v0.16b, v8.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 tbnz x9, #1, 1f - ld1 {v9.16b}, [x1], #16 + ld1 {v9.16b}, [x20], #16 eor v1.16b, v1.16b, v9.16b - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 tbnz x9, #2, 2f - ld1 {v10.16b}, [x1], #16 + ld1 {v10.16b}, [x20], #16 eor v4.16b, v4.16b, v10.16b - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 tbnz x9, #3, 3f - ld1 {v11.16b}, [x1], #16 + ld1 {v11.16b}, [x20], #16 eor v6.16b, v6.16b, v11.16b - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 tbnz x9, #4, 4f - ld1 {v12.16b}, [x1], #16 + ld1 {v12.16b}, [x20], #16 eor v3.16b, v3.16b, v12.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 tbnz x9, #5, 5f - ld1 {v13.16b}, [x1], #16 + ld1 {v13.16b}, [x20], #16 eor v7.16b, v7.16b, v13.16b - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 tbnz x9, #6, 6f - ld1 {v14.16b}, [x1], #16 + ld1 {v14.16b}, [x20], #16 eor v2.16b, v2.16b, v14.16b - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 tbnz x9, #7, 7f - ld1 {v15.16b}, [x1], #16 + ld1 {v15.16b}, [x20], #16 eor v5.16b, v5.16b, v15.16b - st1 {v5.16b}, [x0], #16 + st1 {v5.16b}, [x19], #16 8: next_ctr v0 - cbnz x4, 99b + st1 {v0.16b}, [x24] + cbz x23, 0f + + cond_yield_neon 98b + b 99b -0: st1 {v0.16b}, [x5] - ldp x29, x30, [sp], #16 +0: frame_pop 8 ret /* * If we are handling the tail of the input (x6 != NULL), return the * final keystream block back to the caller. */ -1: cbz x6, 8b - st1 {v1.16b}, [x6] +1: cbz x25, 8b + st1 {v1.16b}, [x25] b 8b -2: cbz x6, 8b - st1 {v4.16b}, [x6] +2: cbz x25, 8b + st1 {v4.16b}, [x25] b 8b -3: cbz x6, 8b - st1 {v6.16b}, [x6] +3: cbz x25, 8b + st1 {v6.16b}, [x25] b 8b -4: cbz x6, 8b - st1 {v3.16b}, [x6] +4: cbz x25, 8b + st1 {v3.16b}, [x25] b 8b -5: cbz x6, 8b - st1 {v7.16b}, [x6] +5: cbz x25, 8b + st1 {v7.16b}, [x25] b 8b -6: cbz x6, 8b - st1 {v2.16b}, [x6] +6: cbz x25, 8b + st1 {v2.16b}, [x25] b 8b -7: cbz x6, 8b - st1 {v5.16b}, [x6] +7: cbz x25, 8b + st1 {v5.16b}, [x25] b 8b ENDPROC(aesbs_ctr_encrypt)