From patchwork Mon Apr 30 16:18:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10372095 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8F3AC6038F for ; Mon, 30 Apr 2018 16:26:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80C4228B83 for ; Mon, 30 Apr 2018 16:26:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F6C228B88; Mon, 30 Apr 2018 16:26:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C155C28B83 for ; Mon, 30 Apr 2018 16:26:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=k+WX0vniiPVUbVb/aZQuOcxeLA4zUrwRMkbU3Mslhy8=; b=nahUuj9Uzp9Jx7vDvRp85aADp7 eLqfsP0AqZLbGKIrFIQPsbmyoCNZnyCPTER5HFo8HXxA53+ooH9PLG1o7+/4wDDAfj0J/CazFDyaI bJroqOg5lifPt6Fn8Zhn+hybJntaMXsO4DZL/BsDdRaNLxhIu9eTOuZhogGlX95GQpZtmnKitx5WB e238b25hiHGOyEbRw+VlvF1KrnNS57z6ZVPuwIj1yjHcpktzyONIDLZpdj42AhiQAfcm09lYNH6Uz s/EF01xtc6RZ7yq8S5YRLYPNJS11hdMnEsmSbuV89cdNuDwwyK/La96uzAaBAcMCihXFjV+OwzqlY UnTFwBtg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fDBdb-0000cR-R5; Mon, 30 Apr 2018 16:26:51 +0000 Received: from mail-wr0-x243.google.com ([2a00:1450:400c:c0c::243]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fDBWD-0003Q3-J0 for linux-arm-kernel@lists.infradead.org; Mon, 30 Apr 2018 16:19:15 +0000 Received: by mail-wr0-x243.google.com with SMTP id v60-v6so8576634wrc.7 for ; Mon, 30 Apr 2018 09:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bOSfwQobMJzi1OJdCB+etKpYG/fpkT32nfTf6cOmpZQ=; b=Qbd0+15Y2pCybt6hKDfX+JqXZky05wobZH+2WaManVKCkyJSKsUj/wLDp9o0QnS55l 3l5CfqUo6w0Hd94kMfn49+pIflEC1FLZJa5PvWM+fAYTlpPtx5yE8xhMmtD6MACVCnsA 15NXn6EvcSnaYCV0/Fd0k0I4GSl6hrhSCycX8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bOSfwQobMJzi1OJdCB+etKpYG/fpkT32nfTf6cOmpZQ=; b=X1DsLIRRIYDou9nqBo+CPvIHueWMg4/nwRq7Kvzw9E1wR6ifUUtF9fXyF3Nm/xfQ1y xQWhYJ+xHTFtjSzI0o2gPaPNbxHxsks62ve5l8Yc/88oT6AqXTfdfDH4c77pgMDDGfS+ qjMZiMsPCRnmyDX1A2GapNDbCfzTE8jdVGQ1UoM0ixg+QM94zrryszhzUAYV3Th3gdlX TTtAajwo+DuVJbfszWx9WmOPR9U083JogwDaFB/uyrapN9f6856t2fcNizoGYktQKkhJ +sY2heu6O9XmGPo0sGxx6CxG3Z6/64RGe+6f7CMHxIRLG9bjmlnGGGR5msqXhBRMWvqW akLw== X-Gm-Message-State: ALQs6tBJZ68Uj7ZoJhEra2Z1yZtkmA6Xz0URyxi6OPfh6AKTcCk6wZro r+fhZn8pX0qit0YflJRa/aUPlQ== X-Google-Smtp-Source: AB8JxZpsaY3HlSAP3u3NJN6yAOUKGmBr4eBSHBCRWrGY4v92KKQOlk7frZHtC12eXxMkk5t16/X6NQ== X-Received: by 2002:adf:c104:: with SMTP id r4-v6mr9881361wre.84.1525105141405; Mon, 30 Apr 2018 09:19:01 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:19:00 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Subject: [PATCH resend 09/10] crypto: arm64/sha3-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:29 +0200 Message-Id: <20180430161830.14892-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180430_091913_673128_B384801D X-CRM114-Status: GOOD ( 10.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: will.deacon@arm.com, dave.martin@arm.com, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha3-ce-core.S | 77 +++++++++++++------- 1 file changed, 50 insertions(+), 27 deletions(-) diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S index 332ad7530690..a7d587fa54f6 100644 --- a/arch/arm64/crypto/sha3-ce-core.S +++ b/arch/arm64/crypto/sha3-ce-core.S @@ -41,9 +41,16 @@ */ .text ENTRY(sha3_ce_transform) - /* load state */ - add x8, x0, #32 - ld1 { v0.1d- v3.1d}, [x0] + frame_push 4 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + +0: /* load state */ + add x8, x19, #32 + ld1 { v0.1d- v3.1d}, [x19] ld1 { v4.1d- v7.1d}, [x8], #32 ld1 { v8.1d-v11.1d}, [x8], #32 ld1 {v12.1d-v15.1d}, [x8], #32 @@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform) ld1 {v20.1d-v23.1d}, [x8], #32 ld1 {v24.1d}, [x8] -0: sub w2, w2, #1 +1: sub w21, w21, #1 mov w8, #24 adr_l x9, .Lsha3_rcon /* load input */ - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v31.8b}, [x1], #24 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v31.8b}, [x20], #24 eor v0.8b, v0.8b, v25.8b eor v1.8b, v1.8b, v26.8b eor v2.8b, v2.8b, v27.8b @@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform) eor v5.8b, v5.8b, v30.8b eor v6.8b, v6.8b, v31.8b - tbnz x3, #6, 2f // SHA3-512 + tbnz x22, #6, 3f // SHA3-512 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v30.8b}, [x1], #16 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v30.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b eor v9.8b, v9.8b, v27.8b @@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform) eor v11.8b, v11.8b, v29.8b eor v12.8b, v12.8b, v30.8b - tbnz x3, #4, 1f // SHA3-384 or SHA3-224 + tbnz x22, #4, 2f // SHA3-384 or SHA3-224 // SHA3-256 - ld1 {v25.8b-v28.8b}, [x1], #32 + ld1 {v25.8b-v28.8b}, [x20], #32 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b - b 3f + b 4f -1: tbz x3, #2, 3f // bit 2 cleared? SHA-384 +2: tbz x22, #2, 4f // bit 2 cleared? SHA-384 // SHA3-224 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b}, [x1], #8 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b}, [x20], #8 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b eor v17.8b, v17.8b, v29.8b - b 3f + b 4f // SHA3-512 -2: ld1 {v25.8b-v26.8b}, [x1], #16 +3: ld1 {v25.8b-v26.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b -3: sub w8, w8, #1 +4: sub w8, w8, #1 eor3 v29.16b, v4.16b, v9.16b, v14.16b eor3 v26.16b, v1.16b, v6.16b, v11.16b @@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform) eor v0.16b, v0.16b, v31.16b - cbnz w8, 3b - cbnz w2, 0b + cbnz w8, 4b + cbz w21, 5f + + if_will_cond_yield_neon + add x8, x19, #32 + st1 { v0.1d- v3.1d}, [x19] + st1 { v4.1d- v7.1d}, [x8], #32 + st1 { v8.1d-v11.1d}, [x8], #32 + st1 {v12.1d-v15.1d}, [x8], #32 + st1 {v16.1d-v19.1d}, [x8], #32 + st1 {v20.1d-v23.1d}, [x8], #32 + st1 {v24.1d}, [x8] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* save state */ - st1 { v0.1d- v3.1d}, [x0], #32 - st1 { v4.1d- v7.1d}, [x0], #32 - st1 { v8.1d-v11.1d}, [x0], #32 - st1 {v12.1d-v15.1d}, [x0], #32 - st1 {v16.1d-v19.1d}, [x0], #32 - st1 {v20.1d-v23.1d}, [x0], #32 - st1 {v24.1d}, [x0] +5: st1 { v0.1d- v3.1d}, [x19], #32 + st1 { v4.1d- v7.1d}, [x19], #32 + st1 { v8.1d-v11.1d}, [x19], #32 + st1 {v12.1d-v15.1d}, [x19], #32 + st1 {v16.1d-v19.1d}, [x19], #32 + st1 {v20.1d-v23.1d}, [x19], #32 + st1 {v24.1d}, [x19] + frame_pop ret ENDPROC(sha3_ce_transform)