From patchwork Tue Nov 27 17:42:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 10701125 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E59C9181D for ; Tue, 27 Nov 2018 17:43:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CED752C7B0 for ; Tue, 27 Nov 2018 17:43:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C1D712C7AC; Tue, 27 Nov 2018 17:43:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0CC1F2C7AD for ; Tue, 27 Nov 2018 17:43:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:To :From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=r/6IEHmbNm22MC6Z127NlVXg4mC//26z0k5bl+Nuevk=; b=muH5MhuX73IXgV nt42TR9R6YV9qFcy1bbvKeRfWCC7LSibFEq2EwhUrmsV6P8/EEPjanVyuEJ1+tBGa8ni0QBpUk9CJ akwsxhbHjJ3YYtPEjlGWk6ekVRsaXfhsenM1jVpu51wF18lpVrBsnrGi2lSfOJpcBeg0Ws8DirGZS rn15dZl+doufvnLSr+DkTsisQa36zuCKbTRhaPtyhinQPCw8jo5kbvqKs4o0Pd4sorMAMqmftx+n9 wBp+NwuInuvUu60/AxH0dt0dPN3/BWdwRWPVc9E1xtgsAjQP2jEgqfj9vatRLhtezEGYqqR+Ev2kb L/kGzOgu7ZPNWaQeGlgw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gRhOK-0006ro-AS; Tue, 27 Nov 2018 17:43:20 +0000 Received: from mail-ed1-x544.google.com ([2a00:1450:4864:20::544]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1gRhOG-0006r5-FW for linux-arm-kernel@lists.infradead.org; Tue, 27 Nov 2018 17:43:18 +0000 Received: by mail-ed1-x544.google.com with SMTP id r27so19763311eda.0 for ; Tue, 27 Nov 2018 09:43:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6QrUJpPCjln4OZuzT2t6SLO6BBepUVtyafXp2P9uQCs=; b=SW6hBBTaFRoJHTarjpynRDTmRZXr98KMwrjFUwr8j/t2A4ExQJx+S7gUvz1e0S48XL tC8628YVh+pZJnB3HfH24l2uARpx325k/mCoLg2SArVrNbDsgDf5MDYZLeN2/MRc7f8/ OxTEMIIwT1g16ivSj5FhomuO2ufV7mpkMZmds= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6QrUJpPCjln4OZuzT2t6SLO6BBepUVtyafXp2P9uQCs=; b=Tn+ye5GM6HjbnxXKzZurS+Q8dzrmof3G+jorhe3reB1JZ/5llWnyjQQFXqEkfmgFjN lTXqkQjtZ/kggjvfij5SSINi8jVAKPzZnrCR3jS/Gdt6R0SYXebICNaHYkoDM1s0UzLI o3HlwnewdBHbsGUOSJaUbc53cqNWXZOaiPyaD7DHtHJPhY4vS9zeisyseeuRuVZvQnYx pExMlBMfrtoXW4ln09UvfHfTk3kAVmPoNC5gcSHsnnxPlxe/R+71mTZlU1/Y1Hxry11c hMUzQJCPz23mTjoWXPMaO4klNKwbsJJFqJ/HlnW4aRgR9xh2eJzxP1LMAOVFEdmso5ND We2A== X-Gm-Message-State: AA+aEWbsf2u3JDfd57IRH3iEgid7Fo5w1HuRaztqy8/MC07O2LkdauFe Hj63d/d164RO6QPVnnP2lmYjmw== X-Google-Smtp-Source: AFSGD/VgBDXh2l8QOf2XzFqop014a+3xfUXfI/got74cGvJ6uelmz9wB8q7nIXMXVLW5Cl+7JIPpxQ== X-Received: by 2002:a50:9849:: with SMTP id h9mr26210355edb.36.1543340583791; Tue, 27 Nov 2018 09:43:03 -0800 (PST) Received: from harold.home ([2a01:cb1d:112:6f00:f523:5d63:a56a:3d76]) by smtp.gmail.com with ESMTPSA id u33sm1212459edm.88.2018.11.27.09.43.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 Nov 2018 09:43:02 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Subject: [PATCH] arm64/lib: improve CRC32 performance for deep pipelines Date: Tue, 27 Nov 2018 18:42:55 +0100 Message-Id: <20181127174255.24372-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20181127_094316_543695_90C65F03 X-CRM114-Status: GOOD ( 12.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: catalin.marinas@arm.com, Rui Sun , will.deacon@arm.com, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include extern void crc32_le(unsigned short, char const*, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 * 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun Signed-off-by: Ard Biesheuvel --- Cortex-A57 tcrypt results after the patch. I ran Rui's code [0] as well. On Cortex-A57, performance regresses a bit more (but not dramatically). On Cortex-A72, it executes at $ time ./crc32 real 0m9.625s user 0m9.625s sys 0m0.000s Rui, can you please benchmark this code on your system as well? [0] https://lore.kernel.org/lkml/1542612560-10089-1-git-send-email-sunrui26@huawei.com/ arch/arm64/lib/crc32.S | 54 ++++++++++++++++++-- 1 file changed, 49 insertions(+), 5 deletions(-) diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S index 5bc1e85b4e1c..f132f2a7522e 100644 --- a/arch/arm64/lib/crc32.S +++ b/arch/arm64/lib/crc32.S @@ -15,15 +15,59 @@ .cpu generic+crc .macro __crc32, c -0: subs x2, x2, #16 - b.mi 8f - ldp x3, x4, [x1], #16 + cmp x2, #16 + b.lt 8f // less than 16 bytes + + and x7, x2, #0x1f + and x2, x2, #~0x1f + cbz x7, 32f // multiple of 32 bytes + + and x8, x7, #0xf + ldp x3, x4, [x1] + add x8, x8, x1 + add x1, x1, x7 + ldp x5, x6, [x8] CPU_BE( rev x3, x3 ) CPU_BE( rev x4, x4 ) +CPU_BE( rev x5, x5 ) +CPU_BE( rev x6, x6 ) + + tst x7, #8 + crc32\c\()x w8, w0, x3 + csel x3, x3, x4, eq + csel w0, w0, w8, eq + tst x7, #4 + lsr x4, x3, #32 + crc32\c\()w w8, w0, w3 + csel x3, x3, x4, eq + csel w0, w0, w8, eq + tst x7, #2 + lsr w4, w3, #16 + crc32\c\()h w8, w0, w3 + csel w3, w3, w4, eq + csel w0, w0, w8, eq + tst x7, #1 + crc32\c\()b w8, w0, w3 + csel w0, w0, w8, eq + tst x7, #16 + crc32\c\()x w8, w0, x5 + crc32\c\()x w8, w8, x6 + csel w0, w0, w8, eq + cbz x2, 0f + +32: ldp x3, x4, [x1], #32 + sub x2, x2, #32 + ldp x5, x6, [x1, #-16] +CPU_BE( rev x3, x3 ) +CPU_BE( rev x4, x4 ) +CPU_BE( rev x5, x5 ) +CPU_BE( rev x6, x6 ) crc32\c\()x w0, w0, x3 crc32\c\()x w0, w0, x4 - b.ne 0b - ret + crc32\c\()x w0, w0, x5 + crc32\c\()x w0, w0, x6 + cbnz x2, 32b +0: ret 8: tbz x2, #3, 4f ldr x3, [x1], #8