From patchwork Tue May 11 16:12:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Murphy X-Patchwork-Id: 12251511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34C4DC433B4 for ; Tue, 11 May 2021 16:32:07 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 973B66187E for ; Tue, 11 May 2021 16:32:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 973B66187E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-Id:Date: Subject:Cc:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=zvtu6fKahSEHRwkYu3QVbCfKWyCrAYkGAA5Mur4E8cQ=; b=ipIYRrzNhqxhvZcDDDTfMReKZ Rvt3K6Z0eshcgpe1UWYK3KMdMKmN3nXCCNlZMt0YfznJxM6Q5Sa+CuKst2kBoR1t8eWdP6RuJ72TM QbYvRyyic07ZzJsTkvqRXWrN3tLSsHJ5gpMmwXcF1Wo64AkkggXybZdqldEwU7I2txvRxUYeNE8Vc au8srfRnHNQqc6GS2Pyb8Dbq1fuydXMBVZNzyK0NYKQHvqtI0Y2CXir9IAAp8J4XwY55VQYcoPPjC KDn4Ig+5rtxL6rhc7dyf9ADMzzleYpPIHkOQdrgBegCkxDMRcpGGYPc8gi/Rk2LyeAMMsua16L9v3 IVxq+hrRQ==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lgVGv-000kEX-VZ; Tue, 11 May 2021 16:30:16 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lgV0Q-000go6-Mw for linux-arm-kernel@desiato.infradead.org; Tue, 11 May 2021 16:13:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=g8Ky6cdgSyuNrl2ID25llExQw3mgyvvWXZJejYyp+Ig=; b=EXhIMR7B7smlYps4zmR1y0rJbx EYkv61gqieDktoAg2+wRm29Mzps254YKQldFUeC0IjYV9YM69eC76QLX9vG9FcHEiYJ6qY3rv43TR BuLxXl9kFBrG7XlIZlUZ/098Cr5+lRECDb5CE1hnSXEZIcK76rGQPYMEI0VGCZt6eMTYv4GJjldNL 4Rk6jbsMLtoa4FBk43AmNXF6NBg2MjHAGzXZ9HLIc/Oxjy0UlNz2s3VhmLF0JEAMEAWRWUjWGjYIQ GTlgHgbfqcLmEXAdaBQh1e0nR7lnWKq2hqMrwd/ep+2+tAZ2IxnJlCn4ie999jDBctThVyGqd1tc1 yrxNBBlQ==; Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lgV0N-009kXg-KY for linux-arm-kernel@lists.infradead.org; Tue, 11 May 2021 16:13:09 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CCF311756; Tue, 11 May 2021 09:12:50 -0700 (PDT) Received: from e110467-lin.cambridge.arm.com (e110467-lin.cambridge.arm.com [10.1.196.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 10A3F3F792; Tue, 11 May 2021 09:12:49 -0700 (PDT) From: Robin Murphy To: will@kernel.org, catalin.marinas@arm.com Cc: linux-arm-kernel@lists.infradead.org, yangyingliang@huawei.com, shenkai8@huawei.com Subject: [PATCH 7/8] arm64: Better optimised memchr() Date: Tue, 11 May 2021 17:12:37 +0100 Message-Id: <68d3229c208780fdde88ba48e14fcde8c50f6e76.1620738177.git.robin.murphy@arm.com> X-Mailer: git-send-email 2.21.0.dirty In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210511_091307_743123_3848779C X-CRM114-Status: GOOD ( 10.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Although we implement our own assembly version of memchr(), it turns out to be barely any better than what GCC can generate for the generic C version (and would go wrong if the size_t argument were ever large enough to be interpreted as negative). Unfortunately we can't import the tuned implementation from the Arm optimized-routines library, since that has some Advanced SIMD parts which are not really viable for general kernel library code. What we can do, however, is pep things up with some relatively straightforward word-at-a-time logic for larger calls. Adding some timing to optimized-routines' memchr() test for a simple benchmark, overall this version comes in around half as fast as the SIMD code, but still nearly 4x faster than our existing implementation. Signed-off-by: Robin Murphy --- arch/arm64/lib/memchr.S | 65 +++++++++++++++++++++++++++++++++-------- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/arch/arm64/lib/memchr.S b/arch/arm64/lib/memchr.S index edf6b970a277..7c2276fdab54 100644 --- a/arch/arm64/lib/memchr.S +++ b/arch/arm64/lib/memchr.S @@ -1,9 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Based on arch/arm/lib/memchr.S - * - * Copyright (C) 1995-2000 Russell King - * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2021 Arm Ltd. */ #include @@ -19,16 +16,60 @@ * Returns: * x0 - address of first occurrence of 'c' or 0 */ + +#define L(label) .L ## label + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f + +#define srcin x0 +#define chrin w1 +#define cntin x2 + +#define result x0 + +#define wordcnt x3 +#define rep01 x4 +#define repchr x5 +#define cur_word x6 +#define cur_byte w6 +#define tmp x7 +#define tmp2 x8 + + .p2align 4 + nop SYM_FUNC_START_WEAK_PI(memchr) - and w1, w1, #0xff -1: subs x2, x2, #1 - b.mi 2f - ldrb w3, [x0], #1 - cmp w3, w1 - b.ne 1b - sub x0, x0, #1 + and chrin, chrin, #0xff + lsr wordcnt, cntin, #3 + cbz wordcnt, L(byte_loop) + mov rep01, #REP8_01 + mul repchr, x1, rep01 + and cntin, cntin, #7 +L(word_loop): + ldr cur_word, [srcin], #8 + sub wordcnt, wordcnt, #1 + eor cur_word, cur_word, repchr + sub tmp, cur_word, rep01 + orr tmp2, cur_word, #REP8_7f + bics tmp, tmp, tmp2 + b.ne L(found_word) + cbnz wordcnt, L(word_loop) +L(byte_loop): + cbz cntin, L(not_found) + ldrb cur_byte, [srcin], #1 + sub cntin, cntin, #1 + cmp cur_byte, chrin + b.ne L(byte_loop) + sub srcin, srcin, #1 ret -2: mov x0, #0 +L(found_word): +CPU_LE( rev tmp, tmp) + clz tmp, tmp + sub tmp, tmp, #64 + add result, srcin, tmp, asr #3 + ret +L(not_found): + mov result, #0 ret SYM_FUNC_END_PI(memchr) EXPORT_SYMBOL_NOKASAN(memchr)