From patchwork Tue Feb 15 17:07:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joey Gouly X-Patchwork-Id: 12747369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EC9B0C433F5 for ; Tue, 15 Feb 2022 17:10:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=cQ74AqfsAIKNZ1dD+Cz9520Wp40q4CdFIaGEgpZ2Z6k=; b=KzpQmqv8K31L3r u23mj2Y7rXzVoQvPsgHI1iEo8/pEpdA7favcKqIH/vccnN/0Ve6VgpIj+brJo8euvY6z0LDBEOki5 GqyiRPuiEx5WuJfsU2wJVRom0JY0H97Yl64hXHVGqStBJ0FB6M6lt8O/Vyjqxp2pLgFr9IN/zrlPi DDSr7ZOwVfNCIjNzqkyaztMFWMl90YM7n4sPQNOKyYj2dp03CvmQ672EgJ4DLiLUW1n1Xi2xxm0P0 CcP0dJYqWaoJjMu+uYtLyQ8OsDHU4/Kk/N3lBLZmZoqaQIkpWmE6TJYVeoDrXIcYFuTrsP4nnsMlQ 4ob8UgXB9ATt99+z5pmA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nK1JN-003pow-Fu; Tue, 15 Feb 2022 17:08:21 +0000 Received: from mail-db8eur05on20604.outbound.protection.outlook.com ([2a01:111:f400:7e1a::604] helo=EUR05-DB8-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nK1Iy-003piL-DP for linux-arm-kernel@lists.infradead.org; Tue, 15 Feb 2022 17:08:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7St+6tdKDAQvSf6DF+e7xpaqGJKLB2tp9iBvCw0PqRk=; b=CxArne4SG+ivZUXptlfTAk/7PClniazL89YmWDCXSIK3R05K3D0ikcsvQxnrn/kueOFig32Rg62JTvtfG1Zaa5mM3U80AUSSkjoyPai3atM2c5thzZLFAZRD89Ont06R/v8tKWnU91ebu09yTs2LMSwjYlp6PXUsuIXy/G+8WMU= Received: from AM6P195CA0016.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::29) by VI1PR08MB5405.eurprd08.prod.outlook.com (2603:10a6:803:12f::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11; Tue, 15 Feb 2022 17:07:51 +0000 Received: from AM5EUR03FT047.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::a0) by AM6P195CA0016.outlook.office365.com (2603:10a6:209:81::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4995.14 via Frontend Transport; Tue, 15 Feb 2022 17:07:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT047.mail.protection.outlook.com (10.152.16.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11 via Frontend Transport; Tue, 15 Feb 2022 17:07:50 +0000 Received: ("Tessian outbound 1f399c739551:v113"); Tue, 15 Feb 2022 17:07:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7ec87b82db7ed525 X-CR-MTA-TID: 64aa7808 Received: from 78b22c12bee9.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A4307632-81D1-4187-83F7-2EC65B202359.1; Tue, 15 Feb 2022 17:07:43 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 78b22c12bee9.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 15 Feb 2022 17:07:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fNHEoM3EcL0nvfJSMX9psllkxuCz8AGUu5r7QoQ8kPGKTgt/WJcG0F21NwkO8ULQ0AX3Yh0pGp2v5enFm9q+Suul/dCqboWQeFVB3JW4o4rmh4dXaiwc8fJwdJ5KsyyW6btxRxIjG3qEbMrBzvwuRnZdP5Bp9Jwx1Pr3PxCVikAie8w4SC7zii9Y4fB18LULJ3qMKJGR10B4kNVugow2MXqGnccchIVWDW6twjCdamnEumPufFhi3GA9K3B8qVHxvZf3ZqpmZbM539XKh8CEFcvq+rNqYHrkjJAhCvMCMLEtRduCztpNZVtwLoBC6yJt2+naCUu8ktRGHxbqmqaH2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7St+6tdKDAQvSf6DF+e7xpaqGJKLB2tp9iBvCw0PqRk=; b=Q5ih0PRWj3VZy71lsG5KDN/UkVklG+D1U/pDkhCs/H0nCGg8CQdQBPwkrF8jBMFayZn6myVxgc3jCNYisI8qYjSNND7w1s1vgALAbpLRJbRCAgSQUTIhX5fiZDPoTbOLQCrNCx/vSWZO4+h4wKoKvyA6nsFiDQXkZ3RGMpHUuvtdz40ZHHpjjieFJUBKoaHJAsWt3tCWJOG1NemCGUUUTY5ajSB9CfcT6FPJGGOzqwl4D6ABAobDloWxbDqrB0laZyCYv6dPP/RxWVMtNpq3cjIPmi2+zTrgVv0pQ6+5eOlfeyQ+23I5k6rgtqpcSw88ze4/38lrItsSGZ6Xqd4BiQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=lists.infradead.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7St+6tdKDAQvSf6DF+e7xpaqGJKLB2tp9iBvCw0PqRk=; b=CxArne4SG+ivZUXptlfTAk/7PClniazL89YmWDCXSIK3R05K3D0ikcsvQxnrn/kueOFig32Rg62JTvtfG1Zaa5mM3U80AUSSkjoyPai3atM2c5thzZLFAZRD89Ont06R/v8tKWnU91ebu09yTs2LMSwjYlp6PXUsuIXy/G+8WMU= Received: from AM6PR05CA0027.eurprd05.prod.outlook.com (2603:10a6:20b:2e::40) by VE1PR08MB5261.eurprd08.prod.outlook.com (2603:10a6:803:10d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.17; Tue, 15 Feb 2022 17:07:40 +0000 Received: from AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2e:cafe::e4) by AM6PR05CA0027.outlook.office365.com (2603:10a6:20b:2e::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4995.14 via Frontend Transport; Tue, 15 Feb 2022 17:07:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; Received: from nebula.arm.com (40.67.248.234) by AM5EUR03FT011.mail.protection.outlook.com (10.152.16.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4975.11 via Frontend Transport; Tue, 15 Feb 2022 17:07:38 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Tue, 15 Feb 2022 17:07:42 +0000 Received: from e124191.cambridge.arm.com (10.1.197.45) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2308.20 via Frontend Transport; Tue, 15 Feb 2022 17:07:42 +0000 From: Joey Gouly To: CC: , , , , , Subject: [PATCH v1 1/3] arm64: lib: Import latest version of Arm Optimized Routines' strcmp Date: Tue, 15 Feb 2022 17:07:21 +0000 Message-ID: <20220215170723.21266-2-joey.gouly@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220215170723.21266-1-joey.gouly@arm.com> References: <20220215170723.21266-1-joey.gouly@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-Office365-Filtering-Correlation-Id: a47cfc6e-8ec7-487b-f996-08d9f0a5b2cf X-MS-TrafficTypeDiagnostic: VE1PR08MB5261:EE_|AM5EUR03FT047:EE_|VI1PR08MB5405:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:5236;OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: iHNOPrESG4/5LEDhkrC2KXxA8a2IbcX3ImlZBQRgiVwlHaAqXIhB+2W1tTUx9v2gfJXykCKPA8RwIESBEpCjjmpik6SV+aj7qLBKlz9YCprDF9xjUkDHkhzxLTqM2AVdmrNK/vRDF9EAppuOh2vUS9LLqhaFlxrNjq3NAEawNfRzeGXJkFksXFUR85ulFPcFU4UoXQS7Ku01JWm4ioGLiaOuukTz1Egp1cMpIdFFkJkwRTr64WY3z7o3msXtpk/JauyHa7d+0Vp1F/4vl5iyPI8z5UkQadiRFFBGZ5RGQ+bMUHcJ64PL/n1GrxQtZIQ6ImfeyvvCPdpjKQ/hZZZ/iSR/4G0Tzhi6vcdnTtFMADxOFfClvzMUxEKtL4yxZl+4qnIF1VUlu+Lbs2BQ8p8kkxTG9OwfAYWrPZzfetnfMC3z5Q3nMzosvzFy4BCGB1ZTHNEZRo2KFLJG0ctfCv0PTBi80JzPyt8kCC2TxCh5S2JGSe8WpUN0D65GZOwtjxH39kXjoj4JVK+a1Rzs9boEEmTxgoQyBKKQiiOHWzsoZc5HZ3b3AD/OGUazJzEu3EUl/4neP3q3VGjlWVooUvrcaWiYeE1ckq15Sx+ugH3yUvZjxd8Wzs/45MxGdyyRi1jvgA37NpC2/HqOTKXnjp/6eOYr4hcYo8UxNgsFo8a7xxnptGnXTh+3mut2e1iICd0Wykpb/snACv0gxB4PgrIo2T8MtSJ44IP8ZfQBYTBFO/e5GKhU9dESs8kbICBXvSFt39LEeg+3QgWcQfOxn1sRrHHM67D6/fUmbY0J7+eVHc6VS+BzvwOTYnilcGpt9r0WqYWJQ0nGlgSWZtIMEo5KJGJhfmXwuYgs2sWo8Wpjm2oKnx+Y68w75wAl1vBqAh5DZXjOC4UGdr9aFOOBZSqBRUKNEHZFkPxwh09tpinPGos= X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(966005)(36756003)(7696005)(2906002)(6916009)(6666004)(316002)(44832011)(4326008)(2616005)(86362001)(508600001)(54906003)(8936002)(1076003)(5660300002)(336012)(426003)(186003)(8676002)(70206006)(70586007)(82310400004)(26005)(356005)(40460700003)(36860700001)(47076005)(83380400001)(81166007)(17423001)(156123004)(36900700001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5261 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: eab01dcf-6a5d-46cb-ceec-08d9f0a5ab9c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eSVTNo8j7oeHEj/CguDMPexwziq7+nAmWxm7ceSB+x3fgh4GXsEV8rpQDKs1bbrt71UFA9tb4pgH1cwzP4UR4Dt7IUQdQWCUbc3gxOuAWKvGeagPjEciFGmmjQ0qLWU9Xo6rzWM64IVEFyS6ByVVTGI4T/hUtHEpKuipLwhSE5WewlXsWjuBSM005DTiuzBm+otymLnBzgOe5DT+0SE/YOJ5/ivXjPV5U7ZV0YVNuc5fex866y3hRFjWcffi5gv2CHQbI3/Ll/EyDqfdqGbmYapYGQNM4agCQ7KLTaMeB5flI8/xEDR/s0NlvQwpEIO6FLOqbSIFEhAdfVGw+0ZMy/f0vegLqNNSn/WcLdHwcvoLo2HZOD4IgmlGyw11ke0BuNt+mEx3KvQ6K8fR8bWuksvFrd94y+OWH4LVwvXZUZEhHSnotfxQq/LcVrzJOFsEjh+4meOpWAUnbjCsfwdy0G1QsiISrgOMEGP+v7QcudzC7s77Ksez6o7lyZSzP9aCWcr7IS3u68r7SIrGp87RLljS7otkVmBAxblEKvhS2VR2SMmkPAirmpnWZDNgXlG8Qa7LbGhsH81yR+S0Erv2cyKfebUtf12X/WSuYBQ6Hf6OBfpSM5by5jqhWZ3y8c4nfj3z0SRCmQbBEuigSjlpB0eQ+nsva2/x9ei4ipaAk39fVZlbN5+GaWH2rCNgFvCIUBbp2WR9kpixQf8CGHREwmagU2w2hhehmV0RxmREfRhaNexq4koDdNOmhpmzg6T6TNBtmeDVXncbo8VmrKhJ22Wmre2WvKKtj5mkwvEiO1q2e/kPBsGNNM+LCe8PSNUQPYPZbDkuAqjJ59Bmv1/KyTh88iu2S4oc4AWg6IjxCWLsNFOcT7oMD2GItwpRIJp/ X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(6916009)(186003)(336012)(316002)(966005)(508600001)(426003)(36756003)(81166007)(54906003)(26005)(86362001)(1076003)(8936002)(83380400001)(4326008)(8676002)(40460700003)(44832011)(2906002)(70586007)(70206006)(6666004)(82310400004)(5660300002)(2616005)(47076005)(36860700001)(7696005)(107886003)(17423001)(156123004)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Feb 2022 17:07:50.9186 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a47cfc6e-8ec7-487b-f996-08d9f0a5b2cf X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5405 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220215_090756_680164_28E4DAF7 X-CRM114-Status: GOOD ( 15.09 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Import the latest version of the Arm Optimized Routines strcmp function based on the upstream code of string/aarch64/strcmp.S at commit 189dfefe37d5 from: https://github.com/ARM-software/optimized-routines This latest version includes MTE support. Signed-off-by: Joey Gouly Cc: Robin Murphy Cc: Mark Rutland Cc: Catalin Marinas Cc: Will Deacon --- arch/arm64/lib/strcmp.S | 238 +++++++++++++++++++++------------------- 1 file changed, 126 insertions(+), 112 deletions(-) diff --git a/arch/arm64/lib/strcmp.S b/arch/arm64/lib/strcmp.S index 83bcad72ec97..758de77afd2f 100644 --- a/arch/arm64/lib/strcmp.S +++ b/arch/arm64/lib/strcmp.S @@ -1,9 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2012-2021, Arm Limited. + * Copyright (c) 2012-2022, Arm Limited. * * Adapted from the original at: - * https://github.com/ARM-software/optimized-routines/blob/afd6244a1f8d9229/string/aarch64/strcmp.S + * https://github.com/ARM-software/optimized-routines/blob/189dfefe37d54c5b/string/aarch64/strcmp.S */ #include @@ -11,161 +11,175 @@ /* Assumptions: * - * ARMv8-a, AArch64 + * ARMv8-a, AArch64. + * MTE compatible. */ #define L(label) .L ## label #define REP8_01 0x0101010101010101 #define REP8_7f 0x7f7f7f7f7f7f7f7f -#define REP8_80 0x8080808080808080 -/* Parameters and result. */ #define src1 x0 #define src2 x1 #define result x0 -/* Internal variables. */ #define data1 x2 #define data1w w2 #define data2 x3 #define data2w w3 #define has_nul x4 #define diff x5 +#define off1 x5 #define syndrome x6 -#define tmp1 x7 -#define tmp2 x8 -#define tmp3 x9 -#define zeroones x10 -#define pos x11 - - /* Start of performance-critical section -- one 64B cache line. */ - .align 6 +#define tmp x6 +#define data3 x7 +#define zeroones x8 +#define shift x9 +#define off2 x10 + +/* On big-endian early bytes are at MSB and on little-endian LSB. + LS_FW means shifting towards early bytes. */ +#ifdef __AARCH64EB__ +# define LS_FW lsl +#else +# define LS_FW lsr +#endif + +/* NUL detection works on the principle that (X - 1) & (~X) & 0x80 + (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and + can be done in parallel across the entire word. + Since carry propagation makes 0x1 bytes before a NUL byte appear + NUL too in big-endian, byte-reverse the data before the NUL check. */ + + SYM_FUNC_START_WEAK_PI(strcmp) - eor tmp1, src1, src2 - mov zeroones, #REP8_01 - tst tmp1, #7 + sub off2, src2, src1 + mov zeroones, REP8_01 + and tmp, src1, 7 + tst off2, 7 b.ne L(misaligned8) - ands tmp1, src1, #7 - b.ne L(mutual_align) - /* NUL detection works on the principle that (X - 1) & (~X) & 0x80 - (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and - can be done in parallel across the entire word. */ + cbnz tmp, L(mutual_align) + + .p2align 4 + L(loop_aligned): - ldr data1, [src1], #8 - ldr data2, [src2], #8 + ldr data2, [src1, off2] + ldr data1, [src1], 8 L(start_realigned): - sub tmp1, data1, zeroones - orr tmp2, data1, #REP8_7f - eor diff, data1, data2 /* Non-zero if differences found. */ - bic has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */ +#ifdef __AARCH64EB__ + rev tmp, data1 + sub has_nul, tmp, zeroones + orr tmp, tmp, REP8_7f +#else + sub has_nul, data1, zeroones + orr tmp, data1, REP8_7f +#endif + bics has_nul, has_nul, tmp /* Non-zero if NUL terminator. */ + ccmp data1, data2, 0, eq + b.eq L(loop_aligned) +#ifdef __AARCH64EB__ + rev has_nul, has_nul +#endif + eor diff, data1, data2 orr syndrome, diff, has_nul - cbz syndrome, L(loop_aligned) - /* End of performance-critical section -- one 64B cache line. */ - L(end): -#ifndef __AARCH64EB__ +#ifndef __AARCH64EB__ rev syndrome, syndrome rev data1, data1 - /* The MS-non-zero bit of the syndrome marks either the first bit - that is different, or the top bit of the first zero byte. - Shifting left now will bring the critical information into the - top bits. */ - clz pos, syndrome rev data2, data2 - lsl data1, data1, pos - lsl data2, data2, pos - /* But we need to zero-extend (char is unsigned) the value and then - perform a signed 32-bit subtraction. */ - lsr data1, data1, #56 - sub result, data1, data2, lsr #56 - ret -#else - /* For big-endian we cannot use the trick with the syndrome value - as carry-propagation can corrupt the upper bits if the trailing - bytes in the string contain 0x01. */ - /* However, if there is no NUL byte in the dword, we can generate - the result directly. We can't just subtract the bytes as the - MSB might be significant. */ - cbnz has_nul, 1f - cmp data1, data2 - cset result, ne - cneg result, result, lo - ret -1: - /* Re-compute the NUL-byte detection, using a byte-reversed value. */ - rev tmp3, data1 - sub tmp1, tmp3, zeroones - orr tmp2, tmp3, #REP8_7f - bic has_nul, tmp1, tmp2 - rev has_nul, has_nul - orr syndrome, diff, has_nul - clz pos, syndrome - /* The MS-non-zero bit of the syndrome marks either the first bit - that is different, or the top bit of the first zero byte. +#endif + clz shift, syndrome + /* The most-significant-non-zero bit of the syndrome marks either the + first bit that is different, or the top bit of the first zero byte. Shifting left now will bring the critical information into the top bits. */ - lsl data1, data1, pos - lsl data2, data2, pos + lsl data1, data1, shift + lsl data2, data2, shift /* But we need to zero-extend (char is unsigned) the value and then perform a signed 32-bit subtraction. */ - lsr data1, data1, #56 - sub result, data1, data2, lsr #56 + lsr data1, data1, 56 + sub result, data1, data2, lsr 56 ret -#endif + + .p2align 4 L(mutual_align): /* Sources are mutually aligned, but are not currently at an alignment boundary. Round down the addresses and then mask off - the bytes that preceed the start point. */ - bic src1, src1, #7 - bic src2, src2, #7 - lsl tmp1, tmp1, #3 /* Bytes beyond alignment -> bits. */ - ldr data1, [src1], #8 - neg tmp1, tmp1 /* Bits to alignment -64. */ - ldr data2, [src2], #8 - mov tmp2, #~0 -#ifdef __AARCH64EB__ - /* Big-endian. Early bytes are at MSB. */ - lsl tmp2, tmp2, tmp1 /* Shift (tmp1 & 63). */ -#else - /* Little-endian. Early bytes are at LSB. */ - lsr tmp2, tmp2, tmp1 /* Shift (tmp1 & 63). */ -#endif - orr data1, data1, tmp2 - orr data2, data2, tmp2 + the bytes that precede the start point. */ + bic src1, src1, 7 + ldr data2, [src1, off2] + ldr data1, [src1], 8 + neg shift, src2, lsl 3 /* Bits to alignment -64. */ + mov tmp, -1 + LS_FW tmp, tmp, shift + orr data1, data1, tmp + orr data2, data2, tmp b L(start_realigned) L(misaligned8): /* Align SRC1 to 8 bytes and then compare 8 bytes at a time, always - checking to make sure that we don't access beyond page boundary in - SRC2. */ - tst src1, #7 - b.eq L(loop_misaligned) + checking to make sure that we don't access beyond the end of SRC2. */ + cbz tmp, L(src1_aligned) L(do_misaligned): - ldrb data1w, [src1], #1 - ldrb data2w, [src2], #1 - cmp data1w, #1 - ccmp data1w, data2w, #0, cs /* NZCV = 0b0000. */ + ldrb data1w, [src1], 1 + ldrb data2w, [src2], 1 + cmp data1w, 0 + ccmp data1w, data2w, 0, ne /* NZCV = 0b0000. */ b.ne L(done) - tst src1, #7 + tst src1, 7 b.ne L(do_misaligned) -L(loop_misaligned): - /* Test if we are within the last dword of the end of a 4K page. If - yes then jump back to the misaligned loop to copy a byte at a time. */ - and tmp1, src2, #0xff8 - eor tmp1, tmp1, #0xff8 - cbz tmp1, L(do_misaligned) - ldr data1, [src1], #8 - ldr data2, [src2], #8 - - sub tmp1, data1, zeroones - orr tmp2, data1, #REP8_7f - eor diff, data1, data2 /* Non-zero if differences found. */ - bic has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */ +L(src1_aligned): + neg shift, src2, lsl 3 + bic src2, src2, 7 + ldr data3, [src2], 8 +#ifdef __AARCH64EB__ + rev data3, data3 +#endif + lsr tmp, zeroones, shift + orr data3, data3, tmp + sub has_nul, data3, zeroones + orr tmp, data3, REP8_7f + bics has_nul, has_nul, tmp + b.ne L(tail) + + sub off1, src2, src1 + + .p2align 4 + +L(loop_unaligned): + ldr data3, [src1, off1] + ldr data2, [src1, off2] +#ifdef __AARCH64EB__ + rev data3, data3 +#endif + sub has_nul, data3, zeroones + orr tmp, data3, REP8_7f + ldr data1, [src1], 8 + bics has_nul, has_nul, tmp + ccmp data1, data2, 0, eq + b.eq L(loop_unaligned) + + lsl tmp, has_nul, shift +#ifdef __AARCH64EB__ + rev tmp, tmp +#endif + eor diff, data1, data2 + orr syndrome, diff, tmp + cbnz syndrome, L(end) +L(tail): + ldr data1, [src1] + neg shift, shift + lsr data2, data3, shift + lsr has_nul, has_nul, shift +#ifdef __AARCH64EB__ + rev data2, data2 + rev has_nul, has_nul +#endif + eor diff, data1, data2 orr syndrome, diff, has_nul - cbz syndrome, L(loop_misaligned) b L(end) L(done):