From patchwork Tue Feb 15 17:07:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joey Gouly X-Patchwork-Id: 12747370 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E918C433EF for ; Tue, 15 Feb 2022 17:10:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=URhjIocg/v3vX1xMHZ/WZLlZPwjYmydnmFkkwnkeeaI=; b=iH0zXJDRkGSMJB 1sWsT+axj/pkL2l6AmXTW4CatO0R3Sq88c8C9FG/UKUTQ8OPMVd/evrK2rgxi7TiSkfg0Mk9asHMa fNXM++oVV/RfaWXB85XfdWlR8SWaTKn+JY4e8o0cN2ALACDOdaVTvyFCwaCiFXhFMv64Yc+KQepKn i5Nz8/rfAFnKUKVzrbewb6/7hXibkVmagc+LfCzSkYfdEdFgwuv/eIFl//S5oEZaG2CJTRXCNGW+w ZhnWYXA+rhY5zqKJ2fCVQnhMD+97GkpTgvY4u7uQhwtTZJS7TT4onbPnLQpPMbaOwo4Zfin4IakXX g5t67l3GiOw3vEpkNUSw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nK1Ja-003ptA-R4; Tue, 15 Feb 2022 17:08:35 +0000 Received: from mail-he1eur04on0611.outbound.protection.outlook.com ([2a01:111:f400:fe0d::611] helo=EUR04-HE1-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nK1Iy-003piK-Ti for linux-arm-kernel@lists.infradead.org; Tue, 15 Feb 2022 17:08:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DSceSedckHVuGQDyA/AN2uzhWfucWsvHhIF0uLhkRo8=; b=xFfUJZ5y/nXG7AezNJDjea9RvlyEtNi/m49lIRJ6v2UoTdPjJIZuejUWsDqYDEAIWXVmxq8OqyW3nsxoXfW0UNe8tQs5HBlazzD83mA9rkjG/qGPUtu23Jekyh49IL7k0xFeFd/zENOHCIpaTJ/5irPkyC3XJxRs/jF2fFR0klg= Received: from DB8P191CA0010.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::20) by PR2PR08MB5225.eurprd08.prod.outlook.com (2603:10a6:101:1c::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.17; Tue, 15 Feb 2022 17:07:50 +0000 Received: from DB5EUR03FT003.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:130:cafe::e9) by DB8P191CA0010.outlook.office365.com (2603:10a6:10:130::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.12 via Frontend Transport; Tue, 15 Feb 2022 17:07:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT003.mail.protection.outlook.com (10.152.20.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11 via Frontend Transport; Tue, 15 Feb 2022 17:07:50 +0000 Received: ("Tessian outbound 741ca6c82739:v113"); Tue, 15 Feb 2022 17:07:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3601649839c718bf X-CR-MTA-TID: 64aa7808 Received: from 3e3df595a1b6.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C2F9A8A4-71F3-4E37-BEF3-9CE3652A5F0B.1; Tue, 15 Feb 2022 17:07:43 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3e3df595a1b6.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 15 Feb 2022 17:07:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QKewBvgumWSczDILeQWBMxG74ItvVz8OBzPXj7J9gV1i8vsQn3tHuCvJ9qiDD7Pa62pW6V8X6JotwW3pgLBesv4cOcLGBAAnOT8J1f9jhlMnVRr8gWDGrZlMunw+nwSR3Bqi6fE3NvaR+XxtS13tgV0Z0EncVxVvi0wEMaDhTd3lV6u4YvmXiv7oGNKGuJkpNI5le+Kvbr+jYkixkx702PPtb48xdDiVJ0hz/u2U0iuEQTVpyCLTmyQNKStew0ryXtfTysNyiBWYLoXKcCx3TfPL3EFB0gRAaOrdaPJm8s926+6dmvnIrwLD/mZuQq6Nh1hBhTP3UcjG7pZZGYn8Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DSceSedckHVuGQDyA/AN2uzhWfucWsvHhIF0uLhkRo8=; b=hY3WmACnlZrQJkWPd5jbkNi+PHhhixJLwDia0tKbUuq+JsolH41D2xTvgDymIlui68SpuOFQXE0KNcCzsJN5di+4Rae8s10v9Ihwf76hGp+vnoE+l4ZjfZuHuqZDWrLOiaTYk5RcW4LxPF0R/hS9+mc/8y/qiPzr9KM1fjpI3WjvhaVk7MdWJJv5UJTZ2WPahelNjbXNmERJs8W/bDa2r7uG6WkjI7QeWLZMaJJiHX0NZGVHS+jLROKaCPFSSu53GZG5PwavF8E1jHQaBgmfH3XCLfPRtLtqmg7pSt4MJAmXjIgkBKelMOZzI38Djt8dB7zVR4tPO/1vApnlCr/l9Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=lists.infradead.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DSceSedckHVuGQDyA/AN2uzhWfucWsvHhIF0uLhkRo8=; b=xFfUJZ5y/nXG7AezNJDjea9RvlyEtNi/m49lIRJ6v2UoTdPjJIZuejUWsDqYDEAIWXVmxq8OqyW3nsxoXfW0UNe8tQs5HBlazzD83mA9rkjG/qGPUtu23Jekyh49IL7k0xFeFd/zENOHCIpaTJ/5irPkyC3XJxRs/jF2fFR0klg= Received: from AM6PR05CA0014.eurprd05.prod.outlook.com (2603:10a6:20b:2e::27) by AM6PR08MB4582.eurprd08.prod.outlook.com (2603:10a6:20b:8f::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.12; Tue, 15 Feb 2022 17:07:42 +0000 Received: from AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2e:cafe::ec) by AM6PR05CA0014.outlook.office365.com (2603:10a6:20b:2e::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4995.14 via Frontend Transport; Tue, 15 Feb 2022 17:07:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; Received: from nebula.arm.com (40.67.248.234) by AM5EUR03FT011.mail.protection.outlook.com (10.152.16.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4975.11 via Frontend Transport; Tue, 15 Feb 2022 17:07:41 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Tue, 15 Feb 2022 17:07:42 +0000 Received: from e124191.cambridge.arm.com (10.1.197.45) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2308.20 via Frontend Transport; Tue, 15 Feb 2022 17:07:42 +0000 From: Joey Gouly To: CC: , , , , , Subject: [PATCH v1 2/3] arm64: lib: Import latest version of Arm Optimized Routines' strncmp Date: Tue, 15 Feb 2022 17:07:22 +0000 Message-ID: <20220215170723.21266-3-joey.gouly@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220215170723.21266-1-joey.gouly@arm.com> References: <20220215170723.21266-1-joey.gouly@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-Office365-Filtering-Correlation-Id: 0af925d4-591f-434a-2e38-08d9f0a5b24e X-MS-TrafficTypeDiagnostic: AM6PR08MB4582:EE_|DB5EUR03FT003:EE_|PR2PR08MB5225:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:4125;OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: K54h73AkRKPsyg+0OPwjm/cjlMWN0OMAeL8SRTd/3Qr3RVpbCPvPyJsIIGfzfOUQwNQvPRaW1dz2n2dkr9rZ8VK150QDBuKkEejFKgNa62WHRQQsZTKbRA/vtiVecucjRec6Y8prSKzNE0o2SU1zvCju0S0h3w3JGQMTCEYX8Cl2eaU4cFLWPQ39DrnFCXAxU7JOKOs0KodveyvappHwQLRo36grafyld0E5ixxEAlfYpU9TRrPyMj6XpbyG6t1ND7awtMoDpkaQLc3jYprjv/0tMoFjSCdCUQhsMqM4LaFhIkh1xqsFiAw6t1Gczi67/9ta4dLvy0TJDwSZnEe9QHo1oE1dD3Ufh/WAmgpyOfBoVCXLMuSLwXhnKMYJCyQimYE05WJvQEceziSjle/jmiGelkGZMgMCDn+6519rQQVqZzfNd48fC+8ZNg5q2lzNNViha9b5hHbpCtlHTMTk7Uwkdflrju4LX6S54kW7eJWyPlABFk/SiQwAjH+d8Wr6z02aOtgCC9DT5tCzV56T/LUINa5x1UcadXCrf8iF+brzy38HmT87VYczGvUwcx6a+JVEquhz4W2XkqGw72uUI6FKKbHeGUgr+mq1ZJgKXyUIdoiTsiTY5PYblylLgsnlCK7QF1BREmCLVNM/oiBwhrd+jiHxpVz5MV0bB2uxRB+R/um9ASLibfVf5RAzS50s2XKh9gdomhg3C5umnJN2VgM6fFQ7ZmlYEh/dQXbJQv5qkJTrB1z6lL4qkOpjwRhLG3KjQ6a75JBFOFJtQJ2NMPxuq0LJMlBv2OXJHa9GzlZKobbQZ/F3WwuYHb9oCQhAagXPaySo7mMCowF0S4W9UQU1WcqeXNGkn4PrHO+AiLi90rCW6Kk52+s6BOmCTAQrKBO6E5bMnxWoTVpOkNJ6ww== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(36860700001)(7696005)(36756003)(44832011)(2616005)(508600001)(86362001)(6666004)(40460700003)(6916009)(966005)(54906003)(336012)(426003)(1076003)(26005)(186003)(316002)(5660300002)(83380400001)(82310400004)(81166007)(47076005)(30864003)(8936002)(8676002)(70586007)(70206006)(4326008)(356005)(2906002)(17423001)(156123004)(36900700001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB4582 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 6d76380d-3145-4696-dade-08d9f0a5ad42 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wU7rARPe99cNvxcn4jfzXTylOWMq6ciKU8HbEBOT/JMn6hKqvB8ROM/V3dtno02zxKD26qMDkfh7/PVy3ODaU8aJwbi156mJ82kwY5aKpapNIGFmQD0pamXmO/pthdbjcOsuO6CV4Cg1DpR4QH5FGvk765u1MXyNVvy2t1gcgP+TnmOs4D2T2sXY3XMeEyFkOZ65Fp9PpiV9rsMcAhLFjNl9IzIdm1/HjbTsn/pTWVa4N+bxLRBrNff90IGfYNLe79FehD9daZNKyYue6OuE6ArIQ5Jo/v2fYsREwstBs416YDkGLZAnSKAlblr9A+BAbvWKRRe9W70DiCW6ZdFixKJBgIbJ8Vc1a8wYybpIahawlWKRirocj8K0i6n3DWaRfWEZ6hn7p0ajDILoAsjCPTFEt5it7Q6VVysvJPCu5SsNCSIwakhCrcHzO3EjYiXFmLghY2RqJDPbBUb8fQkO+Y1e2qTJ4nolEmlZToKipIvJCTstl5Q/O5lFlqW5RFoqGD85alZW7H1OUm752X5LSP5AbzNq96K+odye11h0lUfKkiEovxpgm0zYCDM8wUK9eqj/hGNBCibD0Rb1ZXYbmnblROPaEwYHIFkIElETpHbFKQUmY9PZRZFGje13F/lmMxlOK9a+O6Ko/L2fbbERbn3kXp54MLfYx2+Hb+Ugf+uPdERX3ewxz0KCMn0ZUigQbwTkGKMMm/VWhs/BNTH+Lq1n4XihTVgNpOwAADJjufzsmGxTmTd3gMNzXS1fLHTlTMZ5Z+nCH6ang5CNKDClAXJCA0cERme230H4cDuwLV3z4CD2qGuoSDdip0khovf/UhdxGU+pueeloqALeo21cpLxOAjw+7y+7VJ04h+lXgbhV3uGxqjQf5igYTQuuH9J X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(26005)(5660300002)(44832011)(107886003)(81166007)(186003)(82310400004)(36756003)(86362001)(7696005)(83380400001)(30864003)(4326008)(2906002)(8936002)(8676002)(1076003)(316002)(40460700003)(508600001)(47076005)(36860700001)(426003)(336012)(2616005)(70586007)(70206006)(6666004)(6916009)(966005)(54906003)(17423001)(156123004)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Feb 2022 17:07:50.1217 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0af925d4-591f-434a-2e38-08d9f0a5b24e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR2PR08MB5225 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220215_090757_222682_EC823AD7 X-CRM114-Status: GOOD ( 15.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Import the latest version of the Arm Optimized Routines strncmp function based on the upstream code of string/aarch64/strncmp.S at commit 189dfefe37d5 from: https://github.com/ARM-software/optimized-routines This latest version includes MTE support. Signed-off-by: Joey Gouly Cc: Robin Murphy Cc: Mark Rutland Cc: Catalin Marinas Cc: Will Deacon --- arch/arm64/lib/strncmp.S | 234 +++++++++++++++++++++++---------------- 1 file changed, 141 insertions(+), 93 deletions(-) diff --git a/arch/arm64/lib/strncmp.S b/arch/arm64/lib/strncmp.S index e42bcfcd37e6..a4884b97e9a8 100644 --- a/arch/arm64/lib/strncmp.S +++ b/arch/arm64/lib/strncmp.S @@ -1,9 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2013-2021, Arm Limited. + * Copyright (c) 2013-2022, Arm Limited. * * Adapted from the original at: - * https://github.com/ARM-software/optimized-routines/blob/e823e3abf5f89ecb/string/aarch64/strncmp.S + * https://github.com/ARM-software/optimized-routines/blob/189dfefe37d54c5b/string/aarch64/strncmp.S */ #include @@ -11,14 +11,14 @@ /* Assumptions: * - * ARMv8-a, AArch64 + * ARMv8-a, AArch64. + * MTE compatible. */ #define L(label) .L ## label #define REP8_01 0x0101010101010101 #define REP8_7f 0x7f7f7f7f7f7f7f7f -#define REP8_80 0x8080808080808080 /* Parameters and result. */ #define src1 x0 @@ -39,10 +39,24 @@ #define tmp3 x10 #define zeroones x11 #define pos x12 -#define limit_wd x13 -#define mask x14 -#define endloop x15 +#define mask x13 +#define endloop x14 #define count mask +#define offset pos +#define neg_offset x15 + +/* Define endian dependent shift operations. + On big-endian early bytes are at MSB and on little-endian LSB. + LS_FW means shifting towards early bytes. + LS_BK means shifting towards later bytes. + */ +#ifdef __AARCH64EB__ +#define LS_FW lsl +#define LS_BK lsr +#else +#define LS_FW lsr +#define LS_BK lsl +#endif SYM_FUNC_START_WEAK_PI(strncmp) cbz limit, L(ret0) @@ -52,9 +66,6 @@ SYM_FUNC_START_WEAK_PI(strncmp) and count, src1, #7 b.ne L(misaligned8) cbnz count, L(mutual_align) - /* Calculate the number of full and partial words -1. */ - sub limit_wd, limit, #1 /* limit != 0, so no underflow. */ - lsr limit_wd, limit_wd, #3 /* Convert to Dwords. */ /* NUL detection works on the principle that (X - 1) & (~X) & 0x80 (=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and @@ -64,56 +75,52 @@ L(loop_aligned): ldr data1, [src1], #8 ldr data2, [src2], #8 L(start_realigned): - subs limit_wd, limit_wd, #1 + subs limit, limit, #8 sub tmp1, data1, zeroones orr tmp2, data1, #REP8_7f eor diff, data1, data2 /* Non-zero if differences found. */ - csinv endloop, diff, xzr, pl /* Last Dword or differences. */ + csinv endloop, diff, xzr, hi /* Last Dword or differences. */ bics has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */ ccmp endloop, #0, #0, eq b.eq L(loop_aligned) /* End of main loop */ - /* Not reached the limit, must have found the end or a diff. */ - tbz limit_wd, #63, L(not_limit) - - /* Limit % 8 == 0 => all bytes significant. */ - ands limit, limit, #7 - b.eq L(not_limit) - - lsl limit, limit, #3 /* Bits -> bytes. */ - mov mask, #~0 -#ifdef __AARCH64EB__ - lsr mask, mask, limit -#else - lsl mask, mask, limit -#endif - bic data1, data1, mask - bic data2, data2, mask - - /* Make sure that the NUL byte is marked in the syndrome. */ - orr has_nul, has_nul, mask - -L(not_limit): +L(full_check): +#ifndef __AARCH64EB__ orr syndrome, diff, has_nul - -#ifndef __AARCH64EB__ + add limit, limit, 8 /* Rewind limit to before last subs. */ +L(syndrome_check): + /* Limit was reached. Check if the NUL byte or the difference + is before the limit. */ rev syndrome, syndrome rev data1, data1 - /* The MS-non-zero bit of the syndrome marks either the first bit - that is different, or the top bit of the first zero byte. - Shifting left now will bring the critical information into the - top bits. */ clz pos, syndrome rev data2, data2 lsl data1, data1, pos + cmp limit, pos, lsr #3 lsl data2, data2, pos /* But we need to zero-extend (char is unsigned) the value and then perform a signed 32-bit subtraction. */ lsr data1, data1, #56 sub result, data1, data2, lsr #56 + csel result, result, xzr, hi ret #else + /* Not reached the limit, must have found the end or a diff. */ + tbz limit, #63, L(not_limit) + add tmp1, limit, 8 + cbz limit, L(not_limit) + + lsl limit, tmp1, #3 /* Bits -> bytes. */ + mov mask, #~0 + lsr mask, mask, limit + bic data1, data1, mask + bic data2, data2, mask + + /* Make sure that the NUL byte is marked in the syndrome. */ + orr has_nul, has_nul, mask + +L(not_limit): /* For big-endian we cannot use the trick with the syndrome value as carry-propagation can corrupt the upper bits if the trailing bytes in the string contain 0x01. */ @@ -134,10 +141,11 @@ L(not_limit): rev has_nul, has_nul orr syndrome, diff, has_nul clz pos, syndrome - /* The MS-non-zero bit of the syndrome marks either the first bit - that is different, or the top bit of the first zero byte. + /* The most-significant-non-zero bit of the syndrome marks either the + first bit that is different, or the top bit of the first zero byte. Shifting left now will bring the critical information into the top bits. */ +L(end_quick): lsl data1, data1, pos lsl data2, data2, pos /* But we need to zero-extend (char is unsigned) the value and then @@ -159,22 +167,12 @@ L(mutual_align): neg tmp3, count, lsl #3 /* 64 - bits(bytes beyond align). */ ldr data2, [src2], #8 mov tmp2, #~0 - sub limit_wd, limit, #1 /* limit != 0, so no underflow. */ -#ifdef __AARCH64EB__ - /* Big-endian. Early bytes are at MSB. */ - lsl tmp2, tmp2, tmp3 /* Shift (count & 63). */ -#else - /* Little-endian. Early bytes are at LSB. */ - lsr tmp2, tmp2, tmp3 /* Shift (count & 63). */ -#endif - and tmp3, limit_wd, #7 - lsr limit_wd, limit_wd, #3 - /* Adjust the limit. Only low 3 bits used, so overflow irrelevant. */ - add limit, limit, count - add tmp3, tmp3, count + LS_FW tmp2, tmp2, tmp3 /* Shift (count & 63). */ + /* Adjust the limit and ensure it doesn't overflow. */ + adds limit, limit, count + csinv limit, limit, xzr, lo orr data1, data1, tmp2 orr data2, data2, tmp2 - add limit_wd, limit_wd, tmp3, lsr #3 b L(start_realigned) .p2align 4 @@ -197,13 +195,11 @@ L(done): /* Align the SRC1 to a dword by doing a bytewise compare and then do the dword loop. */ L(try_misaligned_words): - lsr limit_wd, limit, #3 - cbz count, L(do_misaligned) + cbz count, L(src1_aligned) neg count, count and count, count, #7 sub limit, limit, count - lsr limit_wd, limit, #3 L(page_end_loop): ldrb data1w, [src1], #1 @@ -214,48 +210,100 @@ L(page_end_loop): subs count, count, #1 b.hi L(page_end_loop) -L(do_misaligned): - /* Prepare ourselves for the next page crossing. Unlike the aligned - loop, we fetch 1 less dword because we risk crossing bounds on - SRC2. */ - mov count, #8 - subs limit_wd, limit_wd, #1 - b.lo L(done_loop) -L(loop_misaligned): - and tmp2, src2, #0xff8 - eor tmp2, tmp2, #0xff8 - cbz tmp2, L(page_end_loop) + /* The following diagram explains the comparison of misaligned strings. + The bytes are shown in natural order. For little-endian, it is + reversed in the registers. The "x" bytes are before the string. + The "|" separates data that is loaded at one time. + src1 | a a a a a a a a | b b b c c c c c | . . . + src2 | x x x x x a a a a a a a a b b b | c c c c c . . . + + After shifting in each step, the data looks like this: + STEP_A STEP_B STEP_C + data1 a a a a a a a a b b b c c c c c b b b c c c c c + data2 a a a a a a a a b b b 0 0 0 0 0 0 0 0 c c c c c + The bytes with "0" are eliminated from the syndrome via mask. + + Align SRC2 down to 16 bytes. This way we can read 16 bytes at a + time from SRC2. The comparison happens in 3 steps. After each step + the loop can exit, or read from SRC1 or SRC2. */ +L(src1_aligned): + /* Calculate offset from 8 byte alignment to string start in bits. No + need to mask offset since shifts are ignoring upper bits. */ + lsl offset, src2, #3 + bic src2, src2, #0xf + mov mask, -1 + neg neg_offset, offset ldr data1, [src1], #8 - ldr data2, [src2], #8 - sub tmp1, data1, zeroones - orr tmp2, data1, #REP8_7f - eor diff, data1, data2 /* Non-zero if differences found. */ - bics has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */ - ccmp diff, #0, #0, eq - b.ne L(not_limit) - subs limit_wd, limit_wd, #1 - b.pl L(loop_misaligned) + ldp tmp1, tmp2, [src2], #16 + LS_BK mask, mask, neg_offset + and neg_offset, neg_offset, #63 /* Need actual value for cmp later. */ + /* Skip the first compare if data in tmp1 is irrelevant. */ + tbnz offset, 6, L(misaligned_mid_loop) -L(done_loop): - /* We found a difference or a NULL before the limit was reached. */ - and limit, limit, #7 - cbz limit, L(not_limit) - /* Read the last word. */ - sub src1, src1, 8 - sub src2, src2, 8 - ldr data1, [src1, limit] - ldr data2, [src2, limit] - sub tmp1, data1, zeroones - orr tmp2, data1, #REP8_7f +L(loop_misaligned): + /* STEP_A: Compare full 8 bytes when there is enough data from SRC2.*/ + LS_FW data2, tmp1, offset + LS_BK tmp1, tmp2, neg_offset + subs limit, limit, #8 + orr data2, data2, tmp1 /* 8 bytes from SRC2 combined from two regs.*/ + sub has_nul, data1, zeroones eor diff, data1, data2 /* Non-zero if differences found. */ - bics has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */ - ccmp diff, #0, #0, eq - b.ne L(not_limit) + orr tmp3, data1, #REP8_7f + csinv endloop, diff, xzr, hi /* If limit, set to all ones. */ + bic has_nul, has_nul, tmp3 /* Non-zero if NUL byte found in SRC1. */ + orr tmp3, endloop, has_nul + cbnz tmp3, L(full_check) + + ldr data1, [src1], #8 +L(misaligned_mid_loop): + /* STEP_B: Compare first part of data1 to second part of tmp2. */ + LS_FW data2, tmp2, offset +#ifdef __AARCH64EB__ + /* For big-endian we do a byte reverse to avoid carry-propagation + problem described above. This way we can reuse the has_nul in the + next step and also use syndrome value trick at the end. */ + rev tmp3, data1 + #define data1_fixed tmp3 +#else + #define data1_fixed data1 +#endif + sub has_nul, data1_fixed, zeroones + orr tmp3, data1_fixed, #REP8_7f + eor diff, data2, data1 /* Non-zero if differences found. */ + bic has_nul, has_nul, tmp3 /* Non-zero if NUL terminator. */ +#ifdef __AARCH64EB__ + rev has_nul, has_nul +#endif + cmp limit, neg_offset, lsr #3 + orr syndrome, diff, has_nul + bic syndrome, syndrome, mask /* Ignore later bytes. */ + csinv tmp3, syndrome, xzr, hi /* If limit, set to all ones. */ + cbnz tmp3, L(syndrome_check) + + /* STEP_C: Compare second part of data1 to first part of tmp1. */ + ldp tmp1, tmp2, [src2], #16 + cmp limit, #8 + LS_BK data2, tmp1, neg_offset + eor diff, data2, data1 /* Non-zero if differences found. */ + orr syndrome, diff, has_nul + and syndrome, syndrome, mask /* Ignore earlier bytes. */ + csinv tmp3, syndrome, xzr, hi /* If limit, set to all ones. */ + cbnz tmp3, L(syndrome_check) + + ldr data1, [src1], #8 + sub limit, limit, #8 + b L(loop_misaligned) + +#ifdef __AARCH64EB__ +L(syndrome_check): + clz pos, syndrome + cmp pos, limit, lsl #3 + b.lo L(end_quick) +#endif L(ret0): mov result, #0 ret - SYM_FUNC_END_PI(strncmp) EXPORT_SYMBOL_NOHWKASAN(strncmp)