From patchwork Thu Apr 11 16:46:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13626423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 38476C04FF0 for ; Thu, 11 Apr 2024 16:47:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=YUYWfZrcxm83w586wyYJKfQXfeC3LovWr8GENiF4xEg=; b=xEUw6qs9R1xtfH 0A2RFdwkZHeT1XLWSDDWsD0pA7wSE5tCBUxdyOaQlh0kwjJzrUTGegGrThN6YktJbEmOMcAHwP6DE 0faQ5b5JqSoGoufvxfABtdkv3SIFn2/1EtEWL3OdkPp6WI/4bFwp6sNdKhc71fS1/tscXeE2as45T v3taOYXU1iKlMV8/a2QqRbWOqKkxZ4OWAtTuvqwv3OrL6Q6fHAl6lvCwIwh6BPCxYMpDaaCMUfFLv 0We0o3BJXfhqFU6dT1aFxBIIU74jTMRyWYhLzaXEgdZ5lXVr6NyGfRsja39lVaAWxErLCgrXShMIM 4Mwijzu40WQWYuQqW3Cg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruxZl-0000000D9yB-3mzA; Thu, 11 Apr 2024 16:47:01 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruxZc-0000000D9qa-2fqG for linux-arm-kernel@bombadil.infradead.org; Thu, 11 Apr 2024 16:46:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=MIME-Version:Content-Type: Content-Transfer-Encoding:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-ID:Content-Description:In-Reply-To:References; bh=/TSfpHJuRZgnnF4JHPfetCgyobUxlxswrERmKm2ZCKk=; b=UedYb0hK/jDjBWwkyCwvfU7C9j 5fQEAovgBI75ynGt2P3WPeBWixOpBFVAgvNk0ogh6TIMJHphm+Lt8djPMRyU2mf+yk/LVnqg1M7OG QsHLQsLSrPMeECH97d1LtSWCpZTePDQveGLyZMW0wAvpNCA5rm4g7H9I8FiyQQtCa2BoqMYeq9A7b LokxSxroXMVEErm9kqsMOagbyK1DsWTZ3EMou8zsdJRerUUe/D3P5AfEmv6qAXRlWqOIO0i6kvRrF A1LpOZKucSSTRO7Lp/vKiLHbWt9Js7bQZ21Tzx28UfQIKFsQTk5NvTGI0Ky3IeEyIDSz3bi971hat Cmb8BuDQ==; Received: from mail-bn8nam12on20601.outbound.protection.outlook.com ([2a01:111:f403:2418::601] helo=NAM12-BN8-obe.outbound.protection.outlook.com) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruxZY-00000008iaj-3ThT for linux-arm-kernel@lists.infradead.org; Thu, 11 Apr 2024 16:46:50 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EbABDfPcvoBQcaIZMLVAluH8LT5prIo6dFqA2JAaMsBFf2mY3T8HLo3xG8oGzSCmmjC7PBESq7p9djXqyd68+1TIELh6m7egsvOZCf1GAEe9ELLR088O9194CIKuCHKuOpOx/1LiQAtILQ6l50LmChlqooIQdYcNKr6UwOG+xICek0K9FSC0ydeX41nRpCorzFFAEHkT1DOFC2OTnArh+5xK/kqihfd8y9i7+5sfPSR3vskGcKFj1qi/LcBvYt2oQ2jS1evO1SJftHHfoKVJ3c6QB2Ev+iKqXdMZQ61uhomihmBxz0GFQt/RfTT0yeo7RITWyrqAyb3aP8cZ0BrU9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/TSfpHJuRZgnnF4JHPfetCgyobUxlxswrERmKm2ZCKk=; b=hZhCeDgHMQi53CjhbAwuwIoJ3WYjxbVxyxPdjdOlmo1Z7VCVUQL+9RSk4SAtIw1e+9ZP0V4Txc7Ce8ZM/X2OJBmyoD1QX/4+M8+lE9ajP9981Hh1g5CHVq4F1bxLJx+MVraDylJzC909oYsqRPfbqGMWX3JBgPAL+DwAmGd4Ky+JWHxWUpqhGJ+jFuzTguepBM3Hjfi0nZtEIN2PevLJsHcMNntil7j6ttF94Qf2x5m+OV8TSLsQycbQqrdfTCzpGeQa3Ses2hE/vAoUD2LtwOrW/kNf5mmqOYchATnu4lZdSFOTHqdI9sKdzlVx86GH81UpOBMUfObOUl3/96IuhQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/TSfpHJuRZgnnF4JHPfetCgyobUxlxswrERmKm2ZCKk=; b=GCkKbeAIEXnt21pK4Re1vy1IkGgwr9B7O/zdcWAVHABILAI0Cz+zd1amGvZDnDG8wvhINqyEvAanquK5/Z4iL3FEWeGxBO990/DNe0MAoIqjO3VI/AQwFL8RGxXFXtqUiK7eRHvw1t2js93WVIOxNxfXjYLKdBMGfRW2iyUJQW6jdvcERpLTPa7lhxL51eX0SiKLdLjx43EWMrPdFw5lzehKjHezA4twv8zk0RL46xEVT0JmIyrPs0g4uyQgR8PJBHRuhWqxByjc44ojIOfpPfsGyAMypN9QKSu28ZeosEki6pasnlXLAj1NK586LP8aiKy3HZzU/O74ZC11DKYZhA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) by SJ2PR12MB8064.namprd12.prod.outlook.com (2603:10b6:a03:4cc::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.46; Thu, 11 Apr 2024 16:46:29 +0000 Received: from DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::6aec:dbca:a593:a222]) by DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::6aec:dbca:a593:a222%5]) with mapi id 15.20.7409.053; Thu, 11 Apr 2024 16:46:29 +0000 From: Jason Gunthorpe To: Alexander Gordeev , Andrew Morton , Christian Borntraeger , Borislav Petkov , Dave Hansen , "David S. Miller" , Eric Dumazet , Gerald Schaefer , Vasily Gorbik , Heiko Carstens , "H. Peter Anvin" , Justin Stitt , Jakub Kicinski , Leon Romanovsky , linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org, llvm@lists.linux.dev, Ingo Molnar , Bill Wendling , Nathan Chancellor , Nick Desaulniers , netdev@vger.kernel.org, Paolo Abeni , Salil Mehta , Sven Schnelle , Thomas Gleixner , x86@kernel.org, Yisen Zhuang Cc: Arnd Bergmann , Catalin Marinas , Leon Romanovsky , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Mark Rutland , Michael Guralnik , patches@lists.linux.dev, Niklas Schnelle , Jijie Shao , Will Deacon Subject: [PATCH v3 0/6] Fix mlx5 write combining support on new ARM64 cores Date: Thu, 11 Apr 2024 13:46:13 -0300 Message-ID: <0-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com> X-ClientProxiedBy: MN2PR07CA0030.namprd07.prod.outlook.com (2603:10b6:208:1a0::40) To DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR12MB3849:EE_|SJ2PR12MB8064:EE_ X-MS-Office365-Filtering-Correlation-Id: 265e108c-8a7c-49ac-5c50-08dc5a46eb56 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JA8EFwCK/RO2ujwGJEToskV/XqLTM90iGcNkOTWMp+kmwQhjpuJxGUjAoF54WAGTLBin3QCVN2yf2VGlR/eTaBEJ/qtIwkyWrWeRJ9ac4h07oMKgwYjWrGDLUlO8W5E/M76keBDjOViDvlMs+QE7Cd7CRBks67UNI8yWtwHvBz6xGq3HiiGFpdEX4dDZwawRqsFMG1BD5wlGMU1NdwA2nHNruPdyF2O0oqRucG4xcDppNQKEmRDjsWNE/2b07jcrbpgWPyJrxFXLxXwuGQk6JNspR3M3F8GEy7IH7cflG0f3l8qL8P2ORAaD1F+bLVJWlBtAO0Mmn7r2BRJY3wuN3CUcskU1i2rLyoZ3waQwlqgpk8VzWCNSgNbq+B4A5Agxx+6LoEcRdiLL7/OFx4Huh7nsUVl7qjthVfu5Zl6NVNhiZQziQnkBLLGn5OQTi9vlBRvItE3/OinOhYzCCZTEoj4DSWPtjVW6ViZpdDXvQcfTrCUvTwAuYX69z4SoEMWCpSPjSpfqmRcvsNHKOz789LaIGW31Jkcg8D92jDJHUPL/8jgrt8upX7PdjPP+Mn3FHFT1iKgLR7VRYXepcGm8k48cgniZg4K1Ba4ihnSH5xSqiBWVK9gxEKdS4vkp9sw5h3AFLxS/vDlAXBwQ7PZBHj+xPKjvNPF9mevu5P9cQQlXnzTMmy0x3PRXGhsFvfoUe++mZrtzBQRXlcVOfHC5qA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR12MB3849.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(1800799015)(376005)(7416005)(366007)(921011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 5aVM9tk0wMqiDLfCCnBpXqnLfCIJx/VaYqd0yyi9z/7JnW9I4EtLda8qAYUeptHoLrwDXzQRf4VdnTuE2/dY2hO+iI/X8bymNOZiZX3icgqPhOtUxojs8SlG/ltEwkUTpBmUR5Z77UTX2vQnytidz5C/rCU8jbT4qGR9nkGEIAtPcCbsYJ0cL4VOY0MELDd0Vvhx+4bkNML0fAnapZPSTK61UyEPnHH+zILstGERSCKUUf/PepQ5PA6o4q0YuTPWAO/TUFcs5p0py7xgngfeidTaiBrMyZLiEiOaVKF6ipgi/timq1FIazDNrIAAwFBpNG8CXSThN0zq6jTxhQRRUsK2jSlbnG0TnxF3EgNW546XuF8ORBarYkiMKlkn7dleKbchOoHrIKOar868t1Nis6LPF3mBpuYFRir+iEaxlDF/I4ag3UXQSXT0adfzuAMQ8lteBQ/AXE8nLdRGLyy6kBpyf8Uy/XI7NUyOpuBPhxSHFE5yq4Z0yiQhyM1AqaQ4or8oFim0TfD5NjnTK0/Urvznpng30CV9kanAysk0rPBf/nKJIkHLj4Iy3iZHbqFvw23hlchjMDFORPlPCInmsNtadOp4gBww8eAX5hfcPcr50kjBNMn1UwzvOqMSfMKYUVnK3BV081jTXUbbmv3vdegWwKI9y+79MoafBdp3U5Dvh1iRnRs4qmxdp8KJc63boX+C8Iw2JosW/GzEVM5SoAL/7AH1JwAyikMOCJWyikJHysd6Eeqwze09bHanwAxFdw4jaYPYBdgTSQ5EKj/mT3SY0EzUVuDXBuuwDCH7u5ovzsV3XebMvTIsD+gSXXKUkhIITxU/bzVLxWwXdboT2nUjDenXW49L6yl8UsiM4IAr64O9+tlsCTIBOeFqMDTlb4qpOYsBN63nccdIjdLJEsjxPwPNh8pN2FYvG/17Pj0ux10lwm3OSbsV8mYrndUoMlEoqkzswaI4TYDGRdj/mxO9lsk/rHQnwiTMLW/jkOMvaZnPCh/u+qN1ZTMj0PktYzMZwF9w1Vqd9HCwSIF9qYKpl9ITVrquSZtC7+85Snopi/PmBt+PC/BHlpD6J72y4WKMmWFpkSDYY7JRmlC77nE0pjVIgRzesoMJhLXFM0YVlF2JT5uwtdzf89CpHhQPojsH1TIxeU3Has17xhPjrH1gwcXQuu1OKKg5Q/WMyNh0TufHBzQrVVkZBQ15KlHexYqh+yZ40kxEYAPa8I8W8+Bkxt54uq1mG4bMn3nxqDCIs/LHEWHLzILoFlCgPmKihs0Fc4CgpeKuCb1ZT8/Py/rSYloQxtMuN7gd1TwfSbujY1SxdA78JTt2KpViu003CH6uFPFpQpWapOqxULuYWYWdhJvKPn9mLnmSihR+IaFLuP1PSlUIrSt9yGr1JKQy7cuKKzuYGfy4C+3v1JWNvZ4bljKiUuJbHP4zEO2skZw2ewx5q0+gl/MhZt2rapMTPVTH/cRvtDKytDTS37gr49JbNZQplzLKjFqIHkssr9Qvcd85IHAYd3hfv1f/1ni0mEpyi4Qy5POf6htrzRPpPvaTrSucd/cU8pMDwzut3WUlkzvYGMxNETQ/4DeovgFJ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 265e108c-8a7c-49ac-5c50-08dc5a46eb56 X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3849.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Apr 2024 16:46:22.3307 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Q8915375WsAel0lUb1WRehXKGKkoos4XM1pJyEhFqMm88ZXRdGreWm2xCWtCFlWI X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8064 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240411_174648_943716_562E4E85 X-CRM114-Status: GOOD ( 15.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org mlx5 has a built in self-test at driver startup to evaluate if the platform supports write combining to generate a 64 byte PCIe TLP or not. This has proven necessary because a lot of common scenarios end up with broken write combining (especially inside virtual machines) and there is no other way to learn this information. This self test has been consistently failing on new ARM64 CPU designs (specifically with NVIDIA Grace's implementation of Neoverse V2). The C loop around writeq() generates some pretty terrible ARM64 assembly, but historically this has worked on alot of existing ARM64 CPUs till now. We see it succeed about 1 time in 10,000 on the worst affected systems. The CPU architects speculate that the load instructions interspersed with the stores make the test unreliable. Arrange things so that the ARM64 uses a predictable inline assembly block of 8 STR instructions. Catalin suggested implementing this in terms of the obscure __iowrite64_copy() interface which was long ago added to optimize write combining stores on Pathscale RDMA HW for x86. These copy routines have the advantage of requiring the caller to supply alignment which allows an optimal assembly implementation. This is a good suggestion because it turns out that S390 has much the same problem and already uses the __iowrite64_copy() to try to make its WC operations work. The first several patches modernize and improve the performance of __iowriteXX_copy() so that an ARM64 implementation can be provided which relies on __builtin_constant_p to generate fast inlined assembly code in a few common cases. It looks ack'd enough now so I plan to take this through the RDMA tree. v3: - Rebase to 6.9-rc3 - Fix copy&pasteo in __const_memcpy_toio_aligned64() to use__raw_writeq() v2: https://lore.kernel.org/r/0-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com - Rework everything to use __iowrite64_copy(). - Don't use STP since that is not reliably supported in ARM VMs - New patches to tidy up __iowriteXX_copy() on x86 and s390 v1: https://lore.kernel.org/r/cover.1700766072.git.leon@kernel.org Jason Gunthorpe (6): x86: Stop using weak symbols for __iowrite32_copy() s390: Implement __iowrite32_copy() s390: Stop using weak symbols for __iowrite64_copy() arm64/io: Provide a WC friendly __iowriteXX_copy() net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() IB/mlx5: Use __iowrite64_copy() for write combining stores arch/arm64/include/asm/io.h | 132 ++++++++++++++++++ arch/arm64/kernel/io.c | 42 ++++++ arch/s390/include/asm/io.h | 15 ++ arch/s390/pci/pci.c | 6 - arch/x86/include/asm/io.h | 17 +++ arch/x86/lib/Makefile | 1 - arch/x86/lib/iomap_copy_64.S | 15 -- drivers/infiniband/hw/mlx5/mem.c | 8 +- .../net/ethernet/hisilicon/hns3/hns3_enet.c | 4 - include/linux/io.h | 8 +- lib/iomap_copy.c | 13 +- 11 files changed, 222 insertions(+), 39 deletions(-) delete mode 100644 arch/x86/lib/iomap_copy_64.S base-commit: fec50db7033ea478773b159e0e2efb135270e3b7