From patchwork Tue May 7 22:35:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 13657897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 82BFCC19F4F for ; Tue, 7 May 2024 22:36:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=G3ohVi04QyBEHPgeMPm0gf14dHBgcf3vKGCpn/4cdFg=; b=YnXi8hZBVhZFFL ukKi2o8g9e1WPyz+Ik2opnaW1C0LhvWQ9VfqLFBLx/e1oWWH/eb5ybgmtybxQkfnMI0liQWm9ITbG fqLIbOIZZtkQbuhaOH85LOk8j1zUXtFcYraE/ZzRwJZ1R1VwErabdzFGU0xkEc08s7VcabTL3qQzb T4eNySrHBZ+QMjt7QBmhY/9tpKRhGEfjmhjJTS78oF6cMDn19Su6J84Sv5onu/8f71kFDu6GiCbX1 Fq0Symn/ugca171NGHL8x44UsFe9IFvNifwi5XocHpIyQ6ysiuLpvLgZ/voh+VCzGQw/xxmtb4SdS QOyDAD3Dc+NF0cQsHkSA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1s4TQH-0000000D3aG-0yoB; Tue, 07 May 2024 22:36:33 +0000 Received: from mail-bn8nam12on20701.outbound.protection.outlook.com ([2a01:111:f403:2418::701] helo=NAM12-BN8-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1s4TQD-0000000D3YY-36eU for linux-arm-kernel@lists.infradead.org; Tue, 07 May 2024 22:36:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BJz0gPl7IErgbWDaYp+yJDWcC8DQFHVMxkzYf/OvcjkDIL6Y6YDE02figyQ+SX3ykpmWryiOxea8f1sdktOIx46nS7zcCLHwvnB5/wTbwWPs+XGoHWB7rA2ozRCHrYwla9CfoYT7RudxDfK84uFgBzNDzkyDl8uwq8kCqeXJ7nbkfRXxs5gwY7r1hYtjxPSPUGb06hX59hX8KM9QUBrnT6nbeBuRS+ndikzw6we0qfAJ7sAYidkpD8nqwtSkBgSxMWh9UHHn203rE8dW4lHAgLvmv1+FMM7xv7uYVcqKDE01khhIyF7Wgv3TjTcHB28hiNdBtF8GVtNnwVobNs+wQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=aW5IXZdyAjowbWzpMfxdyNKYw//HS47nxKQ2Vq0RQ2A=; b=dU592RymBX29ZMlfS0gcTYb9x3cLyGPBgYTsHrYogAxEiqNsavdGGnSK4AQodxodzWO1wZ2akDO/bmwnNo5GhG3jB58QwHg9jYh0exCk44Bumi19a/ov/G4obYUAWfrtXQHFb/XF35j/V5b+Sdz55P/UHuaNHwXYVY4GaZKI7ytgvC+UyBkQjf7EXh9h6i8/ox6pjtc7HTI9WHmToQHG2gBc5r7ZBdtKyuhtgvU8qR0vT+fhGvtTPM/5tpEeVMje1yXqCJyFHI+LxKY+cJ/hVFEdumJuu8MWifruCbgXcHbiGgKRi3eYBo6muypKrGvLKTjUYLs6jVe+kP4QQBvTpA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aW5IXZdyAjowbWzpMfxdyNKYw//HS47nxKQ2Vq0RQ2A=; b=gj8M4uS5LVWLulTk2nj0a7nqxoPeNrOP+2LKcaScsDxZfyKX2PsdvYglklj3TO7gAh/MrvvNZRGPdbqbRqa7PEwiT01ADuIjh9QexUBL8Kft4sOaLY1u1G1G8A9uxbLEkIFACkInI74Jc18z7ad+tbCLlPxEzYVVzZIvid7OUIs= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; Received: from BYAPR01MB5463.prod.exchangelabs.com (2603:10b6:a03:11b::20) by CYYPR01MB8291.prod.exchangelabs.com (2603:10b6:930:c4::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.42; Tue, 7 May 2024 22:36:16 +0000 Received: from BYAPR01MB5463.prod.exchangelabs.com ([fe80::4984:7039:100:6955]) by BYAPR01MB5463.prod.exchangelabs.com ([fe80::4984:7039:100:6955%4]) with mapi id 15.20.7544.041; Tue, 7 May 2024 22:36:15 +0000 From: Yang Shi To: catalin.marinas@arm.com, will@kernel.org, scott@os.amperecomputing.com, cl@gentwo.org Cc: yang@os.amperecomputing.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] arm64: mm: force write fault for atomic RMW instructions Date: Tue, 7 May 2024 15:35:58 -0700 Message-ID: <20240507223558.3039562-1-yang@os.amperecomputing.com> X-Mailer: git-send-email 2.41.0 X-ClientProxiedBy: CH5P222CA0009.NAMP222.PROD.OUTLOOK.COM (2603:10b6:610:1ee::25) To BYAPR01MB5463.prod.exchangelabs.com (2603:10b6:a03:11b::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR01MB5463:EE_|CYYPR01MB8291:EE_ X-MS-Office365-Filtering-Correlation-Id: 4f49a043-51b0-4da6-5666-08dc6ee61af1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230031|366007|376005|1800799015|52116005|38350700005; X-Microsoft-Antispam-Message-Info: CyIvortRt4768vBXTVoXKamfcYp+80Ti8FTefJ6lVAf1IIhKbCR5wu529gFKvAbyEhLcB4MVv4cSingKVzk0ST26GzNyVNvC0qJ60Ur88aCw+Rrq3kqKduLtJ4f9Rt3ME9Yg0K5KXJZULqv0ks/LcIzhjzK6o571jQMQTG9/RpOSr70ZLVFzbj216WBvBwH3i0CIlAbUCKl8N+mTlr4KXCgyl0X7iiJ/AwuwvyBNK4fivHS85Fr4I9JLMjZvAS6oT1a4Qflww6dHKMRGpSaodHhKUzghN8/kCO1j89SaQU55zjXDqahKZDDbsxeqX1ByjyKwg1HsvXpqBbqjxgNVCqjbwt1OeoUxvqiTIfIIlK8Kl6yxpvSHiSQDFE2mrC8yuq3zrH7L7YLtZgWbUDUo7cqm0ZaY/+/RaX8r8mjFj2rvX0JjpRVx9K7V18Wt9982YqgfTPY3Av2GIlleact22rPmvlps95GRyQ9uzSHOxavTATUL18mCSpTI4g8UxdaMiFar7qYarXDVpapn0wLl5DqzOXu+W/iuo/UJuylznCCRiZapDpdn/Xz42n6jyTszYXzPhIg/Jabw8qB/a2du9SPFax31MgnVslzbwr2tagJsrIGVeKtvAwFKXP0n7XHnYLxdlfeF6N5oUcVVmJmsKKgFvhqN1aAL14WkKdDZfjeVsGCLM9nJx5+3XLH0qYHv7DU4HCCm0ZUlFaTxl3SnOyKf6YqQ4KIF8tgU281ynnybYRBhQBUMPpCYJAsgMeJn2cQsH7C1+YtuajgycIo/5rDKssBlbNwhWAtDcT6rKiVZRvYRGRUS5EXkTPep6uRjO+WjGeuYQseEdEx3Qwmyih7IlDS4NXwgSJoBzGdKR0C0MwUyEkMetrjgKluG21RiuL4GOcrsp6mpzFlTfYKV5o8WaLYzP9Okq73/HKJyrmt0EOzEPORW2tzWim4M6uC1Hmf2mhNo2m/pTIP0upOeAsjts1Lt/r1tmn6k+qx8CWOGBOY5Zs9RpBNX1QTZ8XQOoeuXm8+y8ywnt8GEiTbg0rjKauJeyHvYQ78i6xLqtScEuEtFoQFrfpbQshnIsFaQzUiSZV15I/iTxK1LGAujulx5dMp+YDIIJehda6g9ig9DWqUiRetUVqIwWN0DqarNwvHy3cRU0xqY5lznyXVnz2/o7ZgB6qDzb9kKqodRwMtQ9x8L/xCx+ocrKY4UHo0ELEjaaF3WTlW+3jYJLvC9jpS0sYTJCgBqx0KlxIiBJ1L87UkRiM94UCSQrL9M5vP5O2TEM8629ggyRH3QwRIOxUjD5/dDzuaEiZ9ppLHUPpoh6BFeavyZEIIJDHrbXOoZus0GfrppYljcQoTeTVv4nw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR01MB5463.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(376005)(1800799015)(52116005)(38350700005);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: GJdecM+Syvn75M+VY9URzuiA51p2E6Z8KfuS0SHlfTXFOi6Pk727pAtLwFhaurnCMFS6gWiXI1Mg6Z+PxAATZDT+fZICdsYdF+PuyrxjDWhxbK8trqdZfNT6tHFtU/AziMtZDIVLxqPnMwe8bMcfTaYNu5+nd3bKBZyoDJPvOfNltjFOqA9UJACJybNdqzxWU4eB70tV/OMg+3u8nubHcG6vkvf7TDjefQZse1YQb65fYW6Df4PvgMBulcR2uhHbt+lSCpgLb/WFR0P9jt+/NFAy17eoXqn1FitVOzSAS5Rfx3BYJpMeSZ3M1T1M+Ajkm7sHtjoLnbfOWscGe5jHF6fLd9JV1GurlycjGPzD8KkqINct8AilXWi5rOnbuR/UOsTgSztmbM6OKQ9S2J9DbsiAQjoPpnhN8K3nLnzZdtJlarOs2u+mNXa2LDKzDjST2u/0c0dsv20yCEVPqAsgWqU9tKRP4ry3l+DlIA+2KSMYTgscJYk1Aih783lMbgh57MLEvUcD2g19MX4QkXwM0FiaAWbdo9NL0fo+EyVQZVEzWOlCOKbaX3EQs3UD6gDdhdv1kwIPigVSDPVfMo5gWPSPLqC+26aP6PNfVjUD5nyC22qrGaO+JAU1xryJVqS0zzh0CZAJUEqshfTZDEo6T6BZxmClXy/Yf8mxClX+2w1qmZK610f4kuee3WhuW34KVcc8g3fmUDM3tSFkFq6+nUORwlJOm3vOG+BZyki+cCWVb7LJzOmBsiAi0ovzaAZY8SfpVdc7E4VBll/JLaeZ7HOeVzczpYceAGVS2Q5uxUxGifuG2qZZ1+nSBHk6NQkizo6IU0AUH0ZbH6+gx+AEGszJS3riGchKNLr/zgTOMzFyE+dyscmp9zOkelGJILnbrDCvdlGR0snKMnNmApGF9ihOB9tHWTtB2gaxlYrTkzoEgOMvu/Fb1jRSiBs3U0yyUbjAznc7AVHvglMzfpZt06l3EvaPY8FgiBBIFf4xDW2DzwvPTbA3/p4Yp95k1AgLyMU9XDrl3sv0kvc5Y/qJr9DxWPN7Q1t3xpH1H/pUOIL2H8qJIIV2zYpzXNcladOkgpgyJ1rTFk9PWRr1F6HD19Eyyi03tzhzbnmoqW+6jjN3GUkiV68ITUxsgjGoxqAGI8KRuoR7GlqXVubNORxysaP95ZJ+QP8NXfuSAHQFjPE2XM7lQ7IzDmEl8CkDY0OznVDOkWh7dVhpUmHjPfiGBVR+tAlWlWS+u0eTEAy5cV0HBtcOlKboEw60uzd4/oRETeWu6rOXieWsrfCFaqddvaBz/TS1y6XnDdhlOraW+FDVhWTztUivb8/BZchDeWU6goC3+BgC5aRSj/yGHZhlByPJZ5q1vgAC8UK/3lGStA1tVvjRRKj3F/xoTg/pTlusua0OfeEokQxtYC0pXyD/3BGvO4NX3IZzvJ/PssiwG51UcNOxJTRJSmmlXNDqaZzs9DF/ecLxQ6hRAr0+pPJtBePfYmS9F46UPoEe2x/gU0OxBxJD3oytE7qwL7MLamHasxNrAPmpYJ9iKRHWTOFxoDTV34N8wyFiGjCfVMwz7QOiv7qHNEfchL76aOdOT0zs3fBS/b0ghQu2MoS6ZUrBb5JO/LwP85NIGhO8mnOUo7c= X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4f49a043-51b0-4da6-5666-08dc6ee61af1 X-MS-Exchange-CrossTenant-AuthSource: BYAPR01MB5463.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 May 2024 22:36:15.5199 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: P0+hMGhzOsfYDdkjReFOo/bFb7AQuT1mIScicHsKyQwA3Yad4CJg7iWmi4mha0EA/rjb1Tx8DUht0ZZvMPiwaCLSneQD6SiMuycyUcCPshw= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYYPR01MB8291 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240507_153629_824198_B858B273 X-CRM114-Status: GOOD ( 17.73 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The atomic RMW instructions, for example, ldadd, actually does load + add + store in one instruction, it may trigger two page faults, the first fault is a read fault, the second fault is a write fault. Some applications use atomic RMW instructions to populate memory, for example, openjdk uses atomic-add-0 to do pretouch (populate heap memory at launch time) between v18 and v22. But the double page fault has some problems: 1. Noticeable TLB overhead. The kernel actually installs zero page with readonly PTE for the read fault. The write fault will trigger a write-protection fault (CoW). The CoW will allocate a new page and make the PTE point to the new page, this needs TLB invalidations. The tlb invalidation and the mandatory memory barriers may incur significant overhead, particularly on the machines with many cores. 2. Break up huge pages. If THP is on the read fault will install huge zero pages. The later CoW will break up the huge page and allocate base pages instead of huge page. The applications have to rely on khugepaged (kernel thread) to collapse huge pages asynchronously. This also incurs noticeable performance penalty. 3. 512x page faults with huge page. Due to #2, the applications have to have page faults for every 4K area for the write, this makes the speed up by using huge page actually gone. So it sounds pointless to have two page faults since we know the memory will be definitely written very soon. Forcing write fault for atomic RMW instruction makes some sense and it can solve the aforementioned problems: Firstly, it just allocates zero'ed page, no tlb invalidation and memory barriers anymore. Secondly, it can populate writable huge pages in the first place and don't break them up. Just one page fault is needed for 2M area instrad of 512 faults and also save cpu time by not using khugepaged. A simple micro benchmark which populates 1G memory shows the number of page faults is reduced by half and the time spent by system is reduced by 60% on a VM running on Ampere Altra platform. And the benchmark for anonymous read fault on 1G memory, file read fault on 1G file (cold page cache and warm page cache) don't show noticeable regression. Some other architectures also have code inspection in page fault path, for example, SPARC and x86. Signed-off-by: Yang Shi Reviewed-by: Christoph Lameter --- arch/arm64/include/asm/insn.h | 1 + arch/arm64/mm/fault.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h index db1aeacd4cd9..5d5a3fbeecc0 100644 --- a/arch/arm64/include/asm/insn.h +++ b/arch/arm64/include/asm/insn.h @@ -319,6 +319,7 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \ * "-" means "don't care" */ __AARCH64_INSN_FUNCS(class_branch_sys, 0x1c000000, 0x14000000) +__AARCH64_INSN_FUNCS(class_atomic, 0x3b200c00, 0x38200000) __AARCH64_INSN_FUNCS(adr, 0x9F000000, 0x10000000) __AARCH64_INSN_FUNCS(adrp, 0x9F000000, 0x90000000) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 8251e2fea9c7..f7bceedf5ef3 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -529,6 +529,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, unsigned int mm_flags = FAULT_FLAG_DEFAULT; unsigned long addr = untagged_addr(far); struct vm_area_struct *vma; + unsigned int insn; if (kprobe_page_fault(regs, esr)) return 0; @@ -586,6 +587,24 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, if (!vma) goto lock_mmap; + if (mm_flags & (FAULT_FLAG_WRITE | FAULT_FLAG_INSTRUCTION)) + goto continue_fault; + + pagefault_disable(); + + if (get_user(insn, (unsigned int __user *) instruction_pointer(regs))) { + pagefault_enable(); + goto continue_fault; + } + + if (aarch64_insn_is_class_atomic(insn)) { + vm_flags = VM_WRITE; + mm_flags |= FAULT_FLAG_WRITE; + } + + pagefault_enable(); + +continue_fault: if (!(vma->vm_flags & vm_flags)) { vma_end_read(vma); goto lock_mmap;