From patchwork Thu Jan 16 02:30:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E470C02185 for ; Thu, 16 Jan 2025 02:33:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FF306B0092; Wed, 15 Jan 2025 21:32:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AF486B0093; Wed, 15 Jan 2025 21:32:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1E57280001; Wed, 15 Jan 2025 21:32:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BF4826B0092 for ; Wed, 15 Jan 2025 21:32:49 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 70B821C70BA for ; Thu, 16 Jan 2025 02:32:49 +0000 (UTC) X-FDA: 83011742058.02.F4CDE13 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf23.hostedemail.com (Postfix) with ESMTP id E9B0B14000A for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994768; a=rsa-sha256; cv=none; b=qVWatv3iQ80WGkhBAF6lkJOgtlm4ozxXU64pb/w7TRr0lgbeWOGhaK/Ab4v2Xi2mAxZcEu Arfnton/K8RSEehtsq6JPbaMNOPH5omPypkTXvruTbZQrj8esMuYZB9eKD9gbxUBHHIohY +7BxlEC7OQ3aMTsZlvMCIvcJWAuGEjg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994768; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fGSmCRe0+cRBeDTp5bUo5pBeMMwXGS2kevdWEnm8DLs=; b=uqBvs48zm+4u3tfCb6HcOEeL8PCaxXZUQVk603BD45aVNg2xeCLaShv6F2CVq0VIoLheXE 1yrLDVBwdBljbviMCe4EKNB9tgCrw0UH87ZCdg0oj2YiNt01L3fDl1sQekwr1mDKqKkenh 5Z3AZuwlJZ9+JLbBRC7gk7iWbo+YiNM= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-2zJv; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 01/12] x86/mm: make MMU_GATHER_RCU_TABLE_FREE unconditional Date: Wed, 15 Jan 2025 21:30:24 -0500 Message-ID: <20250116023127.1531583-2-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E9B0B14000A X-Stat-Signature: tyif8zj69j3kjg3wm48bj819utpcqo45 X-HE-Tag: 1736994767-903783 X-HE-Meta: U2FsdGVkX1/ZE4jMkEH5aI7iZR9DVgMFJz7SIvosIRGuchwTHBeWqAXkPJD4xUdJjW7A5JGxAGS7VDyibfy5yltK4uJ5WZmQM4wf/j16LBJoRVBjC1Ga0EOLXlBMKcqSaVEL/OrrjWnUw5TS3eprW04yWh5KiV1YmaBdGM+S6vF5rlp48JYBR/NC+V0Ln5+xm3LrRho2qx+oHjbZmSeA7ZHaGwpUBh+I6nJ1TVg9UPN7ih5eDBQ0nwttiJaLJrcx51DoULBh6wmOZylx6eS20/dwm1LIwrjZX5t8wiMl4sUUqZpOeaV58lYmvbDit/DgV9RiUjwmgZTqAoJfCRqFVtiVfr0eR3ZRsZPo5kHKsIqbGgGEYFVX2QEp2dz7pBn/L3bK/OXZLFTna3azbK0f9dtQw7lUS8Ju+wAW/gv6/IvW71obucprmZRSeeE/fSKoistmY09QsPnRu15mIDc96HmuXa8u2qPmYVkAN0D7GwIQP+o9aY/oMEL1Mq8RYwif3UGDSGcJ1Zw/yMquLW45sEaAHJw2QkVU6gh2iN7BdJfLbQoKyByASxNtElK/kKW5/7bkrKPNAIMQaB0FpDXEsT0uBv4JPwIgpV/fegMAMCQYclxggsne8WFSdNVsaREMrHERJSiyWC0/F4BrJsAq4L69WCtn6dwN88YnSuUqzFwQmQhVITxkO9wTt1dMLm1VgHd07tCjUC+C5Lce482sSqhWGQvyGFLZ7EyMGZz76LbZ/sanqmXreFSjAhMdmpYEvjN6UzqxOlIT5aQyu6GQXX55D8FHjoEFejvFy/Xpb0wPUN4IMxdkBjiTl5riC7mKmoBgksHVZGgrwv07Cjmkw4cIuaSqC/eZ3tHuVRBVMmkzUe55z7aoebfBQLWF8uTONhUlbZGI8g92hfaibqq84j1Vg8c0UyO5TJYR/aqPuGbmlXB8JFE21SH795yMvC7x2f19MEW7DQuT4cWEzfR Mv1El42s 6FgkE3DHdIX/R2dXO1t1hInNJuAuenAsp0mGjTxOPPqbuahPRGdpOJDywyhQ1AW0NQ3CVs7PXCSDKVSvHXvP//09GZbE/fKBLSc/c/KA9f5WkgQjAoxkIhhJ0GGxJ6zL0vHkcwEhFLJ4fitlnOHa6Sd9zJujptSYrkpQZPz21/2jyEQMEA+lQEymFHZSiyWY9DfesfaDR0N4AZmA2za36mranW+Sn2WEZH6Xh0Pzxxyu5AJ0fZH8Y/nkOE/3trWeWu/Rp3821lZbUjp4/CpffvT3yQFiP5IG64ncV7+XW6oWo1OWpoWf/Ey+qaqyptVLcDLarSIhbfpsrT+c9u6YCKEMBNIb4N5ogUXd8mM4v3B64Hd0W93trKVKcRx9wAPOvxG8k07WTtCElrtXUM2XKuDNPNJWRiQTJ7Ts9mJ84Vvg+ibzXZHpkBlK2CLeLZoqc0UcnU4KoOKDCncufQ+5kmSKuIL6iKp9AedFfSCRCTYmdNcrLvRDDQUKx4A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently x86 uses CONFIG_MMU_GATHER_TABLE_FREE when using paravirt, and not when running on bare metal. There is no real good reason to do things differently for each setup. Make them all the same. Currently get_user_pages_fast synchronizes against page table freeing in two different ways: - on bare metal, by blocking IRQs, which block TLB flush IPIs - on paravirt, with MMU_GATHER_RCU_TABLE_FREE This is done because some paravirt TLB flush implementations handle the TLB flush in the hypervisor, and will do the flush even when the target CPU has interrupts disabled. Always handle page table freeing with MMU_GATHER_RCU_TABLE_FREE. Using RCU synchronization between page table freeing and get_user_pages_fast() allows bare metal to also do TLB flushing while interrupts are disabled. That makes it safe to use INVLPGB on AMD CPUs. Signed-off-by: Rik van Riel Suggested-by: Peter Zijlstra --- arch/x86/Kconfig | 2 +- arch/x86/kernel/paravirt.c | 7 +------ 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9d7bd0ae48c4..e8743f8c9fd0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,7 +274,7 @@ config X86 select HAVE_PCI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP - select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT + select MMU_GATHER_RCU_TABLE_FREE select MMU_GATHER_MERGE_VMAS select HAVE_POSIX_CPU_TIMERS_TASK_WORK select HAVE_REGS_AND_STACK_ACCESS_API diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index fec381533555..2b78a6b466ed 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -59,11 +59,6 @@ void __init native_pv_lock_init(void) static_branch_enable(&virt_spin_lock_key); } -static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - tlb_remove_page(tlb, table); -} - struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; @@ -191,7 +186,7 @@ struct paravirt_patch_template pv_ops = { .mmu.flush_tlb_kernel = native_flush_tlb_global, .mmu.flush_tlb_one_user = native_flush_tlb_one_user, .mmu.flush_tlb_multi = native_flush_tlb_multi, - .mmu.tlb_remove_table = native_tlb_remove_table, + .mmu.tlb_remove_table = tlb_remove_table, .mmu.exit_mmap = paravirt_nop, .mmu.notify_page_enc_status_changed = paravirt_nop, From patchwork Thu Jan 16 02:30:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBEFBC02185 for ; Thu, 16 Jan 2025 02:32:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBDC06B008A; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4E84280001; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C4D7280003; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1453E6B008A for ; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8A229120603 for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) X-FDA: 83011741974.17.2B8E8CC Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf30.hostedemail.com (Postfix) with ESMTP id 0F1A480004 for ; Thu, 16 Jan 2025 02:32:45 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994766; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dHsoEyJZ5XLC7JY7dTW5E1So6h81yzYLCyPTrStsqgs=; b=uKgCzJzdSqNpayRwntfTxNXhMPVtOrCRTti3vLFbsXshvmQId0lnaLMTs6bWFq1xTPUut7 sHpMWH8hNKiYFRziCFuhz+iGDZfy+Jc+BOkLhmFCJHIhowahfp45hooaz74U4PGAgiCMYQ vTjPnysRa7lnAlWCeTJxKB7FawS42fg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994766; a=rsa-sha256; cv=none; b=CaFsNxhjeSc4Pz4CM7hqf+E1gHAC8WgkfFJewc3pGT4koHJKpMEnCcQcaLLKh7HcUXS2ZS 2tgWT7Z53MKeD3RGO83ozPI+RkIajXocFXGrlb/iiREvYSeRF+tCXh3LB1BxlsgFCQSZoO /dKaqIx6Iiyp+kVJGx4MHq4/Gbw5Ovw= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-34ZK; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 02/12] x86/mm: remove pv_ops.mmu.tlb_remove_table call Date: Wed, 15 Jan 2025 21:30:25 -0500 Message-ID: <20250116023127.1531583-3-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0F1A480004 X-Stat-Signature: fj87y9io5bz6yjxd59am6iu8j8srumib X-HE-Tag: 1736994765-953439 X-HE-Meta: U2FsdGVkX18YqgId+WqrNnwqCF4Gh6sdDv2lCCNCME47BssvuFfEr0bGQmPQYQ8nniNRZU/TlKpLjJjlJ/LC8G4wNdK4UvbmUkfQ6O5vl2HvAu7duchvDE65q4Gh8TrmlAzYWAuYdQi2qgH/w2xMc2KQA7/cGcOO+dqKo4g3lcvUAlUhMj9uGpIdri2cENpfB35U4Nq9I5IktZTqlEig4Iv1GiD/dhTXQ6iSO5n7zGT/9z1LLYgOyCUAMsrTP8Dg8Jta6IXY3Vb1v76oc5t6cKx3FP+YEcvzG7My6uZdhgT0zlSz3WRfnhxvWo4oK7+gi47s6e+zsHQY2uPj9Dwfa/Y13D8GDyvYpp9z0VUMUibUSPSxcGDqAegQyxT+f89ChS+hnSU6S7M58iGTGMKWtlOHu/v2E3XHWZ+ua4vC+SdSM3B/FfoHRSzckPsa7SlbvmiWV2iFxOHbRYDPpbqyLokASB7i4XcZcReWO36LFpnMkk4pR4Qo+RcirMKnJcBp4AXFj+5UDTJTnFvxyxOoZQ+DnVv1qYGqYZwvDqLy22jRLWeTMLmfzyX/LaBuyAog2inzd+j33TvWAgkVbPJf010lWPRMOKQBUVHdY15Uc2fSOlU29ixCiKCdpmjkX3xkB7ig64yud4+25OtCKXUhp+VhSY97Ae9qSYlq1Lx7qpWlmBQzms8LRJUYjiLVIfe3Hn6rlP/5vxCRPtWMbIlJ8W53C57oQfaG55JbzsK9l5bmTGe3wyJP0Hkk30xWVq686zT04t+i9Po8KInaymoJm/C1mVo8J3lJLMFzNlx7OFRYJ6BQ7g2U9pyNe62czmGNsAqrknOAHw90KYIQlDHShmdngWkyvjyboAeWSecMmY4ejNdYMeDkqWSXwEmGNH4WdkWOidspYXoPaausCHP6RQDl9eW+zt5nhH8O26ajnL4bS0ZcAslyxIACZCqr3V75lDgGXiAAkLsnwfb/zjg 3P5toftm v12nPWV+fadt0dn2eMMb837clonOy7JVWmx9QTQxYC/a8bzcqhRPgoj0Uj16Aclirv67B7cjV0OuyfLbAcBxQmoA8TR4dlGTlpevVtk1eim8Vofy2wrTxDeRhjMbJkB9XXs69LY9b4AcOTrOAYXfR7MZcb319NMgPYORCxzk2vCe9sWs4g0RDcWEFwyBSsE/M0XS4j2rhWTiw/JhYdzcQKTsskARZqQVAI86if2TYLm6vvWJU8jbGV5+TqGGqazAZJQ+pzQK2NNUNgrMooPN4T7bp8Mhi7iSMjpIxXzte2g9Ljej3jcRz1ysHAFFKX5ppH8+fiq0Q/DqucO9nl+EKrp7h/lJcZNgo5Z0bb5LkLLpJvJTRaueP6d1AG6mbH6ce8NazgTTbPWZ8eUxlIN5Tw4r/hW4WbjF2usc1MEJINmGZ7DEZBYWW6ormUVGIG3NKlKyXT9SwO6JiXVo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Every pv_ops.mmu.tlb_remove_table call ends up calling tlb_remove_table. Get rid of the indirection by simply calling tlb_remove_table directly, and not going through the paravirt function pointers. Signed-off-by: Rik van Riel Suggested-by: Qi Zheng --- arch/x86/hyperv/mmu.c | 1 - arch/x86/include/asm/paravirt.h | 5 ----- arch/x86/include/asm/paravirt_types.h | 2 -- arch/x86/kernel/kvm.c | 1 - arch/x86/kernel/paravirt.c | 1 - arch/x86/mm/pgtable.c | 16 ++++------------ arch/x86/xen/mmu_pv.c | 1 - 7 files changed, 4 insertions(+), 23 deletions(-) diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c index 1cc113200ff5..cbe6c71e17c1 100644 --- a/arch/x86/hyperv/mmu.c +++ b/arch/x86/hyperv/mmu.c @@ -240,5 +240,4 @@ void hyperv_setup_mmu_ops(void) pr_info("Using hypercall for remote TLB flush\n"); pv_ops.mmu.flush_tlb_multi = hyperv_flush_tlb_multi; - pv_ops.mmu.tlb_remove_table = tlb_remove_table; } diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index d4eb9e1d61b8..794ba3647c6c 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -91,11 +91,6 @@ static inline void __flush_tlb_multi(const struct cpumask *cpumask, PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info); } -static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - PVOP_VCALL2(mmu.tlb_remove_table, tlb, table); -} - static inline void paravirt_arch_exit_mmap(struct mm_struct *mm) { PVOP_VCALL1(mmu.exit_mmap, mm); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 8d4fbe1be489..13405959e4db 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -136,8 +136,6 @@ struct pv_mmu_ops { void (*flush_tlb_multi)(const struct cpumask *cpus, const struct flush_tlb_info *info); - void (*tlb_remove_table)(struct mmu_gather *tlb, void *table); - /* Hook for intercepting the destruction of an mm_struct. */ void (*exit_mmap)(struct mm_struct *mm); void (*notify_page_enc_status_changed)(unsigned long pfn, int npages, bool enc); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 7a422a6c5983..3be9b3342c67 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -838,7 +838,6 @@ static void __init kvm_guest_init(void) #ifdef CONFIG_SMP if (pv_tlb_flush_supported()) { pv_ops.mmu.flush_tlb_multi = kvm_flush_tlb_multi; - pv_ops.mmu.tlb_remove_table = tlb_remove_table; pr_info("KVM setup pv remote TLB flush\n"); } diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 2b78a6b466ed..c019771e0123 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -186,7 +186,6 @@ struct paravirt_patch_template pv_ops = { .mmu.flush_tlb_kernel = native_flush_tlb_global, .mmu.flush_tlb_one_user = native_flush_tlb_one_user, .mmu.flush_tlb_multi = native_flush_tlb_multi, - .mmu.tlb_remove_table = tlb_remove_table, .mmu.exit_mmap = paravirt_nop, .mmu.notify_page_enc_status_changed = paravirt_nop, diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241..3dc4af1f7868 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -18,14 +18,6 @@ EXPORT_SYMBOL(physical_mask); #define PGTABLE_HIGHMEM 0 #endif -#ifndef CONFIG_PARAVIRT -static inline -void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - tlb_remove_page(tlb, table); -} -#endif - gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; pgtable_t pte_alloc_one(struct mm_struct *mm) @@ -54,7 +46,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte) { pagetable_pte_dtor(page_ptdesc(pte)); paravirt_release_pte(page_to_pfn(pte)); - paravirt_tlb_remove_table(tlb, pte); + tlb_remove_table(tlb, pte); } #if CONFIG_PGTABLE_LEVELS > 2 @@ -70,7 +62,7 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) tlb->need_flush_all = 1; #endif pagetable_pmd_dtor(ptdesc); - paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc)); + tlb_remove_table(tlb, ptdesc_page(ptdesc)); } #if CONFIG_PGTABLE_LEVELS > 3 @@ -80,14 +72,14 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) pagetable_pud_dtor(ptdesc); paravirt_release_pud(__pa(pud) >> PAGE_SHIFT); - paravirt_tlb_remove_table(tlb, virt_to_page(pud)); + tlb_remove_table(tlb, virt_to_page(pud)); } #if CONFIG_PGTABLE_LEVELS > 4 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) { paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); - paravirt_tlb_remove_table(tlb, virt_to_page(p4d)); + tlb_remove_table(tlb, virt_to_page(p4d)); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #endif /* CONFIG_PGTABLE_LEVELS > 3 */ diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 55a4996d0c04..041e17282af0 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2137,7 +2137,6 @@ static const typeof(pv_ops) xen_mmu_ops __initconst = { .flush_tlb_kernel = xen_flush_tlb, .flush_tlb_one_user = xen_flush_tlb_one_user, .flush_tlb_multi = xen_flush_tlb_multi, - .tlb_remove_table = tlb_remove_table, .pgd_alloc = xen_pgd_alloc, .pgd_free = xen_pgd_free, From patchwork Thu Jan 16 02:30:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941160 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 829FBC02180 for ; Thu, 16 Jan 2025 02:32:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D4EB280004; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8096D280002; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51BEC6B0089; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1514A6B008C for ; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 629CAAEF78 for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) X-FDA: 83011741974.28.C97D250 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf26.hostedemail.com (Postfix) with ESMTP id CC3B714000B for ; Thu, 16 Jan 2025 02:32:45 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994765; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cUJMLaEKqo6CTpsFhcH1hbpSqbEySeP7dICZgiGMxI4=; b=6T6C6gcWYtoRBGfUvpoPEnzIh4yVvbcvBaeiSXpcdazmpbr6/cT/qqvnJuvGwHINAFuDBY K3sO7VTMGZHwrS3CQBCg2UydsUEhGOlVj0VxnwNC/5zUWO1t70i5siVP9Ki3DPfR2vQWbM yWc3Idwes6+gGZ/RJSICpbzPxAi1lT8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994765; a=rsa-sha256; cv=none; b=GUoZUDl/L2Y7cm2Zs7jo0M07VO3xbF/sSn6suAX/I6nlbnjLxCVmlABI4nJmP5myogrhA0 +rOynFIZtcwiMN2onWVvC1YV76TEci9GOL0AcAPeKgAjzHMjKPP3yjRSxjnQKMVnKobYAO G5FOdV4kU7pCoM2KEEpxH73Izd8xP7Y= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3AF4; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel , Dave Hansen Subject: [PATCH v5 03/12] x86/mm: consolidate full flush threshold decision Date: Wed, 15 Jan 2025 21:30:26 -0500 Message-ID: <20250116023127.1531583-4-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CC3B714000B X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: a6c8n6tyf4jfmnihosme1w1xyagkouq4 X-HE-Tag: 1736994765-597922 X-HE-Meta: U2FsdGVkX1+pBq+OG+gL3Sm9/O76UAI8it0/wwsoJlk9SHwFdhKBmSRyvs3itmg/gFcWdFHJogOH53YupQ2F8GRhPwnFuQpx8cw4QdtuYwQBzOWZrcyGSWx116MI5CYvEJwWSGYvzyuvCbISZ2h9FPu9HOM4cb0DuyjqssuN/iINWdPguxvyY2wqLuRDxFyKAcZsJVAG7ZMboMDra/WhxIVGmLosYor1bhkjwm0CGYdQCO40k+xhsCt5cmMjzJrZmiw9VFV8+86pUCuUfiZmnl5bUv6sLKwYvGBjIa49UHGSix1ZxUWUxqT4KQTy3ZN1U1Mlp+Io7yFNoqJNxYluNwxzbqK4LbdhorZa6jvqas1DZVPFBBFiiwYnBY6MLxHdE6FI4ymX38KRDmG9Hrlw3HlTfX+kadksMS1orGjhLa217jd6LiWrT7LUnG0AdacU5XhH9guqO6uoHHzK0vZJiJTkl6BCfIdANIaoRnTvOwgWc7CTezomoU8AhWZcIy3mvf6uq6zzCG7VhUH8o0zM1SlplTj9A38oabwmJ0TEjQPhFaJYhfhVsbEHgMTatxR+JLSLrg1y6Cc2BK8foZfgfWIt9HwunYTbH/QmstG//WPbv89639vYLFL0K5GhyLUT6UDXpTETjHbTAcuXtddvF0Q8UWZdtTV5cEtvkts+TDAPqdMXZVUMDofuTwAayeL5s0IpGZjoVN3FaqJQl5elUX7hL5ver2g0Lo1C5ziosOOqQxXH8mvJayRXMQaow5VnuTqA+4IJcQZfx7YG8GQCxejvfJgcVdY1mWLHmhLu2+EOItQHUVAM332imRej+PUuDAb4X4ZgBQlds6wZm91yUTS5jZv1Zq7rxBBNS4hc9aEaylS3Zpz+K/iM5EoCfvncedjUNGVbQ/6SkDvpUC/F6JHkUlo0YQArWCWcjuShnWtqsqTnTYxnAJeOglNTGlpNxg9cDHCqg7KKkda+cfM /viZVgs7 9rdAANTOAHT/1ENhpYOuAKzEnydTF0ljeyeimYBdLaC3gjyi1jZFTaUkJPOLstwaGvMf38SzYo/dSBrwdm3fJPrq20NKm07+mqol+rW7YiN0woHOGl+rC2CEovMuRMo2iWDPwvHi+/rADT/gs9CrnMOWOAz5SsCF77M8FV0oiHP5PdynEIiobk29jib/f0hBnuOk0lHezTmAV3JuIbFo6xfBE6391BsOtFfX+6CsdAcbmW9K2yoPLXkxWy5qWFVzlWK6itvNfcCZ57BCCx+bGUhUfr9kIiF6bhI7i+gu2ZhjviuTdMC3oNjWUdpzC3UQSSU/GuU4sll/4yWrAhPoXfe5Aj3b+fgn7LFHa5fMXDrmzFZbklDKgOvUwYbgxPVQcDSgte9BqbS53NJI7XrpEwwUSbw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Reduce code duplication by consolidating the decision point for whether to do individual invalidations or a full flush inside get_flush_tlb_info. Signed-off-by: Rik van Riel Suggested-by: Dave Hansen --- arch/x86/mm/tlb.c | 43 ++++++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 6cf881a942bb..2f38cf95dee3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1009,6 +1009,15 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, info->initiating_cpu = smp_processor_id(); info->trim_cpumask = 0; + /* + * If the number of flushes is so large that a full flush + * would be faster, do a full flush. + */ + if ((end - start) >> stride_shift > tlb_single_page_flush_ceiling) { + info->start = 0; + info->end = TLB_FLUSH_ALL; + } + return info; } @@ -1026,17 +1035,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, bool freed_tables) { struct flush_tlb_info *info; + int cpu = get_cpu(); u64 new_tlb_gen; - int cpu; - - cpu = get_cpu(); - - /* Should we flush just the requested range? */ - if ((end == TLB_FLUSH_ALL) || - ((end - start) >> stride_shift) > tlb_single_page_flush_ceiling) { - start = 0; - end = TLB_FLUSH_ALL; - } /* This is also a barrier that synchronizes with switch_mm(). */ new_tlb_gen = inc_mm_tlb_gen(mm); @@ -1089,22 +1089,19 @@ static void do_kernel_range_flush(void *info) void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - /* Balance as user space task's flush, a bit conservative */ - if (end == TLB_FLUSH_ALL || - (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) { - on_each_cpu(do_flush_tlb_all, NULL, 1); - } else { - struct flush_tlb_info *info; + struct flush_tlb_info *info; - preempt_disable(); - info = get_flush_tlb_info(NULL, start, end, 0, false, - TLB_GENERATION_INVALID); + guard(preempt)(); + + info = get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, + TLB_GENERATION_INVALID); + if (end == TLB_FLUSH_ALL) + on_each_cpu(do_flush_tlb_all, NULL, 1); + else on_each_cpu(do_kernel_range_flush, info, 1); - put_flush_tlb_info(); - preempt_enable(); - } + put_flush_tlb_info(); } /* @@ -1276,7 +1273,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) int cpu = get_cpu(); - info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, + info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, PAGE_SHIFT, false, TLB_GENERATION_INVALID); /* * flush_tlb_multi() is not optimized for the common case in which only From patchwork Thu Jan 16 02:30:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53EC3C02183 for ; Thu, 16 Jan 2025 02:32:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 015266B0089; Wed, 15 Jan 2025 21:32:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBAC06B0093; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEF5C6B0089; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6F80C280001 for ; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 28A2FA0598 for ; Thu, 16 Jan 2025 02:32:48 +0000 (UTC) X-FDA: 83011742016.06.17E8B98 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf13.hostedemail.com (Postfix) with ESMTP id 96D0020006 for ; Thu, 16 Jan 2025 02:32:46 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994766; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sOiwDzUcJmcySWabyvYzZaW6UaPMP6se+kpAPen0v7I=; b=ooJK56KJ5tq2w4dkpJyhFpqy8Xti1lu6N0/pOQqt3o8H9R70YwgOBoMYxF9E0b5BC3LnyM qk24aieZ/Jbct09uPl583JqDNN3RxDFdPmu42eoMnBVzcnQ+Dx1wWXr59ZKptwJv5mtR9O Piaqkr/mZX5bIV8YWc+UbztBXTsr2mo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994766; a=rsa-sha256; cv=none; b=BXqARa5uQDYxxVpVQgSeKb5QB0FvnmEeXGa/VDbC5shYB8BS2qlLYYUWIFvJYB1bukfslt NGJuJ4/CmYGB1rYHBxtmspHDzX9uwSMe1ABSCvuLfw/1MbqP6/oztHjfO51dJgKXmMvLEd rfwIEoi3Y8nOZrrTGBCnC+COo/DB02Q= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3FjL; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 04/12] x86/mm: get INVLPGB count max from CPUID Date: Wed, 15 Jan 2025 21:30:27 -0500 Message-ID: <20250116023127.1531583-5-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Stat-Signature: atk5je4za8dmwbccyxyya7w6ixnngwsy X-Rspam-User: X-Rspamd-Queue-Id: 96D0020006 X-Rspamd-Server: rspam03 X-HE-Tag: 1736994766-812024 X-HE-Meta: U2FsdGVkX19+liAaZgqOTAUh2yT/LCyDxcCPJp9u5P7xqpFrwEcuQEKWVTqV1DMxcdtnCAxGlHpJ/ctkmuvKLF6zi0gEGlyNfjELr/Ty6hVgcWfFNmsGfcFncnv6FzsGdcYRLr/JML5zghG2jyOyc4gdkI4Qtwfp2fEA2MC3C18NVYIbeBHKzd+sp8mGEeltz+H4EAdbDghM1PLj/Dyt4GYL66Mj7UUl/eS4XOzhm6q+KEqWb7FeYzXie3OV+bNr6t2xtcbmZl2fe637hRNkgyfS0FU1qvC2IXefLHm9RLvY8nGYRHIqq/H0gF3j+TuiVgq1feyhu0gp2x0hAUs4Wx1VvxacO4AyPDKj3Au++j1aeeYVqswZ75UOOj3sK0W+2+WpfqYfcKmOsATpsAMSSh/50AnKMst80MvIcrvznf0iXQ6tm+VmdJ1EyhMej+V7eQgMoI9bK7h6WEhooYRySIF/wj13U2NjekqI5GPh1c/NwaRUpx1zYGbSnG7m/0EoUG2o+NmSxBJBlNhXMTE9hv7TCjbhmxVv40JRlyCmIrL/KfEv9yhRuH7nCHtogbpH3IdCCC0ZcP/6ROeEq0BOQFkQxmz30RUMPiqxZ67mfbf4GI8XnHZmtx0WnzVS9nSs7s5ecxRxI4wWTqVm0n4HxkEfELw7FAdZz+9L7aON4dp3BDxd6P31YikyLPWYWcthmdd2OkBXafzkTRzOojLEytVdhwwuqe6uRopmzqni07h/E5hcQcl8F5foDBXfP63h3ONlAqdQk2TlPLpwWPMy6sBXcF9tmaWFn30g4siKjLMC+94SHQeGvFDY59BnlheOAqBRQpe9AeAo/c4KALINPglASnQvh1Ek+1Y4/dDmsrumsE8Iy1U+EjbMBUXWfR3KTYjpluIK9EZObtaluhX9viDlqUDOnMtQS8enMtbwk6/4fCjTg+jWtnUgY8temVGTsssI+gYfTbVz++z2zag l1kpdhk7 WXtQOn+U8zbToj4yUH7gmBIVtDV3AZqRVYS+3su5IrFjVc5smSmWNzYCMzVvr+bLK7iRvpDRVV8Vmw12pe115x0wg2ozomfsHpXzh6SXpHcETTchJaCxUPSDRT0TE0Ql1vX8cIynqaCNu79s3zyYYyBpPrV3UWyK4bxesaz5cpCq2GIW/F0gO0t/LD+0s+YyrHReE1s1MXd/hEJL+mWLJs+evgMP5AUheseKYdguQXAnvVIPuz7Mm+kpS5XCHA/o3IfQ4Nr33xP5hK7kvApZb3VZjGW84fkUMo9Lc0rkQcYsXyUcuRkTTUHg3UP0o3hHgPTPjteUlf68BhArcvfSkbDtP5/NdqvjulHSKCJQsuktoakj8NHHBrmlKgQ1CqvzLKHOY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The CPU advertises the maximum number of pages that can be shot down with one INVLPGB instruction in the CPUID data. Save that information for later use. Signed-off-by: Rik van Riel --- arch/x86/Kconfig.cpu | 5 +++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/tlbflush.h | 7 +++++++ arch/x86/kernel/cpu/amd.c | 8 ++++++++ 4 files changed, 21 insertions(+) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 2a7279d80460..bacdc502903f 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -395,6 +395,10 @@ config X86_VMX_FEATURE_NAMES def_bool y depends on IA32_FEAT_CTL +config X86_BROADCAST_TLB_FLUSH + def_bool y + depends on CPU_SUP_AMD + menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT help @@ -431,6 +435,7 @@ config CPU_SUP_CYRIX_32 config CPU_SUP_AMD default y bool "Support AMD processors" if PROCESSOR_SELECT + select X86_BROADCAST_TLB_FLUSH help This enables detection, tunings and quirks for AMD processors diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 17b6590748c0..f9b832e971c5 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -338,6 +338,7 @@ #define X86_FEATURE_CLZERO (13*32+ 0) /* "clzero" CLZERO instruction */ #define X86_FEATURE_IRPERF (13*32+ 1) /* "irperf" Instructions Retired Count */ #define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* "xsaveerptr" Always save/restore FP error pointers */ +#define X86_FEATURE_INVLPGB (13*32+ 3) /* INVLPGB and TLBSYNC instruction supported. */ #define X86_FEATURE_RDPRU (13*32+ 4) /* "rdpru" Read processor register at user level */ #define X86_FEATURE_WBNOINVD (13*32+ 9) /* "wbnoinvd" WBNOINVD instruction */ #define X86_FEATURE_AMD_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 02fc2aa06e9e..8fe3b2dda507 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -183,6 +183,13 @@ static inline void cr4_init_shadow(void) extern unsigned long mmu_cr4_features; extern u32 *trampoline_cr4_features; +/* How many pages can we invalidate with one INVLPGB. */ +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH +extern u16 invlpgb_count_max; +#else +#define invlpgb_count_max 1 +#endif + extern void initialize_tlbstate_and_flush(void); /* diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 79d2e17f6582..bcf73775b4f8 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -29,6 +29,8 @@ #include "cpu.h" +u16 invlpgb_count_max __ro_after_init; + static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p) { u32 gprs[8] = { 0 }; @@ -1135,6 +1137,12 @@ static void cpu_detect_tlb_amd(struct cpuinfo_x86 *c) tlb_lli_2m[ENTRIES] = eax & mask; tlb_lli_4m[ENTRIES] = tlb_lli_2m[ENTRIES] >> 1; + + /* Max number of pages INVLPGB can invalidate in one shot */ + if (boot_cpu_has(X86_FEATURE_INVLPGB)) { + cpuid(0x80000008, &eax, &ebx, &ecx, &edx); + invlpgb_count_max = (edx & 0xffff) + 1; + } } static const struct cpu_dev amd_cpu_dev = { From patchwork Thu Jan 16 02:30:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941158 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB9A5C02183 for ; Thu, 16 Jan 2025 02:32:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C96A46B0085; Wed, 15 Jan 2025 21:32:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C46D86B0089; Wed, 15 Jan 2025 21:32:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0DFD6B008A; Wed, 15 Jan 2025 21:32:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 943C46B0085 for ; Wed, 15 Jan 2025 21:32:47 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4D59CC0548 for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) X-FDA: 83011741974.04.E05AACD Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf09.hostedemail.com (Postfix) with ESMTP id B2D8B140007 for ; Thu, 16 Jan 2025 02:32:45 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf09.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994765; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1K8kB1I4+9t9TIYqSgJN8D2rNOhCjXKJjmT94NTU88I=; b=6Zk0JWkiuOLg/8QYU46H3rHH55fwpbFWbd2asnYgRCUQ2Yo8rtKxCrqbK3NdZWIrRUy7cC qeOYSfpmPj2tcr4wOc4SbutAyjUvvhZrceIHX3kWXfIPUw6gOYR6+8o5gvPq1LumLvr1cG Mif7hXOBZaBziE/+4rBvSxiSBmrVXrk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf09.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994765; a=rsa-sha256; cv=none; b=4GCDrBBqztdMXJLKVhJdzusH3RqOHGqoINnF1FEZj7ev6mCCXHzYAtBjIhmcPg2aG2/Lbr E8TlkZFmeSLBjNR3M2xE0krt6mkC+3jUS+5ZLIV2fMm+d5VDexGL3JQu5bpFFllWn+VEDQ 6l2z0ea4wa21aATg4ifEPdQXEtfh0L4= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3L0b; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 05/12] x86/mm: add INVLPGB support code Date: Wed, 15 Jan 2025 21:30:28 -0500 Message-ID: <20250116023127.1531583-6-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B2D8B140007 X-Stat-Signature: 1eqj46cwghihky78wbbto77zresyfj7q X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1736994765-413159 X-HE-Meta: U2FsdGVkX191vrwWhsLpuDpL5Q5qX7H1AQITK626NUxelO5vQS1ZenfrSNJ/IXNJbjfl2+YoB1ZWy1nwtF97AjQd4HfbfcZdUOKaILURJJ7UJNAzd6zFCitCvRYfxIwkatlZodvlX1li5WtSJ55Z62SauwLZQYOWk8N7beesD2OM/PTXh1kxwyf9YZY7Odoq7N89EcBV14RQolcNZPn4lMQVcLIPkEFHM15Cx+K43Gfi3NowQKavWiKvyCO3Usw2UMFtgiU6OS2RL+3jJtOMRPWNf12VTcJz5uZ4Z3hfxszu2yFpeFla/p4Fxg2EvryAeVhJV5DOjlzcO/gmo/H22Bn2iyqm2DmR/URGZ9KNwOxv0bL5Xv//CyIQJUQSon3YC/bFjNkDpQrt7FYKVC/jc88i1tPjIMBCTmjFdZsFB0mT/jZ1FjlI/tlNanNxuls3VjoG9S3KshrADhIMq48rfJ2yDfx4NqpPvmRgudbmhJlEkHIPfGEvAClP44nhvPzddy7NC/FSUYXodB1aiUiQ27/IcLzvwyGvMi5y9MLK/xmlTOu2MHyixGuracjcONnfHBu8RTEx7fkyXSxbQUgnlGkHM1Ng1xiZGgBJ3k7kmrEZoJ7OlZPhLMPk4h3vkJ0dQ2lxeypY8dra1RIyU1Bu8N8zJOypKEQvh8j/KK2V4InslRrcNKfwRlcJlaoKjvfTnvE1xuVwZselyAoRdTW///s5nAxHCAagTof8ClvJcZQUDZqD80hSWPiylYlmWYxwj9xAzVycLYvDFo40Pyb23EgKUvQWvVoVisdaFdZWHyWuMVYTH/hLvN46TS78MPJDQoBMZgWYPYk9LmFmpcJvdO7oTnPIxidva5ajCn5kRsrun78aacLSkjjRQWDWHfxRYftNtl5HpNp46lxFvVdRAC/1LO2Ub5DO6u3XuFaZc21nN2C5i8LgLPMbyn7zU2UCtAPAYXSuDFj6TD38cEB jOA9W8Rx HOM7lJMm5uT8Cs3cBnbiw6wn6QI3HjTyaUS3hTv4Q3gKlhNhosnAR/B9dr1eZEux3CpMFTUOyq7a5+M5bPoNkLjImT5sBz7k02k1ZMR1TlqLPe8v6f+QJyGYk0xZ80u+2mk1s6KnZIQ5/0Gt/s9VhXA3i9Ng6HsCNZkCpjWrABnL+jKlAKEbHRnhyBMOem7yM9vfR7cgLu36WtXFLHQo+Yu/8Rq7hJdJxZwqnhTD5Qfn9kbP1drJttVRWs5S3FuNj4kgyvMzk76ueOVho3+csR3eUbDaK1McuopHqtD30JiDgWy66Jjj0IfBBjk/8iWgSpXEuF2wkzGfLRoFK7NHNPR7ibBF9xDLwYLiri3+LGZtPAPHJ7QdicYAAvRpYKLFd0vFG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add invlpgb.h with the helper functions and definitions needed to use broadcast TLB invalidation on AMD EPYC 3 and newer CPUs. Signed-off-by: Rik van Riel --- arch/x86/include/asm/invlpgb.h | 97 +++++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 1 + 2 files changed, 98 insertions(+) create mode 100644 arch/x86/include/asm/invlpgb.h diff --git a/arch/x86/include/asm/invlpgb.h b/arch/x86/include/asm/invlpgb.h new file mode 100644 index 000000000000..4dfd09e65fa6 --- /dev/null +++ b/arch/x86/include/asm/invlpgb.h @@ -0,0 +1,97 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_INVLPGB +#define _ASM_X86_INVLPGB + +#include +#include + +/* + * INVLPGB does broadcast TLB invalidation across all the CPUs in the system. + * + * The INVLPGB instruction is weakly ordered, and a batch of invalidations can + * be done in a parallel fashion. + * + * TLBSYNC is used to ensure that pending INVLPGB invalidations initiated from + * this CPU have completed. + */ +static inline void __invlpgb(unsigned long asid, unsigned long pcid, + unsigned long addr, u16 extra_count, + bool pmd_stride, unsigned long flags) +{ + u32 edx = (pcid << 16) | asid; + u32 ecx = (pmd_stride << 31) | extra_count; + u64 rax = addr | flags; + + /* INVLPGB; supported in binutils >= 2.36. */ + asm volatile(".byte 0x0f, 0x01, 0xfe" : : "a" (rax), "c" (ecx), "d" (edx)); +} + +/* Wait for INVLPGB originated by this CPU to complete. */ +static inline void tlbsync(void) +{ + cant_migrate(); + /* TLBSYNC: supported in binutils >= 0.36. */ + asm volatile(".byte 0x0f, 0x01, 0xff" ::: "memory"); +} + +/* + * INVLPGB can be targeted by virtual address, PCID, ASID, or any combination + * of the three. For example: + * - INVLPGB_VA | INVLPGB_INCLUDE_GLOBAL: invalidate all TLB entries at the address + * - INVLPGB_PCID: invalidate all TLB entries matching the PCID + * + * The first can be used to invalidate (kernel) mappings at a particular + * address across all processes. + * + * The latter invalidates all TLB entries matching a PCID. + */ +#define INVLPGB_VA BIT(0) +#define INVLPGB_PCID BIT(1) +#define INVLPGB_ASID BIT(2) +#define INVLPGB_INCLUDE_GLOBAL BIT(3) +#define INVLPGB_FINAL_ONLY BIT(4) +#define INVLPGB_INCLUDE_NESTED BIT(5) + +/* Flush all mappings for a given pcid and addr, not including globals. */ +static inline void invlpgb_flush_user(unsigned long pcid, + unsigned long addr) +{ + __invlpgb(0, pcid, addr, 0, 0, INVLPGB_PCID | INVLPGB_VA); + tlbsync(); +} + +static inline void invlpgb_flush_user_nr_nosync(unsigned long pcid, + unsigned long addr, + u16 nr, + bool pmd_stride) +{ + __invlpgb(0, pcid, addr, nr - 1, pmd_stride, INVLPGB_PCID | INVLPGB_VA); +} + +/* Flush all mappings for a given PCID, not including globals. */ +static inline void invlpgb_flush_single_pcid_nosync(unsigned long pcid) +{ + __invlpgb(0, pcid, 0, 0, 0, INVLPGB_PCID); +} + +/* Flush all mappings, including globals, for all PCIDs. */ +static inline void invlpgb_flush_all(void) +{ + __invlpgb(0, 0, 0, 0, 0, INVLPGB_INCLUDE_GLOBAL); + tlbsync(); +} + +/* Flush addr, including globals, for all PCIDs. */ +static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr) +{ + __invlpgb(0, 0, addr, nr - 1, 0, INVLPGB_INCLUDE_GLOBAL); +} + +/* Flush all mappings for all PCIDs except globals. */ +static inline void invlpgb_flush_all_nonglobals(void) +{ + __invlpgb(0, 0, 0, 0, 0, 0); + tlbsync(); +} + +#endif /* _ASM_X86_INVLPGB */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 8fe3b2dda507..dba5caa4a9f4 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include From patchwork Thu Jan 16 02:30:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00656C02180 for ; Thu, 16 Jan 2025 02:32:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 520B96B008C; Wed, 15 Jan 2025 21:32:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A70E6B0093; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E7A2280001; Wed, 15 Jan 2025 21:32:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0372A6B0089 for ; Wed, 15 Jan 2025 21:32:47 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AF4B01A0669 for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) X-FDA: 83011741974.30.5DAB336 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf24.hostedemail.com (Postfix) with ESMTP id 3DD0A18000C for ; Thu, 16 Jan 2025 02:32:46 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf24.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994766; a=rsa-sha256; cv=none; b=GOFSctjs60Jy+LanIIZAcogh0PWvUqapScZmCJahz55lggpojF/v570tR9nZ3ZA8ICz2Xk doHe9uebItuzSRZMOMADceBbYC++WXXhlrwcJuT7vRUN32EhHGYFy2yhirQTyXWkik41he KovG7d77JaNDUINjUcZhkN5UYD9/vOc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf24.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994766; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fxCD3xy+OxxqHxrPDR48qSO6ejbegAvHmdEDWAk2U4c=; b=AcOj4nprIMh2ynkZa99mZcp3fC9pfTTv2lHaiN/u9FPEfgsXFW9amEdnsApFEJHtAHU2a/ ZFveob5sIQ4sPoB/haj8rn5X8x2syErB8Er5j7GPFiEMerSw2OydIcGPX4wkh+UGbkz2Qe zecD2FXOQd6FNlV/cQO17UHiCWYB1sg= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3QRo; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 06/12] x86/mm: use INVLPGB for kernel TLB flushes Date: Wed, 15 Jan 2025 21:30:29 -0500 Message-ID: <20250116023127.1531583-7-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3DD0A18000C X-Stat-Signature: mqacy8aok6ji8jx1zgjz5fbqugrncqaj X-HE-Tag: 1736994766-348451 X-HE-Meta: U2FsdGVkX19bcsnfAqY51bTZJQ6i+Y9533b0LIX/MsHgPduyYr7qub25diGBDkLA9URKNzwKVNBAWt/DxWXGJjWsACHmRGgAKX4z8Jk6Yzwr+Fdw0Toho7DRfCUZxsAQI9Nq6XCLkPWtJusn5Bn9WiWI449HncLsRw5+c56Zb4787RCSEUAjcfTCgXYvdrXXONw6eIx5Q8FayT5p3DKXIF6LifzdCMnepOgFjoPZZkeSiOf2CZVAv08QpcsOGJ0QMjVUOq67iR7dsYI2EZ3UMlrEZH9uAErmOaecMCucBVPPq8YdmBLRjl941UIAxHxoLkJyEbvpNPgHY1MsfmmaHjzMFTafxPIaaRyFYviysINHDfxOLzUJprhFr/COJoeP2YMt2Tb4b0CHWeLxkutOHNeCCclrrN6WzFxQHGS1pPF87UHlH1TgvEr+RHeBXXCbmPjnk2waQFSEyoVTsTMZ0AkvvqlTXRZm6cb/ORzBMVOkyYefyDns93Ef02i5+Hp+PAD/M2rAsAFqDsFGEVcXJYI4s+5Awt5mmmeFo3lSFXsnQei4Uozh+c8RBBUh0rVpwz2WXJLerLfCtwZ9LzmZykay0+xN97s7edcjjP8449mgYp4TLN9C6ay8GsMEIJuYxazRgWx0RAoK+VFjNBxsBeD4++i1sIL5R174aObMSOK388rigiAaPmtiRtwH+9oQr4h2skrl7j0G+tXNKUJlHZorA/W+TAXDbmX8ThEy6LAoJ9CWYNwphYiIcaFB6xD8n7Bsd8Iro5bH9X86S8MRdvLJJYpNJeGmuzgGGpEe5/dn+e4Hy3mIW38G9Kj6TQcFFECX90QM9HKjYfSwX4frm1SODoGeZFUAfq7BpBn1fYO/YZ67oCwnoLT42rteFo6hkHB5H7DPQUnUkf0dmLbpyTzBAgvYsiD7i0HDfCb6l+/MLD5K+YVMVkGEV3GOQUodBP6C7TE3BXkR6hP63ry WHV2Hvv3 dx6Mu7SBiY1hzGXFkedQyA5e5Q4zJnlQ2Tp+kSQiN6bPinfd+JfodB32B7kjr7LnHmh0ud5Ba5kMgg80WM2047w/7zCT/POCW4EIfccYmRVj7yB8dwaOIIcxdoJxF+lKYo0hCpjlH4bVOJt7Sf8ZKzE7i8fP+M9ksbjt3WdQ3K0ON1OQm7iKpkP5yW28Gl3NxU/gzc9NZJyZzig3f7J1st52anZLmgqVkmShL+OVdpEjOKhT8EWZC0OWkIpl0PWhCvgjW5MzR235Kn3QGginowfLrROSLiPutNYLEqXsf/jTh5bt/Op0esuTCKH2BmzjelmY7RQBXIvRN1EeRlFgOXzUsZMiPjgfscyv0p78m1LLvT/TlQ3qWuZwmnxRy94NGZpDw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Use broadcast TLB invalidation for kernel addresses when available. Remove the need to send IPIs for kernel TLB flushes. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 2f38cf95dee3..0761dd224e84 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1077,6 +1077,30 @@ void flush_tlb_all(void) on_each_cpu(do_flush_tlb_all, NULL, 1); } +static bool broadcast_kernel_range_flush(struct flush_tlb_info *info) +{ + unsigned long addr; + unsigned long nr; + + if (!IS_ENABLED(CONFIG_X86_BROADCAST_TLB_FLUSH)) + return false; + + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + return false; + + if (info->end == TLB_FLUSH_ALL) { + invlpgb_flush_all(); + return true; + } + + for (addr = info->start; addr < info->end; addr += nr << PAGE_SHIFT) { + nr = min((info->end - addr) >> PAGE_SHIFT, invlpgb_count_max); + invlpgb_flush_addr_nosync(addr, nr); + } + tlbsync(); + return true; +} + static void do_kernel_range_flush(void *info) { struct flush_tlb_info *f = info; @@ -1096,7 +1120,9 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) info = get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, TLB_GENERATION_INVALID); - if (end == TLB_FLUSH_ALL) + if (broadcast_kernel_range_flush(info)) + ; /* Fall through. */ + else if (end == TLB_FLUSH_ALL) on_each_cpu(do_flush_tlb_all, NULL, 1); else on_each_cpu(do_kernel_range_flush, info, 1); From patchwork Thu Jan 16 02:30:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 114F2C02180 for ; Thu, 16 Jan 2025 03:05:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 950D36B0082; Wed, 15 Jan 2025 22:05:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 900F86B0085; Wed, 15 Jan 2025 22:05:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EF736B0088; Wed, 15 Jan 2025 22:05:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6211D6B0082 for ; Wed, 15 Jan 2025 22:05:39 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 05546C0671 for ; Thu, 16 Jan 2025 03:05:38 +0000 (UTC) X-FDA: 83011824798.25.1BE45E0 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf22.hostedemail.com (Postfix) with ESMTP id 61109C0008 for ; Thu, 16 Jan 2025 03:05:37 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736996737; a=rsa-sha256; cv=none; b=0AiTPzcUaZ3lyOcDtC5aFBf39fGN3qqk/QlJ3PfDS2vEQsPpmn+6n3LaiWhd4N01hsPgoG N2+jLGzJ5wgC9BnRQ5jnLpN4eekbpCzTmuTHYgNX3PaSJmulN/wW9mVcCuFPnqTLTjc8VX aIfXMVpGBuFV9BM9dTxIZ9h3KEyL+R8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736996737; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=orlGaKw9nqEENaNpZxcL9zaVEFoWFuuRr5656SZ4f60=; b=oT3TSlRxQ3vK5mebSsb2NpN8CCB0ASGCfh0dyZ+sQnafvLFnhKicqtiz+rc7IB6v5CZu2q 111Y1PhkO5JlOmm6w47GOAHeYOrf691tBuy0k8R2/l8E9l94lN6KSu2gL3atMFuIs/uTzj FacdxBbOb77lZttUM2mHJtIBBWNy1o4= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3VsE; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 07/12] x86/tlb: use INVLPGB in flush_tlb_all Date: Wed, 15 Jan 2025 21:30:30 -0500 Message-ID: <20250116023127.1531583-8-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 61109C0008 X-Rspamd-Server: rspam10 X-Stat-Signature: 4t8y3xg5akxzimr94dzrit3wfog7kaq8 X-HE-Tag: 1736996737-101135 X-HE-Meta: U2FsdGVkX1/ZV7tQczOCOAoCMX6WYx1F9+B+rMB5MPY6wXK9M2OFKyQ/4X9L53uQkIA6VhX73ywHiaACJNgYTl683TJSvpWt7ULqH++j4Akzkp5SlGERWqsWCd7wztbyBvOKDTZ7En6Tv6J3OFDmmSMyE900UwM2bEc6DM5kKaO5pCQLeu/yM8bd8wX2kARHunSq7Eps+cTcen5uB/7lMbELomoFbMWZjH4NUtuH3SyWELU0r2vu8pbKhoNDhJ2IM7V8swLJ+i4M1GZ0Hsj+v3fTgQJfqIC+yTB5FA1Myxfl6ytuHXtWT29NuT2t62JULNi8nDFfgA7HbeO7O7aQxkwiV+YgPdH9RbpDWR8IFPhZhPqrSAe8JaBsmsDWwtDTchKDD7BWlAbHcjng6yhSTQyc2MYXBeQIfyW6d8D9AC0sfHrFojN2s2HJ68c+8tWZDkTDA2nKRI1UDwwk1krUaELJxP3hgf/qbX7wOVdiVMqzo7YMMP96C/EWaVRBQy4eMALa3xW4XvNriVEbFwzMwWspO8GXBpJ0CKhaVZeZsYr7gpwzdo/e84enZfgKxVzOoz1/CRB4NsFe1xhdls5FSKriyRhLn2oKpc60kFC/jvCcRXZ2NxOaFqxzU7IWI8b1tBjF2ARxatEM+0oh0z5G+Miy1VYDZtHfBMOH9wXtsf4cHsCAx22GnBZ/DPTnPkfy7wjPx7jYT59IO5tgDEA4w0c3pIKdxCKeHvnnpgK6SWzAfCUbvtSjoWs/UWoiQJepVQI5KIod+gffvzymDwVVG0THwu98cWQzTK1ruuq3KVKBdeaK/X918GOvVZNtHr0mUQoEcksusTwloofixqK8umwqI9uLa6YN/WCpDd/O8XmY8ljuhqhuNJmkZsEeLMGZgVpNVl71S1U5FoX0puIG6OHzkRL1vioSLF+ZxoPi+oCllV+18yxvTPM9AngVBaEfPEJmngNkXIGk6sKOurN KeBUk7/+ hMM8wv14nVokU9myTS4urvCU8Br+TUQQV0s/gZQUIElmmqkULytC0BB5NXvhimqSoYjitL1ju79OZ6uug6jjMHJKQ3dCmLEjlhvESZi29lwHFoFcIYs1679HJBTtdyC89dqyo6A6Rrkcoobd4qP5QW6sV5JxNTEMr3sb1D5PqpOh+sakVBxxEqjjuNFFRzN8hY47R6jEgMXWd4jzQaVgE590k/dll3meGwI4edJ0kTFmNtt9E48Zp0YcomidpjZKMFUNyDIQ2aaE7nAIq56sfii9Wp6SPcgCL+dpppqWfkx6VrdpXORgs4gKaGkwVWIC9mHQw2uBgJvtw4m1u/FRBoN7hEAXZO3wuX7HKoNF16+Q+tqZqMv4F6HlKwMbo0JjQsRqmmoqRKL4CNor7i6AERwVH2XeSIB/0BwyXzZbs4YhzyXIFfvXCEzpUOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The flush_tlb_all() function is not used a whole lot, but we might as well use broadcast TLB flushing there, too. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 0761dd224e84..49b3e90503a3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1065,6 +1065,19 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, } +static bool broadcast_flush_tlb_all(void) +{ + if (!IS_ENABLED(CONFIG_X86_BROADCAST_TLB_FLUSH)) + return false; + + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + return false; + + guard(preempt)(); + invlpgb_flush_all(); + return true; +} + static void do_flush_tlb_all(void *info) { count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); @@ -1073,6 +1086,8 @@ static void do_flush_tlb_all(void *info) void flush_tlb_all(void) { + if (broadcast_flush_tlb_all()) + return; count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); on_each_cpu(do_flush_tlb_all, NULL, 1); } From patchwork Thu Jan 16 02:30:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E545EC02180 for ; Thu, 16 Jan 2025 02:55:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FF886B0082; Wed, 15 Jan 2025 21:55:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6881F6B0085; Wed, 15 Jan 2025 21:55:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 529156B0088; Wed, 15 Jan 2025 21:55:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 318386B0082 for ; Wed, 15 Jan 2025 21:55:43 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E217BAEB6B for ; Thu, 16 Jan 2025 02:55:42 +0000 (UTC) X-FDA: 83011799724.24.67BC14C Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf14.hostedemail.com (Postfix) with ESMTP id 62861100007 for ; Thu, 16 Jan 2025 02:55:41 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736996141; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cQd0Jxm3wPpHkQ2qCXJZ2BF12qzYrqztWPzol4AzppY=; b=gazXpqypeBZzLtRu9KdhIZXX7PUcg9k44aEQGNr6zRmb4kFqZwjEg60b46szPB7m9kd9IL h4x70qMcTbWI3XqJEIcAEH/pL0mdXWMUs51P/7S5HBO68NKHPO8Vl2B+ulAz5C08aKbvpO sE5q9RwYOPo/19ExKQPdSWxPoehuoHM= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736996141; a=rsa-sha256; cv=none; b=HWeIfteldIA3MAQ8l6WQqP62AiFVRP4fYV4x40fguDZVIWowC6iUDpsgzvyAk+Y0HJf6wK tyErNsgJ3je3rK27V6eJETbbVAKRsUKA5sIROZUoVTmK9orbrKOkdRnMyDDI3+Cj7EBgQR H+LRS97ylGN9gREgns4T98kZvhDssrQ= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3c0c; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 08/12] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing Date: Wed, 15 Jan 2025 21:30:31 -0500 Message-ID: <20250116023127.1531583-9-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 62861100007 X-Stat-Signature: rg6yptebmk1c5xgpjru1b54g3jd1oh3y X-Rspam-User: X-HE-Tag: 1736996141-48568 X-HE-Meta: U2FsdGVkX19DOcJo1qGE13frstQUMzbjUep452AZaeIMOnCV1IbkCprWAWlRVit2X/wfIjm/KRZf9Pl7VvEIKbQzCX+5F1KkKt8dZ7lPXE28J9QmeE2dD2RmLgZh+0f6EbCTWgH4gIV3LPGI7CpEIKmGuhy+FPU7X6gFrcxnLQ0NDl4yCt6Qlk0dLPlIQwAV6DFcbcwrIEbjJhZYbIju3Z44BNEWkFW8baxijw3YYoIZ6kpjZrOtcw2tAapp2UQam72yDL2E4gHzkPh3WnJNaIHft0LFfeLcthPO+TyfFnRVKx8lTQv86PcNeC/HIa/SosgFkY2PCJ8GVW1KpnrHhHCgWEDBetD1uGCplCfvNDnke6FE7ePfZBNCT9oW3Gqt8PnC5SDnJwHQlBbYOLsEwc8/SlHaMDm/GnMhUl18bF5laRzSHpOf7PCBoljJ2cbIfeSPPFvaiaVS/+6+2WUse9GwTzJKvjoW9eXO6+lez2QtXNP+ZWCep2PpWo+uiTVtPK5GNnLS5+gaaUuY5mia6r2nRuwlEEYDOTdkPAC3qyK2S7m4mcJrKN228xtiPe5Ae6T8MSlRF6Fbwrm3+VdZj1MB04dT5I9kRM0lhdIGVf0IwNKVriQgf58SHzVG4U6YOYfVbhiH4jHm3ED+PctcXKw6f0TlVEWT0CaW7wFfzDLiHyy1ShkYqyVZdwIBE+NCgqEvuVmzgWhYEISLGDKwsnII0XHOpunJpdWOiGEg62L+A+70DqWNhQaUPwBAr0h4pq3dy5vrInL/lPLjs/k+UBmrnptqweYYdfoyvttU8S4QXU+rBYPDKndDa2EHJfebEs/8hdfbEWCv1UElPFbojlZCOluc+aQHuK8yQYTONs2IYucSX8rZ+Apf9W3IpeZgn4L9XxlvPhqw8ukAKElkXmnGGAedyvDo0ZSmrsd5mgST4R3JMA5QVICLWXlxWoHmvLlahVi4bkyo3fkgKwq mQel+X8L 1y/DkVYxicLi0xycMj7neJ1IqHSUGWQFHe+S8PS5Jnji0ICJRxDsm1+Z2vXSJnrGlqrKp5n8HPjkwLsbkTS7xEDCWssrV7YnRA2laiUWQG8XSEr/9Dw4AlhK31GGcfy7okOJX5lPj2XxRk/yvgNWFaC12ilpxBxQhxlWRQayoPehmJw1zmq76X1naRUFvL0khbLjru/F8o4azY1M3GYPjzr6cMb46G4LVFCEVFcSLb3OmpAZU3owWY0ffBzBY1IyY8ufx7UVOUkC6WEWEmDjrjhbEJO0T8kGYCd2OfmugC0aJU17AqsMpNhOqNV4UzZfxBE8tg0FGpmrm5KNbqiETgEmz/A6q4NkCzls8G0pz1lRvlAlTmDLcV4MS2zvgOPJjwNE2yLjceF1sFkLglbTulshr7dmzuD60WiPw2bkCI1Dj2mY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the page reclaim code, we only track the CPU(s) where the TLB needs to be flushed, rather than all the individual mappings that may be getting invalidated. Use broadcast TLB flushing when that is available. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 49b3e90503a3..746a89924f02 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1321,7 +1321,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + invlpgb_flush_all_nonglobals(); + } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); From patchwork Thu Jan 16 02:30:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941164 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CBD0C02183 for ; Thu, 16 Jan 2025 02:33:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3B4F6B0096; Wed, 15 Jan 2025 21:32:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A7527280001; Wed, 15 Jan 2025 21:32:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8813B6B0099; Wed, 15 Jan 2025 21:32:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 61A376B0096 for ; Wed, 15 Jan 2025 21:32:51 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 270C1A05A2 for ; Thu, 16 Jan 2025 02:32:51 +0000 (UTC) X-FDA: 83011742142.20.1378F5F Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf30.hostedemail.com (Postfix) with ESMTP id 870868001B for ; Thu, 16 Jan 2025 02:32:49 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994769; a=rsa-sha256; cv=none; b=LVN+3OM7oR4wPf6ip5gQg1T+tBvVRWwaWIgtriX7p9zqMLIiNPKmYkOEH/CXqJfMbwYhs7 VZ3GpDfo7yFYfTKRlst86PjMW+6zuEToramKNRrWnDFXaMpZnbqHxEigVTwsYp4sROfFvm AZCcrw/d2A7mvZW8Vs9CDzHdr0BoSxk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994769; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u4+zTqlXVKz7yvP6+sB0bTYXyAbCMyH4KgkmKacFmAM=; b=goHVBo6b9t3u2sqmvxNs02lXOKUXp3QAs7qnQ7Pi27X3oE/blh+7u2lh7QGILMH81FL6wu C2mif5EQYtYUHGpxrxC53m7R7tWQDKnOQfMf0dkYGpVeHqGW0lokMHSu7NWb27MW9lS0nD lP+Y6NlL61pj1EiGvqjS2F/d2GWJzrU= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3ixR; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes Date: Wed, 15 Jan 2025 21:30:32 -0500 Message-ID: <20250116023127.1531583-10-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 870868001B X-Rspamd-Server: rspam10 X-Stat-Signature: g35syyht8gyb9nkzx5joyxxso76d5akb X-HE-Tag: 1736994769-418812 X-HE-Meta: U2FsdGVkX1/o0EVja2xa+7YNA0yU6P80GcvjaiTKgKEH49Msh5so3gNLdSz4PoWJFTQj/Ip8Ibf0Eq6oJ4fTM1SQpw1tQ8Q3g5UQhsqzaYJhZdgA87rC9zUp3hStqHSM1c18iRShT68bguDzP6bSBpYblDblV7dv8NeE1vkVed4JofEskUBYBQed4ZMUybl0afMvJQanrbCKC0fl9EpFCwu/J8e6UbXtp4z8wY9mueuK7wEXmwi7HS3iky3yJ6iy8NmcZJkcmyhQytA/NVCx2dir74KXC96KeS1EN+fqj3XcigR38lHYwrfoneMVCVUXjT6xzIlNx97NuKyswHA6gcrifx8PeMb7UKhYTpGlr4/MdUHf466ObmfNBHXTIF6fxRAIaYTTEO2USgSOZx9XIEgI0CYLnfHXAeXbevKq8sThrr8Taj5Rf1NkIILyBPqimKj3rNxOW+TwvbKLn91NhnEfxVgZ4794Jm/Bmmo7xZy48ickYfPu85rsWKDKHBNB/lHtxfYagGjHyV6kr0qGk3Vb2/bEYtF19u6O15svc1iV6+Lmkw+pZDLk2JuJoPhVachNj9kXxUSD3sbImHGaUzbe8loZnLjenccx9LxUsXuOQD62idd40Nh5YEhjt5xzPVfqxFFPEUlgDolU1CtewfEN71razwNQQs2E4I0NKZy4WsJ2uQ64WM7hR187Ywq+q2tQqufvNNXJTbUwEm7AZqfJQduLqN0RaoGY7Lt0wkKS8vqms3gqeyZcb2/CyEi6vBQJAA62noGUWhSj9L2MdSRSpTPbCR6wkQuW8Rk7Y7uLu+e1mJeO8n0hL14/3m2eL2sz9CBCLBEwGdNyW5bC4SqfMt5uganbGNIoZp4GVVlQBbUbpDAu3xfL+4pvaPOh6MyuIV9510FghiNJo8SYwUIZs+BzCeAZoKRyci9pvO7cXqMVVoGbqS3doPlpN4DtM48VDU1yJbNo+F36td9 EoE+PK1j dEPg/pupPettAJFcJfxQnkOr7AiXmRB4HrZMU76XQCgAIYwC0EeBtli+KOOClGxxJZThr7EwBtcC8L0OBFPvZNwwQ6SqctMg1JY+32VqpjM8uaYryOuHyChZEUIaPRO+EiUXLOSx7whhrnaYnl0tdyI6QsKRpNRZv+2G2H5hMxZFCQQGghKJls2z8j/vc2WAkgLgiJkocG7FLondvnR+Oxd615Oi70i1m/U0wdgtspY4WLNAHu3DnkWrzRDoKT6tvpUtrCaEtpBWYz/rQ3dYmqTnlp2L4pELs3A4GMcf9Dr7dVgVNRFlZ98WZn9JvPzt4kE4K6oePNllxMreQsSepW000MdULQ3ElXU3BI8+9JEnW0YcvVvzNqm2KofAJIZkw8jkN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Use broadcast TLB invalidation, using the INVPLGB instruction, on AMD EPYC 3 and newer CPUs. In order to not exhaust PCID space, and keep TLB flushes local for single threaded processes, we only hand out broadcast ASIDs to processes active on 3 or more CPUs, and gradually increase the threshold as broadcast ASID space is depleted. Signed-off-by: Rik van Riel --- arch/x86/include/asm/mmu.h | 6 + arch/x86/include/asm/mmu_context.h | 14 ++ arch/x86/include/asm/tlbflush.h | 72 ++++++ arch/x86/mm/tlb.c | 362 ++++++++++++++++++++++++++++- 4 files changed, 442 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 3b496cdcb74b..d71cd599fec4 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -69,6 +69,12 @@ typedef struct { u16 pkey_allocation_map; s16 execute_only_pkey; #endif + +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH + u16 global_asid; + bool asid_transition; +#endif + } mm_context_t; #define INIT_MM_CONTEXT(mm) \ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 795fdd53bd0a..d670699d32c2 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -139,6 +139,8 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm) #define enter_lazy_tlb enter_lazy_tlb extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk); +extern void destroy_context_free_global_asid(struct mm_struct *mm); + /* * Init a new mm. Used on mm copies, like at fork() * and on mm's that are brand-new, like at execve(). @@ -161,6 +163,14 @@ static inline int init_new_context(struct task_struct *tsk, mm->context.execute_only_pkey = -1; } #endif + +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + mm->context.global_asid = 0; + mm->context.asid_transition = false; + } +#endif + mm_reset_untag_mask(mm); init_new_context_ldt(mm); return 0; @@ -170,6 +180,10 @@ static inline int init_new_context(struct task_struct *tsk, static inline void destroy_context(struct mm_struct *mm) { destroy_context_ldt(mm); +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) + destroy_context_free_global_asid(mm); +#endif } extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index dba5caa4a9f4..5eae5c1aafa5 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -239,6 +239,78 @@ void flush_tlb_one_kernel(unsigned long addr); void flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info); +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH +static inline bool is_dyn_asid(u16 asid) +{ + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + return true; + + return asid < TLB_NR_DYN_ASIDS; +} + +static inline bool is_global_asid(u16 asid) +{ + return !is_dyn_asid(asid); +} + +static inline bool in_asid_transition(const struct flush_tlb_info *info) +{ + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + return false; + + return info->mm && READ_ONCE(info->mm->context.asid_transition); +} + +static inline u16 mm_global_asid(struct mm_struct *mm) +{ + u16 asid; + + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + return 0; + + asid = READ_ONCE(mm->context.global_asid); + + /* mm->context.global_asid is either 0, or a global ASID */ + VM_WARN_ON_ONCE(is_dyn_asid(asid)); + + return asid; +} +#else +static inline bool is_dyn_asid(u16 asid) +{ + return true; +} + +static inline bool is_global_asid(u16 asid) +{ + return false; +} + +static inline bool in_asid_transition(const struct flush_tlb_info *info) +{ + return false; +} + +static inline u16 mm_global_asid(struct mm_struct *mm) +{ + return 0; +} + +static inline bool needs_global_asid_reload(struct mm_struct *next, u16 prev_asid) +{ + return false; +} + +static inline void broadcast_tlb_flush(struct flush_tlb_info *info) +{ + VM_WARN_ON_ONCE(1); +} + +static inline void consider_global_asid(struct mm_struct *mm) +{ +} +#endif + #ifdef CONFIG_PARAVIRT #include #endif diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 746a89924f02..bfc69ae4ea40 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -74,13 +74,15 @@ * use different names for each of them: * * ASID - [0, TLB_NR_DYN_ASIDS-1] - * the canonical identifier for an mm + * the canonical identifier for an mm, dynamically allocated on each CPU + * [TLB_NR_DYN_ASIDS, MAX_ASID_AVAILABLE-1] + * the canonical, global identifier for an mm, identical across all CPUs * - * kPCID - [1, TLB_NR_DYN_ASIDS] + * kPCID - [1, MAX_ASID_AVAILABLE] * the value we write into the PCID part of CR3; corresponds to the * ASID+1, because PCID 0 is special. * - * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS] + * uPCID - [2048 + 1, 2048 + MAX_ASID_AVAILABLE] * for KPTI each mm has two address spaces and thus needs two * PCID values, but we can still do with a single ASID denomination * for each mm. Corresponds to kPCID + 2048. @@ -225,6 +227,20 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, return; } + /* + * TLB consistency for global ASIDs is maintained with broadcast TLB + * flushing. The TLB is never outdated, and does not need flushing. + */ + if (IS_ENABLED(CONFIG_X86_BROADCAST_TLB_FLUSH) && static_cpu_has(X86_FEATURE_INVLPGB)) { + u16 global_asid = mm_global_asid(next); + + if (global_asid) { + *new_asid = global_asid; + *need_flush = false; + return; + } + } + if (this_cpu_read(cpu_tlbstate.invalidate_other)) clear_asid_other(); @@ -251,6 +267,290 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, *need_flush = true; } +#ifdef CONFIG_X86_BROADCAST_TLB_FLUSH +/* + * Logic for broadcast TLB invalidation. + */ +static DEFINE_RAW_SPINLOCK(global_asid_lock); +static u16 last_global_asid = MAX_ASID_AVAILABLE; +static DECLARE_BITMAP(global_asid_used, MAX_ASID_AVAILABLE) = { 0 }; +static DECLARE_BITMAP(global_asid_freed, MAX_ASID_AVAILABLE) = { 0 }; +static int global_asid_available = MAX_ASID_AVAILABLE - TLB_NR_DYN_ASIDS - 1; + +static void reset_global_asid_space(void) +{ + lockdep_assert_held(&global_asid_lock); + + /* + * A global TLB flush guarantees that any stale entries from + * previously freed global ASIDs get flushed from the TLB + * everywhere, making these global ASIDs safe to reuse. + */ + invlpgb_flush_all_nonglobals(); + + /* + * Clear all the previously freed global ASIDs from the + * broadcast_asid_used bitmap, now that the global TLB flush + * has made them actually available for re-use. + */ + bitmap_andnot(global_asid_used, global_asid_used, + global_asid_freed, MAX_ASID_AVAILABLE); + bitmap_clear(global_asid_freed, 0, MAX_ASID_AVAILABLE); + + /* + * ASIDs 0-TLB_NR_DYN_ASIDS are used for CPU-local ASID + * assignments, for tasks doing IPI based TLB shootdowns. + * Restart the search from the start of the global ASID space. + */ + last_global_asid = TLB_NR_DYN_ASIDS; +} + +static u16 get_global_asid(void) +{ + lockdep_assert_held(&global_asid_lock); + + do { + u16 start = last_global_asid; + u16 asid = find_next_zero_bit(global_asid_used, MAX_ASID_AVAILABLE, start); + + if (asid >= MAX_ASID_AVAILABLE) { + reset_global_asid_space(); + continue; + } + + /* Claim this global ASID. */ + __set_bit(asid, global_asid_used); + last_global_asid = asid; + global_asid_available--; + return asid; + } while (1); +} + +/* + * Returns true if the mm is transitioning from a CPU-local ASID to a global + * (INVLPGB) ASID, or the other way around. + */ +static bool needs_global_asid_reload(struct mm_struct *next, u16 prev_asid) +{ + u16 global_asid = mm_global_asid(next); + + if (global_asid && prev_asid != global_asid) + return true; + + if (!global_asid && is_global_asid(prev_asid)) + return true; + + return false; +} + +void destroy_context_free_global_asid(struct mm_struct *mm) +{ + if (!mm->context.global_asid) + return; + + guard(raw_spinlock_irqsave)(&global_asid_lock); + + /* The global ASID can be re-used only after flush at wrap-around. */ + __set_bit(mm->context.global_asid, global_asid_freed); + + mm->context.global_asid = 0; + global_asid_available++; +} + +/* + * Check whether a process is currently active on more than "threshold" CPUs. + * This is a cheap estimation on whether or not it may make sense to assign + * a global ASID to this process, and use broadcast TLB invalidation. + */ +static bool mm_active_cpus_exceeds(struct mm_struct *mm, int threshold) +{ + int count = 0; + int cpu; + + /* This quick check should eliminate most single threaded programs. */ + if (cpumask_weight(mm_cpumask(mm)) <= threshold) + return false; + + /* Slower check to make sure. */ + for_each_cpu(cpu, mm_cpumask(mm)) { + /* Skip the CPUs that aren't really running this process. */ + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) != mm) + continue; + + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu)) + continue; + + if (++count > threshold) + return true; + } + return false; +} + +/* + * Assign a global ASID to the current process, protecting against + * races between multiple threads in the process. + */ +static void use_global_asid(struct mm_struct *mm) +{ + guard(raw_spinlock_irqsave)(&global_asid_lock); + + /* This process is already using broadcast TLB invalidation. */ + if (mm->context.global_asid) + return; + + /* The last global ASID was consumed while waiting for the lock. */ + if (!global_asid_available) + return; + + /* + * The transition from IPI TLB flushing, with a dynamic ASID, + * and broadcast TLB flushing, using a global ASID, uses memory + * ordering for synchronization. + * + * While the process has threads still using a dynamic ASID, + * TLB invalidation IPIs continue to get sent. + * + * This code sets asid_transition first, before assigning the + * global ASID. + * + * The TLB flush code will only verify the ASID transition + * after it has seen the new global ASID for the process. + */ + WRITE_ONCE(mm->context.asid_transition, true); + WRITE_ONCE(mm->context.global_asid, get_global_asid()); +} + +/* + * Figure out whether to assign a global ASID to a process. + * We vary the threshold by how empty or full global ASID space is. + * 1/4 full: >= 4 active threads + * 1/2 full: >= 8 active threads + * 3/4 full: >= 16 active threads + * 7/8 full: >= 32 active threads + * etc + * + * This way we should never exhaust the global ASID space, even on very + * large systems, and the processes with the largest number of active + * threads should be able to use broadcast TLB invalidation. + */ +#define HALFFULL_THRESHOLD 8 +static bool meets_global_asid_threshold(struct mm_struct *mm) +{ + int avail = global_asid_available; + int threshold = HALFFULL_THRESHOLD; + + if (!avail) + return false; + + if (avail > MAX_ASID_AVAILABLE * 3 / 4) { + threshold = HALFFULL_THRESHOLD / 4; + } else if (avail > MAX_ASID_AVAILABLE / 2) { + threshold = HALFFULL_THRESHOLD / 2; + } else if (avail < MAX_ASID_AVAILABLE / 3) { + do { + avail *= 2; + threshold *= 2; + } while ((avail + threshold) < MAX_ASID_AVAILABLE / 2); + } + + return mm_active_cpus_exceeds(mm, threshold); +} + +static void consider_global_asid(struct mm_struct *mm) +{ + if (!static_cpu_has(X86_FEATURE_INVLPGB)) + return; + + /* Check every once in a while. */ + if ((current->pid & 0x1f) != (jiffies & 0x1f)) + return; + + if (meets_global_asid_threshold(mm)) + use_global_asid(mm); +} + +static void finish_asid_transition(struct flush_tlb_info *info) +{ + struct mm_struct *mm = info->mm; + int bc_asid = mm_global_asid(mm); + int cpu; + + if (!READ_ONCE(mm->context.asid_transition)) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + /* + * The remote CPU is context switching. Wait for that to + * finish, to catch the unlikely case of it switching to + * the target mm with an out of date ASID. + */ + while (READ_ONCE(per_cpu(cpu_tlbstate.loaded_mm, cpu)) == LOADED_MM_SWITCHING) + cpu_relax(); + + if (READ_ONCE(per_cpu(cpu_tlbstate.loaded_mm, cpu)) != mm) + continue; + + /* + * If at least one CPU is not using the global ASID yet, + * send a TLB flush IPI. The IPI should cause stragglers + * to transition soon. + * + * This can race with the CPU switching to another task; + * that results in a (harmless) extra IPI. + */ + if (READ_ONCE(per_cpu(cpu_tlbstate.loaded_mm_asid, cpu)) != bc_asid) { + flush_tlb_multi(mm_cpumask(info->mm), info); + return; + } + } + + /* All the CPUs running this process are using the global ASID. */ + WRITE_ONCE(mm->context.asid_transition, false); +} + +static void broadcast_tlb_flush(struct flush_tlb_info *info) +{ + bool pmd = info->stride_shift == PMD_SHIFT; + unsigned long maxnr = invlpgb_count_max; + unsigned long asid = info->mm->context.global_asid; + unsigned long addr = info->start; + unsigned long nr; + + /* Flushing multiple pages at once is not supported with 1GB pages. */ + if (info->stride_shift > PMD_SHIFT) + maxnr = 1; + + /* + * TLB flushes with INVLPGB are kicked off asynchronously. + * The inc_mm_tlb_gen() guarantees page table updates are done + * before these TLB flushes happen. + */ + if (info->end == TLB_FLUSH_ALL) { + invlpgb_flush_single_pcid_nosync(kern_pcid(asid)); + /* Do any CPUs supporting INVLPGB need PTI? */ + if (static_cpu_has(X86_FEATURE_PTI)) + invlpgb_flush_single_pcid_nosync(user_pcid(asid)); + } else for (; addr < info->end; addr += nr << info->stride_shift) { + /* + * Calculate how many pages can be flushed at once; if the + * remainder of the range is less than one page, flush one. + */ + nr = min(maxnr, (info->end - addr) >> info->stride_shift); + nr = max(nr, 1); + + invlpgb_flush_user_nr_nosync(kern_pcid(asid), addr, nr, pmd); + /* Do any CPUs supporting INVLPGB need PTI? */ + if (static_cpu_has(X86_FEATURE_PTI)) + invlpgb_flush_user_nr_nosync(user_pcid(asid), addr, nr, pmd); + } + + finish_asid_transition(info); + + /* Wait for the INVLPGBs kicked off above to finish. */ + tlbsync(); +} +#endif /* CONFIG_X86_BROADCAST_TLB_FLUSH */ + /* * Given an ASID, flush the corresponding user ASID. We can delay this * until the next time we switch to it. @@ -556,8 +856,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, */ if (prev == next) { /* Not actually switching mm's */ - VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != - next->context.ctx_id); + VM_WARN_ON(is_dyn_asid(prev_asid) && + this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != + next->context.ctx_id); /* * If this races with another thread that enables lam, 'new_lam' @@ -573,6 +874,23 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, !cpumask_test_cpu(cpu, mm_cpumask(next)))) cpumask_set_cpu(cpu, mm_cpumask(next)); + /* + * Check if the current mm is transitioning to a new ASID. + */ + if (needs_global_asid_reload(next, prev_asid)) { + next_tlb_gen = atomic64_read(&next->context.tlb_gen); + + choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); + goto reload_tlb; + } + + /* + * Broadcast TLB invalidation keeps this PCID up to date + * all the time. + */ + if (is_global_asid(prev_asid)) + return; + /* * If the CPU is not in lazy TLB mode, we are just switching * from one thread in a process to another thread in the same @@ -606,6 +924,13 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, */ cond_mitigation(tsk); + /* + * Let nmi_uaccess_okay() and finish_asid_transition() + * know that we're changing CR3. + */ + this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING); + barrier(); + /* * Leave this CPU in prev's mm_cpumask. Atomic writes to * mm_cpumask can be expensive under contention. The CPU @@ -620,14 +945,12 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); - - /* Let nmi_uaccess_okay() know that we're changing CR3. */ - this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING); - barrier(); } +reload_tlb: new_lam = mm_lam_cr3_mask(next); if (need_flush) { + VM_WARN_ON_ONCE(is_global_asid(new_asid)); this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); load_new_mm_cr3(next->pgd, new_asid, new_lam, true); @@ -746,7 +1069,7 @@ static void flush_tlb_func(void *info) const struct flush_tlb_info *f = info; struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm); u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); - u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); + u64 local_tlb_gen; bool local = smp_processor_id() == f->initiating_cpu; unsigned long nr_invalidate = 0; u64 mm_tlb_gen; @@ -769,6 +1092,16 @@ static void flush_tlb_func(void *info) if (unlikely(loaded_mm == &init_mm)) return; + /* Reload the ASID if transitioning into or out of a global ASID */ + if (needs_global_asid_reload(loaded_mm, loaded_mm_asid)) { + switch_mm_irqs_off(NULL, loaded_mm, NULL); + loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + } + + /* Broadcast ASIDs are always kept up to date with INVLPGB. */ + if (is_global_asid(loaded_mm_asid)) + return; + VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) != loaded_mm->context.ctx_id); @@ -786,6 +1119,8 @@ static void flush_tlb_func(void *info) return; } + local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); + if (unlikely(f->new_tlb_gen != TLB_GENERATION_INVALID && f->new_tlb_gen <= local_tlb_gen)) { /* @@ -953,7 +1288,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables) + if (info->freed_tables || in_asid_transition(info)) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); else on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func, @@ -1049,9 +1384,12 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { + if (mm_global_asid(mm)) { + broadcast_tlb_flush(info); + } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { info->trim_cpumask = should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), info); + consider_global_asid(mm); } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); From patchwork Thu Jan 16 02:30:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0534DC02180 for ; Thu, 16 Jan 2025 02:32:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7667C6B0082; Wed, 15 Jan 2025 21:32:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 716FF6B0085; Wed, 15 Jan 2025 21:32:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DEE1280001; Wed, 15 Jan 2025 21:32:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3A80B6B0082 for ; Wed, 15 Jan 2025 21:32:47 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E69A01A067F for ; Thu, 16 Jan 2025 02:32:46 +0000 (UTC) X-FDA: 83011741932.29.1686B58 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf27.hostedemail.com (Postfix) with ESMTP id 544B340006 for ; Thu, 16 Jan 2025 02:32:45 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994765; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QcE/FncXkWPWNC9C7QXkQ4tE/7UEtdm83co8kX6IdrM=; b=y/7OxsQsWMjIcfdw1XxNGfYnAvIFjNLcdHhKnombWb3JHkZ9IrZZP6yfpXC8GVZKuY45eE pJ2tVk2/5B7lPWjDBm3Em6FWcjlKN+aQy4imtDyHqYtkaNCgDvmtpQW63GHpwlymYLTHlm VmaWqeIcI6p1xZWSGwyAZ7as2cRcPrI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994765; a=rsa-sha256; cv=none; b=0qDmMqdGl9J5vmyCO74LjVq7NX/xI06xx3X4Bqf73x2YmdXbsqdwBXuHsa/0pfcUkIgWrn E0U+dVOByPMkVrHeHg7z49xEB4G7F6kbPeM4TkEAUoeFRJJ2M4KMG+4VJwg7lAM1wnBfHc ut8Tc1gomFzaNccZJ57axEoQ54/bXDk= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3pb4; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code Date: Wed, 15 Jan 2025 21:30:33 -0500 Message-ID: <20250116023127.1531583-11-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 544B340006 X-Stat-Signature: sxte6rqzpfidzrkhqznuipfcb6d8ehnd X-HE-Tag: 1736994765-497859 X-HE-Meta: U2FsdGVkX19N/XtFdMdSbv6BeG1+XLLkO9Po3fRinmbIgBNwYG9O+F8jwpOW+1AUpNFJkCtWbHiOA+RQUX2uC+ZUQ8H1Bf7t06tmiNFXIy676+h6CL0YUIMGFb1OkAEhDIWXrTXYwz1vviUbSUOMmoz9285pu1L+VFHIU8geM5hqCUEg3H4MZxsVMvtDmnpHWQcz8sbjBJ4va9PxzI9QQgNfOxrMOes6hXSMojMmPjqN5lMQsZxBDv48FA2f+8VDuN0DiPGAjmGRBKEliHDIgWfJ+ZlROFfLSofSm4EZhLBomWlM6T7XGg3W3ZiQ28PXqE4u9tn38Bn+/lZlOuYVeU87AonT3UF2+eEU109Yz7bzKrfbuJE/dDUbj3hM8F0GXAbPFibtJJOzLi/b/FfUh3hqsobLQFyctqB0aTUHqNLsbfFGoq2p8FK/bPc52PlzYV/RtDWteJPXDbhbBZ274u73vbbo5xesRkOaTYYsC4tqrSE0Kj/Jf7zSUa2uzTRoQUaQ7ekPebXbo+abvU822RLlA5hUpx4WblI/YJGDP5dWh4ni345ObVCvNXv7Zq9VPjoBU4mnnfMtiW0vebMEqMf8FR+XbASeHuFcT/5ka2DqCDGG156ofGAY+PJgCt/o9e00+qHQslQVEx8M1tlyxD7sRcqfGwX8E5sFKNoubygH99foWSTT90AaHlr56XTR6ncVdY0Og/OnWRHBqSZBcThKp4yZVwExJ6oydqArF3nL3nQ9Szf6FulAG/x0vOHNwUd3YOkzM/EkNmLmy0hQlDJQvyqLg+79fuu7JLIVKO4fExooV/PxxJyTvl5//H0Sw7s7Ap6yAqQdyL1rpOypLTNNdkCHbxnUHUNf21eyes/gu88zEkKwPw4ZycTAB9WGDA9BOnfZGlSO0UWStxtBBiPo7JWh3gBRS4y7FoF+gmsfa7svQFXyR1U8qfyKzmB1QlPjd+lmK7WqVAdMfc/ shJgpv9l 6HwNqGoVMAO96HH2zDEIkWktLWC9qlyShSsi/UOcpkGLGZvf9ZaMhJCw7gUzGfYm9VfI5aYsZVW2LneAICqvbZXoIoZINX2TVs20rpnNDjlSuYPhiy4tS75sDp8ZHxIyaGzoJ2/lcAOtSnmMSBltDA+1YMUlpSpIJi8FapbK/eiXVDfHd8LZi/iBoPOWYgRw+ZKtjqFotoIuN4bYhJNa0H4v+cx0zRhk44qOA1dHgXeA52+KZb9NsU39pEyxa6QParJzdvGT8eby5f/2uhEV31/GpoVeqyFkt2sO89yVkaDrizVtd8O9MqKvnyifobmlLd8Dr1J4MqlsnuZJIz3fgXAugSuMtqYT9GFM6Ch8FC58flZoUByHLPly7IVpPZWUmSnE3fUOdbxR48CzYAKnUpXT59uf+enYIySLfYi/+1l0EuAA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Instead of doing a system-wide TLB flush from arch_tlbbatch_flush, queue up asynchronous, targeted flushes from arch_tlbbatch_add_pending. This also allows us to avoid adding the CPUs of processes using broadcast flushing to the batch->cpumask, and will hopefully further reduce TLB flushing from the reclaim and compaction paths. Signed-off-by: Rik van Riel --- arch/x86/include/asm/tlbbatch.h | 1 + arch/x86/include/asm/tlbflush.h | 12 ++------ arch/x86/mm/tlb.c | 54 +++++++++++++++++++++++++++++++-- 3 files changed, 55 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/tlbbatch.h b/arch/x86/include/asm/tlbbatch.h index 1ad56eb3e8a8..f9a17edf63ad 100644 --- a/arch/x86/include/asm/tlbbatch.h +++ b/arch/x86/include/asm/tlbbatch.h @@ -10,6 +10,7 @@ struct arch_tlbflush_unmap_batch { * the PFNs being flushed.. */ struct cpumask cpumask; + bool used_invlpgb; }; #endif /* _ARCH_X86_TLBBATCH_H */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 5eae5c1aafa5..e5516afdef7d 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -358,21 +358,15 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) return atomic64_inc_return(&mm->context.tlb_gen); } -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) -{ - inc_mm_tlb_gen(mm); - cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); - mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL); -} - static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) { flush_tlb_mm(mm); } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr); static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index bfc69ae4ea40..81f847c94321 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1659,9 +1659,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { - invlpgb_flush_all_nonglobals(); - } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { + if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); @@ -1670,12 +1668,62 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) local_irq_enable(); } + /* + * If we issued (asynchronous) INVLPGB flushes, wait for them here. + * The cpumask above contains only CPUs that were running tasks + * not using broadcast TLB flushing. + */ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->used_invlpgb) { + tlbsync(); + migrate_enable(); + batch->used_invlpgb = false; + } + cpumask_clear(&batch->cpumask); put_flush_tlb_info(); put_cpu(); } +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr) +{ + if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) { + u16 asid = mm_global_asid(mm); + /* + * Queue up an asynchronous invalidation. The corresponding + * TLBSYNC is done in arch_tlbbatch_flush(), and must be done + * on the same CPU. + */ + if (!batch->used_invlpgb) { + batch->used_invlpgb = true; + migrate_disable(); + } + invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false); + /* Do any CPUs supporting INVLPGB need PTI? */ + if (static_cpu_has(X86_FEATURE_PTI)) + invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false); + + /* + * Some CPUs might still be using a local ASID for this + * process, and require IPIs, while others are using the + * global ASID. + * + * In this corner case we need to do both the broadcast + * TLB invalidation, and send IPIs. The IPIs will help + * stragglers transition to the broadcast ASID. + */ + if (READ_ONCE(mm->context.asid_transition)) + goto also_send_ipi; + } else { +also_send_ipi: + inc_mm_tlb_gen(mm); + cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); + } + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL); +} + /* * Blindly accessing user memory from NMI context can be dangerous * if we're in the middle of switching the current user task or From patchwork Thu Jan 16 02:30:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9380C02180 for ; Thu, 16 Jan 2025 02:33:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE2756B009A; Wed, 15 Jan 2025 21:32:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4383280001; Wed, 15 Jan 2025 21:32:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6EDB6B009C; Wed, 15 Jan 2025 21:32:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A1E926B009A for ; Wed, 15 Jan 2025 21:32:58 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6439AAEF31 for ; Thu, 16 Jan 2025 02:32:58 +0000 (UTC) X-FDA: 83011742436.16.5D85BDF Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf13.hostedemail.com (Postfix) with ESMTP id D996E20006 for ; Thu, 16 Jan 2025 02:32:56 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736994776; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E11YoPwP2rmb054DCbWgK0xc/jXlKCd3xTPcG1+CCjs=; b=sMKZVRVMqWB/k/G/H8VJSeqcX/A9uaqo7sZv8mUxtLQkqem6C5MJLpk2CFiWf0yX1JH0J3 r0v6AyCE9ainIUaJ0SiwCnGD3oiaopLmnsBElDYz9tzPRvLakngswRS8kr0r9rqOUf6kwM CfnivNMGe3hasK9vsJU91wTpCr6SUVo= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736994776; a=rsa-sha256; cv=none; b=0WzVLvUUQ3rCkg3l4itcahexZtU5itTouCnifpFsPQj3Fh9unMUWgl7ZdfJwuae2lofG2A dwM4q4Ndm2zTjwrKdGplZD9i67s34vDDlVzGHMnQa+yOsqJREcX01p2qz0al30wb1qXr7O +sWFO0LutHw6PKffVAmDO2h3oev8uAw= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-3vhs; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 11/12] x86/mm: enable AMD translation cache extensions Date: Wed, 15 Jan 2025 21:30:34 -0500 Message-ID: <20250116023127.1531583-12-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D996E20006 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: mu3d8eyqj54iqkoo9yqhse5xzisgxfee X-HE-Tag: 1736994776-612478 X-HE-Meta: U2FsdGVkX1+37AvbB4696q+72UGiNRNmWUg+a/IeXUmMRO4KRmku3vq6W7IChdrPHld0pmh1ac0q39My5zufX5KzDlduJse8E+3flUVlWIZZnZ7GSR6R/VCeajfvvioGFIYWUi4ItfVTais0P3l5rWUbAsBnDg2hf1e3ExxfpBd2Ox4L4P8Pl7nwOpONpq001adafkMj+tJoB52Y3yQ5spGjxTpC2g8zJkRWbB8U+LaL6ulZDLmKJFdwCGnXMPYK8yMw22S01osqRigRuAb77YB01oa9qfVdKB78fRlhWC+HsYmM5hFzM/Tzh/a5za3ZWbxEQcaX6hhDaXoJz0emkJSNCYlp6w2I0D3SFFxOW46NY0UnltjV2BgWZGkd0l961DFu6AUIYutsATBGFS9foh37/K11k0e5R4bSi83A+Xk5ojc12TFYckGq3xPaDNEbNksIVZzs4CnaB5YwtG58JQVqWZcvCU8naoTpjqo6wlr/WnPsa+yzP49QX6TZ0v0bo514RQxUSUTUVzI1+bCCILL1itfGl814oJZvivJ6iQDNGyIdmVliThdETQY1Y5SULPG0ZIBsn26gHIj5vyfoz/3uVb32fi/ZixWxdirNrPH1wxfxDOOf3dtj50GydDSzBnW6N41MfCVpa/hrn04dIrCupfU+9WJMVLu2zUPvpMj06wGPS/3ZqKv2OP3w8VVZTX0D0Lp3s2oGmFbyqywMQRr1cm9NDvblUplgblSkEi+JrwjxYAwdOu3gdyWmY4rw/9594E3VIhOxvc+1DiQ0hEz4I8nD9Fbx6zTjpa9kCuKMGuaQo6AI6uir9HgBsSPqgVKE+yGQTa4mDfwKIIfQUOVPUd1oRe5IkvsnFk3qetRP+PdrnVlhAjumg4K2/MtRHk5Q7DA/qgI07wi7zkPrhzI4hCW7EjjHQf6DELzIWQpzTVhBNNj+MG5OgHvZ2Ish4mUxoSU6IOykYsXcWGI ae+TCMdz NY1r6qw6c5FrHXiwLOfg/vxPMvHamwXVQ5yYMMIj+Zo9lEc4FBY7xhMfl2CV4rg5qje7r3N4LddDBZIO+WgSs/bZigJRRo/TrqlcNqezkh2MEmPMbq0LRZRQquw3lyZ7ib6eSwig3qIcU9UUHqw3cp0P7RRq8NAS38FSwIgTI34ARqeEjtZfht/Rv5WEv3dWDVIf2ogElxP4cnsW1fUHWjmTedaLB2WGyt8gflTooZHwjHjLH/Wqn47ILUNzjSi+q0bWjvgKaQrZ5EsFBCJNotHrR8Fbo5L3cu/zJNOhgPJYBbtItMV2SlyWkX63+5SiYvl4GiPTqV91rASLTKCZOKEsEG4tdyBJJQefKsVswFq1pB3v1Y531yi8ZRI5ZD0fn2/S62BimkC/F7c7asxtsKzc7kn78JOzEIikU77QXapz81bE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With AMD TCE (translation cache extensions) only the intermediate mappings that cover the address range zapped by INVLPG / INVLPGB get invalidated, rather than all intermediate mappings getting zapped at every TLB invalidation. This can help reduce the TLB miss rate, by keeping more intermediate mappings in the cache. From the AMD manual: Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB entries. When this bit is 0, these instructions remove the target PTE from the TLB as well as all upper-level table entries that are cached in the TLB, whether or not they are associated with the target PTE. When this bit is set, these instructions will remove the target PTE and only those upper-level entries that lead to the target PTE in the page table hierarchy, leaving unrelated upper-level entries intact. Signed-off-by: Rik van Riel --- arch/x86/include/asm/msr-index.h | 2 ++ arch/x86/kernel/cpu/amd.c | 4 ++++ tools/arch/x86/include/asm/msr-index.h | 2 ++ 3 files changed, 8 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3ae84c3b8e6d..dc1c1057f26e 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -25,6 +25,7 @@ #define _EFER_SVME 12 /* Enable virtualization */ #define _EFER_LMSLE 13 /* Long Mode Segment Limit Enable */ #define _EFER_FFXSR 14 /* Enable Fast FXSAVE/FXRSTOR */ +#define _EFER_TCE 15 /* Enable Translation Cache Extensions */ #define _EFER_AUTOIBRS 21 /* Enable Automatic IBRS */ #define EFER_SCE (1<<_EFER_SCE) @@ -34,6 +35,7 @@ #define EFER_SVME (1<<_EFER_SVME) #define EFER_LMSLE (1<<_EFER_LMSLE) #define EFER_FFXSR (1<<_EFER_FFXSR) +#define EFER_TCE (1<<_EFER_TCE) #define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS) /* diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index bcf73775b4f8..21076252a491 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -1071,6 +1071,10 @@ static void init_amd(struct cpuinfo_x86 *c) /* AMD CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); + + /* Enable Translation Cache Extension */ + if (cpu_feature_enabled(X86_FEATURE_TCE)) + msr_set_bit(MSR_EFER, _EFER_TCE); } #ifdef CONFIG_X86_32 diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 3ae84c3b8e6d..dc1c1057f26e 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -25,6 +25,7 @@ #define _EFER_SVME 12 /* Enable virtualization */ #define _EFER_LMSLE 13 /* Long Mode Segment Limit Enable */ #define _EFER_FFXSR 14 /* Enable Fast FXSAVE/FXRSTOR */ +#define _EFER_TCE 15 /* Enable Translation Cache Extensions */ #define _EFER_AUTOIBRS 21 /* Enable Automatic IBRS */ #define EFER_SCE (1<<_EFER_SCE) @@ -34,6 +35,7 @@ #define EFER_SVME (1<<_EFER_SVME) #define EFER_LMSLE (1<<_EFER_LMSLE) #define EFER_FFXSR (1<<_EFER_FFXSR) +#define EFER_TCE (1<<_EFER_TCE) #define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS) /* From patchwork Thu Jan 16 02:30:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13941169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C654C02183 for ; Thu, 16 Jan 2025 02:45:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37B84280003; Wed, 15 Jan 2025 21:45:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32646280001; Wed, 15 Jan 2025 21:45:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EEB1280003; Wed, 15 Jan 2025 21:45:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0123B280001 for ; Wed, 15 Jan 2025 21:45:38 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A98D3AF03A for ; Thu, 16 Jan 2025 02:45:38 +0000 (UTC) X-FDA: 83011774356.28.6DC11CE Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf11.hostedemail.com (Postfix) with ESMTP id 2CA3E4000F for ; Thu, 16 Jan 2025 02:45:37 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736995537; a=rsa-sha256; cv=none; b=UjZww8yFQQMbJaUspCubeRQusXcZrnBzYQFMaS86khFXTgsNyyn/WxhSd1ydjuO0adhPHx Pj0gaA8Iu9Q3SOD8h9aydSnu4SF3MOxEOOp6w8InDTL/zTq/l88BpgKPwrnlx4WHzyFXQ6 V6naOh5qwEqkYNBWt10fA+f8pjgM8M0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736995537; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0o0tFfKjjLyNfvHqePJBQrBq7wmKzWDPuR0QqO5bV3g=; b=Ht0CCpiYkQFt/BGWSFb3vvSBDDOin67PGDMHeRUlHJdb25hFGYJ+hxCinr2G27cuj9nXgg zgRS5vzfZZohUY45raAnMSHwWJ/Yx4uLTWMOnVhbvNIlq+rd6rs/9d/xVmJDtAiq0o3uGE A1t97+/Hc8rM4M7Y3z6TevEN9/9t5rI= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tYFfO-000000003nd-41DF; Wed, 15 Jan 2025 21:31:30 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, bp@alien8.de, peterz@infradead.org, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Rik van Riel Subject: [PATCH v5 12/12] x86/mm: only invalidate final translations with INVLPGB Date: Wed, 15 Jan 2025 21:30:35 -0500 Message-ID: <20250116023127.1531583-13-riel@surriel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250116023127.1531583-1-riel@surriel.com> References: <20250116023127.1531583-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2CA3E4000F X-Stat-Signature: 51xro7ojw95ynw1gr4zhq8qiazbxauyc X-HE-Tag: 1736995537-42331 X-HE-Meta: U2FsdGVkX1/BQfaptHvNzHf3vYIutFyBXNgdryBirm+2b4+CaGYTVlLwvV08vULEN4GJoJPTgGBRqRXsUls93RpH2HIKdV4WlQbCa7ncGAHpLdZtYP7/e5NF73h/VP0jctQJGQ+ZZBVuLWLsoqAdzwrQutFgSPWtWMTeSA6YuVOaNkUvJQFr6xoP813oNfYEI0WD3Gdgdo7BmdgB6dblHxcTg8j1SPc/q5MoaaPq1t1bLekSsaU6ozvxGswS1f4M8vQSR8FcJ8pAxjaV/jW0BjHieRb91USZQaEEDOvd0NSTRt9yK7sxrOF9sSWEkoUxKShLuhGNJ06vE2r6grnJ76wGKDhP5LAzYdVrqeEVENMhVJeZgn/7Q9KaRj4rB+zhv6XDaO7WgghV35ExWfuHaB2OR9DGAmu0JQ5iVl01OCVALJMzd1hSh7jNT9RKp6/s59mQeisIVg6Iy+v8b/OwRwcLmOXnakuTINF8rIBkDKoj2SPBQVzt86bppEDVGa+edLykh8QQNwyHUHGZIguRjJEokqSco47uGMqBHLrE/0IgWnUK/gJEIXbQfPpn12tieD4ZTNVsLQ3LSgtH0BfzlXzj3lzF9zQcl/6yikirmIEnoBapvWY+ZmdpzM9PzRXrEE59K/UT+0SexfbJKhuYyqzfKWdOjOmuQW08f0G/AtX4M2cNht0lqDLHy9hn37wHS2HDd7HLqINI3fVYYWuxcTWVCXBBxDgIjrs2luOdzVJyYc+LGYDrzxKSU0V5P6Wx2EZXyNPmGA9Ztw0nKCtXz9+77PGZ29EL2v/W/PBodGg2UswLtIO5d48tyJYi6l1NHMnvrlTQ2WDRqBcVlcemdPc4rGqzOGzefOTlfmIXH9PTgW6+gqI6UD3F2hWEXkpZp+HbR2maD8PGEul1Ku1Id6XY9u2fMGusisMTja1WpmfN18SQgGZ4KT2fLPk1mq0X2iiG4PzK/cX4VmPYm4C XUw3CiUo Q+xs0Wmi+qhD+Hupx3Br3fDw2iGJBh5NwPBCsnHz7x6KqHdpDSoob9Ag6Y7Gtk84idn39MllwibIRf12K2TcYP3MHmP4wn5WRQal0L11aZ09So23B95IN3wly6SChufuKPac2I4zpxXQ0UHJEAdCNBWB1sJg2EKJseFSpZHubrnamA8leprLQ0/ddEiRKtq++w8YZIxTzPodc1wYdalhLcmzmB8oQ7t6DuR90tw7JCNZZf4uOjDJqZUo6TkyhRbD8eID9W31t4r2uvVvPpQxKanPX79ua8QUCXLAE9f+72on4QuCmMmLl2Fu1TW+gZuX+e2OPaAWMHJtQOv/8zxGpe81xYmQJ/l4FPtOO4PIdPej5xEWFIrmxct7VzcgsESuSzi5YpP34Gr4nAoQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Use the INVLPGB_FINAL_ONLY flag when invalidating mappings with INVPLGB. This way only leaf mappings get removed from the TLB, leaving intermediate translations cached. On the (rare) occasions where we free page tables we do a full flush, ensuring intermediate translations get flushed from the TLB. Signed-off-by: Rik van Riel --- arch/x86/include/asm/invlpgb.h | 10 ++++++++-- arch/x86/mm/tlb.c | 8 ++++---- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/invlpgb.h b/arch/x86/include/asm/invlpgb.h index 4dfd09e65fa6..418402535319 100644 --- a/arch/x86/include/asm/invlpgb.h +++ b/arch/x86/include/asm/invlpgb.h @@ -63,9 +63,15 @@ static inline void invlpgb_flush_user(unsigned long pcid, static inline void invlpgb_flush_user_nr_nosync(unsigned long pcid, unsigned long addr, u16 nr, - bool pmd_stride) + bool pmd_stride, + bool freed_tables) { - __invlpgb(0, pcid, addr, nr - 1, pmd_stride, INVLPGB_PCID | INVLPGB_VA); + unsigned long flags = INVLPGB_PCID | INVLPGB_VA; + + if (!freed_tables) + flags |= INVLPGB_FINAL_ONLY; + + __invlpgb(0, pcid, addr, nr - 1, pmd_stride, flags); } /* Flush all mappings for a given PCID, not including globals. */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 81f847c94321..82fed9fd5ac2 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -538,10 +538,10 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info) nr = min(maxnr, (info->end - addr) >> info->stride_shift); nr = max(nr, 1); - invlpgb_flush_user_nr_nosync(kern_pcid(asid), addr, nr, pmd); + invlpgb_flush_user_nr_nosync(kern_pcid(asid), addr, nr, pmd, info->freed_tables); /* Do any CPUs supporting INVLPGB need PTI? */ if (static_cpu_has(X86_FEATURE_PTI)) - invlpgb_flush_user_nr_nosync(user_pcid(asid), addr, nr, pmd); + invlpgb_flush_user_nr_nosync(user_pcid(asid), addr, nr, pmd, info->freed_tables); } finish_asid_transition(info); @@ -1700,10 +1700,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, batch->used_invlpgb = true; migrate_disable(); } - invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false); + invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false, false); /* Do any CPUs supporting INVLPGB need PTI? */ if (static_cpu_has(X86_FEATURE_PTI)) - invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false); + invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false, false); /* * Some CPUs might still be using a local ASID for this