From patchwork Thu Nov 14 06:59:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13874617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 401CFD65C52 for ; Thu, 14 Nov 2024 07:01:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C51A66B00A1; Thu, 14 Nov 2024 02:01:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDE596B00A2; Thu, 14 Nov 2024 02:01:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A078A6B00A3; Thu, 14 Nov 2024 02:01:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7BE886B00A1 for ; Thu, 14 Nov 2024 02:01:33 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2CD0BA0EBC for ; Thu, 14 Nov 2024 07:01:33 +0000 (UTC) X-FDA: 82783804446.14.FB589D3 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 7BE671C043A for ; Thu, 14 Nov 2024 07:01:11 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TWJGOCv5; spf=pass (imf18.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731567496; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EIspmjr7fEoQ+01q6Xqvp9udSZQrZ5Y5COB53v8W124=; b=ddvfJzTzXwNUh8zI6pOLHBb3XDHRH7n1HIMmmss5+8IQQ4vZBuWvLF7LsJ/BldOX4boje6 adrL25sMSf4KHgNxwVKPkjC+I2UCp5eJSNG8WluA4Yh7uMpXTXESYX668GLJ0fIWJPUzaz THwkL6qS4UZ1vQzSHhkAEJD1UO6fZJY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TWJGOCv5; spf=pass (imf18.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731567496; a=rsa-sha256; cv=none; b=D3JHRuIfmFAGIh1SpNGmFfZck6Cb5mBmjFKv6jUUke1Y3Aw2UhwxQy9evfOpuhFmapUewX /CXJipeQNzEgAPRMo41rrI4Fmcm1gqS6B3vPTVTDsGa38PRuJlieLo/mPG0vzkL0nG2313 Z7dsFcLhwBWDFbEMwdjEJ8NqS0BiRaw= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-20cceb8d8b4so1457585ad.1 for ; Wed, 13 Nov 2024 23:01:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1731567690; x=1732172490; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EIspmjr7fEoQ+01q6Xqvp9udSZQrZ5Y5COB53v8W124=; b=TWJGOCv54824T8vbDFrpO14eP9LTcGyn/scH71adqczsTM1VG8m0KGJapYSJ9LxITB /vGmEAFN0KMJuUElC1Yc09+QToCCrzOyeLLAjzcOz6bO8XGkxvMXVndBo85St4EyMviv GbcsasotQ+TfRUwd4xCPOR97dCZgndXHTC8Htsmed5XM9X2qb+RqwFSTzhVJ3KCDTyGr CsginC9Id5WB5NbiWcxupa0VpFbyxKj8XyQMEdbuuuup6ReStzmVxySW3d0PWbBz0KSR FgZr1TTUNrrA/qf+jLAgpMQKKa/QnXie82o0DEZQsEVGNUNje/EEWn6NL1akYMRUiILf j4ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731567690; x=1732172490; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EIspmjr7fEoQ+01q6Xqvp9udSZQrZ5Y5COB53v8W124=; b=Rb3hHuzFjsDQCEz6rAb+o7OnejPWANed03so4XBgzt6OzYoNp3tUm50dBAQN+LMjbP kNYtQQbLYFkMFOu7ZWPhaTuu+q1yc5Jd8Mjw+Gk+UtK5g16BWzJR+eUbPdMmkDDX7NFO DOkUTzU4WRSb0XSjWX8pg6clS0bZi0i+5z8uipbmPtdLjtzUKRRdz2WkMr5ve0tJV3FD FYkO5V57RU8Z9QCJemddh1Tc8IhEQZDfCS6i6MA7amk5cRBPfOWRb3F9Tr1/gbkenjTk u+M7KAtt98SkMzO0j5ubLKeF8NKwKOzb5WtAzYvfWhAJSTvztc/9M7uS4v3IqJBvTupy 9fng== X-Forwarded-Encrypted: i=1; AJvYcCUT+zjvshImBtj+SQ1YpsxNN/2Vtjs0lQiCAzKNKC9dpq2rWfe09geERqbkkqe/vtNygQr8apPQ3w==@kvack.org X-Gm-Message-State: AOJu0YwfNopFlqgI4e/BthC6fTV9o6VyiAiedpwqe00Uo77cOvAdrE/M Ksj4AEWd0lBl4/70a19GP2js2XzRF0sizOBf877hXFu1xCEAhYG684HHpF1EokY= X-Google-Smtp-Source: AGHT+IGZZsMk4C1Gc1YiyqnQ/Vm69/2CldCbtvO7gKrKwJI6uPz/y6t4qr23UntpDUqpfJ9OLSSGjA== X-Received: by 2002:a17:902:cf11:b0:211:2fb2:6a6 with SMTP id d9443c01a7336-211c0fa7399mr34336085ad.24.1731567689701; Wed, 13 Nov 2024 23:01:29 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211c7d389c2sm4119065ad.268.2024.11.13.23.01.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 23:01:28 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, peterx@redhat.com Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zokeefe@google.com, rientjes@google.com, Qi Zheng Subject: [PATCH v3 8/9] x86: mm: free page table pages by RCU instead of semi RCU Date: Thu, 14 Nov 2024 14:59:59 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7BE671C043A X-Stat-Signature: jykiuq6gh4dk3wyehirp4tgy65w9gmu7 X-Rspam-User: X-HE-Tag: 1731567671-567985 X-HE-Meta: U2FsdGVkX18sRc0Clv8//0qdw8EoOj6zO1t+bZYOFo/hgApoYDTk8AjxyNDLRWhAJyryaV1msBN/7Ggj0MRRRzOuSXqJ4HOUaRjKNrp/mPtXiG5JgfCaYe7aY0uJVJ/nsVqtDI94pArI3OKxeRCmptYTDSmKX/LqAqTW/5C5nHYl2ALc7HDEOtc4mHHHcZMNZ/IhXDtOKhWjChtykRBj1RnN9CWUCAxbeoX/qCCQouJxNGBatd0vMTp1Mb2ZRAu/dRTTUUJ2WB0ON4b03IuaFPyesa+KM0N6ZHPX5Xd48Q7CRrCTu9RjfSPE9q72B1HD90kmQQnB5H01FJXzT1QN0wzUPqb6VG3kx5J3Hq5FBp8HoohtLlTgNUgwwlncadPu3mYbG1nobvdBjO9kZw656VorMQI2d0UubZD/1rWPW0ki9pqGtM8VSe1o0IyM/LOWS/QIifjsB0iNUVejNgCK3V5DlOgdviCLIt495sc9pUFnoDfdVeafSLGu8pfL/SN8hiI797NVBAEhoBD7vdfFFjkUR8jS/8ETDH9f9rpcEvBWAIiHG+r94omTrM1Fs8csvpaGL6FhzQ/5HGfgyS6dTlOQCIknN8DkRyX/ijHsfsg1p9d0eJAzYFl7I2mNviMyVTcsrbrfHQXB/guj2p+TPlzX+P4LCq9hfJ+yZ/rOK4JTpkJt8ZhH14Rb6ww3NhWqf8CJvRYgqvx2QXgElS+FNosAkX2WMIVfNL10JgTnD+Q6y9FN5DDes9eBP4+OVcgjo01K4WEbpkXgx5eGHDy3qNi8HO8ZMo+I2BQq3guRyUGu0c+ig7zguzUShq5Jqv1MY0to8l8PQFKuA2t2JsQA1FlpJVR4Vz1QsNVAiSsyiN1opSUMGrvuvOocA+TvDyc96B0tsmN7bGkSDkXkHmA9q0kmdO8lCBG4ib8gV3qKIx1xzO1nW/L7IOs/cbWo3DgyU32ARwyqv0YPe8cszWf ZBNB9V20 Edcj7DYVAQ7XOqd5aRoXgD6/bJ68KZJYH3VV+ZfREi6LrGKbllGjcG1dv71qiEqxdNCipvHjZ8BHMX4Xf/kuylKWcHp9pjOn9cxHWDbpqHxz4pWC7C1yauCZOaI9VSOUsObt0J6uoUOwc2Q80JMTat/JuDTwEo6rmL/5AN32Zv3O0IvDfNqoljoxRPr3JtuUPM0AMfNJIUOrLa4qvB+ZXFp8dAXrgOuqevEw8+GZb2EvX7ivzEwdZtK4tHyuNP7Yqk8zzYjRlQ5P/XWuhowE8cjBNcHFlaTQ+FDxhZ2jb4J4fiRG0NBB601uPxPwWCIfYIUdbtIjY8dYuwZmmaayd2HbtNVvIjmxnHjZ0Y0dTLLvW3E3SIb2gWBVym1Qzeec6AYD4Gj6X6M3KHcXSoJ7Wi1u8Nz8Vwlb5P2pF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, it is used for THPs, so it is safe. After applying this patch, if CONFIG_PT_RECLAIM is enabled, the function call of free_pte() is as follows: free_pte pte_free_tlb __pte_free_tlb ___pte_free_tlb paravirt_tlb_remove_table tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM] [no-free-memory slowpath:] tlb_table_invalidate tlb_remove_table_one __tlb_remove_table_one [frees via RCU] [fastpath:] tlb_table_flush tlb_remove_table_free [frees via RCU] native_tlb_remove_table [CONFIG_PARAVIRT on native] tlb_remove_table [see above] Signed-off-by: Qi Zheng Cc: x86@kernel.org Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra --- arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 10 +++++++++- include/linux/mm_types.h | 4 +++- mm/mmu_gather.c | 9 ++++++++- 5 files changed, 46 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257b..d134ecf1ada06 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } +#ifdef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + put_page(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page = table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#define __tlb_remove_table_one __tlb_remove_table_one +#endif /* CONFIG_PT_RECLAIM */ + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index fec3815335558..89688921ea62e 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -59,10 +59,17 @@ void __init native_pv_lock_init(void) static_branch_enable(&virt_spin_lock_key); } +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241c..69a357b15974a 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); #endif #ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PT_RECLAIM static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } -#endif +#else +static inline +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif /* !CONFIG_PT_RECLAIM */ +#endif /* !CONFIG_PARAVIRT */ gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 97e2f4fe1d6c4..266f53b2bb497 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -438,7 +438,9 @@ FOLIO_MATCH(compound_head, _head_2a); * struct ptdesc - Memory descriptor for page tables. * @__page_flags: Same as page flags. Powerpc only. * @pt_rcu_head: For freeing page table pages. - * @pt_list: List of used page tables. Used for s390 and x86. + * @pt_list: List of used page tables. Used for s390 gmap shadow pages + * (which are not linked into the user page tables) and x86 + * pgds. * @_pt_pad_1: Padding that aliases with page's compound head. * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs. * @__page_mapping: Aliases with page->mapping. Unused for page tables. diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0f..1e21022bcf339 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -311,11 +311,18 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) } } -static void tlb_remove_table_one(void *table) +#ifndef __tlb_remove_table_one +static inline void __tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); __tlb_remove_table(table); } +#endif + +static void tlb_remove_table_one(void *table) +{ + __tlb_remove_table_one(table); +} static void tlb_table_flush(struct mmu_gather *tlb) {