From patchwork Mon Aug 5 12:55:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13753587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BC7CC3DA4A for ; Mon, 5 Aug 2024 12:56:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F8DF6B0085; Mon, 5 Aug 2024 08:56:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 182816B009D; Mon, 5 Aug 2024 08:56:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F16EB6B00A0; Mon, 5 Aug 2024 08:56:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D2AC76B009D for ; Mon, 5 Aug 2024 08:56:13 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 73F6980251 for ; Mon, 5 Aug 2024 12:56:13 +0000 (UTC) X-FDA: 82418189826.05.A958276 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf06.hostedemail.com (Postfix) with ESMTP id 8C624180014 for ; Mon, 5 Aug 2024 12:56:11 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="dtvz/8BM"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf06.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722862503; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zvOCNxNaR9g68beFCkVQN7ot2Umf+8WTdO0qJ5nAYzo=; b=g5K1A3/M6sPWaURMQ3gzW+UCogjqc3iegFvbQlWXcaPsBA+EOKnUTSm/k7rImTtic3pGLM /mBaIqBJIh7qkiGdVii0iZB0dikSkEe7GRl0Ohx06xxFl6BhNYgZM+p4z1oAoExD1t2++S wfeYivb8H6rTdVEuJVu5iaFeS6+RL1c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722862503; a=rsa-sha256; cv=none; b=CPyurOoJwYA84RZAc1h3FuN4t07jy+cVI3Sor/ke4rNeBZF3owEyBHYYsr92/h6pIBNkHP gTIZdYgjkudPby9qjyklgVvxODBQ0DRgmVAcSZB/pvhM27n/921rGvNndkURRzCgP/vU4E t6q9xp0VdMZzSZlQZwINQZUBmxkY8Kg= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="dtvz/8BM"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf06.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1fc53f91ed3so6090105ad.0 for ; Mon, 05 Aug 2024 05:56:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862570; x=1723467370; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zvOCNxNaR9g68beFCkVQN7ot2Umf+8WTdO0qJ5nAYzo=; b=dtvz/8BMSvC3B9IbWcH5MVLmB8IObLBdqvU1+2X7O1KPuFEEjLVRVuSnINAcLtpo0q 1VvjonEXem7DB7QH1NJUQGyjmQqE2CJ7aOfFxU77q3PZfT8u5YS4L04ZmpV+1coHHmg4 CyGaWxW/8LyFmf2/EfFzZcPRGwfrXgIrFTwiu+e1f3RGFliQERopNTrDS3ySC6jU6PK9 7D26EJVQtIPewrFE0qDPcq77+A7IdyMEiDn2AAB702unbpVpiCGeKW6IQiXU71CQYAmS DS3OLeEppPA0GAHoRrH82YqIOcIYy1kqZpgpfBqHYnmmXX4+ZWXJn3WogBn25rZAuBTY 2rcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862570; x=1723467370; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zvOCNxNaR9g68beFCkVQN7ot2Umf+8WTdO0qJ5nAYzo=; b=ChIR6pDYaxf0TN3sBbOU7iwaCcFXk7OYX2Pwk93G2xpIOjVBpzdp0CsFi7Tm1ttMh6 WTkydZwpfORXCPb9Qu7vTq2HKn1iCmBFOKePsrZHrzIesNH+/3K9gOVM3dx2iPoKFz3p v0+dMK5b0TmUEB3ukvqxCdbBQ4MhktBbAHZ9c0+L434o3+GRFqHpc92ZYIj9+ZVYrHib K+kPDd0ArT5T/nuFRrzqPXyaOji2m7j+YNGFNkjRyvfRLsuGHXOyPPZzEtJZL5V6IHH+ TLM98x9CqEppDzWj0WDpgRiYerjqiE5+Jf30ZlqEppwRZcitib5je6ct/x5NSnRV3etx ooQA== X-Gm-Message-State: AOJu0YzFa3Go80evRBdzVZHav/UPYoWQ8iSqH6kN4fbLaZ6gzEZC6X+5 ZIZu/bWn/xTSgnwMPrTQzPxHjuLIjbqI6AjkCmvikCZsvSRhkDl7O7lGkr+AuUc= X-Google-Smtp-Source: AGHT+IGEkzD+QFj3cmoP4z9fngv/zsFb+TuS1tTAyot/td9F6PCtAysO8/m5eTxJBHfpd8BN64UXhw== X-Received: by 2002:a05:6a21:6da8:b0:1c4:c007:51b7 with SMTP id adf61e73a8af0-1c69965d0bfmr10731013637.6.1722862570294; Mon, 05 Aug 2024 05:56:10 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.56.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:56:10 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 5/7] x86: mm: free page table pages by RCU instead of semi RCU Date: Mon, 5 Aug 2024 20:55:09 +0800 Message-Id: <9a3deedc55947030db20a5ef8aca7b2741df2d9d.1722861064.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8C624180014 X-Stat-Signature: k6n6w9t17ogq5r67zzj38w5knyxnnozb X-Rspam-User: X-HE-Tag: 1722862571-745749 X-HE-Meta: U2FsdGVkX1/iHjCoP8nC9e3QkLMQGSuW15u1f8I94Ha3qwWwgqqRamkF/ZfsqB4di98n7A3OpAXGZMD7SI/KzdSXTv0A501QDR2PAp8QkyV4ts9hDJ+r+nKEXnbupplKFEkI2jhRrRdYnMZrxFPwbl722FP0LwRmj4BdJ9LSB2vIzPipIGLjB2pygWrir+s8fXkcKYiTvmi9IfcFyslVAkcIloEe9R80/gU87dBhpdBi50NFwOAXYNBB73jX8ulDUmYXUVZYHolSueMihakbulE5P+wFouQSTPoUwicfOA+0vYWBSpJ8u53nhsIZAIiM5mwqxAXOD6K3S16oXqruGJiS8aReRHSw7VWtQWgLGpEm7Ln/XKOnYQGAXxj8KvLgZ8n2iIU/KFqiripCv0jhvMYMRIHXU4/KAwYHiUqg9dR0bFgg2bZvy4hMf1UhbDTKJbs8ar0fdr33cKlTJ5tKzHgm9lGm6T+yzHGtCVQbE8dgHR8AORKin0eWIYcwrMS4tft8536w0XJAD7fM3U4khsM3cSW5Vg5NDDXAf5RiW+pZFO87p+VNLbY42fUOraD1CYzY6ei7Kd/+gLE7qrrrVm7zWoAmLCzmlScVglKG11QXnJ2ZewtYdt31+D/YKJqANKvqnnOQu3wQ9yMoFdGHp+GGuKfjQszf/MNAr7vEAgyH0Rn4gkMQXIVHXLxUh0mKvVDSIVKZ4QyYf3vlR8j+LDMaHyQH6Myd8k/29UDag8QwOrJwyxzGKz2+cbDkro2WTMMt0T3LUzQbCQvqf31P47cL1Loiul+4iKl6PmU6CI6+kSC3q8H2mv4g1fOXy951buuzWuZG4hNDz7iyeNgA0URCPRpYu+WbV1gVWuq9vqLs4CQVzBT74TiC8OeWjzs3E+FrkrlqbbpYdFb8s4wOEum+IQow7j+SeZqwj4VTpg4PivRsz+W4KhQSbuuDjK2XYlDOYWQPd1HSCrmqv7s 0Xp58hUG LXu4D21VMu2tvjvfJxik3OH6LFqY8jCZwsZBrQ632AvPnLSxlKWiRAT/XckB+oBbIobXmggT/KTHvtdRTL5I0aBBfT91S3XNxwp8ULU8zRm6GROYmH5uoLYs+1JRZdgUxg7DXCnK1p7y3uHfCPJYa8dExUoeLhvbPA/kZfYs82J6pzO0eIf+Xh2HQO6xLeUKAQnYJkRkSLxTdbnbaqHC/QfGMmaR+LljB4RPFwO8Qu3NyRYXYs6S/bo3q/HAUwMNkgAEjfJWdnxrZI4CRk2ES/gmnlUdYU8vUneaTKNK1ea1lEluhVvTsHZGYq2IqE5Xh2Eto X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, we will do zap_deposited_table() before freeing the PMD page, so it is also safe. Signed-off-by: Qi Zheng --- arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 10 +++++++++- mm/mmu_gather.c | 9 ++++++++- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257b..e223b53a8b190 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } +#ifdef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + free_page_and_swap_cache(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page = table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#define __tlb_remove_table_one __tlb_remove_table_one +#endif /* CONFIG_PT_RECLAIM */ + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5358d43886adc..199b9a3813b4a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -60,10 +60,17 @@ void __init native_pv_lock_init(void) static_branch_disable(&virt_spin_lock_key); } +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index f5931499c2d6b..ea8522289c93d 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); #endif #ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PT_RECLAIM static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } -#endif +#else +static inline +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif /* !CONFIG_PT_RECLAIM */ +#endif /* !CONFIG_PARAVIRT */ gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0f..d948479ca09e6 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -311,10 +311,17 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) } } +#ifndef __tlb_remove_table_one +static inline void __tlb_remove_table_one(void *table) +{ + __tlb_remove_table(table); +} +#endif + static void tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); - __tlb_remove_table(table); + __tlb_remove_table_one(table); } static void tlb_table_flush(struct mmu_gather *tlb)