From patchwork Thu Oct 31 08:13:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13857709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90B2EE68944 for ; Thu, 31 Oct 2024 08:14:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F8906B00A0; Thu, 31 Oct 2024 04:14:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A93A6B00A1; Thu, 31 Oct 2024 04:14:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04A316B00A2; Thu, 31 Oct 2024 04:14:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DC23E6B00A0 for ; Thu, 31 Oct 2024 04:14:47 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 99B1D1C7185 for ; Thu, 31 Oct 2024 08:14:47 +0000 (UTC) X-FDA: 82733184408.20.8075214 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf19.hostedemail.com (Postfix) with ESMTP id 586C21A0004 for ; Thu, 31 Oct 2024 08:14:12 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jx6qXOOZ; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730362430; a=rsa-sha256; cv=none; b=UBZm1/rMONqQZnMdjsJWG36QFopnmzNYTbJlFexl8bSx3QltOXpRsYchA+/PwPghnWvsKs wbv6iw00xLQWm6anYWFWCInyGxP/J6nDLdHqXxNVFzhqkM/0NyCPa4K7NONZq/broZ9SqP c3ELlGX8nheRijvfVsDx+KJ+LtiI9oY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jx6qXOOZ; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730362430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=96N82BICgFqmbQqgF/vu6V3zKuLzfmtjKmdmS7DODoY=; b=k+0O1T1ubfcUj3LC/61rPnmIHW1OAmyO8YYpc8MQNykh63xRJ61EKMd2S40TcuRCcRWrzV 9W2WqpW8FyIY8Vz95oJpi37luhTC3oBI7P6spQcNA8IajqicM5xBmzWviP7fctJx9DOpmB Ct/opFOuPWO5U6RAstZCalfz8+WOd8E= Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-2e2a97c2681so486529a91.2 for ; Thu, 31 Oct 2024 01:14:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1730362484; x=1730967284; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=96N82BICgFqmbQqgF/vu6V3zKuLzfmtjKmdmS7DODoY=; b=jx6qXOOZap74sYIm0iOmlxxyTh0kv1cpuBES1fghMbUhiD4VYFEjSZgJVvxiM9kZcr SXGNeAc05hvboLpXouO2UCTbcts/Rw4zCWf5CjZS/s8kgfQrxR52KsWZA6/+U30cQQTz 9ni0wEbmvIPMg1FmWWYwvBms8eb8qHb5fdtVWdFEYqoNfbNYBZXaDg5uobKNAKfxJgIj GBNVeMz+gb2R1GJafLsDj6K+cIv3pCkKcHPPxoUNVTydXmaCTOuBzOZ7x9nzxfg1tZpJ IJWwdL3K4yLGnEuYD/KS930IweI1phzEhxlLl1DpFWxpGVQwr5Q61S3YdSsn6pr4ABvJ NCpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730362484; x=1730967284; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=96N82BICgFqmbQqgF/vu6V3zKuLzfmtjKmdmS7DODoY=; b=CavVAOgmG3FWQ9DC177Os2mCHkuchr4N3TyBKSI/aD+MOZeWAb6b8/NEjJwI4a2w9l Q48E/95Vckryq6jkUQflhlW0cIzYvzychzxflC6pai+GHM6fAiivFKPXuHIhXInct8GF OoDZsqDtd9dGKzX3GRJVYDOogbYdV8mp76YNIWqt+UpX4GzIqCYpT979QUx1bC6kddIN Xqm4mjsYv3PFRiWAXgR2QG08czvlK0eTvw0VNZP1g2SYbRD5a0tOH53wAXECeAIdAyzJ B0aaZHc3lOHXL7Rs25V0Dq89bkGH4zfPxEzAaIfhxmtBQPYpmG0MHSzVJJJVvepjRbxI flPA== X-Gm-Message-State: AOJu0YxCL8NBgrGBik8iXbe9nqCS/Nyhra8ohPgfapBdlGYgxYqkMtmp 51fmHEI/mV0DXv8YvziX6U4pfTV06XOfWtp5K0NjnLSWEEstxhRiknIyjbi0kmk= X-Google-Smtp-Source: AGHT+IHc3twInabx1E+TNdhn783JHqCF2GYMwoTNrA61R6GZcT2sSHxVlBm8Eg/JbQvoGhNtrFU0aA== X-Received: by 2002:a17:90b:3b4b:b0:2d8:8509:85cd with SMTP id 98e67ed59e1d1-2e92cf687efmr5271210a91.40.1730362484634; Thu, 31 Oct 2024 01:14:44 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.149]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e9201b2586sm5163996a91.0.2024.10.31.01.14.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Oct 2024 01:14:44 -0700 (PDT) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com, peterx@redhat.com, catalin.marinas@arm.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, Qi Zheng Subject: [PATCH v2 6/7] x86: mm: free page table pages by RCU instead of semi RCU Date: Thu, 31 Oct 2024 16:13:22 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 586C21A0004 X-Rspamd-Server: rspam01 X-Stat-Signature: hyoa4gi7httbnufzxz5gtnc9tjm98qy3 X-HE-Tag: 1730362452-511237 X-HE-Meta: U2FsdGVkX1+Wx0C+lXusLzYZMuk9S0ZmOQJxCWtPMUYZH+OsnqR06K3LMNAnd3lpFkTOq23w/9rUrGU+WwK+0s8Foea4fR51YzSrIDfGiFoOOs8YAwxjHvC5bPuDxxvlglCc1KlJuDEGcgQa5S/1Jmcsp0JrRQhhjGhJmUVtAX891kLKcDZStkSr27QZvMe36qHGbcKDqiAsFWtdv/L5LpXbKC1AtPzsA6Dvlht4PutKYdnBu9NaoTBKxovjR8t1KkxF2T3syOTegSDa7GSi9HoJWvPzzduTefEObgBigbU/azBgJnKdStMxzukTmoWhPLbuKUT6iY00F0/rY6gikbaGEFYA4Ox1ItJaTtSe9VJOFsw5jVr3cQ+KJ8DR6sj92VbjaocE5+HLBOALC50zTrzxpCc1Uf4XuTcBEuSQz7JuqYjJPYUXOibh1pSoBLtCOtp9s8AUKzerDPqJVosPhZex6W5Mnwv4lN32z3ytL/Hu6rWOcAduEcwyELjDFmAu65BPPW2GPqjzg8/WKhzW3J1iZAF5+rp5HXBH6uBA2VqHcG2b8QYYAj/SCGsupAMKzIN3KuVWPZDq6oUK8NqXh3pZzZxT4qputSy3GCPb/Pxl3/+VOFq1sGwX/MumoyXu5o1ZKAdMAijH7aoT8hUkAUVu48pimlweirc+QyoXQnP8MNkKlY7miP48SihVc0VcMqlsdDcNjh3eyvvt0Xg+HhpiRpMobkSeDpQ1MzlHCFLT08I4I3C1td7OyIGda9OjYV8n+2uJseYb3FX3UAj/amNzvJdQhA5YJ4oqbKHXXlU+IBuzNiXlDb3bLJlcY7A9q380cl4SjAbkxoIyDzFTGJMiUp+aIpfSMb9Man10bxGByknlgQawPHZE7UlZwEqQT3K7iy/iG/JbYa6zstA5jcPP/VQqvmeOQcmYBJRsLKp+MnmcYVnxV6nQIVCg+O/Jna+3yQWjdqn/vWqEFKz KxxGtEta 2Y0C0w9qDKxRtVEhQPIFssapOVokYLCrYKwPkdB4Aug1CDUpzo0kTRebzvV8K18AQ3HKBDxb+ZOWug1u0AzIHAe8johN2I/GynsLceaZGu5X7OcOB0+GvI2AjHEX8Qil/cK00g8nYXHCW4Lj5U6NIt6Gp4FmNIhm451BZE8Q2Bwv02t0zepW4XNMxnLizmEYwh/40zHPzBaWolmwgIeWy4HWOTwpPvOZ9cHBUe1OV9SUOdVmhs+9GbQVuXPwJOcEI1hfzNbdmneqjyQOkZ8S03OHIf5hPDoWNuNxSFQ85hponG7W1pZhiZPzCvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, we will do zap_deposited_table() before freeing the PMD page, so it is also safe. Signed-off-by: Qi Zheng --- arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 10 +++++++++- mm/mmu_gather.c | 9 ++++++++- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257b..e223b53a8b190 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } +#ifdef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + free_page_and_swap_cache(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page = table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#define __tlb_remove_table_one __tlb_remove_table_one +#endif /* CONFIG_PT_RECLAIM */ + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index fec3815335558..89688921ea62e 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -59,10 +59,17 @@ void __init native_pv_lock_init(void) static_branch_enable(&virt_spin_lock_key); } +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241c..69a357b15974a 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask); #endif #ifndef CONFIG_PARAVIRT +#ifndef CONFIG_PT_RECLAIM static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } -#endif +#else +static inline +void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif /* !CONFIG_PT_RECLAIM */ +#endif /* !CONFIG_PARAVIRT */ gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM; diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0f..d948479ca09e6 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -311,10 +311,17 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) } } +#ifndef __tlb_remove_table_one +static inline void __tlb_remove_table_one(void *table) +{ + __tlb_remove_table(table); +} +#endif + static void tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); - __tlb_remove_table(table); + __tlb_remove_table_one(table); } static void tlb_table_flush(struct mmu_gather *tlb)