From patchwork Thu Feb 20 05:20:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A198C021AD for ; Thu, 20 Feb 2025 05:21:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F124F280152; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9A0B28013A; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D14DB280152; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AA3D928013A for ; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5C77A120B42 for ; Thu, 20 Feb 2025 05:20:49 +0000 (UTC) X-FDA: 83139173418.28.5901CD1 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf07.hostedemail.com (Postfix) with ESMTP id 8789E40002 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=Pe9lRznUK0e58WeuUTkd8QcmDM9shuTVgn2f+WFrV4k=; b=SQqPEYUFamlrPC6oE1keTX+D63lmxOwBuui7m2Z3sojUnkyU0UhfD8sZ8lgfgNi1hNsTMa MfJBx0KlPFnSx1EmULmHLDjoeTfxduDH8DOnIGNImyjQaf9JWaMc41oMr6uTFxdVwGfPzr x+NpMYFli6+967DBTZWqpGXtps0qS68= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028847; a=rsa-sha256; cv=none; b=xkAzGophfAeZgnX7CJIaqJjHnZJMxk75l+nIBIfSss5icFTXtrter320NaP1fgsisZ35gs e9ndK6EBTSZOMpOWB7KfJHU//empusSVJAubcknDtR8GhtFdZqfYvJbH9Knlnsy+kJrYnX qPFcWTGDF2y6IUW/YlSMwBPXv3HCKYA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-12-67b6bba718ad From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 20/26] mm, fs: skip tlb flushes for luf'd filemap that already has been done Date: Thu, 20 Feb 2025 14:20:21 +0900 Message-Id: <20250220052027.58847-21-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDZp3CFrMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M pQ8WMBdcCqg4uG8XWwPjWYcuRk4OCQETiRNP3zPC2NN3XGUBsdkE1CVu3PjJDGKLCJhJHGz9 ww5iMwvcZZI40M/WxcjBISyQInHgnQZImEVAVeJn4zEmEJsXqLx31TUmiJHyEqs3HAAbwwkU /zGjlw3EFhIwlXi34BJQDRdQzWc2ieOfnrFCNEhKHFxxg2UCI+8CRoZVjEKZeWW5iZk5JnoZ lXmZFXrJ+bmbGIGBv6z2T/QOxk8Xgg8xCnAwKvHwzmjdli7EmlhWXJl7iFGCg1lJhLetfku6 EG9KYmVValF+fFFpTmrxIUZpDhYlcV6jb+UpQgLpiSWp2ampBalFMFkmDk6pBsbci8EnElQl DrVo5kx1c7znJDjZ69em24H5IUe2TjyfVqzNoRQUsO3uk65EXcGqxWFWphq/r24SXFta/TWy eSrDwUOn/u+/EmHZvIRRacoZf73l8k0njgot+vV6vx1T0v4Kn9ky329dmBq9+Ijylr4/nEZX lj+fNYNR0ER60e6c4HjnyPrctNtKLMUZiYZazEXFiQAFu6/BeAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g18f2S3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbH0wQLmgksBFQf37WJrYDzr0MXIySEhYCIxfcdVFhCbTUBd4saNn8wgtoiA mcTB1j/sIDazwF0miQP9bF2MHBzCAikSB95pgIRZBFQlfjYeYwKxeYHKe1ddY4IYKS+xesMB sDGcQPEfM3rZQGwhAVOJdwsuMU1g5FrAyLCKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMIyX 1f6ZuIPxy2X3Q4wCHIxKPLwPHm9NF2JNLCuuzD3EKMHBrCTC21a/JV2INyWxsiq1KD++qDQn tfgQozQHi5I4r1d4aoKQQHpiSWp2ampBahFMlomDU6qBsXjb1Vn7X9zP7HIrCK5ZZTFlypzJ M4wau3MnvGIL6AqOfmtlybexSN9Q9+PB1faP5WZlvl7IqnFE4cNH9xUbJwj33597NlE3b8GD sNY5zREfZu1WevJla3F5+beOdU2GeQoro/xOf2ap12VSXiftE73WgEH6zsada+Z99zj4SIB/ t8weiZ7VN5RYijMSDbWYi4oTAR3U4EpfAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 8789E40002 X-Rspamd-Server: rspam07 X-Stat-Signature: ibdyeguafq79k4urq6nkikgnckbmojf8 X-HE-Tag: 1740028847-114123 X-HE-Meta: U2FsdGVkX18cOFKcfmnlq34MpfvkAI1XiHoAcHGcgU60XRDThRpdF9I4Gw+1T5Ed95lncd7OKKA3ib5LBAbHs471ihrFo+fVx94VaUkIIKPsmYzve0Lsj4NaMBvVXvqWd3u7okidTzz/SHV5IoqLXhsMea2SsP8i/A4mwYL25BxSuP89TkqQQt7jjXYlIYMVNz78vp3MMMO7V5v94IJRoefKyZolMj8n/OFeLSNqNxrnaKhfoeyksQ7vCx8nHXXrONBnxQ8CoTzdoo3iEHYN3BYL4EQUZUcZRfoxWJsigH3/x5VTjuTTeSCin+SzQTujbYdoxxuXJVl9taXTHDAo/6D53nGnieZHrQlCsFZJeXl59lbRGrhRQ1izicpnTVRhHFb61x3iJcPPuh5c7gLecgineBJaqv4TY8ooyHxJU76vZmW8x1oTdxq9Kk6SpuiKSnGgdZpfPnQSSBWSk+9Lm0o3l4nqeNrU/dcViNEyIvfRc4PBcZvqC0DdF+e+5WEi9gdLkolyU5WjBfsjbLYHRkmRmZCn5b0l8Il3Oy86eUci+RetWvlN4CS/ONoA/K3Ri7R73fnqC0UGeHEuHUgtYEawwcqujTOb+YaVAaxJOjNMD0AuqIdrV5NOLdFtFx0IG0WA0Hm5gdunUpIkFvWVTc8vFMGOdQeUQUCMHBrege1DF2KEd98L3E24ywZ/OulEEwWZYXZO8mIkzuQKS99THG/GUjzs/CVLRP8CFSxwmm7UFHTB+TbU7E5bG9ErHO3aH0MIGZkVEcj990KxJlHwTAKen60ClDn0PkFiWVPKHNG+zHxMw1M4QSrGva8YQlIZmbDXmnWBFy5ec213o/8wwYcc+jipriF3wAg+WSSDbOfkEVaCc1Vh1/Wt9FVZ4Djlnz955wv/T+vSG45PAmlgFxMfVx9fYxIQDSq3kbiSVh6MmFhuemNyUSYOXtN3Kc1cYbBkKfDuyus4qFhjiwA Tvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For luf'd filemap, tlb shootdown is performed when updating page cache, no matter whether tlb flushes required already has been done or not. By storing luf meta data in struct address_space and updating the luf meta data properly, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- fs/inode.c | 1 + include/linux/fs.h | 4 ++- include/linux/mm_types.h | 2 ++ mm/memory.c | 4 +-- mm/rmap.c | 59 +++++++++++++++++++++++++--------------- mm/truncate.c | 14 +++++----- mm/vmscan.c | 2 +- 7 files changed, 53 insertions(+), 33 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 46fbd5b234822..e155e51be2d28 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -404,6 +404,7 @@ static void __address_space_init_once(struct address_space *mapping) init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->i_private_list); spin_lock_init(&mapping->i_private_lock); + luf_batch_init(&mapping->luf_batch); mapping->i_mmap = RB_ROOT_CACHED; } diff --git a/include/linux/fs.h b/include/linux/fs.h index ec88270221bfe..0cc588c704cd1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -461,6 +461,7 @@ extern const struct address_space_operations empty_aops; * @i_private_lock: For use by the owner of the address_space. * @i_private_list: For use by the owner of the address_space. * @i_private_data: For use by the owner of the address_space. + * @luf_batch: Data to track need of tlb flush by luf. */ struct address_space { struct inode *host; @@ -482,6 +483,7 @@ struct address_space { struct list_head i_private_list; struct rw_semaphore i_mmap_rwsem; void * i_private_data; + struct luf_batch luf_batch; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -508,7 +510,7 @@ static inline int mapping_write_begin(struct file *file, * Ensure to clean stale tlb entries for this mapping. */ if (!ret) - luf_flush(0); + luf_flush_mapping(mapping); return ret; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8de4c190ad514..c50cfc1c6282f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1279,10 +1279,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); void luf_flush(unsigned short luf_key); void luf_flush_mm(struct mm_struct *mm); void luf_flush_vma(struct vm_area_struct *vma); +void luf_flush_mapping(struct address_space *mapping); #else static inline void luf_flush(unsigned short luf_key) {} static inline void luf_flush_mm(struct mm_struct *mm) {} static inline void luf_flush_vma(struct vm_area_struct *vma) {} +static inline void luf_flush_mapping(struct address_space *mapping) {} #endif struct vm_fault; diff --git a/mm/memory.c b/mm/memory.c index b02f86b1adb91..c98af5e567e89 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6161,10 +6161,10 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flush) { /* * If it has a VM_SHARED mapping, all the mms involved - * should be luf_flush'ed. + * in the struct address_space should be luf_flush'ed. */ if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } diff --git a/mm/rmap.c b/mm/rmap.c index e0304dc74c3a7..0cb13e8fcd739 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -691,7 +691,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, #define NR_LUF_BATCH (1 << (sizeof(short) * 8)) /* - * Use 0th entry as accumulated batch. + * XXX: Reserve the 0th entry for later use. */ struct luf_batch luf_batch[NR_LUF_BATCH]; @@ -936,7 +936,7 @@ void luf_flush_vma(struct vm_area_struct *vma) mapping = vma->vm_file->f_mapping; if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } @@ -962,6 +962,29 @@ void luf_flush_mm(struct mm_struct *mm) try_to_unmap_flush(); } +void luf_flush_mapping(struct address_space *mapping) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mapping) + return; + + lb = &mapping->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush_mapping); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -1010,7 +1033,8 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -1032,27 +1056,15 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, tlb_ubc = ¤t->tlb_ubc; else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + if (mapping) + fold_luf_batch_mm(&mapping->luf_batch, mm); } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; - if (can_luf_test()) { - struct luf_batch *lb; - unsigned long flags; - - /* - * Accumulate to the 0th entry right away so that - * luf_flush(0) can be uesed to properly perform pending - * TLB flush once this unmapping is observed. - */ - lb = &luf_batch[0]; - write_lock_irqsave(&lb->lock, flags); - __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); - write_unlock_irqrestore(&lb->lock, flags); - } - /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -1134,7 +1146,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { } @@ -1503,7 +1516,7 @@ int folio_mkclean(struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return cleaned; } @@ -2037,6 +2050,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2174,7 +2188,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2414,6 +2428,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2563,7 +2578,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } diff --git a/mm/truncate.c b/mm/truncate.c index 14618c53f1910..f9a3416610231 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -128,7 +128,7 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -170,7 +170,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return 0; } @@ -220,7 +220,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); if (!folio_test_large(folio)) return true; @@ -282,7 +282,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } @@ -417,7 +417,7 @@ void truncate_inode_pages_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -537,7 +537,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return count; } @@ -704,7 +704,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index ffc4a48710f1d..cbca027d2a10e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -836,7 +836,7 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; }