From patchwork Tue Sep 3 15:19:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13788895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06028CD342A for ; Tue, 3 Sep 2024 15:20:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 901A38D019A; Tue, 3 Sep 2024 11:20:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B1068D016E; Tue, 3 Sep 2024 11:20:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79F9B8D019A; Tue, 3 Sep 2024 11:20:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5D2CF8D016E for ; Tue, 3 Sep 2024 11:20:55 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F1E8EC04F0 for ; Tue, 3 Sep 2024 15:20:54 +0000 (UTC) X-FDA: 82523789628.21.4B171EC Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf10.hostedemail.com (Postfix) with ESMTP id 65E4DC0015 for ; Tue, 3 Sep 2024 15:20:52 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725376782; a=rsa-sha256; cv=none; b=yS+ebMeaQvKYTaOT9cjDcn+R2GJXABdUZNdDhmQUy9qzA/dP/Aeqcc1rdHhUIMZ5dWn/aX rGDv1RWuossaIyCGMcs8XZaJ7fRb4KZDbvISdwz+q3QeSWDAZm8dY12lVi5Rc040lXH3+E zn5T5XN1z7w2HvMmkJWRG8jlH/OmoBI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725376782; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=AWb83//SBeqZZdC7kwVBaFoq+n3K76el4t7bAPy3MDg=; b=nSX4LWZxb0zJ2kY+VQVEoidCzCaCnlzxVvD69N4mLUstBHWoVLAwBx1igF/VJunAWV7I9q UKFjFGleZ35k+z/aKUxLLdCoylLvwuyO8aI6lIG6Rxw0yO9Kp95ntZme5cI12IMKjCTkbA YYLYB/8ISbDCddeJsuKLgu/MMo9cuzM= Received: from [2601:18c:9101:a8b6:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1slVJY-000000001O4-2J7e; Tue, 03 Sep 2024 11:19:28 -0400 Date: Tue, 3 Sep 2024 11:19:28 -0400 From: Rik van Riel To: Hugh Dickins Cc: Baolin Wang , Andrew Morton , "Darrick J. Wong" , Vlastimil Babka , Matthew Wilcox , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2] mm,tmpfs: consider end of file write in shmem_is_huge Message-ID: <20240903111928.7171e60c@imladris.surriel.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.41; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Rspamd-Queue-Id: 65E4DC0015 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jnury889qj55wacq9s3fj8ebb9r3xzdu X-HE-Tag: 1725376852-370487 X-HE-Meta: U2FsdGVkX18zWpeKO4ia4NvMY+0I72/eb3u05aVgBgFuFnEg5cBfH8O7wUV1SiQfKOUjytEO7ykrm2aVQeQyowaJBMBkoDQ2wJbgca3ATIZJfxhIDOH6+SyIPGS0XP7TZ7B6pVswDB4BOxg04MHXMM8UvNmIs6YlgDR6neWIg74fEPVme14Nne641JL+K/A1L94xNbJY/AeQ8UFKy/iM8SsyRO5VfrxzbBWReorAZHn1pv5dbRMkR9+3fdtAh3fmORYwrH2BaK/0PCB8hiEXqSMQt4CKQZBxIxhw+dowLtuFF8Fux26wkgK/rdvfgJn8RR4mORXGdeKqrcfJOEgfpCfIwlsmU0qSupKiDgBXU/CITVm0Y6IScW+rCMaAdbLdcaUZCp6jtwZK1WAScQRKVvakc7Y3NKwEx4zevUqOdoZBhUq6kiSCuytPcG5r+qRAwMDnP27y346LvxcWVLCMHVZbXZiqEeEQLXr0pVNJD+b+CERJE5kCruYV9qWyN/+SeVJXffnaqXqOLPGQQ2m4BrKOxkGks4QruLkZU7guiNNGVXnFdDAhDeikHJWgnKeMzuL2Im5RIsQFZyLAguT4Lv3FpKsqrzeZgotzTHqXEVxb431GfSuuxdxtU8F7sMbE1ATPar9FxBZcIgQywZcA0hFPfEASeszbYh+y5oNuTmsQA15utckiu/UHTHW1ic/UrZnmyLZiEa5ao1+CmqsdqvX81eBgUNeTdDFaBlvvTTAiSlrbj9mRgWmKpiuk6ww0sHLKyyXJybaee90gYBPOfoArymgGglLRmVdy6O0J/Kqx3yoxuCj3fhR3UtZJqLe1+pOmmKTaW445a98RiL/KCWxxM50tXYRd26BaC9oXHIKi18eaUoK+zufRFV9lpgR4eigqeJ4VvDyBG7DLqmkze/E8rElxb6IGKwGfJmeRTnOLUjQArwgQu5EKe1VE6TToNAO5Dc5SMg/FzYDpVM5 DkYMFQ36 EONDSU9Jv7MHoDWXpsT/LXKMX0/vg54K7HhRZkAEcodMPvTRE27gXnB8zOK2Je/+A1ZkXz9cnzCIyhO9FDcm1JOcDQHahrM7e/6kNxvKW2wO9+K3SHxpn2xus7/Xrs5FPN6I14eORbiMnL2XQZIB1N65n93HPxSpDcDUBg/UbmUczPYykXwnmglMuJiqXwmV5XCd3EPnS4zFg2lxJEfsjfbMp478uweQk/zwcHzK5vss8RJE4WpvL6PhmTIY4EE0MtTVgceSlA/Qq3XPGvSf0k1ALRsMp4nXkBjmP08E1UlOMuScvPqHNRjym+WxWpeMbBmf93PGyPYMaNqkyqBA+a0EqHv4qjdkFDwL1V1JIorG5V2Wetheqid1dMkyfD8bS7VJlRuUDAd2V2vU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Take the end of a file write into consideration when deciding whether or not to use huge pages for tmpfs files when the tmpfs filesystem is mounted with huge=within_size This allows large writes that append to the end of a file to automatically use large pages. Doing 4MB squential writes without fallocate to a 16GB tmpfs file with fio. The numbers without THP or with huge=always stay the same, but the performance with huge=within_size now matches that of huge=always. huge before after 4kB pages 1560 MB/s 1560 MB/s within_size 1560 MB/s 4720 MB/s always: 4720 MB/s 4720 MB/s Signed-off-by: Rik van Riel Reviewed-by: Baolin Wang Tested-by: Baolin Wang --- v2: rebased on mm-unstable, fixed up changelog as suggested by Andrew Morton fs/xfs/scrub/xfile.c | 6 ++-- fs/xfs/xfs_buf_mem.c | 2 +- include/linux/shmem_fs.h | 8 +++--- mm/huge_memory.c | 2 +- mm/khugepaged.c | 2 +- mm/shmem.c | 59 +++++++++++++++++++++------------------- mm/userfaultfd.c | 2 +- 7 files changed, 42 insertions(+), 39 deletions(-) diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c index 9b5d98fe1f8a..c753c79df203 100644 --- a/fs/xfs/scrub/xfile.c +++ b/fs/xfs/scrub/xfile.c @@ -126,7 +126,7 @@ xfile_load( unsigned int len; unsigned int offset; - if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, + if (shmem_get_folio(inode, pos >> PAGE_SHIFT, 0, &folio, SGP_READ) < 0) break; if (!folio) { @@ -196,7 +196,7 @@ xfile_store( unsigned int len; unsigned int offset; - if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, + if (shmem_get_folio(inode, pos >> PAGE_SHIFT, 0, &folio, SGP_CACHE) < 0) break; if (filemap_check_wb_err(inode->i_mapping, 0)) { @@ -267,7 +267,7 @@ xfile_get_folio( i_size_write(inode, pos + len); pflags = memalloc_nofs_save(); - error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, + error = shmem_get_folio(inode, pos >> PAGE_SHIFT, 0, &folio, (flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ); memalloc_nofs_restore(pflags); if (error) diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c index 9bb2d24de709..07bebbfb16ee 100644 --- a/fs/xfs/xfs_buf_mem.c +++ b/fs/xfs/xfs_buf_mem.c @@ -149,7 +149,7 @@ xmbuf_map_page( return -ENOMEM; } - error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE); + error = shmem_get_folio(inode, pos >> PAGE_SHIFT, 0, &folio, SGP_CACHE); if (error) return error; diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 1564d7d3ca61..515a9a6a3c6f 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -113,11 +113,11 @@ int shmem_unuse(unsigned int type); #ifdef CONFIG_TRANSPARENT_HUGEPAGE unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force); + loff_t write_end, bool shmem_huge_force); #else static inline unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force) + loff_t write_end, bool shmem_huge_force) { return 0; } @@ -143,8 +143,8 @@ enum sgp_type { SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ }; -int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, - enum sgp_type sgp); +int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end, + struct folio **foliop, enum sgp_type sgp); struct folio *shmem_read_folio_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0993dfe9ae94..382938e46f96 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -164,7 +164,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, */ if (!in_pf && shmem_file(vma->vm_file)) return shmem_allowable_huge_orders(file_inode(vma->vm_file), - vma, vma->vm_pgoff, + vma, vma->vm_pgoff, 0, !enforce_sysfs); if (!vma_is_anonymous(vma)) { diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 32100041aef3..f9c39898eaff 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1870,7 +1870,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (xa_is_value(folio) || !folio_test_uptodate(folio)) { xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ - if (shmem_get_folio(mapping->host, index, + if (shmem_get_folio(mapping->host, index, 0, &folio, SGP_NOALLOC)) { result = SCAN_FAIL; goto xa_unlocked; diff --git a/mm/shmem.c b/mm/shmem.c index 553b99cb265e..375ae3f170a3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -549,7 +549,8 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { struct mm_struct *mm = vma ? vma->vm_mm : NULL; @@ -569,7 +570,8 @@ static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, return true; case SHMEM_HUGE_WITHIN_SIZE: index = round_up(index + 1, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); + i_size = max(write_end, i_size_read(inode)); + i_size = round_up(i_size, PAGE_SIZE); if (i_size >> PAGE_SHIFT >= index) return true; fallthrough; @@ -583,14 +585,14 @@ static bool __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, } static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, - unsigned long vm_flags) + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER) return false; - return __shmem_huge_global_enabled(inode, index, shmem_huge_force, - vma, vm_flags); + return __shmem_huge_global_enabled(inode, index, write_end, + shmem_huge_force, vma, vm_flags); } #if defined(CONFIG_SYSFS) @@ -770,8 +772,8 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, } static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - bool shmem_huge_force, struct vm_area_struct *vma, - unsigned long vm_flags) + loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { return false; } @@ -978,7 +980,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index) * (although in some cases this is just a waste of time). */ folio = NULL; - shmem_get_folio(inode, index, &folio, SGP_READ); + shmem_get_folio(inode, index, 0, &folio, SGP_READ); return folio; } @@ -1166,7 +1168,7 @@ static int shmem_getattr(struct mnt_idmap *idmap, STATX_ATTR_NODUMP); generic_fillattr(idmap, request_mask, inode, stat); - if (shmem_huge_global_enabled(inode, 0, false, NULL, 0)) + if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; if (request_mask & STATX_BTIME) { @@ -1653,7 +1655,7 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) #ifdef CONFIG_TRANSPARENT_HUGEPAGE unsigned long shmem_allowable_huge_orders(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - bool shmem_huge_force) + loff_t write_end, bool shmem_huge_force) { unsigned long mask = READ_ONCE(huge_shmem_orders_always); unsigned long within_size_orders = READ_ONCE(huge_shmem_orders_within_size); @@ -1670,8 +1672,8 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) return 0; - global_huge = shmem_huge_global_enabled(inode, index, shmem_huge_force, - vma, vm_flags); + global_huge = shmem_huge_global_enabled(inode, index, write_end, + shmem_huge_force, vma, vm_flags); if (!vma || !vma_is_anon_shmem(vma)) { /* * For tmpfs, we now only support PMD sized THP if huge page @@ -2231,8 +2233,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, * vmf and fault_type are only supplied by shmem_fault: otherwise they are NULL. */ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, - struct folio **foliop, enum sgp_type sgp, gfp_t gfp, - struct vm_fault *vmf, vm_fault_t *fault_type) + loff_t write_end, struct folio **foliop, enum sgp_type sgp, + gfp_t gfp, struct vm_fault *vmf, vm_fault_t *fault_type) { struct vm_area_struct *vma = vmf ? vmf->vma : NULL; struct mm_struct *fault_mm; @@ -2312,7 +2314,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, } /* Find hugepage orders that are allowed for anonymous shmem and tmpfs. */ - orders = shmem_allowable_huge_orders(inode, vma, index, false); + orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false); if (orders > 0) { gfp_t huge_gfp; @@ -2413,6 +2415,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, * shmem_get_folio - find, and lock a shmem folio. * @inode: inode to search * @index: the page index. + * @write_end: end of a write, could extend inode size * @foliop: pointer to the folio if found * @sgp: SGP_* flags to control behavior * @@ -2432,10 +2435,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, * Context: May sleep. * Return: 0 if successful, else a negative error code. */ -int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, - enum sgp_type sgp) +int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end, + struct folio **foliop, enum sgp_type sgp) { - return shmem_get_folio_gfp(inode, index, foliop, sgp, + return shmem_get_folio_gfp(inode, index, write_end, foliop, sgp, mapping_gfp_mask(inode->i_mapping), NULL, NULL); } EXPORT_SYMBOL_GPL(shmem_get_folio); @@ -2530,7 +2533,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) } WARN_ON_ONCE(vmf->page != NULL); - err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, + err = shmem_get_folio_gfp(inode, vmf->pgoff, 0, &folio, SGP_CACHE, gfp, vmf, &ret); if (err) return vmf_error(err); @@ -3040,7 +3043,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, return -EPERM; } - ret = shmem_get_folio(inode, index, &folio, SGP_WRITE); + ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE); if (ret) return ret; @@ -3111,7 +3114,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) break; } - error = shmem_get_folio(inode, index, &folio, SGP_READ); + error = shmem_get_folio(inode, index, 0, &folio, SGP_READ); if (error) { if (error == -EINVAL) error = 0; @@ -3287,7 +3290,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, if (*ppos >= i_size_read(inode)) break; - error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio, + error = shmem_get_folio(inode, *ppos / PAGE_SIZE, 0, &folio, SGP_READ); if (error) { if (error == -EINVAL) @@ -3477,8 +3480,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) error = -ENOMEM; else - error = shmem_get_folio(inode, index, &folio, - SGP_FALLOC); + error = shmem_get_folio(inode, index, offset + len, + &folio, SGP_FALLOC); if (error) { info->fallocend = undo_fallocend; /* Remove the !uptodate folios we added */ @@ -3829,7 +3832,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir, } else { inode_nohighmem(inode); inode->i_mapping->a_ops = &shmem_aops; - error = shmem_get_folio(inode, 0, &folio, SGP_WRITE); + error = shmem_get_folio(inode, 0, 0, &folio, SGP_WRITE); if (error) goto out_remove_offset; inode->i_op = &shmem_symlink_inode_operations; @@ -3875,7 +3878,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode, return ERR_PTR(-ECHILD); } } else { - error = shmem_get_folio(inode, 0, &folio, SGP_READ); + error = shmem_get_folio(inode, 0, 0, &folio, SGP_READ); if (error) return ERR_PTR(error); if (!folio) @@ -5343,7 +5346,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping, struct folio *folio; int error; - error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE, + error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE, gfp, NULL, NULL); if (error) return ERR_PTR(error); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 6ef42d9cd482..ce13c4062647 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -391,7 +391,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, struct page *page; int ret; - ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC); + ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret == -ENOENT) ret = -EFAULT;