From patchwork Fri Jun 2 22:24:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0245AC7EE2D for ; Fri, 2 Jun 2023 22:25:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236671AbjFBWZS (ORCPT ); Fri, 2 Jun 2023 18:25:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236675AbjFBWZL (ORCPT ); Fri, 2 Jun 2023 18:25:11 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AB551BF; Fri, 2 Jun 2023 15:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=8dCohK3HnaaNjnWeZwFJCo19Yo0iOXyeyDVPkMnax+A=; b=Hb1LAS9iQgiBqesTXfeqEFRDSk xWzKPR4WggMLYWCSYPE+xCwAbP7BodH4rS2Ns+/PGXbmypoPs2kwL9/w4rnR/OoERHDKJzOUa4NUh HCUcKcSsLGZRf4w1+YXsWs54PxE0XEX2LSiTe9KLlK6ySpY8qvFvA2oztqFYS4CIfVRxuiPcdHISc 8QTyKOveMj1/4KXbND4pMvBNzcbipK3Uylq0VFMM9Zni7MrjNSBXA8a0MtgAEKFUfZkMkYhEJU1GL MR83PKGOElW/GnfsxoYZbBQ+JhYb7YIwPVwtGytEylJvP3caPcIBYC5csdJ1R708gQTXZTgjjuhC7 ajVgySeQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCQ-009aPI-NE; Fri, 02 Jun 2023 22:24:46 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 1/7] iomap: Remove large folio handling in iomap_invalidate_folio() Date: Fri, 2 Jun 2023 23:24:38 +0100 Message-Id: <20230602222445.2284892-2-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org We do not need to release the iomap_page in iomap_invalidate_folio() to allow the folio to be split. The splitting code will call ->release_folio() if there is still per-fs private data attached to the folio. At that point, we will check if the folio is still dirty and decline to release the iomap_page. It is possible to trigger the warning in perfectly legitimate circumstances (eg if a disk read fails, we do a partial write to the folio, then we truncate the folio), which will cause those writes to be lost. Fixes: 60d8231089f0 ("iomap: Support large folios in invalidatepage") Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 063133ec77f4..08ee293c4117 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -508,11 +508,6 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); iomap_page_release(folio); - } else if (folio_test_large(folio)) { - /* Must release the iop so the page can be split */ - WARN_ON_ONCE(!folio_test_uptodate(folio) && - folio_test_dirty(folio)); - iomap_page_release(folio); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); From patchwork Fri Jun 2 22:24:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3243DC77B7A for ; Fri, 2 Jun 2023 22:24:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236140AbjFBWY4 (ORCPT ); Fri, 2 Jun 2023 18:24:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235469AbjFBWYz (ORCPT ); Fri, 2 Jun 2023 18:24:55 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 271401BC; Fri, 2 Jun 2023 15:24:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Zh3vwPttBvras0W9xMfJik9g+4v5Ww/hZDI6n4H8e78=; b=pM5IUzBsEY/vQ/Nwzm+o5XiibG IOpgqcvsClLvAZJelssepb3n2OTF2AW+HrocgmfIPiDeyzGorPuH3Jr+56+KFiXs/OplCEkrjlYHd 9ZDQjtG9cG6C4B+/D77rKvCcXi7gZk/dSBWUn+Y5mBtsYyrX3qH1a3Z88bDDf4ntLVqrSnXGUQtly faF3JtRTar1aY9Ay2gT32DS+n8JHW0TybRVImNbzp1jSzm383SwBZ/Lz9ieycACTOXRtF3EqTci6s 32ZNZvVTVQWuW5BMR9dua0bAP8NaH6Tz0TCA2wAtLw2k8adz7h9+wQO1QbgfvoGSTliA1tMFmO5Rc 2lD87zRg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCQ-009aPK-Pp; Fri, 02 Jun 2023 22:24:46 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 2/7] doc: Correct the description of ->release_folio Date: Fri, 2 Jun 2023 23:24:39 +0100 Message-Id: <20230602222445.2284892-3-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The filesystem ->release_folio method is called under more circumstances now than when the documentation was written. The second sentence describing the interpretation of the return value is the wrong polarity (false indicates failure, not success). And the third sentence is also wrong (the kernel calls try_to_free_buffers() instead). So replace the entire paragraph with a detailed description of what the state of the folio may be, the meaning of the gfp parameter, why the method is being called and what the filesystem is expected to do. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- Documentation/filesystems/locking.rst | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index aa1a233b0fa8..91dc9d5bc602 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -374,10 +374,16 @@ invalidate_lock before invalidating page cache in truncate / hole punch path (and thus calling into ->invalidate_folio) to block races between page cache invalidation and page cache filling functions (fault, read, ...). -->release_folio() is called when the kernel is about to try to drop the -buffers from the folio in preparation for freeing it. It returns false to -indicate that the buffers are (or may be) freeable. If ->release_folio is -NULL, the kernel assumes that the fs has no private interest in the buffers. +->release_folio() is called when the MM wants to make a change to the +folio that would invalidate the filesystem's private data. For example, +it may be about to be removed from the address_space or split. The folio +is locked and not under writeback. It may be dirty. The gfp parameter is +not usually used for allocation, but rather to indicate what the filesystem +may do to attempt to free the private data. The filesystem may +return false to indicate that the folio's private data cannot be freed. +If it returns true, it should have already removed the private data from +the folio. If a filesystem does not provide a ->release_folio method, +the kernel will call try_to_free_buffers(). ->free_folio() is called when the kernel has dropped the folio from the page cache. From patchwork Fri Jun 2 22:24:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E163DC7EE2E for ; Fri, 2 Jun 2023 22:25:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236490AbjFBWZH (ORCPT ); Fri, 2 Jun 2023 18:25:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236585AbjFBWZE (ORCPT ); Fri, 2 Jun 2023 18:25:04 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F5261BC; Fri, 2 Jun 2023 15:25:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=vLEeKteqsDUAnkFvPBrhq1GjkuxkCUs9Dhv1jNbwvTQ=; b=BkW7WbhVl3DxOgxkZirtFJT4qI RRNfRDJoLxb2wb/7/XpKqq8mXL+kG98X9FDqZ0ads4jQL+xScjCUSApEQnUrGqAWWIcjAL/n6jJ7O rjExndgNv31Zxnnb6ib9ZdVBCXCMd4JamiS/Fdr+nUIUMDUAHlYdgk6asmmadt60S09C/6wuHuijl 0uxidhXAHQCIzaUHVxnUwHwRdwO36vfui04uPmdDbkdrQeG41XunrZUz5ctuewZqMg4dr98JpCgWz hss7UPcIGqAPg77NXh6bc2JiEPVTcNafmw0xO7aFP63zD/DAMcM4nUHzz54dYOJMf20tnu68fV7eX TLzViVyA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCQ-009aPM-So; Fri, 02 Jun 2023 22:24:46 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 3/7] iomap: Remove unnecessary test from iomap_release_folio() Date: Fri, 2 Jun 2023 23:24:40 +0100 Message-Id: <20230602222445.2284892-4-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The check for the folio being under writeback is unnecessary; the caller has checked this and the folio is locked, so the folio cannot be under writeback at this point. The comment is somewhat misleading in that it talks about one specific situation in which we can see a dirty folio. There are others, so change the comment to explain why we can't release the iomap_page. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 08ee293c4117..2054b85c9d9b 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -483,12 +483,10 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags) folio_size(folio)); /* - * mm accommodates an old ext3 case where clean folios might - * not have had the dirty bit cleared. Thus, it can send actual - * dirty folios to ->release_folio() via shrink_active_list(); - * skip those here. + * If the folio is dirty, we refuse to release our metadata because + * it may be partially dirty (FIXME, add a test for that). */ - if (folio_test_dirty(folio) || folio_test_writeback(folio)) + if (folio_test_dirty(folio)) return false; iomap_page_release(folio); return true; From patchwork Fri Jun 2 22:24:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED5D5C7EE2D for ; Fri, 2 Jun 2023 22:24:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236474AbjFBWY5 (ORCPT ); Fri, 2 Jun 2023 18:24:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234860AbjFBWYz (ORCPT ); Fri, 2 Jun 2023 18:24:55 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24F0A9D; Fri, 2 Jun 2023 15:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=5WkUXUai2QzTLlts672Sdv2nVGg5BRD8bnOo/7S20j0=; b=v3GeLnbGiewAuhFslvyJ8eR6xV GxkFftGrM3UmXtCLBYTEPyfmbFv8ChRbrik4DOpTsz9KA5uMpTHxm2Cwkvv6MIq9ZMZuoeBGOf4dK qSk8ZtZi6GVtDlx/P7J2mUiiPOng4WEBcCiXkRXa9QBbaf+e+xYaQCXRsbnQ0AAmzBgdeSI9qK7+n /W5rlqgeJ3tRh3X5+eQHxSpYDRfasZ0pbvpzBPdDwtKIDXAl/ZZ/3N0hfzAF0NQYL+QpbIhGbXVVH YvV+G/xiQ/A2tHRgQPMc4iV0wD4ZDzXnF668WDs4J3iHzRN0botpHTYTFnKfAp/L4Vak55rzQeD0M 6WWDbE9Q==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCQ-009aPO-Vs; Fri, 02 Jun 2023 22:24:47 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 4/7] filemap: Add fgp_t typedef Date: Fri, 2 Jun 2023 23:24:41 +0100 Message-Id: <20230602222445.2284892-5-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Similarly to gfp_t, define fgp_t as its own type to prevent various misuses and confusion. Move the documentation to the definition of the type insted of burying it in the __filemap_get_folio() documentation. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/btrfs/file.c | 6 +++--- fs/f2fs/compress.c | 2 +- fs/f2fs/f2fs.h | 2 +- fs/iomap/buffered-io.c | 2 +- include/linux/pagemap.h | 48 +++++++++++++++++++++++++++++++---------- mm/filemap.c | 19 ++-------------- mm/folio-compat.c | 2 +- 7 files changed, 46 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f649647392e0..4bbbbdafb472 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -876,9 +876,9 @@ static int prepare_uptodate_page(struct inode *inode, return 0; } -static unsigned int get_prepare_fgp_flags(bool nowait) +static fgp_t get_prepare_fgp_flags(bool nowait) { - unsigned int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT; + fgp_t fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT; if (nowait) fgp_flags |= FGP_NOWAIT; @@ -910,7 +910,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, int i; unsigned long index = pos >> PAGE_SHIFT; gfp_t mask = get_prepare_gfp_flags(inode, nowait); - unsigned int fgp_flags = get_prepare_fgp_flags(nowait); + fgp_t fgp_flags = get_prepare_fgp_flags(nowait); int err = 0; int faili; diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 11653fa79289..ee358f5f2a86 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1019,7 +1019,7 @@ static int prepare_compress_overwrite(struct compress_ctx *cc, struct address_space *mapping = cc->inode->i_mapping; struct page *page; sector_t last_block_in_bio; - unsigned fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT; + fgp_t fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT; pgoff_t start_idx = start_idx_of_cluster(cc); int i, ret; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d211ee89c158..a03c3df75f7c 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2715,7 +2715,7 @@ static inline struct page *f2fs_grab_cache_page(struct address_space *mapping, static inline struct page *f2fs_pagecache_get_page( struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp_mask) + fgp_t fgp_flags, gfp_t gfp_mask) { if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET)) return NULL; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 2054b85c9d9b..30d53b50ee0f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -467,7 +467,7 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); */ struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos) { - unsigned fgp = FGP_WRITEBEGIN | FGP_NOFS; + fgp_t fgp = FGP_WRITEBEGIN | FGP_NOFS; if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a56308a9d1a4..7ab57a2bb576 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -497,22 +497,48 @@ pgoff_t page_cache_next_miss(struct address_space *mapping, pgoff_t page_cache_prev_miss(struct address_space *mapping, pgoff_t index, unsigned long max_scan); -#define FGP_ACCESSED 0x00000001 -#define FGP_LOCK 0x00000002 -#define FGP_CREAT 0x00000004 -#define FGP_WRITE 0x00000008 -#define FGP_NOFS 0x00000010 -#define FGP_NOWAIT 0x00000020 -#define FGP_FOR_MMAP 0x00000040 -#define FGP_STABLE 0x00000080 +/** + * typedef fgp_t - Flags for getting folios from the page cache. + * + * Most users of the page cache will not need to use these flags; + * there are convenience functions such as filemap_get_folio() and + * filemap_lock_folio(). For users which need more control over exactly + * what is done with the folios, these flags to __filemap_get_folio() + * are available. + * + * * %FGP_ACCESSED - The folio will be marked accessed. + * * %FGP_LOCK - The folio is returned locked. + * * %FGP_CREAT - If no folio is present then a new folio is allocated, + * added to the page cache and the VM's LRU list. The folio is + * returned locked. + * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the + * folio is already in cache. If the folio was allocated, unlock it + * before returning so the caller can do the same dance. + * * %FGP_WRITE - The folio will be written to by the caller. + * * %FGP_NOFS - __GFP_FS will get cleared in gfp. + * * %FGP_NOWAIT - Don't block on the folio lock. + * * %FGP_STABLE - Wait for the folio to be stable (finished writeback) + * * %FGP_WRITEBEGIN - The flags to use in a filesystem write_begin() + * implementation. + */ +typedef unsigned int __bitwise fgp_t; + +#define FGP_ACCESSED ((__force fgp_t)0x00000001) +#define FGP_LOCK ((__force fgp_t)0x00000002) +#define FGP_CREAT ((__force fgp_t)0x00000004) +#define FGP_WRITE ((__force fgp_t)0x00000008) +#define FGP_NOFS ((__force fgp_t)0x00000010) +#define FGP_NOWAIT ((__force fgp_t)0x00000020) +#define FGP_FOR_MMAP ((__force fgp_t)0x00000040) +#define FGP_STABLE ((__force fgp_t)0x00000080) #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgp_t fgp_flags, gfp_t gfp); struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgp_t fgp_flags, gfp_t gfp); /** * filemap_get_folio - Find and get a folio. @@ -586,7 +612,7 @@ static inline struct page *find_get_page(struct address_space *mapping, } static inline struct page *find_get_page_flags(struct address_space *mapping, - pgoff_t offset, int fgp_flags) + pgoff_t offset, fgp_t fgp_flags) { return pagecache_get_page(mapping, offset, fgp_flags, 0); } diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..eb89a815f2f8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1887,30 +1887,15 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index) * * Looks up the page cache entry at @mapping & @index. * - * @fgp_flags can be zero or more of these flags: - * - * * %FGP_ACCESSED - The folio will be marked accessed. - * * %FGP_LOCK - The folio is returned locked. - * * %FGP_CREAT - If no page is present then a new page is allocated using - * @gfp and added to the page cache and the VM's LRU list. - * The page is returned locked and with an increased refcount. - * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the - * page is already in cache. If the page was allocated, unlock it before - * returning so the caller can do the same dance. - * * %FGP_WRITE - The page will be written to by the caller. - * * %FGP_NOFS - __GFP_FS will get cleared in gfp. - * * %FGP_NOWAIT - Don't get blocked by page lock. - * * %FGP_STABLE - Wait for the folio to be stable (finished writeback) - * * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even * if the %GFP flags specified for %FGP_CREAT are atomic. * - * If there is a page cache page, it is returned with an increased refcount. + * If this function returns a folio, it is returned with an increased refcount. * * Return: The found folio or an ERR_PTR() otherwise. */ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp) + fgp_t fgp_flags, gfp_t gfp) { struct folio *folio; diff --git a/mm/folio-compat.c b/mm/folio-compat.c index c6f056c20503..ae1a737d2533 100644 --- a/mm/folio-compat.c +++ b/mm/folio-compat.c @@ -92,7 +92,7 @@ EXPORT_SYMBOL(add_to_page_cache_lru); noinline struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp) + fgp_t fgp_flags, gfp_t gfp) { struct folio *folio; From patchwork Fri Jun 2 22:24:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A81DDC77B7A for ; Fri, 2 Jun 2023 22:25:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236786AbjFBWZ0 (ORCPT ); Fri, 2 Jun 2023 18:25:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236728AbjFBWZR (ORCPT ); Fri, 2 Jun 2023 18:25:17 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A120E45; Fri, 2 Jun 2023 15:25:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=nAYJDYU3TRUT7Dhvxc7GBzlQVR6Ge/TZndntkJ0o7dI=; b=dBSj4/uKGMJyWkSGkU4yfbFulx zS6WOpCVK3X2YoMv2Qa/K/jI8P5ZolkuouCVzbmU1ZocPW+lZvOMKblRgvxOdjiFtbt99/3oL37lJ D5pCYoJFDMK0eQU6/PE1zH3SpxfU9au7/RIOeBK4VyHpiYJDHPt+iKi6YDjrktHofwMGqq0j7ZWby J3QMEN0c+4Du6Iwd/It2os/4OyyLV48AFXBz6eBh/r5qxDbud15UvDMsLv5NyOS0UQTxBhsoa4DpV HpFwye2x3vliQdnRLStDqJpzzIYH4hlKhz/8s8CajtJfE6H2dtJUKEfKaDUDRbZcJpen0DkQv7XnH u7qOvTfg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCR-009aPQ-2T; Fri, 02 Jun 2023 22:24:47 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 5/7] filemap: Allow __filemap_get_folio to allocate large folios Date: Fri, 2 Jun 2023 23:24:42 +0100 Message-Id: <20230602222445.2284892-6-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Allow callers of __filemap_get_folio() to specify a preferred folio order in the FGP flags. This is only honoured in the FGP_CREATE path; if there is already a folio in the page cache that covers the index, we will return it, no matter what its order is. No create-around is attempted; we will only create folios which start at the specified index. Unmodified callers will continue to allocate order 0 folios. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/pagemap.h | 23 ++++++++++++++++++++++ mm/filemap.c | 42 ++++++++++++++++++++++++++++------------- mm/readahead.c | 13 ------------- 3 files changed, 52 insertions(+), 26 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 7ab57a2bb576..667ce668f438 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -466,6 +466,19 @@ static inline void *detach_page_private(struct page *page) return folio_detach_private(page_folio(page)); } +/* + * There are some parts of the kernel which assume that PMD entries + * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, + * limit the maximum allocation order to PMD size. I'm not aware of any + * assumptions about maximum order if THP are disabled, but 8 seems like + * a good order (that's 1MB if you're using 4kB pages) + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER +#else +#define MAX_PAGECACHE_ORDER 8 +#endif + #ifdef CONFIG_NUMA struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order); #else @@ -531,9 +544,19 @@ typedef unsigned int __bitwise fgp_t; #define FGP_NOWAIT ((__force fgp_t)0x00000020) #define FGP_FOR_MMAP ((__force fgp_t)0x00000040) #define FGP_STABLE ((__force fgp_t)0x00000080) +#define FGP_GET_ORDER(fgp) (((__force unsigned)fgp) >> 26) /* top 6 bits */ #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) +static inline fgp_t fgp_set_order(size_t size) +{ + unsigned int shift = ilog2(size); + + if (shift <= PAGE_SHIFT) + return 0; + return (__force fgp_t)((shift - PAGE_SHIFT) << 26); +} + void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, fgp_t fgp_flags, gfp_t gfp); diff --git a/mm/filemap.c b/mm/filemap.c index eb89a815f2f8..10ea9321c36e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1937,7 +1937,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio_wait_stable(folio); no_page: if (!folio && (fgp_flags & FGP_CREAT)) { + unsigned order = FGP_GET_ORDER(fgp_flags); int err; + if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping)) gfp |= __GFP_WRITE; if (fgp_flags & FGP_NOFS) @@ -1946,26 +1948,40 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp &= ~GFP_KERNEL; gfp |= GFP_NOWAIT | __GFP_NOWARN; } - - folio = filemap_alloc_folio(gfp, 0); - if (!folio) - return ERR_PTR(-ENOMEM); - if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; - /* Init accessed so avoid atomic mark_page_accessed later */ - if (fgp_flags & FGP_ACCESSED) - __folio_set_referenced(folio); + if (!mapping_large_folio_support(mapping)) + order = 0; + if (order > MAX_PAGECACHE_ORDER) + order = MAX_PAGECACHE_ORDER; + /* If we're not aligned, allocate a smaller folio */ + if (index & ((1UL << order) - 1)) + order = __ffs(index); - err = filemap_add_folio(mapping, folio, index, gfp); - if (unlikely(err)) { + do { + err = -ENOMEM; + if (order == 1) + order = 0; + folio = filemap_alloc_folio(gfp, order); + if (!folio) + continue; + + /* Init accessed so avoid atomic mark_page_accessed later */ + if (fgp_flags & FGP_ACCESSED) + __folio_set_referenced(folio); + + err = filemap_add_folio(mapping, folio, index, gfp); + if (!err) + break; folio_put(folio); folio = NULL; - if (err == -EEXIST) - goto repeat; - } + } while (order-- > 0); + if (err == -EEXIST) + goto repeat; + if (err) + return ERR_PTR(err); /* * filemap_add_folio locks the page, and for mmap * we expect an unlocked page. diff --git a/mm/readahead.c b/mm/readahead.c index 47afbca1d122..59a071badb90 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -462,19 +462,6 @@ static int try_context_readahead(struct address_space *mapping, return 1; } -/* - * There are some parts of the kernel which assume that PMD entries - * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, - * limit the maximum allocation order to PMD size. I'm not aware of any - * assumptions about maximum order if THP are disabled, but 8 seems like - * a good order (that's 1MB if you're using 4kB pages) - */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER -#else -#define MAX_PAGECACHE_ORDER 8 -#endif - static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index, pgoff_t mark, unsigned int order, gfp_t gfp) { From patchwork Fri Jun 2 22:24:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61658C7EE2D for ; Fri, 2 Jun 2023 22:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236627AbjFBWZH (ORCPT ); Fri, 2 Jun 2023 18:25:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236490AbjFBWZB (ORCPT ); Fri, 2 Jun 2023 18:25:01 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 272779D; Fri, 2 Jun 2023 15:25:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=TGMalEYO1ErQ54h43Guq5ImxIk0N+4z7/wR3wP91NUQ=; b=lNrsg9mcNam2PCMiHWIauhJ8D9 6NU5XdNfGnjUBBEcvMSJp9NyocNQmqJYcvP+CxTP8WklmjZSR3tyG80NI2mTS+aDZRx5q9zbdd0jm Iztfdl9mIOq2Cqq/XKF8JWPHviGbeR5cIULDKRPhLtbHUoufX7TNtgvwljCefQC90TRa+PSSpS9Af kgJhb2v18Dx0z1DWlZKY8gBeizJIrF859inorkkJm0rPyFY4HBL/WyaPs8Taxn8E2YsVoNdHFMUsF MZU2g3Y19HrNmbAA89oB+tAalNsVF/FiyDGP87ArObbH8i80eJfBzB0kmipqNk54CHuqTq9oKD9Kd OSc+Eo5g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCR-009aPS-5H; Fri, 02 Jun 2023 22:24:47 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 6/7] iomap: Create large folios in the buffered write path Date: Fri, 2 Jun 2023 23:24:43 +0100 Message-Id: <20230602222445.2284892-7-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Use the size of the write as a hint for the size of the folio to create. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/gfs2/bmap.c | 2 +- fs/iomap/buffered-io.c | 6 ++++-- include/linux/iomap.h | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index c739b258a2d9..3702e5e47b0f 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -971,7 +971,7 @@ gfs2_iomap_get_folio(struct iomap_iter *iter, loff_t pos, unsigned len) if (status) return ERR_PTR(status); - folio = iomap_get_folio(iter, pos); + folio = iomap_get_folio(iter, pos, len); if (IS_ERR(folio)) gfs2_trans_end(sdp); return folio; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 30d53b50ee0f..a10f9c037515 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -461,16 +461,18 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); * iomap_get_folio - get a folio reference for writing * @iter: iteration structure * @pos: start offset of write + * @len: length of write * * Returns a locked reference to the folio at @pos, or an error pointer if the * folio could not be obtained. */ -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos) +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len) { fgp_t fgp = FGP_WRITEBEGIN | FGP_NOFS; if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; + fgp |= fgp_set_order(len); return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT, fgp, mapping_gfp_mask(iter->inode->i_mapping)); @@ -596,7 +598,7 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos, if (folio_ops && folio_ops->get_folio) return folio_ops->get_folio(iter, pos, len); else - return iomap_get_folio(iter, pos); + return iomap_get_folio(iter, pos, len); } static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..80facb9c9e5b 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -261,7 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode, int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, From patchwork Fri Jun 2 22:24:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13265929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F6A4C7EE2C for ; Fri, 2 Jun 2023 22:25:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236697AbjFBWZP (ORCPT ); Fri, 2 Jun 2023 18:25:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236565AbjFBWZJ (ORCPT ); Fri, 2 Jun 2023 18:25:09 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6FC79D; Fri, 2 Jun 2023 15:25:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=JKTgtHNHcRLTjQN0EAPfSucdF0QV/2Uzj27p0WALQZU=; b=M238qUoSHdAGx4OEliDBvdo+fY A8jil4e9S4a/jnWFywzU/duk0iAFs2an2zC6j96ueAA1Orukj0pT1142w60vq0AvUPxUsMWLRRhak IN7roTi9cNBSpoZJEA1Gp+2E8Io6r75gFiXNXATh8BtnzilNrBBkdFeFo9qEaduvaNmh7Y0dBJV0B HkD/Q6QRwRt3nXt5KSKdQoUyFeaAt1+892zEAHE4r5IDRwFqbRx/tgvVPLMiVBzRzyjIVvFqUbtIH JTm/78Vv6/TagZZYOl/uI5oXt1AllsRPCb+gsDG5FMfNTo9RoVz/3SEwrjT4FFUqas+smbK8m1y4I pnOBnX4Q==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q5DCR-009aPU-7u; Fri, 02 Jun 2023 22:24:47 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v2 7/7] iomap: Copy larger chunks from userspace Date: Fri, 2 Jun 2023 23:24:44 +0100 Message-Id: <20230602222445.2284892-8-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230602222445.2284892-1-willy@infradead.org> References: <20230602222445.2284892-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If we have a large folio, we can copy in larger chunks than PAGE_SIZE. Start at the maximum page cache size and shrink by half every time we hit the "we are short on memory" problem. Signed-off-by: Matthew Wilcox (Oracle) --- fs/iomap/buffered-io.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a10f9c037515..10434b07e0f9 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) { loff_t length = iomap_length(iter); + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; loff_t pos = iter->pos; ssize_t written = 0; long status = 0; @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) do { struct folio *folio; - struct page *page; - unsigned long offset; /* Offset into pagecache page */ - unsigned long bytes; /* Bytes to write to page */ + size_t offset; /* Offset into folio */ + unsigned long bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - offset = offset_in_page(pos); - bytes = min_t(unsigned long, PAGE_SIZE - offset, - iov_iter_count(i)); again: + offset = pos & (chunk - 1); + bytes = min(chunk - offset, iov_iter_count(i)); status = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); if (unlikely(status)) @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) if (iter->iomap.flags & IOMAP_F_STALE) break; - page = folio_file_page(folio, pos >> PAGE_SHIFT); + offset = offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes = folio_size(folio) - offset; + if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); - copied = copy_page_from_iter_atomic(page, offset, bytes, i); + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); status = iomap_write_end(iter, pos, bytes, copied, folio); @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) */ if (copied) bytes = copied; + if (chunk > PAGE_SIZE) + chunk /= 2; goto again; } pos += status;