From patchwork Mon Jun 12 20:39:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29472C7EE43 for ; Mon, 12 Jun 2023 20:39:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232170AbjFLUj0 (ORCPT ); Mon, 12 Jun 2023 16:39:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229836AbjFLUjZ (ORCPT ); Mon, 12 Jun 2023 16:39:25 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 646DC1728; Mon, 12 Jun 2023 13:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=cORmRVIzdhIpaC4SU57rlnv0heDB6fvM/E8z8XJtSVA=; b=UIqS+phdYFKvaKsq8HitWQRCpa zh1WzSIKtV0WnCRVRz/FUuL5VO4GsDGwxht+4qWB2toFATdxKgnPspmBRWH1Y5T0mo8G+OJ44x7Jk pnSc75bSEzGvbCFWxJBtww6EZ4fOnSgmlN+4YCSlJfraI1VG0ow3H86R09bT/1oiJbabkColTA9Sk LI6C5s7ZOeONWzcbrfPA57S8U3qbwpHA5OwY9Pti3tf489zATtu/r9k4Iq0Q3OMZFGr0PTfM/2WL8 unOC0KnPEynlOkmefgDhGKFUAI169ztXHl9clUNQZMQPjMKnwjdd0six0c9VWAC6atJim20jT+Wki hpmOmDBg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032SV-CZ; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 1/8] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() Date: Mon, 12 Jun 2023 21:39:03 +0100 Message-Id: <20230612203910.724378-2-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org copy_page_from_iter_atomic() already handles !highmem compound pages correctly, but if we are passed a highmem compound page, each base page needs to be mapped & unmapped individually. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- lib/iov_iter.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 960223ed9199..1a3fbda0c508 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -857,24 +857,37 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(iov_iter_zero); -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, - struct iov_iter *i) +size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, + size_t bytes, struct iov_iter *i) { - char *kaddr = kmap_atomic(page), *p = kaddr + offset; - if (!page_copy_sane(page, offset, bytes)) { - kunmap_atomic(kaddr); + size_t n, copied = 0; + + if (!page_copy_sane(page, offset, bytes)) return 0; - } - if (WARN_ON_ONCE(!i->data_source)) { - kunmap_atomic(kaddr); + if (WARN_ON_ONCE(!i->data_source)) return 0; - } - iterate_and_advance(i, bytes, base, len, off, - copyin(p + off, base, len), - memcpy_from_iter(i, p + off, base, len) - ) - kunmap_atomic(kaddr); - return bytes; + + do { + char *kaddr; + + n = bytes - copied; + if (PageHighMem(page)) { + page += offset / PAGE_SIZE; + offset %= PAGE_SIZE; + n = min_t(size_t, n, PAGE_SIZE - offset); + } + + kaddr = kmap_atomic(page) + offset; + iterate_and_advance(i, n, base, len, off, + copyin(kaddr + off, base, len), + memcpy_from_iter(i, kaddr + off, base, len) + ) + kunmap_atomic(kaddr); + copied += n; + offset += n; + } while (PageHighMem(page) && copied != bytes && n > 0); + + return copied; } EXPORT_SYMBOL(copy_page_from_iter_atomic); From patchwork Mon Jun 12 20:39:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3983C88CB7 for ; Mon, 12 Jun 2023 20:39:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230197AbjFLUjg (ORCPT ); Mon, 12 Jun 2023 16:39:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236056AbjFLUjd (ORCPT ); Mon, 12 Jun 2023 16:39:33 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7469910F9; Mon, 12 Jun 2023 13:39:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=TeUsmxkBd0c7/DlT0OTqD1unAL6e90bynyipv5XpnOQ=; b=Eb21bhxqv+1pOvIN1xPtnUGgrc PPOJryxpY1p/iYNFKlw1RZ11rOaT9lNjAxkh9r3SNkIYZpvUfD3eK+Ab0cAhWN5ItU0FdOJlQuEF/ hy552Xmt4waB2YUjfKak9qJ0yosfzPmw88pGFKmGwFTQVM3Bc6mK7EGUGvVGE9mB7bnDtKJmXKzyW xLdDzXY+4enE/0lYs9+8nUse+xBb3qgSLMT29+gUdF6LsLIeyDJ71wAlFvNw4s9arg/Jg6zkl+INK L6a25bzO2gZ03WOdMZePSNE0b9qz+l61GMsGM5EDB4VZfmv+9Skvxs8faWUDbcfP23xhBTMjPb2hT eGD/Yz7g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032SX-GG; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" , Christoph Hellwig Subject: [PATCH v3 2/8] iomap: Remove large folio handling in iomap_invalidate_folio() Date: Mon, 12 Jun 2023 21:39:04 +0100 Message-Id: <20230612203910.724378-3-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org We do not need to release the iomap_page in iomap_invalidate_folio() to allow the folio to be split. The splitting code will call ->release_folio() if there is still per-fs private data attached to the folio. At that point, we will check if the folio is still dirty and decline to release the iomap_page. It is possible to trigger the warning in perfectly legitimate circumstances (eg if a disk read fails, we do a partial write to the folio, then we truncate the folio), which will cause those writes to be lost. Fixes: 60d8231089f0 ("iomap: Support large folios in invalidatepage") Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 063133ec77f4..08ee293c4117 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -508,11 +508,6 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); iomap_page_release(folio); - } else if (folio_test_large(folio)) { - /* Must release the iop so the page can be split */ - WARN_ON_ONCE(!folio_test_uptodate(folio) && - folio_test_dirty(folio)); - iomap_page_release(folio); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); From patchwork Mon Jun 12 20:39:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A334C88CB7 for ; Mon, 12 Jun 2023 20:40:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233565AbjFLUjy (ORCPT ); Mon, 12 Jun 2023 16:39:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237566AbjFLUjf (ORCPT ); Mon, 12 Jun 2023 16:39:35 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E42C10DF; Mon, 12 Jun 2023 13:39:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=ssF3BpcDp/pd4uh0p5U90yCCwh+nvtfWyviLUBL7Jpw=; b=q2d/qurJytCsSFdrnw5vKAr+Qx tKw10k82Ch6E1wb2ElRPobGiG9jWM2z4NIpCFKaL/8ObKi5M7LMOBxxcfzebmGRjrlHYTXHVLE1qn O/Jx92PTTlh1c4Ol6qSf3hmM1PfNgweFHLcn0or5R31RuAnQ40cJN4QcXerIhBcWjI2li1Y+Nb74b M28zN5mYYTtoIt9np+fOHYT9Gk+Zhu/GN2P4gHwT/AxVL29a9QAdEq8d4pSND1fHBWkQDqpU1boG+ 8xzGjfaiZIDpVenIy4+qFAH7WB4OV+qDACzGoa9+z0PUh1VHQzMPusnXu9itc50YPtIr65c7oICZO fEV6hpyw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032SZ-K0; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 3/8] doc: Correct the description of ->release_folio Date: Mon, 12 Jun 2023 21:39:05 +0100 Message-Id: <20230612203910.724378-4-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The filesystem ->release_folio method is called under more circumstances now than when the documentation was written. The second sentence describing the interpretation of the return value is the wrong polarity (false indicates failure, not success). And the third sentence is also wrong (the kernel calls try_to_free_buffers() instead). So replace the entire paragraph with a detailed description of what the state of the folio may be, the meaning of the gfp parameter, why the method is being called and what the filesystem is expected to do. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- Documentation/filesystems/locking.rst | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index aa1a233b0fa8..b52ad5a79d94 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -374,10 +374,17 @@ invalidate_lock before invalidating page cache in truncate / hole punch path (and thus calling into ->invalidate_folio) to block races between page cache invalidation and page cache filling functions (fault, read, ...). -->release_folio() is called when the kernel is about to try to drop the -buffers from the folio in preparation for freeing it. It returns false to -indicate that the buffers are (or may be) freeable. If ->release_folio is -NULL, the kernel assumes that the fs has no private interest in the buffers. +->release_folio() is called when the MM wants to make a change to the +folio that would invalidate the filesystem's private data. For example, +it may be about to be removed from the address_space or split. The folio +is locked and not under writeback. It may be dirty. The gfp parameter +is not usually used for allocation, but rather to indicate what the +filesystem may do to attempt to free the private data. The filesystem may +return false to indicate that the folio's private data cannot be freed. +If it returns true, it should have already removed the private data from +the folio. If a filesystem does not provide a ->release_folio method, +the pagecache will assume that private data is buffer_heads and call +try_to_free_buffers(). ->free_folio() is called when the kernel has dropped the folio from the page cache. From patchwork Mon Jun 12 20:39:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A00FC7EE2F for ; Mon, 12 Jun 2023 20:40:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232414AbjFLUjx (ORCPT ); Mon, 12 Jun 2023 16:39:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234959AbjFLUjg (ORCPT ); Mon, 12 Jun 2023 16:39:36 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7389AD3; Mon, 12 Jun 2023 13:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=vLEeKteqsDUAnkFvPBrhq1GjkuxkCUs9Dhv1jNbwvTQ=; b=uTDyHPMgDbNMidogvYUuYrumbK F9pYGCNI5bdp6qWYMV1b0WWTMu8+BS8ONfPcjgBf+lE+8gIn89rLN27cBg7qF1cW5GyYfMGt/Vo1o YP0og8HOoJM3/PGvYgYOVlyx7kJwuTHnr2JcEIkeNv8UJqoiUwKRB1hi7Jwo7bBjNn+IZR0+hjFHY otE6xH7l/AyQZLcun1Jwtyg9tPJPLQGCkfJDC8u1oLp9dQPaoICnBmlCRTyHRDKtchmA//FF+DUAK iPlBwI49gfhXUgpRO1jt4Yrt97UbVhHN0dM9UHWt3gTZmR9mqjF+Y3REhFfu8mdunH4xkBMukAWVa Wxw/KsKg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032Sb-My; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 4/8] iomap: Remove unnecessary test from iomap_release_folio() Date: Mon, 12 Jun 2023 21:39:06 +0100 Message-Id: <20230612203910.724378-5-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The check for the folio being under writeback is unnecessary; the caller has checked this and the folio is locked, so the folio cannot be under writeback at this point. The comment is somewhat misleading in that it talks about one specific situation in which we can see a dirty folio. There are others, so change the comment to explain why we can't release the iomap_page. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 08ee293c4117..2054b85c9d9b 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -483,12 +483,10 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags) folio_size(folio)); /* - * mm accommodates an old ext3 case where clean folios might - * not have had the dirty bit cleared. Thus, it can send actual - * dirty folios to ->release_folio() via shrink_active_list(); - * skip those here. + * If the folio is dirty, we refuse to release our metadata because + * it may be partially dirty (FIXME, add a test for that). */ - if (folio_test_dirty(folio) || folio_test_writeback(folio)) + if (folio_test_dirty(folio)) return false; iomap_page_release(folio); return true; From patchwork Mon Jun 12 20:39:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25C77C7EE2F for ; Mon, 12 Jun 2023 20:39:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232633AbjFLUje (ORCPT ); Mon, 12 Jun 2023 16:39:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234353AbjFLUj2 (ORCPT ); Mon, 12 Jun 2023 16:39:28 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86432E56; Mon, 12 Jun 2023 13:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=50/BAmSbb73xu5zPxUGmoFsC12oA5X18xnshUqvgwc8=; b=RBeaEPiSq0IlLT9ER+CPrPJTue vTmaSa+QjYPZhduE4ZeSwP2bCMr9qerIMGXK8bNlYTmkySd5qFzcKAPVAEJvP0O6gC8u1rxMokB0E tye2jHIazN67WE/a4X30xRCKnFRjS0ICs1y4sD1HEeXeC/NnCR8ZntJ+Dj9VAGJ7PA2vwCtk5QsR9 L3IzhZm1TzyJdRt12/YEZ/KHSLfk9zbwMXZJbsxPglgRHeH87JFu+16xkhwIvQM7RUHKhgZvcrVl+ q0jZA0Mmlf26OOu0Xw+t7cIIpVhMav7AJefVa1lcprUhJlhvdXTL5dsSWOMytEjKEg64lVcxAwBqr MXtSMQOw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032Sg-Pj; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 5/8] filemap: Add fgf_t typedef Date: Mon, 12 Jun 2023 21:39:07 +0100 Message-Id: <20230612203910.724378-6-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Similarly to gfp_t, define fgf_t as its own type to prevent various misuses and confusion. Leave the flags as FGP_* for now to reduce the size of this patch; they will be converted to FGF_* later. Move the documentation to the definition of the type insted of burying it in the __filemap_get_folio() documentation. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- fs/btrfs/file.c | 6 +++--- fs/f2fs/compress.c | 2 +- fs/f2fs/f2fs.h | 2 +- fs/iomap/buffered-io.c | 2 +- include/linux/pagemap.h | 48 +++++++++++++++++++++++++++++++---------- mm/filemap.c | 19 ++-------------- mm/folio-compat.c | 2 +- 7 files changed, 46 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f649647392e0..934a92ca4785 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -876,9 +876,9 @@ static int prepare_uptodate_page(struct inode *inode, return 0; } -static unsigned int get_prepare_fgp_flags(bool nowait) +static fgf_t get_prepare_fgp_flags(bool nowait) { - unsigned int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT; + fgf_t fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT; if (nowait) fgp_flags |= FGP_NOWAIT; @@ -910,7 +910,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, int i; unsigned long index = pos >> PAGE_SHIFT; gfp_t mask = get_prepare_gfp_flags(inode, nowait); - unsigned int fgp_flags = get_prepare_fgp_flags(nowait); + fgf_t fgp_flags = get_prepare_fgp_flags(nowait); int err = 0; int faili; diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 11653fa79289..b42feec69175 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1019,7 +1019,7 @@ static int prepare_compress_overwrite(struct compress_ctx *cc, struct address_space *mapping = cc->inode->i_mapping; struct page *page; sector_t last_block_in_bio; - unsigned fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT; + fgf_t fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT; pgoff_t start_idx = start_idx_of_cluster(cc); int i, ret; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d211ee89c158..13b35db3d9c6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2715,7 +2715,7 @@ static inline struct page *f2fs_grab_cache_page(struct address_space *mapping, static inline struct page *f2fs_pagecache_get_page( struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp_mask) + fgf_t fgp_flags, gfp_t gfp_mask) { if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET)) return NULL; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 2054b85c9d9b..9af357d52e56 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -467,7 +467,7 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); */ struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos) { - unsigned fgp = FGP_WRITEBEGIN | FGP_NOFS; + fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS; if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a56308a9d1a4..993242f0c1e1 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -497,22 +497,48 @@ pgoff_t page_cache_next_miss(struct address_space *mapping, pgoff_t page_cache_prev_miss(struct address_space *mapping, pgoff_t index, unsigned long max_scan); -#define FGP_ACCESSED 0x00000001 -#define FGP_LOCK 0x00000002 -#define FGP_CREAT 0x00000004 -#define FGP_WRITE 0x00000008 -#define FGP_NOFS 0x00000010 -#define FGP_NOWAIT 0x00000020 -#define FGP_FOR_MMAP 0x00000040 -#define FGP_STABLE 0x00000080 +/** + * typedef fgf_t - Flags for getting folios from the page cache. + * + * Most users of the page cache will not need to use these flags; + * there are convenience functions such as filemap_get_folio() and + * filemap_lock_folio(). For users which need more control over exactly + * what is done with the folios, these flags to __filemap_get_folio() + * are available. + * + * * %FGP_ACCESSED - The folio will be marked accessed. + * * %FGP_LOCK - The folio is returned locked. + * * %FGP_CREAT - If no folio is present then a new folio is allocated, + * added to the page cache and the VM's LRU list. The folio is + * returned locked. + * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the + * folio is already in cache. If the folio was allocated, unlock it + * before returning so the caller can do the same dance. + * * %FGP_WRITE - The folio will be written to by the caller. + * * %FGP_NOFS - __GFP_FS will get cleared in gfp. + * * %FGP_NOWAIT - Don't block on the folio lock. + * * %FGP_STABLE - Wait for the folio to be stable (finished writeback) + * * %FGP_WRITEBEGIN - The flags to use in a filesystem write_begin() + * implementation. + */ +typedef unsigned int __bitwise fgf_t; + +#define FGP_ACCESSED ((__force fgf_t)0x00000001) +#define FGP_LOCK ((__force fgf_t)0x00000002) +#define FGP_CREAT ((__force fgf_t)0x00000004) +#define FGP_WRITE ((__force fgf_t)0x00000008) +#define FGP_NOFS ((__force fgf_t)0x00000010) +#define FGP_NOWAIT ((__force fgf_t)0x00000020) +#define FGP_FOR_MMAP ((__force fgf_t)0x00000040) +#define FGP_STABLE ((__force fgf_t)0x00000080) #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgf_t fgp_flags, gfp_t gfp); struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgf_t fgp_flags, gfp_t gfp); /** * filemap_get_folio - Find and get a folio. @@ -586,7 +612,7 @@ static inline struct page *find_get_page(struct address_space *mapping, } static inline struct page *find_get_page_flags(struct address_space *mapping, - pgoff_t offset, int fgp_flags) + pgoff_t offset, fgf_t fgp_flags) { return pagecache_get_page(mapping, offset, fgp_flags, 0); } diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..42353b82ebf6 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1887,30 +1887,15 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index) * * Looks up the page cache entry at @mapping & @index. * - * @fgp_flags can be zero or more of these flags: - * - * * %FGP_ACCESSED - The folio will be marked accessed. - * * %FGP_LOCK - The folio is returned locked. - * * %FGP_CREAT - If no page is present then a new page is allocated using - * @gfp and added to the page cache and the VM's LRU list. - * The page is returned locked and with an increased refcount. - * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the - * page is already in cache. If the page was allocated, unlock it before - * returning so the caller can do the same dance. - * * %FGP_WRITE - The page will be written to by the caller. - * * %FGP_NOFS - __GFP_FS will get cleared in gfp. - * * %FGP_NOWAIT - Don't get blocked by page lock. - * * %FGP_STABLE - Wait for the folio to be stable (finished writeback) - * * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even * if the %GFP flags specified for %FGP_CREAT are atomic. * - * If there is a page cache page, it is returned with an increased refcount. + * If this function returns a folio, it is returned with an increased refcount. * * Return: The found folio or an ERR_PTR() otherwise. */ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp) + fgf_t fgp_flags, gfp_t gfp) { struct folio *folio; diff --git a/mm/folio-compat.c b/mm/folio-compat.c index c6f056c20503..10c3247542cb 100644 --- a/mm/folio-compat.c +++ b/mm/folio-compat.c @@ -92,7 +92,7 @@ EXPORT_SYMBOL(add_to_page_cache_lru); noinline struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp) + fgf_t fgp_flags, gfp_t gfp) { struct folio *folio; From patchwork Mon Jun 12 20:39:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0C3BC88CB2 for ; Mon, 12 Jun 2023 20:40:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232215AbjFLUjz (ORCPT ); Mon, 12 Jun 2023 16:39:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233336AbjFLUjj (ORCPT ); Mon, 12 Jun 2023 16:39:39 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74D7D13E; Mon, 12 Jun 2023 13:39:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=ka3ktD5COvV//2E9COtPrxx4+y1hF6EEilwT+w4Zo3I=; b=JrAZW/tWNl7dPIb9ArzH3WYv4V 3UeiyQqp102u4LQEloTbfA92ZiYhrYk7sm2C6GMIB+UqCbzTJR9B9brqkdRiVp3+f2mK/iTTm2k4g BE8JkH2+/b4VVXjG0bbnScWgfDowEX5e5Zebr9X5/luMj6zCh8mhAKmLH8vAOT1/YySpr3d8OEyNl ZLUVLVpBjWq/GVfehFmrspxHcuIfng0sgu1ca2ZaCjyHdDIrgN9j6K1Gx4Em4DMfrirTubIRhiPRg qYxhNGH+ZCWt/iaK3IeACVRFPX1TSXDYiDgcOIumY++31zavSKUdUc4yplkOCbAA9IWYLLcwxSs5w XL4EWFPA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032Si-Sh; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 6/8] filemap: Allow __filemap_get_folio to allocate large folios Date: Mon, 12 Jun 2023 21:39:08 +0100 Message-Id: <20230612203910.724378-7-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Allow callers of __filemap_get_folio() to specify a preferred folio order in the FGP flags. This is only honoured in the FGP_CREATE path; if there is already a folio in the page cache that covers the index, we will return it, no matter what its order is. No create-around is attempted; we will only create folios which start at the specified index. Unmodified callers will continue to allocate order 0 folios. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 23 ++++++++++++++++++++++ mm/filemap.c | 42 ++++++++++++++++++++++++++++------------- mm/readahead.c | 13 ------------- 3 files changed, 52 insertions(+), 26 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 993242f0c1e1..b2ed80f91e5b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -466,6 +466,19 @@ static inline void *detach_page_private(struct page *page) return folio_detach_private(page_folio(page)); } +/* + * There are some parts of the kernel which assume that PMD entries + * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, + * limit the maximum allocation order to PMD size. I'm not aware of any + * assumptions about maximum order if THP are disabled, but 8 seems like + * a good order (that's 1MB if you're using 4kB pages) + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER +#else +#define MAX_PAGECACHE_ORDER 8 +#endif + #ifdef CONFIG_NUMA struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order); #else @@ -531,9 +544,19 @@ typedef unsigned int __bitwise fgf_t; #define FGP_NOWAIT ((__force fgf_t)0x00000020) #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) #define FGP_STABLE ((__force fgf_t)0x00000080) +#define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) +static inline fgf_t fgf_set_order(size_t size) +{ + unsigned int shift = ilog2(size); + + if (shift <= PAGE_SHIFT) + return 0; + return (__force fgf_t)((shift - PAGE_SHIFT) << 26); +} + void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, fgf_t fgp_flags, gfp_t gfp); diff --git a/mm/filemap.c b/mm/filemap.c index 42353b82ebf6..bd66398ae072 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1937,7 +1937,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio_wait_stable(folio); no_page: if (!folio && (fgp_flags & FGP_CREAT)) { + unsigned order = FGF_GET_ORDER(fgp_flags); int err; + if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping)) gfp |= __GFP_WRITE; if (fgp_flags & FGP_NOFS) @@ -1946,26 +1948,40 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp &= ~GFP_KERNEL; gfp |= GFP_NOWAIT | __GFP_NOWARN; } - - folio = filemap_alloc_folio(gfp, 0); - if (!folio) - return ERR_PTR(-ENOMEM); - if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; - /* Init accessed so avoid atomic mark_page_accessed later */ - if (fgp_flags & FGP_ACCESSED) - __folio_set_referenced(folio); + if (!mapping_large_folio_support(mapping)) + order = 0; + if (order > MAX_PAGECACHE_ORDER) + order = MAX_PAGECACHE_ORDER; + /* If we're not aligned, allocate a smaller folio */ + if (index & ((1UL << order) - 1)) + order = __ffs(index); - err = filemap_add_folio(mapping, folio, index, gfp); - if (unlikely(err)) { + do { + err = -ENOMEM; + if (order == 1) + order = 0; + folio = filemap_alloc_folio(gfp, order); + if (!folio) + continue; + + /* Init accessed so avoid atomic mark_page_accessed later */ + if (fgp_flags & FGP_ACCESSED) + __folio_set_referenced(folio); + + err = filemap_add_folio(mapping, folio, index, gfp); + if (!err) + break; folio_put(folio); folio = NULL; - if (err == -EEXIST) - goto repeat; - } + } while (order-- > 0); + if (err == -EEXIST) + goto repeat; + if (err) + return ERR_PTR(err); /* * filemap_add_folio locks the page, and for mmap * we expect an unlocked page. diff --git a/mm/readahead.c b/mm/readahead.c index 47afbca1d122..59a071badb90 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -462,19 +462,6 @@ static int try_context_readahead(struct address_space *mapping, return 1; } -/* - * There are some parts of the kernel which assume that PMD entries - * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, - * limit the maximum allocation order to PMD size. I'm not aware of any - * assumptions about maximum order if THP are disabled, but 8 seems like - * a good order (that's 1MB if you're using 4kB pages) - */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER -#else -#define MAX_PAGECACHE_ORDER 8 -#endif - static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index, pgoff_t mark, unsigned int order, gfp_t gfp) { From patchwork Mon Jun 12 20:39:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4636C88CB2 for ; Mon, 12 Jun 2023 20:39:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237191AbjFLUjf (ORCPT ); Mon, 12 Jun 2023 16:39:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233336AbjFLUj3 (ORCPT ); Mon, 12 Jun 2023 16:39:29 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83AED13E; Mon, 12 Jun 2023 13:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Y1wWS1OQ/HLUe2A+9l4XTUm8jeKJuf95z9T7/th56+o=; b=G6XsM81vR1q+slu7YqpB3UY7W6 QGLd1t9eZdHBlrY90rwprCoM6qvqz1toS1ek3Zdt/HwZhgPklkuDD4e/UOgLJXrAkjrT0/TKADgyY gpND+BwgD2LoP5vIr2D0Q1OAXE6vaczeRYaOeJbh/eL/uq5BYPYGzjAdaEYYrnazXdP8lZmW4tuYY PHftS6GBHnU11G4t4hJpfy31ArpZfsmCn+SkXsI8KW0SGIs84T9HJfmW4J8EwkaJPGCV9aT6ufgmz 1Kz8rvk1az8CkclfzlV6HaYEXslQNk1P+RZyV4oSZLuYIjaexi+cSlrDxH2k2NmZ7V92d1Xj1ZBGN SD9UHfqw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJl-0032Sk-V7; Mon, 12 Jun 2023 20:39:13 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 7/8] iomap: Create large folios in the buffered write path Date: Mon, 12 Jun 2023 21:39:09 +0100 Message-Id: <20230612203910.724378-8-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Use the size of the write as a hint for the size of the folio to create. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- fs/gfs2/bmap.c | 2 +- fs/iomap/buffered-io.c | 6 ++++-- include/linux/iomap.h | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index c739b258a2d9..3702e5e47b0f 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -971,7 +971,7 @@ gfs2_iomap_get_folio(struct iomap_iter *iter, loff_t pos, unsigned len) if (status) return ERR_PTR(status); - folio = iomap_get_folio(iter, pos); + folio = iomap_get_folio(iter, pos, len); if (IS_ERR(folio)) gfs2_trans_end(sdp); return folio; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 9af357d52e56..a5d62c9640cf 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -461,16 +461,18 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); * iomap_get_folio - get a folio reference for writing * @iter: iteration structure * @pos: start offset of write + * @len: length of write * * Returns a locked reference to the folio at @pos, or an error pointer if the * folio could not be obtained. */ -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos) +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len) { fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS; if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; + fgp |= fgf_set_order(len); return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT, fgp, mapping_gfp_mask(iter->inode->i_mapping)); @@ -596,7 +598,7 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos, if (folio_ops && folio_ops->get_folio) return folio_ops->get_folio(iter, pos, len); else - return iomap_get_folio(iter, pos); + return iomap_get_folio(iter, pos, len); } static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..80facb9c9e5b 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -261,7 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode, int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, From patchwork Mon Jun 12 20:39:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 13277319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C693BC7EE43 for ; Mon, 12 Jun 2023 20:39:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229576AbjFLUjv (ORCPT ); Mon, 12 Jun 2023 16:39:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236989AbjFLUje (ORCPT ); Mon, 12 Jun 2023 16:39:34 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83A491739; Mon, 12 Jun 2023 13:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=EkLmyaB6DA8CYGOh1SGUaX5l8dYnTWVJEORmgXgZcN8=; b=R4++hr39ESEVaBhBKwvgXoki8+ XyhvyjhtQPZvlNvPojo9/BakFsoohm4U/VqtxhH+l6rJDjrin+mZtWzSsQokP8EsdChfXLjmhBHRp W/yQviM47ooc72QH5qvQv1YlgRxs6LBSvg+44G4KK1LCpgIVTOa0IdCOARbFlga8qqN10jjk0tk6h pYeWlKT28FEytcUzoOmdkKw0knvIW3y3z23uZ1P1AsxVtMp1GGualEk0jCc6jVzeCt/V6/hINkOO5 1ma2qPNvsq8Erv8MwWt/YWxDjlI8EHmk/B82dkR4p2RLtJPk3nZNarGjUwDnL/5zvT9QZrO5pB06L eXzATyjg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q8oJm-0032Sn-3S; Mon, 12 Jun 2023 20:39:14 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-xfs@vger.kernel.org, Wang Yugui , Dave Chinner , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH v3 8/8] iomap: Copy larger chunks from userspace Date: Mon, 12 Jun 2023 21:39:10 +0100 Message-Id: <20230612203910.724378-9-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If we have a large folio, we can copy in larger chunks than PAGE_SIZE. Start at the maximum page cache size and shrink by half every time we hit the "we are short on memory" problem. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a5d62c9640cf..818dc350ffc5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) { loff_t length = iomap_length(iter); + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; loff_t pos = iter->pos; ssize_t written = 0; long status = 0; @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) do { struct folio *folio; - struct page *page; - unsigned long offset; /* Offset into pagecache page */ - unsigned long bytes; /* Bytes to write to page */ + size_t offset; /* Offset into folio */ + unsigned long bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - offset = offset_in_page(pos); - bytes = min_t(unsigned long, PAGE_SIZE - offset, - iov_iter_count(i)); again: + offset = pos & (chunk - 1); + bytes = min(chunk - offset, iov_iter_count(i)); status = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); if (unlikely(status)) @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) if (iter->iomap.flags & IOMAP_F_STALE) break; - page = folio_file_page(folio, pos >> PAGE_SHIFT); + offset = offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes = folio_size(folio) - offset; + if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); - copied = copy_page_from_iter_atomic(page, offset, bytes, i); + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); status = iomap_write_end(iter, pos, bytes, copied, folio); @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) */ if (copied) bytes = copied; + if (chunk > PAGE_SIZE) + chunk /= 2; goto again; } pos += status;