From patchwork Thu Dec 7 07:26:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482791 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Wh70aY3q" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 239231AD; Wed, 6 Dec 2023 23:27:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=7reiZj9zur86NcKW9Z3NM8dpV8YTIvvmOrGho5B9fuE=; b=Wh70aY3qfNq/eoknmeAT/td7Ks XGIzQJ7jaItCrbqi+AmZKMTRKRIAR4ECsRA5Y8EqEes5jz2rlUgnfKFfGR2siHYX4XizKw5wrBZoM z5WRGkPdgY4YpDl5gwV/37KUjmTDKnzFa0ApseBqX30n1XE2vrytLbwy61xTYS/bJgt/IJ53Z2FEM qkZ2G5JXkh5W/16YdQAHECEtpMkUtevQ8x68/MmU/6QLJ2V0XswDpyNaeCTrOSxRe7YWKyDtCUnFh CknUX3XcqQWxDMyi3AdKxKF7PCSzn2U5LaqK0RO0RKkx8PtdQKig/wmWuH7YY/aRm4ZhgScxnxksc QIUvE3Xw==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8mz-00C4xo-1S; Thu, 07 Dec 2023 07:27:17 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 01/14] iomap: clear the per-folio dirty bits on all writeback failures Date: Thu, 7 Dec 2023 08:26:57 +0100 Message-Id: <20231207072710.176093-2-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html write_cache_pages always clear the page dirty bit before calling into the file systems, and leaves folios with a writeback failure without the dirty bit after return. We also clear the per-block writeback bits for writeback failures unless no I/O has submitted, which will leave the folio in an inconsistent state where it doesn't have the folio dirty, but one or more per-block dirty bits. This seems to be due the place where the iomap_clear_range_dirty call was inserted into the existing not very clearly structured code when adding per-block dirty bit support and not actually intentional. Switch to always clearing the dirty on writeback failure. Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance") Signed-off-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f72df2babe561a..fc5c64712318aa 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1843,16 +1843,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, if (unlikely(error)) { /* * Let the filesystem know what portion of the current page - * failed to map. If the page hasn't been added to ioend, it - * won't be affected by I/O completion and we must unlock it - * now. + * failed to map. */ if (wpc->ops->discard_folio) wpc->ops->discard_folio(folio, pos); - if (!count) { - folio_unlock(folio); - goto done; - } } /* @@ -1861,6 +1855,16 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, * all the dirty bits in the folio here. */ iomap_clear_range_dirty(folio, 0, folio_size(folio)); + + /* + * If the page hasn't been added to the ioend, it won't be affected by + * I/O completion and we must unlock it now. + */ + if (error && !count) { + folio_unlock(folio); + goto done; + } + folio_start_writeback(folio); folio_unlock(folio); From patchwork Thu Dec 7 07:26:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482792 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="URYQmOV/" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39CFED44; Wed, 6 Dec 2023 23:27:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=WP7kO0AnWsmxJgL9BTgKd8gFOHO+tz/SXOo3qvhpGqk=; b=URYQmOV/3fWZouPH4qcfUowKx3 EuEA3IwhDPwsVpneYJ2Z1ne+WPoHrUqAYw6Jw9KZprSh4IZlmFoNeOUlHV6vFwcsxKI8dv2uJGlSb TFYv3eWWBkjrN9Jq7xcCyxayY1mRKVTehIVfiVpUm2Cgc4wKXtjk2SoKbGqben+TmKO85C2YytsMD ak6t3wjc8qspDZZvuFnWYC8ihhYb9Z1mcO1ZdeWhw9eXyjPfzfqKMqOSrAh3mzSPq+pw3QP6NCuG4 0lYTh+X54P9f6YMPcPrmP1JIYxdDDxN8ot8h1RHf/B6i/akO+u/bnNW2QLTwIMbOo5GIYRoxmxO6n 6m1RDTiA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8n2-00C4yI-0I; Thu, 07 Dec 2023 07:27:20 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 02/14] iomap: treat inline data in iomap_writepage_map as an I/O error Date: Thu, 7 Dec 2023 08:26:58 +0100 Message-Id: <20231207072710.176093-3-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html iomap_writepage_map aready warns about inline data, but then just ignores it. Treat it as an error and return -EIO. Signed-off-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index fc5c64712318aa..2426cab70b7102 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1818,8 +1818,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, if (error) break; trace_iomap_writepage_map(inode, &wpc->iomap); - if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) - continue; + if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) { + error = -EIO; + break; + } if (wpc->iomap.type == IOMAP_HOLE) continue; iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc, From patchwork Thu Dec 7 07:26:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482794 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Vh0UFb5U" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D926BD53; Wed, 6 Dec 2023 23:27:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=Zq1p2D6CIWYIAN1TsIMnd1JMbwcaUt5eEkAWi8KwBmU=; b=Vh0UFb5UqYlCnXfTGHim8JxHXr joTNw9ufG9VAmBSi1dyBPWk7nFDOj41pMvIEfV9Kh9PuiW4i4hCpJamF/ezTz3aRL2MlzT6fw7d22 WfkrF5WCE95qZaXcb+Ezkv+jMW1WPYgSeguUlh/t2g2yGyh4zYuZAg0Gx5kUouxY4bYbgyr1/JXw/ wLQkVX4PsQRryLceWm2PdgfkRLMX9UoGC43+e6/yvTORQYrNuEmKcmthDxm3BWG48MgVHRV6E7Sax DvtLvZCdya63/sjbRy0zvIhZE0f9eIwCzHVipndj8psejg6hcWeIOSYFZpHeuIpZUNT13meBljImJ ytVjE6DA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8n5-00C4yV-1S; Thu, 07 Dec 2023 07:27:24 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 03/14] iomap: move the io_folios field out of struct iomap_ioend Date: Thu, 7 Dec 2023 08:26:59 +0100 Message-Id: <20231207072710.176093-4-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The io_folios member in struct iomap_ioend counts the number of folios added to an ioend. It is only used at submission time and can thus be moved to iomap_writepage_ctx instead. Signed-off-by: Christoph Hellwig Reviewed-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 7 ++++--- include/linux/iomap.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 2426cab70b7102..dc5039cdacd928 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1685,10 +1685,11 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc, ioend->io_flags = wpc->iomap.flags; ioend->io_inode = inode; ioend->io_size = 0; - ioend->io_folios = 0; ioend->io_offset = offset; ioend->io_bio = bio; ioend->io_sector = sector; + + wpc->nr_folios = 0; return ioend; } @@ -1732,7 +1733,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset, * also prevents long tight loops ending page writeback on all the * folios in the ioend. */ - if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE) + if (wpc->nr_folios >= IOEND_BATCH_SIZE) return false; return true; } @@ -1829,7 +1830,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, count++; } if (count) - wpc->ioend->io_folios++; + wpc->nr_folios++; WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); WARN_ON_ONCE(!folio_test_locked(folio)); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 96dd0acbba44ac..b2a05dff914d0c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -293,7 +293,6 @@ struct iomap_ioend { struct list_head io_list; /* next ioend in chain */ u16 io_type; u16 io_flags; /* IOMAP_F_* */ - u32 io_folios; /* folios added to ioend */ struct inode *io_inode; /* file being written to */ size_t io_size; /* size of the extent */ loff_t io_offset; /* offset in the file */ @@ -329,6 +328,7 @@ struct iomap_writepage_ctx { struct iomap iomap; struct iomap_ioend *ioend; const struct iomap_writeback_ops *ops; + u32 nr_folios; /* folios added to the ioend */ }; void iomap_finish_ioends(struct iomap_ioend *ioend, int error); From patchwork Thu Dec 7 07:27:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482793 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="IhXKePS7" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0506D5B; Wed, 6 Dec 2023 23:27:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=vbU0cp4U8G5+R2oYHimlhM4sTS1r6QMLrwyCKAbKiO8=; b=IhXKePS7ISnw5zMmzekQxqFqH2 eUqwjxBqC9/76MSF9WDD0JOv4oqyrME1z+CcS8wni4dXbjGllGKkRjPvH3FTUaEBz3ewPrQlPEIv0 aAR+9nY5QnGAN+ZZ8ziveU090vKiDqQZ2NObi6HWkoKUuBbfrBfnt3hB0MloNk5GYOI+0QJxFbbXR 756NxsYBczFYI70BPjoYo4jkx8naiUyJN8/oTn2ztQH2D7REpr8CDOf+HxCeh/W6RsKWoPodIwO83 BE0+Yf7TpBFoKBT3nHliSG96B9fTxs0IMQ/9QZZyM/qPpyzGyjtJNvUcONiucm5v/19rKCuORXNy6 edzcVvGQ==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8n8-00C4yx-0g; Thu, 07 Dec 2023 07:27:26 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 04/14] iomap: move the PF_MEMALLOC check to iomap_writepages Date: Thu, 7 Dec 2023 08:27:00 +0100 Message-Id: <20231207072710.176093-5-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The iomap writepage implementation has been removed in commit 478af190cb6c ("iomap: remove iomap_writepage") and this code is now only called through ->writepages which never happens from memory reclaim. Nove the check from iomap_do_writepage to iomap_writepages so that is only called once per ->writepage invocation. Signed-off-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index dc5039cdacd928..ef99418f5a7a73 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1912,20 +1912,6 @@ static int iomap_do_writepage(struct folio *folio, trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); - /* - * Refuse to write the folio out if we're called from reclaim context. - * - * This avoids stack overflows when called from deeply used stacks in - * random callers for direct reclaim or memcg reclaim. We explicitly - * allow reclaim from kswapd as the stack usage there is relatively low. - * - * This should never happen except in the case of a VM regression so - * warn about it. - */ - if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == - PF_MEMALLOC)) - goto redirty; - /* * Is this folio beyond the end of the file? * @@ -1991,8 +1977,6 @@ static int iomap_do_writepage(struct folio *folio, return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); -redirty: - folio_redirty_for_writepage(wbc, folio); unlock: folio_unlock(folio); return 0; @@ -2005,6 +1989,14 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, { int ret; + /* + * Writeback from reclaim context should never happen except in the case + * of a VM regression so warn about it and refuse to write the data. + */ + if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC | PF_KSWAPD)) == + PF_MEMALLOC)) + return -EIO; + wpc->ops = ops; ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc); if (!wpc->ioend) From patchwork Thu Dec 7 07:27:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482795 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lA4Os0kf" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74B64137; Wed, 6 Dec 2023 23:27:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=4AZ8QVVR/u3P7kc1HPNb0EAo5eAavI4T0bksv6UR5ec=; b=lA4Os0kf0iweQmWskFlPHZL65U S93sxUQidLzPQh3w7/Dqn3Z6XLZN17E8GYRGbd1ZQaRX3D3RXXJRPGxZ6oEugErFPiLML39LjTCTn DkAJkzzjxuJHtOLWFYx8h9xxLGXPS8pcDdRmumX84k+nJ1vKqxATRyCD9aUxUaCZAl8oi6zOp4B/H NCfghijGZFZCrVagRPFGtk7NzRq744C+MvfLdzivNR3P1lvUkOzDcGehdM9goCjcP4lh8QVqYBNDU fduNTIUcSZC/uLJqVLK/CworSUiAj8lSNRVJWS1SrKPh8z2iMDaDoGnKlMhop8DHRJ5UOpQko3kvy jBG4PIVA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nB-00C4zA-2c; Thu, 07 Dec 2023 07:27:30 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 05/14] iomap: factor out a iomap_writepage_handle_eof helper Date: Thu, 7 Dec 2023 08:27:01 +0100 Message-Id: <20231207072710.176093-6-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Most of iomap_do_writepage is dedidcated to handling a folio crossing or beyond i_size. Split this is into a separate helper and update the commens to deal with folios instead of pages and make them more readable. Signed-off-by: Christoph Hellwig Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 128 ++++++++++++++++++++--------------------- 1 file changed, 62 insertions(+), 66 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index ef99418f5a7a73..c011647239f84e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, wbc_account_cgroup_owner(wbc, &folio->page, len); } +/* + * Check interaction of the folio with the file end. + * + * If the folio is entirely beyond i_size, return false. If it straddles + * i_size, adjust end_pos and zero all data beyond i_size. + */ +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, + u64 *end_pos) +{ + u64 isize = i_size_read(inode); + + if (*end_pos > isize) { + size_t poff = offset_in_folio(folio, isize); + pgoff_t end_index = isize >> PAGE_SHIFT; + + /* + * If the folio is entirely ouside of i_size, skip it. + * + * This can happen due to a truncate operation that is in + * progress and in that case truncate will finish it off once + * we've dropped the folio lock. + * + * Note that the pgoff_t used for end_index is an unsigned long. + * If the given offset is greater than 16TB on a 32-bit system, + * then if we checked if the folio is fully outside i_size with + * "if (folio->index >= end_index + 1)", "end_index + 1" would + * overflow and evaluate to 0. Hence this folio would be + * redirtied and written out repeatedly, which would result in + * an infinite loop; the user program performing this operation + * would hang. Instead, we can detect this situation by + * checking if the folio is totally beyond i_size or if its + * offset is just equal to the EOF. + */ + if (folio->index > end_index || + (folio->index == end_index && poff == 0)) + return false; + + /* + * The folio straddles i_size. + * + * It must be zeroed out on each and every writepage invocation + * because it may be mmapped: + * + * A file is mapped in multiples of the page size. For a + * file that is not a multiple of the page size, the + * remaining memory is zeroed when mapped, and writes to that + * region are not written out to the file. + * + * Also adjust the writeback range to skip all blocks entirely + * beyond i_size. + */ + folio_zero_segment(folio, poff, folio_size(folio)); + *end_pos = isize; + } + + return true; +} + /* * We implement an immediate ioend submission policy here to avoid needing to * chain multiple ioends and hence nest mempool allocations which can violate @@ -1908,78 +1966,16 @@ static int iomap_do_writepage(struct folio *folio, { struct iomap_writepage_ctx *wpc = data; struct inode *inode = folio->mapping->host; - u64 end_pos, isize; + u64 end_pos = folio_pos(folio) + folio_size(folio); trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); - /* - * Is this folio beyond the end of the file? - * - * The folio index is less than the end_index, adjust the end_pos - * to the highest offset that this folio should represent. - * ----------------------------------------------------- - * | file mapping | | - * ----------------------------------------------------- - * | Page ... | Page N-2 | Page N-1 | Page N | | - * ^--------------------------------^----------|-------- - * | desired writeback range | see else | - * ---------------------------------^------------------| - */ - isize = i_size_read(inode); - end_pos = folio_pos(folio) + folio_size(folio); - if (end_pos > isize) { - /* - * Check whether the page to write out is beyond or straddles - * i_size or not. - * ------------------------------------------------------- - * | file mapping | | - * ------------------------------------------------------- - * | Page ... | Page N-2 | Page N-1 | Page N | Beyond | - * ^--------------------------------^-----------|--------- - * | | Straddles | - * ---------------------------------^-----------|--------| - */ - size_t poff = offset_in_folio(folio, isize); - pgoff_t end_index = isize >> PAGE_SHIFT; - - /* - * Skip the page if it's fully outside i_size, e.g. - * due to a truncate operation that's in progress. We've - * cleaned this page and truncate will finish things off for - * us. - * - * Note that the end_index is unsigned long. If the given - * offset is greater than 16TB on a 32-bit system then if we - * checked if the page is fully outside i_size with - * "if (page->index >= end_index + 1)", "end_index + 1" would - * overflow and evaluate to 0. Hence this page would be - * redirtied and written out repeatedly, which would result in - * an infinite loop; the user program performing this operation - * would hang. Instead, we can detect this situation by - * checking if the page is totally beyond i_size or if its - * offset is just equal to the EOF. - */ - if (folio->index > end_index || - (folio->index == end_index && poff == 0)) - goto unlock; - - /* - * The page straddles i_size. It must be zeroed out on each - * and every writepage invocation because it may be mmapped. - * "A file is mapped in multiples of the page size. For a file - * that is not a multiple of the page size, the remaining - * memory is zeroed when mapped, and writes to that region are - * not written out to the file." - */ - folio_zero_segment(folio, poff, folio_size(folio)); - end_pos = isize; + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { + folio_unlock(folio); + return 0; } return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); - -unlock: - folio_unlock(folio); - return 0; } int From patchwork Thu Dec 7 07:27:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482797 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="RIUcxFiW" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0CF0D53; Wed, 6 Dec 2023 23:27:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=QONJ5rVMeXRLccdslDLpZ5HqIsxxIVMeDHe0C1ZUtgk=; b=RIUcxFiWEJ0V302AG0+hUHEj0v n9k1soiml3LNoAh5NB1kmRciUTAtP3XtynwvEjVTBgl07VlkkLBLEbYdVMwt4qMIvf2N4JtQ8IlZg LuMytH5CzCDmdpO5zWsVkmvNK73VX8TziCKRe11FZFvTopoy2Ak7ITh/XYlInBaw7qp+yqyMplxAd mMiSnQudZoyfYiqW7GJkFfcYvaD4zRYu8rlywkzr/BQVcuM+Y2z2db+rjlA61OlOvRpQmnCvrPO1A FyxR90xeyu1z5O226OnD/qioo8+KKZg4hImspan9q/G3UZCMQVaCtLElpHs9Cdbc9ydVYFqEqoO3k lXPS/Euw==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nE-00C4ze-26; Thu, 07 Dec 2023 07:27:33 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 06/14] iomap: move all remaining per-folio logic into iomap_writepage_map Date: Thu, 7 Dec 2023 08:27:02 +0100 Message-Id: <20231207072710.176093-7-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Move the tracepoint and the iomap check from iomap_do_writepage into iomap_writepage_map. This keeps all logic in one places, and leaves iomap_do_writepage just as the wrapper for the callback conventions of write_cache_pages, which will go away when that is converted to an iterator. Signed-off-by: Christoph Hellwig Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index c011647239f84e..be23defc09c325 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1842,19 +1842,25 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, * At the end of a writeback pass, there will be a cached ioend remaining on the * writepage context that the caller will need to submit. */ -static int -iomap_writepage_map(struct iomap_writepage_ctx *wpc, - struct writeback_control *wbc, struct inode *inode, - struct folio *folio, u64 end_pos) +static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, + struct writeback_control *wbc, struct folio *folio) { struct iomap_folio_state *ifs = folio->private; + struct inode *inode = folio->mapping->host; struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); u64 pos = folio_pos(folio); + u64 end_pos = pos + folio_size(folio); int error = 0, count = 0, i; LIST_HEAD(submit_list); + trace_iomap_writepage(inode, pos, folio_size(folio)); + + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { + folio_unlock(folio); + return 0; + } WARN_ON_ONCE(end_pos <= pos); if (!ifs && nblocks > 1) { @@ -1954,28 +1960,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, return error; } -/* - * Write out a dirty page. - * - * For delalloc space on the page, we need to allocate space and flush it. - * For unwritten space on the page, we need to start the conversion to - * regular allocated space. - */ static int iomap_do_writepage(struct folio *folio, struct writeback_control *wbc, void *data) { - struct iomap_writepage_ctx *wpc = data; - struct inode *inode = folio->mapping->host; - u64 end_pos = folio_pos(folio) + folio_size(folio); - - trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); - - if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { - folio_unlock(folio); - return 0; - } - - return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); + return iomap_writepage_map(data, wbc, folio); } int From patchwork Thu Dec 7 07:27:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482796 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="DNUgHjXp" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AF80D44; Wed, 6 Dec 2023 23:27:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=GKowZw5EFVL4s9hWrxJpcn0QfCciMG1qQeTNv820mBU=; b=DNUgHjXpZgpJTjBvG55R9ml3pH 6nxEUkqFCCG4LEIw9i9YQU32uLeQwJ2Hy+l0JzvIaM8VDLh0/UU+y+rTl80nAArlH45FIORT0Ss+t l/iyDSeGp/nOKWB4hBOJ4B7beWtW4+WDf0zDHM1boROV/k1hf8koN/rBz6Kf/gEEM3X5wbjNzcoKR 8EMSAshbKSqVmCJvrneuRiKYkW5o8/ZNJL1phVMXXqEPH2ZfWG/b0NxWV3ubso+PsH3FCYozKryar 4eomZhpNNamJs9PJKA7GvBb2+X7ctHMS9cNQceHZTA7q0uN1tfwGP4OZXIETa7adaUS/XLn2fXHRG TrmAJShg==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nH-00C50v-2N; Thu, 07 Dec 2023 07:27:36 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 07/14] iomap: clean up the iomap_alloc_ioend calling convention Date: Thu, 7 Dec 2023 08:27:03 +0100 Message-Id: <20231207072710.176093-8-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Switch to the same argument order as iomap_writepage_map and remove the ifs argument that can be trivially recalculated. Signed-off-by: Christoph Hellwig Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index be23defc09c325..dc409ec85c3c0b 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1742,11 +1742,11 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset, * Test to see if we have an existing ioend structure that we could append to * first; otherwise finish off the current ioend and start another. */ -static void -iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, - struct iomap_folio_state *ifs, struct iomap_writepage_ctx *wpc, - struct writeback_control *wbc, struct list_head *iolist) +static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, + struct writeback_control *wbc, struct folio *folio, + struct inode *inode, loff_t pos, struct list_head *iolist) { + struct iomap_folio_state *ifs = folio->private; sector_t sector = iomap_sector(&wpc->iomap, pos); unsigned len = i_blocksize(inode); size_t poff = offset_in_folio(folio, pos); @@ -1889,8 +1889,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, } if (wpc->iomap.type == IOMAP_HOLE) continue; - iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc, - &submit_list); + iomap_add_to_ioend(wpc, wbc, folio, inode, pos, &submit_list); count++; } if (count) From patchwork Thu Dec 7 07:27:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482798 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fNbbzkIJ" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AF4F1AD; Wed, 6 Dec 2023 23:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=6w3rL/XbGr2VGomxDtHrk4OnPxYYtC/IRjrvVtWWkoI=; b=fNbbzkIJFR0OSjeyiWQ33ZtydJ kJFIH39tDS64REVitiAhGuko3wbdiptHHVuuc5CDFKlkWF2WSihFFa4VBwKV5Y9W9vQtrsHt1Sh42 HmaKXt5q0pZWFgenQh1CoFr4kQ5n2jvL2P6NlROUuda+Sg4o3EB6rvFZzVeH8JZsOETSMTYHx/wKS Ru+gd0fkvzggIUiV5RxmC27V04tRu5Tllh7fOlBhWyEdx0mnjPS4ZUuvy9tvb0zmuo35phXkFXFx0 cDORwkyGqFI9zsvIbkxMXZ9Cct7Veb7HuxPkz7XzSDt0MRHNhMQdYYdvbNOQEXrjpcubyfgGgQj91 lyuTSEmA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nK-00C51I-36; Thu, 07 Dec 2023 07:27:39 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 08/14] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Date: Thu, 7 Dec 2023 08:27:04 +0100 Message-Id: <20231207072710.176093-9-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The calculation in iomap_sector is pretty trivial and most of the time iomap_add_to_ioend only callers either iomap_can_add_to_ioend or iomap_alloc_ioend from a single invocation. Calculate the sector in the two lower level functions and stop passing it from iomap_add_to_ioend and update the iomap_alloc_ioend argument passing order to match that of iomap_add_to_ioend. Signed-off-by: Christoph Hellwig Reviewed-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index dc409ec85c3c0b..78cd5c06ea9b77 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1666,9 +1666,8 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend, return 0; } -static struct iomap_ioend * -iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc, - loff_t offset, sector_t sector, struct writeback_control *wbc) +static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc, + struct writeback_control *wbc, struct inode *inode, loff_t pos) { struct iomap_ioend *ioend; struct bio *bio; @@ -1676,7 +1675,7 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc, bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS, REQ_OP_WRITE | wbc_to_write_flags(wbc), GFP_NOFS, &iomap_ioend_bioset); - bio->bi_iter.bi_sector = sector; + bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos); wbc_init_bio(wbc, bio); ioend = container_of(bio, struct iomap_ioend, io_inline_bio); @@ -1685,9 +1684,9 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc, ioend->io_flags = wpc->iomap.flags; ioend->io_inode = inode; ioend->io_size = 0; - ioend->io_offset = offset; + ioend->io_offset = pos; ioend->io_bio = bio; - ioend->io_sector = sector; + ioend->io_sector = bio->bi_iter.bi_sector; wpc->nr_folios = 0; return ioend; @@ -1715,18 +1714,17 @@ iomap_chain_bio(struct bio *prev) return new; } -static bool -iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset, - sector_t sector) +static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos) { if ((wpc->iomap.flags & IOMAP_F_SHARED) != (wpc->ioend->io_flags & IOMAP_F_SHARED)) return false; if (wpc->iomap.type != wpc->ioend->io_type) return false; - if (offset != wpc->ioend->io_offset + wpc->ioend->io_size) + if (pos != wpc->ioend->io_offset + wpc->ioend->io_size) return false; - if (sector != bio_end_sector(wpc->ioend->io_bio)) + if (iomap_sector(&wpc->iomap, pos) != + bio_end_sector(wpc->ioend->io_bio)) return false; /* * Limit ioend bio chain lengths to minimise IO completion latency. This @@ -1747,14 +1745,13 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t pos, struct list_head *iolist) { struct iomap_folio_state *ifs = folio->private; - sector_t sector = iomap_sector(&wpc->iomap, pos); unsigned len = i_blocksize(inode); size_t poff = offset_in_folio(folio, pos); - if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, sector)) { + if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) { if (wpc->ioend) list_add(&wpc->ioend->io_list, iolist); - wpc->ioend = iomap_alloc_ioend(inode, wpc, pos, sector, wbc); + wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos); } if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) { From patchwork Thu Dec 7 07:27:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482799 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="qeFFSPoK" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBB1E137; Wed, 6 Dec 2023 23:27:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=eP2j69RQkSSBlrpyFkvKUGmppCPnNSlAZvWEq85w0qQ=; b=qeFFSPoKvQ27gdmlEaaIiFWSkn w5Jos0UP7Zc8SSjkh4OdiocLaOk8kv2hzdp7lsM7Y2XgPLyTa4sF/NtA4VJu0Y6xh8FPLdl/rIZxX Tqyqrq6qxP4tkdtxn4YILtqUeQojcEwN2VAhm/Sm6fRMTNxsU2rtHHIKurqDkBdsOWAVcQeIXgaFR bqSvtKs7xkoLYAzBkyxK+2oYM0QpwPlC1UpJjpfCY/amDchRnhmWJf4haXHDlgjRo8B5mUtees6GB EAjWYHWyHFgIPMapBKPZSW3gKP6Dwgz/CwVnRBlQUDCfgzAjyZg/rBYWbtMOuJ14aivRKMKe9SHYh d4IhoQIA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nO-00C528-0c; Thu, 07 Dec 2023 07:27:42 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 09/14] iomap: don't chain bios Date: Thu, 7 Dec 2023 08:27:05 +0100 Message-Id: <20231207072710.176093-10-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Back in the days when a single bio could only be filled to the hardware limits, and we scheduled a work item for each bio completion, chaining multiple bios for a single ioend made a lot of sense to reduce the number of completions. But these days bios can be filled until we reach the number of vectors or total size limit, which means we can always fit at least 1 megabyte worth of data in the worst case, but usually a lot more due to large folios. The only thing bio chaining is buying us now is to reduce the size of the allocation from an ioend with an embedded bio into a plain bio, which is a 52 bytes differences on 64-bit systems. This is not worth the added complexity, so remove the bio chaining and only use the bio embedded into the ioend. This will help to simplify further changes to the iomap writeback code. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 90 +++++++++++------------------------------- fs/xfs/xfs_aops.c | 6 +-- include/linux/iomap.h | 8 +++- 3 files changed, 32 insertions(+), 72 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 78cd5c06ea9b77..7ed11eca7c7c9e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1489,40 +1489,23 @@ static u32 iomap_finish_ioend(struct iomap_ioend *ioend, int error) { struct inode *inode = ioend->io_inode; - struct bio *bio = &ioend->io_inline_bio; - struct bio *last = ioend->io_bio, *next; - u64 start = bio->bi_iter.bi_sector; - loff_t offset = ioend->io_offset; - bool quiet = bio_flagged(bio, BIO_QUIET); + struct bio *bio = &ioend->io_bio; + struct folio_iter fi; u32 folio_count = 0; - for (bio = &ioend->io_inline_bio; bio; bio = next) { - struct folio_iter fi; - - /* - * For the last bio, bi_private points to the ioend, so we - * need to explicitly end the iteration here. - */ - if (bio == last) - next = NULL; - else - next = bio->bi_private; - - /* walk all folios in bio, ending page IO on them */ - bio_for_each_folio_all(fi, bio) { - iomap_finish_folio_write(inode, fi.folio, fi.length, - error); - folio_count++; - } - bio_put(bio); + /* walk all folios in bio, ending page IO on them */ + bio_for_each_folio_all(fi, bio) { + iomap_finish_folio_write(inode, fi.folio, fi.length, error); + folio_count++; } - /* The ioend has been freed by bio_put() */ - if (unlikely(error && !quiet)) { + if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) { printk_ratelimited(KERN_ERR "%s: writeback error on inode %lu, offset %lld, sector %llu", - inode->i_sb->s_id, inode->i_ino, offset, start); + inode->i_sb->s_id, inode->i_ino, + ioend->io_offset, ioend->io_sector); } + bio_put(bio); /* frees the ioend */ return folio_count; } @@ -1563,7 +1546,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_ioends); static bool iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next) { - if (ioend->io_bio->bi_status != next->io_bio->bi_status) + if (ioend->io_bio.bi_status != next->io_bio.bi_status) return false; if ((ioend->io_flags & IOMAP_F_SHARED) ^ (next->io_flags & IOMAP_F_SHARED)) @@ -1628,9 +1611,8 @@ EXPORT_SYMBOL_GPL(iomap_sort_ioends); static void iomap_writepage_end_bio(struct bio *bio) { - struct iomap_ioend *ioend = bio->bi_private; - - iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status)); + iomap_finish_ioend(iomap_ioend_from_bio(bio), + blk_status_to_errno(bio->bi_status)); } /* @@ -1645,9 +1627,6 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend, int error) { - ioend->io_bio->bi_private = ioend; - ioend->io_bio->bi_end_io = iomap_writepage_end_bio; - if (wpc->ops->prepare_ioend) error = wpc->ops->prepare_ioend(ioend, error); if (error) { @@ -1657,12 +1636,12 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend, * as there is only one reference to the ioend at this point in * time. */ - ioend->io_bio->bi_status = errno_to_blk_status(error); - bio_endio(ioend->io_bio); + ioend->io_bio.bi_status = errno_to_blk_status(error); + bio_endio(&ioend->io_bio); return error; } - submit_bio(ioend->io_bio); + submit_bio(&ioend->io_bio); return 0; } @@ -1676,44 +1655,22 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc, REQ_OP_WRITE | wbc_to_write_flags(wbc), GFP_NOFS, &iomap_ioend_bioset); bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos); + bio->bi_end_io = iomap_writepage_end_bio; wbc_init_bio(wbc, bio); - ioend = container_of(bio, struct iomap_ioend, io_inline_bio); + ioend = iomap_ioend_from_bio(bio); INIT_LIST_HEAD(&ioend->io_list); ioend->io_type = wpc->iomap.type; ioend->io_flags = wpc->iomap.flags; ioend->io_inode = inode; ioend->io_size = 0; ioend->io_offset = pos; - ioend->io_bio = bio; ioend->io_sector = bio->bi_iter.bi_sector; wpc->nr_folios = 0; return ioend; } -/* - * Allocate a new bio, and chain the old bio to the new one. - * - * Note that we have to perform the chaining in this unintuitive order - * so that the bi_private linkage is set up in the right direction for the - * traversal in iomap_finish_ioend(). - */ -static struct bio * -iomap_chain_bio(struct bio *prev) -{ - struct bio *new; - - new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS); - bio_clone_blkg_association(new, prev); - new->bi_iter.bi_sector = bio_end_sector(prev); - - bio_chain(prev, new); - bio_get(prev); /* for iomap_finish_ioend */ - submit_bio(prev); - return new; -} - static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos) { if ((wpc->iomap.flags & IOMAP_F_SHARED) != @@ -1724,7 +1681,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos) if (pos != wpc->ioend->io_offset + wpc->ioend->io_size) return false; if (iomap_sector(&wpc->iomap, pos) != - bio_end_sector(wpc->ioend->io_bio)) + bio_end_sector(&wpc->ioend->io_bio)) return false; /* * Limit ioend bio chain lengths to minimise IO completion latency. This @@ -1749,15 +1706,14 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, size_t poff = offset_in_folio(folio, pos); if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) { +new_ioend: if (wpc->ioend) list_add(&wpc->ioend->io_list, iolist); wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos); } - if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) { - wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio); - bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff); - } + if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff)) + goto new_ioend; if (ifs) atomic_add(len, &ifs->write_bytes_pending); @@ -1988,7 +1944,7 @@ EXPORT_SYMBOL_GPL(iomap_writepages); static int __init iomap_init(void) { return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), - offsetof(struct iomap_ioend, io_inline_bio), + offsetof(struct iomap_ioend, io_bio), BIOSET_NEED_BVECS); } fs_initcall(iomap_init); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 465d7630bb2185..b45ee6cbbdaab2 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -112,7 +112,7 @@ xfs_end_ioend( * longer dirty. If we don't remove delalloc blocks here, they become * stale and can corrupt free space accounting on unmount. */ - error = blk_status_to_errno(ioend->io_bio->bi_status); + error = blk_status_to_errno(ioend->io_bio.bi_status); if (unlikely(error)) { if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); @@ -179,7 +179,7 @@ STATIC void xfs_end_bio( struct bio *bio) { - struct iomap_ioend *ioend = bio->bi_private; + struct iomap_ioend *ioend = iomap_ioend_from_bio(bio); struct xfs_inode *ip = XFS_I(ioend->io_inode); unsigned long flags; @@ -444,7 +444,7 @@ xfs_prepare_ioend( /* send ioends that might require a transaction to the completion wq */ if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN || (ioend->io_flags & IOMAP_F_SHARED)) - ioend->io_bio->bi_end_io = xfs_end_bio; + ioend->io_bio.bi_end_io = xfs_end_bio; return status; } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index b2a05dff914d0c..b8d3b658ad2b03 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -297,10 +297,14 @@ struct iomap_ioend { size_t io_size; /* size of the extent */ loff_t io_offset; /* offset in the file */ sector_t io_sector; /* start sector of ioend */ - struct bio *io_bio; /* bio being built */ - struct bio io_inline_bio; /* MUST BE LAST! */ + struct bio io_bio; /* MUST BE LAST! */ }; +static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio) +{ + return container_of(bio, struct iomap_ioend, io_bio); +} + struct iomap_writeback_ops { /* * Required, maps the blocks so that writeback can be performed on From patchwork Thu Dec 7 07:27:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482800 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Q8I2PfwF" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97D241AD; Wed, 6 Dec 2023 23:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=XZ1A7mD8eN8GtDAlmqm1J/SEKdB95CFncu1BUvWex9k=; b=Q8I2PfwFoXdUCEtw5I4k640hRN jWOp1aSuWJfWVZRY+MuPvJU40g9Hc3x1D8lPqwdJU13mwEO1i8/5V7iT92XGwyquXN/xbxZx2tVxC 1DHH7W54b2HRZeuYIuS9aib9llJJyObdVlRhRMoa4ndE0D4Pz88XAEBT+YAhJZyrx5qLPNRbXS6Mw WmzF6OGzajCoaDa6GKGXZT9q6rK+bIwhsWGXyJTfMaknTnfMZDO7KCMCDXFFjssdn4P9BSVypVWqV hqCDTCGZNALBQxYmK3Iemz03m4w8BeSNW6GOHhy8x717dy8UBv7ayHwtT7qb1bw+g9Dqqf0ml7oZu tZGKbqXg==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nR-00C53U-02; Thu, 07 Dec 2023 07:27:45 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 10/14] iomap: only call mapping_set_error once for each failed bio Date: Thu, 7 Dec 2023 08:27:06 +0100 Message-Id: <20231207072710.176093-11-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Instead of clling mapping_set_error once per folio, only do that once per bio, and consolidate all the writeback error handling code in iomap_finish_ioend. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7ed11eca7c7c9e..21f5019c0fe762 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1464,15 +1464,10 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops) EXPORT_SYMBOL_GPL(iomap_page_mkwrite); static void iomap_finish_folio_write(struct inode *inode, struct folio *folio, - size_t len, int error) + size_t len) { struct iomap_folio_state *ifs = folio->private; - if (error) { - folio_set_error(folio); - mapping_set_error(inode->i_mapping, error); - } - WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs); WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0); @@ -1493,18 +1488,24 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) struct folio_iter fi; u32 folio_count = 0; + if (error) { + mapping_set_error(inode->i_mapping, error); + if (!bio_flagged(bio, BIO_QUIET)) { + pr_err_ratelimited( +"%s: writeback error on inode %lu, offset %lld, sector %llu", + inode->i_sb->s_id, inode->i_ino, + ioend->io_offset, ioend->io_sector); + } + } + /* walk all folios in bio, ending page IO on them */ bio_for_each_folio_all(fi, bio) { - iomap_finish_folio_write(inode, fi.folio, fi.length, error); + if (error) + folio_set_error(fi.folio); + iomap_finish_folio_write(inode, fi.folio, fi.length); folio_count++; } - if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) { - printk_ratelimited(KERN_ERR -"%s: writeback error on inode %lu, offset %lld, sector %llu", - inode->i_sb->s_id, inode->i_ino, - ioend->io_offset, ioend->io_sector); - } bio_put(bio); /* frees the ioend */ return folio_count; } From patchwork Thu Dec 7 07:27:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482801 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="BWCJdGIy" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F017C137; Wed, 6 Dec 2023 23:27:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=h89Ver0p59pKQNx+TSpFffWpkZj+ggB4gAjtPUjFpJM=; b=BWCJdGIyjf27HdNtJDJSNq9e9/ +2tosYtJyfUolT5YsOn1Wd0NHvwcjtHGX73cGQxeDk7kN2wny0Xm/NnYKX1Q40INVp9+mk9YjenYU Pi63/ra3Yoac3kWbGx0/EzbywbDYX7MRhN4ch3mHPNfSjIZvsR8/OhyyJpRDYTS1fC9LRfyA9s4+V rNcu39p5BlRg5wdaWL8U6b993AYHJQ79m3zepBxLF3/UT8zSl5HMcODfw2aM9rswgnHiAKe/RoG09 s0yaO9aRmR81hy1HxUBV76MmQZGNS5AJ/0CTejaQqRoTYqCNdK07bt3GRsFQP4MOa1tM/5A+D+V9Y 1SFgUMfg==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nT-00C54n-2o; Thu, 07 Dec 2023 07:27:48 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 11/14] iomap: factor out a iomap_writepage_map_block helper Date: Thu, 7 Dec 2023 08:27:07 +0100 Message-Id: <20231207072710.176093-12-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Split the loop body that calls into the file system to map a block and add it to the ioend into a separate helper to prefer for refactoring of the surrounding code. Note that this was the only place in iomap_writepage_map that could return an error, so include the call to ->discard_folio into the new helper as that will help to avoid code duplication in the future. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 70 ++++++++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 27 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 21f5019c0fe762..622764ed649d57 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1722,6 +1722,45 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, wbc_account_cgroup_owner(wbc, &folio->page, len); } +static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc, + struct writeback_control *wbc, struct folio *folio, + struct inode *inode, u64 pos, unsigned *count, + struct list_head *submit_list) +{ + int error; + + error = wpc->ops->map_blocks(wpc, inode, pos); + if (error) + goto fail; + trace_iomap_writepage_map(inode, &wpc->iomap); + + switch (wpc->iomap.type) { + case IOMAP_INLINE: + WARN_ON_ONCE(1); + error = -EIO; + break; + case IOMAP_HOLE: + break; + default: + iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list); + (*count)++; + } + +fail: + /* + * We cannot cancel the ioend directly here on error. We may have + * already set other pages under writeback and hence we have to run I/O + * completion to mark the error state of the pages under writeback + * appropriately. + * + * Just let the file system know what portion of the folio failed to + * map. + */ + if (error && wpc->ops->discard_folio) + wpc->ops->discard_folio(folio, pos); + return error; +} + /* * Check interaction of the folio with the file end. * @@ -1806,7 +1845,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, unsigned nblocks = i_blocks_per_folio(inode, folio); u64 pos = folio_pos(folio); u64 end_pos = pos + folio_size(folio); - int error = 0, count = 0, i; + unsigned count = 0; + int error = 0, i; LIST_HEAD(submit_list); trace_iomap_writepage(inode, pos, folio_size(folio)); @@ -1832,19 +1872,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { if (ifs && !ifs_block_is_dirty(folio, ifs, i)) continue; - - error = wpc->ops->map_blocks(wpc, inode, pos); + error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos, + &count, &submit_list); if (error) break; - trace_iomap_writepage_map(inode, &wpc->iomap); - if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) { - error = -EIO; - break; - } - if (wpc->iomap.type == IOMAP_HOLE) - continue; - iomap_add_to_ioend(wpc, wbc, folio, inode, pos, &submit_list); - count++; } if (count) wpc->nr_folios++; @@ -1854,21 +1885,6 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, WARN_ON_ONCE(folio_test_writeback(folio)); WARN_ON_ONCE(folio_test_dirty(folio)); - /* - * We cannot cancel the ioend directly here on error. We may have - * already set other pages under writeback and hence we have to run I/O - * completion to mark the error state of the pages under writeback - * appropriately. - */ - if (unlikely(error)) { - /* - * Let the filesystem know what portion of the current page - * failed to map. - */ - if (wpc->ops->discard_folio) - wpc->ops->discard_folio(folio, pos); - } - /* * We can have dirty bits set past end of file in page_mkwrite path * while mapping the last partial folio. Hence it's better to clear From patchwork Thu Dec 7 07:27:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482802 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VSyEfw8b" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4825AD5E; Wed, 6 Dec 2023 23:27:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=8g4YTFdUiHNqzI7lwlY3NMfQbHY6rYGZCDFrK5kK1uM=; b=VSyEfw8brqy/BfvPooP2q95zsD BqoU0e5jih/XDAEdpZm5XGlOPyy1sm6Wz5nhSR4NnS4qNb6KHk7vnFQiwhTOOikrp/wUi53M/et39 w6XhAGR1kk4f9vy7IKVNqPytEg60xbYSAb1qZfLVFi0nr/TGPdX7znF3DpVzBEZJoLd7ygGg8QTLV BR8463cNhdg1sE4w7KXiiwye9yj9vH9gr+CR//PKtfMbj1o8yNlgDQ03KO7l7NRFx1an64ZiPLsGx QOyNMAvT0xiasAmjITDTYcxBoBVjlRq0RVJ4kKRgledrkTuHwoUqaCOZ7jMW1lY6Ax9kDH1lJGlcg yhfePY6w==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nX-00C55Z-0E; Thu, 07 Dec 2023 07:27:51 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 12/14] iomap: submit ioends immediately Date: Thu, 7 Dec 2023 08:27:08 +0100 Message-Id: <20231207072710.176093-13-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Currently the writeback code delays submitting fill ioends until we reach the end of the folio. The reason for that is that otherwise the end I/O handler could clear the writeback bit before we've even finished submitting all I/O for the folio. Add a bias to ifs->write_bytes_pending while we are submitting I/O for a folio so that it never reaches zero until all I/O is completed to prevent the early writeback bit clearing, and remove the now superfluous submit_list. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 162 +++++++++++++++++++---------------------- 1 file changed, 76 insertions(+), 86 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 622764ed649d57..4339bc422b245d 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1620,30 +1620,34 @@ static void iomap_writepage_end_bio(struct bio *bio) * Submit the final bio for an ioend. * * If @error is non-zero, it means that we have a situation where some part of - * the submission process has failed after we've marked pages for writeback - * and unlocked them. In this situation, we need to fail the bio instead of - * submitting it. This typically only happens on a filesystem shutdown. + * the submission process has failed after we've marked pages for writeback. + * We cannot cancel ioend directly in that case, so call the bio end I/O handler + * with the error status here to run the normal I/O completion handler to clear + * the writeback bit and let the file system proess the errors. */ -static int -iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend, - int error) +static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error) { + if (!wpc->ioend) + return error; + + /* + * Let the file systems prepare the I/O submission and hook in an I/O + * comletion handler. This also needs to happen in case after a + * failure happened so that the file system end I/O handler gets called + * to clean up. + */ if (wpc->ops->prepare_ioend) - error = wpc->ops->prepare_ioend(ioend, error); + error = wpc->ops->prepare_ioend(wpc->ioend, error); + if (error) { - /* - * If we're failing the IO now, just mark the ioend with an - * error and finish it. This will run IO completion immediately - * as there is only one reference to the ioend at this point in - * time. - */ - ioend->io_bio.bi_status = errno_to_blk_status(error); - bio_endio(&ioend->io_bio); - return error; + wpc->ioend->io_bio.bi_status = errno_to_blk_status(error); + bio_endio(&wpc->ioend->io_bio); + } else { + submit_bio(&wpc->ioend->io_bio); } - submit_bio(&ioend->io_bio); - return 0; + wpc->ioend = NULL; + return error; } static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc, @@ -1697,19 +1701,28 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos) /* * Test to see if we have an existing ioend structure that we could append to * first; otherwise finish off the current ioend and start another. + * + * If a new ioend is created and cached, the old ioend is submitted to the block + * layer instantly. Batching optimisations are provided by higher level block + * plugging. + * + * At the end of a writeback pass, there will be a cached ioend remaining on the + * writepage context that the caller will need to submit. */ -static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, +static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct folio *folio, - struct inode *inode, loff_t pos, struct list_head *iolist) + struct inode *inode, loff_t pos) { struct iomap_folio_state *ifs = folio->private; unsigned len = i_blocksize(inode); size_t poff = offset_in_folio(folio, pos); + int error; if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) { new_ioend: - if (wpc->ioend) - list_add(&wpc->ioend->io_list, iolist); + error = iomap_submit_ioend(wpc, 0); + if (error) + return error; wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos); } @@ -1720,12 +1733,12 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, atomic_add(len, &ifs->write_bytes_pending); wpc->ioend->io_size += len; wbc_account_cgroup_owner(wbc, &folio->page, len); + return 0; } static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct folio *folio, - struct inode *inode, u64 pos, unsigned *count, - struct list_head *submit_list) + struct inode *inode, u64 pos, unsigned *count) { int error; @@ -1742,8 +1755,9 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc, case IOMAP_HOLE: break; default: - iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list); - (*count)++; + error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos); + if (!error) + (*count)++; } fail: @@ -1819,35 +1833,21 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, return true; } -/* - * We implement an immediate ioend submission policy here to avoid needing to - * chain multiple ioends and hence nest mempool allocations which can violate - * the forward progress guarantees we need to provide. The current ioend we're - * adding blocks to is cached in the writepage context, and if the new block - * doesn't append to the cached ioend, it will create a new ioend and cache that - * instead. - * - * If a new ioend is created and cached, the old ioend is returned and queued - * locally for submission once the entire page is processed or an error has been - * detected. While ioends are submitted immediately after they are completed, - * batching optimisations are provided by higher level block plugging. - * - * At the end of a writeback pass, there will be a cached ioend remaining on the - * writepage context that the caller will need to submit. - */ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct folio *folio) { struct iomap_folio_state *ifs = folio->private; struct inode *inode = folio->mapping->host; - struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); u64 pos = folio_pos(folio); u64 end_pos = pos + folio_size(folio); unsigned count = 0; int error = 0, i; - LIST_HEAD(submit_list); + + WARN_ON_ONCE(!folio_test_locked(folio)); + WARN_ON_ONCE(folio_test_dirty(folio)); + WARN_ON_ONCE(folio_test_writeback(folio)); trace_iomap_writepage(inode, pos, folio_size(folio)); @@ -1857,12 +1857,27 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, } WARN_ON_ONCE(end_pos <= pos); - if (!ifs && nblocks > 1) { - ifs = ifs_alloc(inode, folio, 0); - iomap_set_range_dirty(folio, 0, end_pos - pos); + if (nblocks > 1) { + if (!ifs) { + ifs = ifs_alloc(inode, folio, 0); + iomap_set_range_dirty(folio, 0, end_pos - pos); + } + + /* + * Keep the I/O completion handler from clearing the writeback + * bit until we have submitted all blocks by adding a bias to + * ifs->write_bytes_pending, which is dropped after submitting + * all blocks. + */ + WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0); + atomic_inc(&ifs->write_bytes_pending); } - WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0); + /* + * Set the writeback bit ASAP, as the I/O completion for the single + * block per folio case happen hit as soon as we're submitting the bio. + */ + folio_start_writeback(folio); /* * Walk through the folio to find areas to write back. If we @@ -1873,18 +1888,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, if (ifs && !ifs_block_is_dirty(folio, ifs, i)) continue; error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos, - &count, &submit_list); + &count); if (error) break; } if (count) wpc->nr_folios++; - WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); - WARN_ON_ONCE(!folio_test_locked(folio)); - WARN_ON_ONCE(folio_test_writeback(folio)); - WARN_ON_ONCE(folio_test_dirty(folio)); - /* * We can have dirty bits set past end of file in page_mkwrite path * while mapping the last partial folio. Hence it's better to clear @@ -1893,38 +1903,20 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, iomap_clear_range_dirty(folio, 0, folio_size(folio)); /* - * If the page hasn't been added to the ioend, it won't be affected by - * I/O completion and we must unlock it now. + * Usually the writeback bit is cleared by the I/O completion handler. + * But we may end up either not actually writing any blocks, or (when + * there are multiple blocks in a folio) all I/O might have finished + * already at this point. In that case we need to clear the writeback + * bit ourselves right after unlocking the page. */ - if (error && !count) { - folio_unlock(folio); - goto done; - } - - folio_start_writeback(folio); folio_unlock(folio); - - /* - * Preserve the original error if there was one; catch - * submission errors here and propagate into subsequent ioend - * submissions. - */ - list_for_each_entry_safe(ioend, next, &submit_list, io_list) { - int error2; - - list_del_init(&ioend->io_list); - error2 = iomap_submit_ioend(wpc, ioend, error); - if (error2 && !error) - error = error2; + if (ifs) { + if (atomic_dec_and_test(&ifs->write_bytes_pending)) + folio_end_writeback(folio); + } else { + if (!count) + folio_end_writeback(folio); } - - /* - * We can end up here with no error and nothing to write only if we race - * with a partial page truncate on a sub-page block sized filesystem. - */ - if (!count) - folio_end_writeback(folio); -done: mapping_set_error(inode->i_mapping, error); return error; } @@ -1952,9 +1944,7 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, wpc->ops = ops; ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc); - if (!wpc->ioend) - return ret; - return iomap_submit_ioend(wpc, wpc->ioend, ret); + return iomap_submit_ioend(wpc, ret); } EXPORT_SYMBOL_GPL(iomap_writepages); From patchwork Thu Dec 7 07:27:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482803 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="hPv3ZtIv" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82E6AD44; Wed, 6 Dec 2023 23:27:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=iuObPO5/R9rInT1fN5flI+NvAoR7bLN0FidOhnwrFu4=; b=hPv3ZtIv5F9QERLoVHsbHaQ5on oaPjTKb3cXm5+2yvC/xeh4VUHe0dsxKtw0CSlOdBNM2MyTYz6nsgtJkbgKGCS04D5KhC1+BbPR0Mv shwx0OY+r6ZV6PbZx3C/WQSdxIrMyx3UQuceKo4AEm8ffCpMJuJoGPY07SCt7u3cDX0+3rVXd9Div JgQigArG/qQoMhJVPIeH+BZFs8cM6GMKN0h0tigy33trUNq0QEJTT4QZiPbxTOiNHbuPV175vP1Bz QOqms31OUokuLHQ/+KLEf+OSpPj5O9eTOqk4ISgIIfCJORGjBR943t5T0wPrHfRdspXxJaKZLM2Iq GFzsk+HA==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nZ-00C57a-2w; Thu, 07 Dec 2023 07:27:54 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 13/14] iomap: map multiple blocks at a time Date: Thu, 7 Dec 2023 08:27:09 +0100 Message-Id: <20231207072710.176093-14-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The ->map_blocks interface returns a valid range for writeback, but we still call back into it for every block, which is a bit inefficient. Change iomap_writepage_map to use the valid range in the map until the end of the folio or the dirty range inside the folio instead of calling back into every block. Note that the range is not used over folio boundaries as we need to be able to check the mapping sequence count under the folio lock. Signed-off-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 116 ++++++++++++++++++++++++++++------------- include/linux/iomap.h | 7 +++ 2 files changed, 88 insertions(+), 35 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 4339bc422b245d..d8f56968962b97 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2010 Red Hat, Inc. - * Copyright (C) 2016-2019 Christoph Hellwig. + * Copyright (C) 2016-2023 Christoph Hellwig. */ #include #include @@ -95,6 +95,44 @@ static inline bool ifs_block_is_dirty(struct folio *folio, return test_bit(block + blks_per_folio, ifs->state); } +static unsigned ifs_find_dirty_range(struct folio *folio, + struct iomap_folio_state *ifs, u64 *range_start, u64 range_end) +{ + struct inode *inode = folio->mapping->host; + unsigned start_blk = + offset_in_folio(folio, *range_start) >> inode->i_blkbits; + unsigned end_blk = min_not_zero( + offset_in_folio(folio, range_end) >> inode->i_blkbits, + i_blocks_per_folio(inode, folio)); + unsigned nblks = 1; + + while (!ifs_block_is_dirty(folio, ifs, start_blk)) + if (++start_blk == end_blk) + return 0; + + while (start_blk + nblks < end_blk) { + if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks)) + break; + nblks++; + } + + *range_start = folio_pos(folio) + (start_blk << inode->i_blkbits); + return nblks << inode->i_blkbits; +} + +static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start, + u64 range_end) +{ + struct iomap_folio_state *ifs = folio->private; + + if (*range_start >= range_end) + return 0; + + if (ifs) + return ifs_find_dirty_range(folio, ifs, range_start, range_end); + return range_end - *range_start; +} + static void ifs_clear_range_dirty(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) { @@ -1711,10 +1749,9 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos) */ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct folio *folio, - struct inode *inode, loff_t pos) + struct inode *inode, loff_t pos, unsigned len) { struct iomap_folio_state *ifs = folio->private; - unsigned len = i_blocksize(inode); size_t poff = offset_in_folio(folio, pos); int error; @@ -1738,29 +1775,41 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct folio *folio, - struct inode *inode, u64 pos, unsigned *count) + struct inode *inode, u64 pos, unsigned dirty_len, + unsigned *count) { int error; - error = wpc->ops->map_blocks(wpc, inode, pos); - if (error) - goto fail; - trace_iomap_writepage_map(inode, &wpc->iomap); - - switch (wpc->iomap.type) { - case IOMAP_INLINE: - WARN_ON_ONCE(1); - error = -EIO; - break; - case IOMAP_HOLE: - break; - default: - error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos); - if (!error) - (*count)++; - } + do { + unsigned map_len; + + error = wpc->ops->map_blocks(wpc, inode, pos); + if (error) + break; + trace_iomap_writepage_map(inode, &wpc->iomap); + + map_len = min_t(u64, dirty_len, + wpc->iomap.offset + wpc->iomap.length - pos); + WARN_ON_ONCE(!folio->private && map_len < dirty_len); + + switch (wpc->iomap.type) { + case IOMAP_INLINE: + WARN_ON_ONCE(1); + error = -EIO; + break; + case IOMAP_HOLE: + break; + default: + error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos, + map_len); + if (!error) + (*count)++; + break; + } + dirty_len -= map_len; + pos += map_len; + } while (dirty_len && !error); -fail: /* * We cannot cancel the ioend directly here on error. We may have * already set other pages under writeback and hence we have to run I/O @@ -1827,7 +1876,7 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, * beyond i_size. */ folio_zero_segment(folio, poff, folio_size(folio)); - *end_pos = isize; + *end_pos = round_up(isize, i_blocksize(inode)); } return true; @@ -1838,12 +1887,11 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, { struct iomap_folio_state *ifs = folio->private; struct inode *inode = folio->mapping->host; - unsigned len = i_blocksize(inode); - unsigned nblocks = i_blocks_per_folio(inode, folio); u64 pos = folio_pos(folio); u64 end_pos = pos + folio_size(folio); unsigned count = 0; - int error = 0, i; + int error = 0; + u32 rlen; WARN_ON_ONCE(!folio_test_locked(folio)); WARN_ON_ONCE(folio_test_dirty(folio)); @@ -1857,7 +1905,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, } WARN_ON_ONCE(end_pos <= pos); - if (nblocks > 1) { + if (i_blocks_per_folio(inode, folio) > 1) { if (!ifs) { ifs = ifs_alloc(inode, folio, 0); iomap_set_range_dirty(folio, 0, end_pos - pos); @@ -1880,18 +1928,16 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, folio_start_writeback(folio); /* - * Walk through the folio to find areas to write back. If we - * run off the end of the current map or find the current map - * invalid, grab a new one. + * Walk through the folio to find dirty areas to write back. */ - for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { - if (ifs && !ifs_block_is_dirty(folio, ifs, i)) - continue; - error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos, - &count); + while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) { + error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, + pos, rlen, &count); if (error) break; + pos += rlen; } + if (count) wpc->nr_folios++; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index b8d3b658ad2b03..49d93f53878565 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -309,6 +309,13 @@ struct iomap_writeback_ops { /* * Required, maps the blocks so that writeback can be performed on * the range starting at offset. + * + * Can return arbitrarily large regions, but we need to call into it at + * least once per folio to allow the file systems to synchronize with + * the write path that could be invalidating mappings. + * + * An existing mapping from a previous call to this method can be reused + * by the file system if it is still valid. */ int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset); From patchwork Thu Dec 7 07:27:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13482804 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="4DPPOb15" Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECCB1D5B; Wed, 6 Dec 2023 23:28:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=y+NglUGlgZ9YBrdTJdpOBFfcM6D33ZO9uf8JCBTQOZ8=; b=4DPPOb15nmRr71vLaSqWwT2ORq PLSsaSvO0xL6TiE88y9Tl6/Di/HKDezf1XcsIvWzb2hTXPlt1xZz3OaS2k4xUj2uvFuqKFt4sIanm CqXsu+ZPZFqsVkl2TqqJJXvQvbvqK2PBLW8cudvuAkVVy5MsU07YcPkBzGwcfNNmG0xOJxmHcaiPH lPzFWRU67gDArF2yHf9eVcRZaDQ57GLqVAQvXJS8pCnliMOdZYTA3JpZePYVmhN9FjmECoYHMuwrv oqhAnq4i7AELm48Hk6Np/nO2SWj22Ug/t48k6iL4tJP9Yk8jCKAGhfktgx2ry98oxCjVgFjmsLZFM eRtSFs1g==; Received: from [2001:4bb8:191:e7ca:4bf6:cea4:9bbf:8b02] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1rB8nc-00C59G-23; Thu, 07 Dec 2023 07:27:57 +0000 From: Christoph Hellwig To: Christian Brauner Cc: "Darrick J. Wong" , Chandan Babu R , Zhang Yi , Ritesh Harjani , Jens Axboe , Andreas Gruenbacher , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, gfs2@lists.linux.dev Subject: [PATCH 14/14] iomap: pass the length of the dirty region to ->map_blocks Date: Thu, 7 Dec 2023 08:27:10 +0100 Message-Id: <20231207072710.176093-15-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207072710.176093-1-hch@lst.de> References: <20231207072710.176093-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Let the file system know how much dirty data exists at the passed in offset. This allows file systems to allocate the right amount of space that actually is written back if they can't eagerly convert (e.g. because they don't support unwritten extents). Signed-off-by: Christoph Hellwig --- block/fops.c | 2 +- fs/gfs2/bmap.c | 2 +- fs/iomap/buffered-io.c | 2 +- fs/xfs/xfs_aops.c | 3 ++- fs/zonefs/file.c | 3 ++- include/linux/iomap.h | 2 +- 6 files changed, 8 insertions(+), 6 deletions(-) diff --git a/block/fops.c b/block/fops.c index 0abaac705dafb0..7e921f999182dc 100644 --- a/block/fops.c +++ b/block/fops.c @@ -467,7 +467,7 @@ static void blkdev_readahead(struct readahead_control *rac) } static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc, - struct inode *inode, loff_t offset) + struct inode *inode, loff_t offset, unsigned int len) { loff_t isize = i_size_read(inode); diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index d9ccfd27e4f11f..789af5c8fade9d 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -2465,7 +2465,7 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length) } static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode, - loff_t offset) + loff_t offset, unsigned int len) { int ret; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index d8f56968962b97..e0c9cede82ee4b 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1783,7 +1783,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc, do { unsigned map_len; - error = wpc->ops->map_blocks(wpc, inode, pos); + error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len); if (error) break; trace_iomap_writepage_map(inode, &wpc->iomap); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index b45ee6cbbdaab2..bf26a91ecdbbc6 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -276,7 +276,8 @@ static int xfs_map_blocks( struct iomap_writepage_ctx *wpc, struct inode *inode, - loff_t offset) + loff_t offset, + unsigned int len) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c index b2c9b35df8f76d..1526e0ec6bfeaf 100644 --- a/fs/zonefs/file.c +++ b/fs/zonefs/file.c @@ -125,7 +125,8 @@ static void zonefs_readahead(struct readahead_control *rac) * which implies that the page range can only be within the fixed inode size. */ static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc, - struct inode *inode, loff_t offset) + struct inode *inode, loff_t offset, + unsigned int len) { struct zonefs_zone *z = zonefs_inode_zone(inode); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 49d93f53878565..6fc1c858013d1e 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -318,7 +318,7 @@ struct iomap_writeback_ops { * by the file system if it is still valid. */ int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, - loff_t offset); + loff_t offset, unsigned len); /* * Optional, allows the file systems to perform actions just before