[19/19] writeback: simplify writeback iteration

Message ID	20240125085758.2393327-20-hch@lst.de (mailing list archive)
State	New
Headers	show Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7E104594E; Thu, 25 Jan 2024 08:59:19 +0000 (UTC) From: Christoph Hellwig <hch@lst.de> To: linux-mm@kvack.org Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.com>, David Howells <dhowells@redhat.com>, Brian Foster <bfoster@redhat.com>, Christian Brauner <brauner@kernel.org>, "Darrick J. Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 19/19] writeback: simplify writeback iteration Date: Thu, 25 Jan 2024 09:57:58 +0100 Message-Id: <20240125085758.2393327-20-hch@lst.de> In-Reply-To: <20240125085758.2393327-1-hch@lst.de> References: <20240125085758.2393327-1-hch@lst.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[01/19] writeback: fix done_index when hitting the wbc->nr_to_write \| expand [01/19] writeback: fix done_index when hitting the wbc->nr_to_write [02/19] writeback: also update wbc->nr_to_write on writeback failure [03/19] writeback: rework the loop termination condition in write_cache_pages [04/19] writeback: only update ->writeback_index for range_cyclic writeback [05/19] writeback: remove a duplicate prototype for tag_pages_for_writeback [06/19] writeback: Factor out writeback_finish() [07/19] writeback: Factor writeback_get_batch() out of write_cache_pages() [08/19] writeback: Factor folio_prepare_writeback() out of write_cache_pages() [09/19] writeback: Simplify the loops in write_cache_pages() [10/19] pagevec: Add ability to iterate a queue [11/19] writeback: Use the folio_batch queue iterator [12/19] writeback: Factor writeback_iter_init() out of write_cache_pages() [13/19] writeback: Move the folio_prepare_writeback loop out of write_cache_pages() [14/19] writeback: Factor writeback_iter_next() out of write_cache_pages() [15/19] writeback: Add for_each_writeback_folio() [16/19] writeback: Remove a use of write_cache_pages() from do_writepages() [17/19] writeback: update the kerneldoc comment for tag_pages_for_writeback [18/19] iomap: Convert iomap_writepages() to use for_each_writeback_folio() [19/19] writeback: simplify writeback iteration

Christoph Hellwig Jan. 25, 2024, 8:57 a.m. UTC

Based on the feedback from Jan I've tried to figure out how to
avoid the error magic in the for_each_writeback_folio.  This patch
tries to implement this by switching to an open while loop over a
single writeback_iter() function:

	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
		...
	}

the twist here is that the error value is passed by reference, so that
the iterator can restore it when breaking out of the loop.

Additionally it moves the AOP_WRITEPAGE_ACTIVATE out of the iterator
and into the callers, in preparation for eventually killing it off
with the phase out of write_cache_pages().

To me this form of the loop feels easier to follow, and it has the
added advantage that writeback_iter() can actually be nicely used in
nested loops, which should help with further iterizing the iomap
writeback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c    |   7 +-
 include/linux/writeback.h |  11 +--
 mm/page-writeback.c       | 174 +++++++++++++++++++++-----------------
 3 files changed, 102 insertions(+), 90 deletions(-)

Jan Kara Jan. 30, 2024, 10:46 a.m. UTC | #1

On Thu 25-01-24 09:57:58, Christoph Hellwig wrote:
> Based on the feedback from Jan I've tried to figure out how to
> avoid the error magic in the for_each_writeback_folio.  This patch
> tries to implement this by switching to an open while loop over a
> single writeback_iter() function:
> 
> 	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
> 		...
> 	}
> 
> the twist here is that the error value is passed by reference, so that
> the iterator can restore it when breaking out of the loop.
> 
> Additionally it moves the AOP_WRITEPAGE_ACTIVATE out of the iterator
> and into the callers, in preparation for eventually killing it off
> with the phase out of write_cache_pages().
> 
> To me this form of the loop feels easier to follow, and it has the
> added advantage that writeback_iter() can actually be nicely used in
> nested loops, which should help with further iterizing the iomap
> writeback code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looking at it now I'm thinking whether we would not be better off to
completely dump the 'error' argument of writeback_iter() /
writeback_iter_next() and just make all .writepage implementations set
wbc->err directly. But that means touching all the ~20 writepage
implementations we still have...

Couple of comments regarding this implementation below (overall I agree it
seems somewhat easier to follow the code).

> +/**
> + * writepage_iter - iterate folio of a mapping for writeback
> + * @mapping: address space structure to write
> + * @wbc: writeback context
> + * @folio: previously iterated folio (%NULL to start)
> + * @error: in-out pointer for writeback errors (see below)
> + *
> + * This function should be called in a while loop in the ->writepages
> + * implementation and returns the next folio for the writeback operation
> + * described by @wbc on @mapping.
> + *
> + * To start writeback @folio should be passed as NULL, for every following
> + * iteration the folio returned by this function previously should be passed.
> + * @error should contain the error from the previous writeback iteration when
> + * calling writeback_iter.
> + *
> + * Once the writeback described in @wbc has finished, this function will return
> + * %NULL and if there was an error in any iteration restore it to @error.
> + *
> + * Note: callers should not manually break out of the loop using break or goto.
> + */
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	unsigned long nr = folio_nr_pages(folio);
> +	if (folio) {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
> +		if (*error && !wbc->err)
> +			wbc->err = *error;
>  
> -	wbc->nr_to_write -= nr;
> -
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity sync  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 *
> +		 * For background writeback just push done_index past this folio
> +		 * so that we can just restart where we left off and media
> +		 * errors won't choke writeout for the entire file.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_NONE &&
> +		    (wbc->err || wbc->nr_to_write <= 0))
> +			goto finish;

I think it would be a bit more comprehensible if we replace the goto with:
			folio_batch_release(&wbc->fbatch);
			if (wbc->range_cyclic)
				mapping->writeback_index =
					folio->index + folio_nr_pages(folio);
			*error = wbc->err;
			return NULL;

> +	} else {
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index; /* prev offset */
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio)
> +		goto finish;

And here we just need to do:
		if (wbc->range_cyclic)
			mapping->writeback_index = 0;
		*error = wbc->err;
		return NULL;

> +	return folio;
> +
> +finish:
> +	folio_batch_release(&wbc->fbatch);
>  
>  	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> +	 * For range cyclic writeback we need to remember where we stopped so
> +	 * that we can continue there next time we are called.  If  we hit the
> +	 * last page and there is more work to be done, wrap back to the start
> +	 * of the file.
>  	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> +	 * For non-cyclic writeback we always start looking up at the beginning
> +	 * of the file if we are called again, which can only happen due to
> +	 * -ENOMEM from the file system.
>  	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +	if (wbc->range_cyclic) {
> +		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
> +		if (wbc->err || wbc->nr_to_write <= 0)
> +			mapping->writeback_index =
> +				folio->index + folio_nr_pages(folio);
> +		else
> +			mapping->writeback_index = 0;
>  	}
> -
> -	return writeback_get_folio(mapping, wbc);
> +	*error = wbc->err;
> +	return NULL;
>  }
>  
>  /**
> @@ -2563,13 +2575,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = 0;
			     ^^ NULL please


> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
> @@ -2590,6 +2606,8 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
>  			ret = mapping->a_ops->writepages(mapping, wbc);
>  		} else if (mapping->a_ops->writepage) {
>  			ret = writeback_use_writepage(mapping, wbc);
> +			if (!ret)
> +				ret = wbc->err;

AFAICT this should not be needed as writeback_iter() made sure wbc->err is
returned when set?

								Honza

Christoph Hellwig Jan. 30, 2024, 2:16 p.m. UTC | #2

On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> Looking at it now I'm thinking whether we would not be better off to
> completely dump the 'error' argument of writeback_iter() /
> writeback_iter_next() and just make all .writepage implementations set
> wbc->err directly. But that means touching all the ~20 writepage
> implementations we still have...

Heh.  I actually had an earlier version that looked at wbc->err in
the ->writepages callers.  But it felt a bit too ugly.

> > +		 */
> > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > +		    (wbc->err || wbc->nr_to_write <= 0))
> > +			goto finish;
> 
> I think it would be a bit more comprehensible if we replace the goto with:
> 			folio_batch_release(&wbc->fbatch);
> 			if (wbc->range_cyclic)
> 				mapping->writeback_index =
> 					folio->index + folio_nr_pages(folio);
> 			*error = wbc->err;
> 			return NULL;

I agree that keeping the logic on when to break and when to set the
writeback_index is good, but duplicating the batch release and error
assignment seems a bit suboptimal.  Let me know what you think of the
alternatіve variant below.

> > +	struct folio *folio = 0;
> 			     ^^ NULL please

Fixed.

> >  			ret = writeback_use_writepage(mapping, wbc);
> > +			if (!ret)
> > +				ret = wbc->err;
> 
> AFAICT this should not be needed as writeback_iter() made sure wbc->err is
> returned when set?

Heh.  That's a leftover from my above mentioned different attempt at
error handling and shouldn't have stayed in.

Christoph Hellwig Jan. 30, 2024, 2:22 p.m. UTC | #3

On Tue, Jan 30, 2024 at 03:16:01PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> > Looking at it now I'm thinking whether we would not be better off to
> > completely dump the 'error' argument of writeback_iter() /
> > writeback_iter_next() and just make all .writepage implementations set
> > wbc->err directly. But that means touching all the ~20 writepage
> > implementations we still have...
> 
> Heh.  I actually had an earlier version that looked at wbc->err in
> the ->writepages callers.  But it felt a bit too ugly.
> 
> > > +		 */
> > > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > > +		    (wbc->err || wbc->nr_to_write <= 0))
> > > +			goto finish;
> > 
> > I think it would be a bit more comprehensible if we replace the goto with:
> > 			folio_batch_release(&wbc->fbatch);
> > 			if (wbc->range_cyclic)
> > 				mapping->writeback_index =
> > 					folio->index + folio_nr_pages(folio);
> > 			*error = wbc->err;
> > 			return NULL;
> 
> I agree that keeping the logic on when to break and when to set the
> writeback_index is good, but duplicating the batch release and error
> assignment seems a bit suboptimal.  Let me know what you think of the
> alternatіve variant below.

And now for real:


diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d8fcbac2d72310..3e4aa5bfe75819 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2475,10 +2475,23 @@ struct folio *writeback_iter(struct address_space *mapping,
 		 * For background writeback just push done_index past this folio
 		 * so that we can just restart where we left off and media
 		 * errors won't choke writeout for the entire file.
+		 *
+		 * For range cyclic writeback we need to remember where we
+		 * stopped so that we can continue there next time we are
+		 * called.  If we hit the last page and there is more work
+		 * to be done, wrap back to the start of the file.
+		 *
+		 * For non-cyclic writeback we always start looking up at the
+		 * beginning of the file if we are called again, which can only
+		 * happen due to -ENOMEM from the file system.
 		 */
 		if (wbc->sync_mode == WB_SYNC_NONE &&
-		    (wbc->err || wbc->nr_to_write <= 0))
+		    (wbc->err || wbc->nr_to_write <= 0)) {
+			if (wbc->range_cyclic)
+				mapping->writeback_index =
+					folio->index + folio_nr_pages(folio);
 			goto finish;
+		}
 	} else {
 		if (wbc->range_cyclic)
 			wbc->index = mapping->writeback_index; /* prev offset */
@@ -2492,31 +2505,15 @@ struct folio *writeback_iter(struct address_space *mapping,
 	}
 
 	folio = writeback_get_folio(mapping, wbc);
-	if (!folio)
+	if (!folio) {
+		if (wbc->range_cyclic)
+			mapping->writeback_index = 0;
 		goto finish;
+	}
 	return folio;
 
 finish:
 	folio_batch_release(&wbc->fbatch);
-
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
-		if (wbc->err || wbc->nr_to_write <= 0)
-			mapping->writeback_index =
-				folio->index + folio_nr_pages(folio);
-		else
-			mapping->writeback_index = 0;
-	}
 	*error = wbc->err;
 	return NULL;
 }

Christoph Hellwig Jan. 30, 2024, 2:32 p.m. UTC | #4

On Tue, Jan 30, 2024 at 03:22:05PM +0100, Christoph Hellwig wrote:
> And now for real:

Another slight variant of this would be to move the nr_to_write || err
check for WB_SYNC_NONE out of the branch.  This adds extra tests for
the initial iteration, but unindents a huge comment, and moves it closer
to the other branch that it also describes.  The downside at least to
me is that the normal non-termination path is not the straight line
through function.  Thoughs?

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 973f57ad9ee548..ff6e73453aa8c4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2461,24 +2461,6 @@ struct folio *writeback_iter(struct address_space *mapping,
 		wbc->nr_to_write -= folio_nr_pages(folio);
 		if (*error && !wbc->err)
 			wbc->err = *error;
-
-		/*
-		 * For integrity sync  we have to keep going until we have
-		 * written all the folios we tagged for writeback prior to
-		 * entering the writeback loop, even if we run past
-		 * wbc->nr_to_write or encounter errors.
-		 *
-		 * This is because the file system may still have state to clear
-		 * for each folio.  We'll eventually return the first error
-		 * encountered.
-		 *
-		 * For background writeback just push done_index past this folio
-		 * so that we can just restart where we left off and media
-		 * errors won't choke writeout for the entire file.
-		 */
-		if (wbc->sync_mode == WB_SYNC_NONE &&
-		    (wbc->err || wbc->nr_to_write <= 0))
-			goto finish;
 	} else {
 		if (wbc->range_cyclic)
 			wbc->index = mapping->writeback_index; /* prev offset */
@@ -2491,17 +2473,20 @@ struct folio *writeback_iter(struct address_space *mapping,
 		wbc->err = 0;
 	}
 
-	folio = writeback_get_folio(mapping, wbc);
-	if (!folio)
-		goto finish;
-	return folio;
-
-finish:
-	folio_batch_release(&wbc->fbatch);
-
 	/*
+	 * For integrity sync  we have to keep going until we have written all
+	 * the folios we tagged for writeback prior to entering the writeback
+	 * loop, even if we run past wbc->nr_to_write or encounter errors.
+	 *
+	 * This is because the file system may still have state to clear for
+	 * each folio.  We'll eventually return the first error encountered.
+	 *
+	 * For background writeback just push done_index past this folio so that
+	 * we can just restart where we left off and media errors won't choke
+	 * writeout for the entire file.
+	 *
 	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
+	 * that we can continue there next time we are called.  If we hit the
 	 * last page and there is more work to be done, wrap back to the start
 	 * of the file.
 	 *
@@ -2509,14 +2494,21 @@ struct folio *writeback_iter(struct address_space *mapping,
 	 * of the file if we are called again, which can only happen due to
 	 * -ENOMEM from the file system.
 	 */
-	if (wbc->range_cyclic) {
-		WARN_ON_ONCE(wbc->sync_mode != WB_SYNC_NONE);
-		if (wbc->err || wbc->nr_to_write <= 0)
+	if (wbc->sync_mode == WB_SYNC_NONE &&
+	    (wbc->err || wbc->nr_to_write <= 0)) {
+		if (wbc->range_cyclic)
 			mapping->writeback_index =
 				folio->index + folio_nr_pages(folio);
-		else
+	} else {
+		folio = writeback_get_folio(mapping, wbc);
+		if (folio)
+			return folio;
+
+		if (wbc->range_cyclic)
 			mapping->writeback_index = 0;
 	}
+
+	folio_batch_release(&wbc->fbatch);
 	*error = wbc->err;
 	return NULL;
 }

Jan Kara Jan. 30, 2024, 9:50 p.m. UTC | #5

On Tue 30-01-24 15:16:01, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 11:46:05AM +0100, Jan Kara wrote:
> > Looking at it now I'm thinking whether we would not be better off to
> > completely dump the 'error' argument of writeback_iter() /
> > writeback_iter_next() and just make all .writepage implementations set
> > wbc->err directly. But that means touching all the ~20 writepage
> > implementations we still have...
> 
> Heh.  I actually had an earlier version that looked at wbc->err in
> the ->writepages callers.  But it felt a bit too ugly.

OK.

> > > +		 */
> > > +		if (wbc->sync_mode == WB_SYNC_NONE &&
> > > +		    (wbc->err || wbc->nr_to_write <= 0))
> > > +			goto finish;
> > 
> > I think it would be a bit more comprehensible if we replace the goto with:
> > 			folio_batch_release(&wbc->fbatch);
> > 			if (wbc->range_cyclic)
> > 				mapping->writeback_index =
> > 					folio->index + folio_nr_pages(folio);
> > 			*error = wbc->err;
> > 			return NULL;
> 
> I agree that keeping the logic on when to break and when to set the
> writeback_index is good, but duplicating the batch release and error
> assignment seems a bit suboptimal.  Let me know what you think of the
> alternatіve variant below.

Well, batch release needs to be only here because if writeback_get_folio()
returns NULL, the batch has been already released by it. So what would be
duplicated is only the error assignment. But I'm fine with the version in
the following email and actually somewhat prefer it compared the yet
another variant you've sent.

								Honza

Christoph Hellwig Jan. 31, 2024, 7:14 a.m. UTC | #6

On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> Well, batch release needs to be only here because if writeback_get_folio()
> returns NULL, the batch has been already released by it.

Indeed.

> So what would be
> duplicated is only the error assignment. But I'm fine with the version in
> the following email and actually somewhat prefer it compared the yet
> another variant you've sent.

So how about another variant, this is closer to your original suggestion.
But I've switched around the ordered of the folio or not branches
from my original patch, and completely reworked and (IMHO) improved the
comments.  it replaces patch 19 instead of being incremental to be
readable:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58b3661f5eac9e..1593a783176ca2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1985,12 +1985,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 		struct iomap_writepage_ctx *wpc,
 		const struct iomap_writeback_ops *ops)
 {
-	struct folio *folio;
-	int ret;
+	struct folio *folio = NULL;
+	int ret = 0;
 
 	wpc->ops = ops;
-	for_each_writeback_folio(mapping, wbc, folio, ret)
+	while ((folio = writeback_iter(mapping, wbc, folio, &ret)))
 		ret = iomap_do_writepage(folio, wbc, wpc);
+
 	if (!wpc->ioend)
 		return ret;
 	return iomap_submit_ioend(wpc, wpc->ioend, ret);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 2416da933440e2..fc4605627496fc 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -367,15 +367,8 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc);
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error);
-
-#define for_each_writeback_folio(mapping, wbc, folio, error)		\
-	for (folio = writeback_iter_init(mapping, wbc);			\
-	     folio || ((error = wbc->err), false);			\
-	     folio = writeback_iter_next(mapping, wbc, folio, error))
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error);
 
 typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
 				void *data);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0763c4353a676a..eefcb00cb7b227 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,29 +2360,6 @@ void tag_pages_for_writeback(struct address_space *mapping,
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
-static void writeback_finish(struct address_space *mapping,
-		struct writeback_control *wbc, pgoff_t done_index)
-{
-	folio_batch_release(&wbc->fbatch);
-
-	/*
-	 * For range cyclic writeback we need to remember where we stopped so
-	 * that we can continue there next time we are called.  If  we hit the
-	 * last page and there is more work to be done, wrap back to the start
-	 * of the file.
-	 *
-	 * For non-cyclic writeback we always start looking up at the beginning
-	 * of the file if we are called again, which can only happen due to
-	 * -ENOMEM from the file system.
-	 */
-	if (wbc->range_cyclic) {
-		if (wbc->err || wbc->nr_to_write <= 0)
-			mapping->writeback_index = done_index;
-		else
-			mapping->writeback_index = 0;
-	}
-}
-
 static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
 {
 	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
@@ -2442,10 +2419,8 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
 				wbc_to_tag(wbc), &wbc->fbatch);
 		folio = folio_batch_next(&wbc->fbatch);
-		if (!folio) {
-			writeback_finish(mapping, wbc, 0);
+		if (!folio)
 			return NULL;
-		}
 	}
 
 	folio_lock(folio);
@@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
 	return folio;
 }
 
-struct folio *writeback_iter_init(struct address_space *mapping,
-		struct writeback_control *wbc)
+/**
+ * writepage_iter - iterate folio of a mapping for writeback
+ * @mapping: address space structure to write
+ * @wbc: writeback context
+ * @folio: previously iterated folio (%NULL to start)
+ * @error: in-out pointer for writeback errors (see below)
+ *
+ * This function should be called in a while loop in the ->writepages
+ * implementation and returns the next folio for the writeback operation
+ * described by @wbc on @mapping.
+ *
+ * To start writeback @folio should be passed as NULL, for every following
+ * iteration the folio returned by this function previously should be passed.
+ * @error should contain the error from the previous writeback iteration when
+ * calling writeback_iter.
+ *
+ * Once the writeback described in @wbc has finished, this function will return
+ * %NULL and if there was an error in any iteration restore it to @error.
+ *
+ * Note: callers should not manually break out of the loop using break or goto.
+ */
+struct folio *writeback_iter(struct address_space *mapping,
+		struct writeback_control *wbc, struct folio *folio, int *error)
 {
-	if (wbc->range_cyclic)
-		wbc->index = mapping->writeback_index; /* prev offset */
-	else
-		wbc->index = wbc->range_start >> PAGE_SHIFT;
-
-	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
-		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
-
-	wbc->err = 0;
-	folio_batch_init(&wbc->fbatch);
-	return writeback_get_folio(mapping, wbc);
-}
+	if (!folio) {
+		folio_batch_init(&wbc->fbatch);
+		wbc->err = 0;
 
-struct folio *writeback_iter_next(struct address_space *mapping,
-		struct writeback_control *wbc, struct folio *folio, int error)
-{
-	unsigned long nr = folio_nr_pages(folio);
+		/*
+		 * For range cyclic writeback we remember where we stopped so
+		 * that we can continue where we stopped.
+		 *
+		 * For non-cyclic writeback we always start at the beginning of
+		 * the passed in range.
+		 */
+		if (wbc->range_cyclic)
+			wbc->index = mapping->writeback_index;
+		else
+			wbc->index = wbc->range_start >> PAGE_SHIFT;
 
-	wbc->nr_to_write -= nr;
+		/*
+		 * To avoid livelocks when other processes dirty new pages, we
+		 * first tag pages which should be written back and only then
+		 * start writing them.
+		 *
+		 * For data-integrity sync we have to be careful so that we do
+		 * not miss some pages (e.g., because some other process has
+		 * cleared the TOWRITE tag we set).  The rule we follow is that
+		 * TOWRITE tag can be cleared only by the process clearing the
+		 * DIRTY tag (and submitting the page for I/O).
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+			tag_pages_for_writeback(mapping, wbc->index,
+					wbc_end(wbc));
+	} else {
+		wbc->nr_to_write -= folio_nr_pages(folio);
 
-	/*
-	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
-	 * Eventually all instances should just unlock the folio themselves and
-	 * return 0;
-	 */
-	if (error == AOP_WRITEPAGE_ACTIVATE) {
-		folio_unlock(folio);
-		error = 0;
+		/*
+		 * For integrity writeback  we have to keep going until we have
+		 * written all the folios we tagged for writeback prior to
+		 * entering the writeback loop, even if we run past
+		 * wbc->nr_to_write or encounter errors, and just stash away
+		 * the first error we encounter in wbc->err so that it can
+		 * be retrieved on return.
+		 *
+		 * This is because the file system may still have state to clear
+		 * for each folio.  We'll eventually return the first error
+		 * encountered.
+		 */
+		if (wbc->sync_mode == WB_SYNC_ALL) {
+			if (*error && !wbc->err)
+				wbc->err = *error;
+		} else {
+			if (*error || wbc->nr_to_write <= 0)
+				goto done;
+		}
 	}
 
-	if (error && !wbc->err)
-		wbc->err = error;
+	folio = writeback_get_folio(mapping, wbc);
+	if (!folio) {
+		/*
+		 * For range cyclic writeback not finding another folios means
+		 * that we are at the end of the file.  In that case go back
+		 * to the start of the file for the next call.
+		 */
+		if (wbc->range_cyclic)
+			mapping->writeback_index = 0;
 
-	/*
-	 * For integrity sync  we have to keep going until we have written all
-	 * the folios we tagged for writeback prior to entering the writeback
-	 * loop, even if we run past wbc->nr_to_write or encounter errors.
-	 * This is because the file system may still have state to clear for
-	 * each folio.   We'll eventually return the first error encountered.
-	 *
-	 * For background writeback just push done_index past this folio so that
-	 * we can just restart where we left off and media errors won't choke
-	 * writeout for the entire file.
-	 */
-	if (wbc->sync_mode == WB_SYNC_NONE &&
-	    (wbc->err || wbc->nr_to_write <= 0)) {
-		writeback_finish(mapping, wbc, folio->index + nr);
-		return NULL;
+		/*
+		 * Return the first error we encountered (if there was any) to
+		 * the caller now that we are done.
+		 */
+		*error = wbc->err;
 	}
+	return folio;
 
-	return writeback_get_folio(mapping, wbc);
+done:
+	if (wbc->range_cyclic)
+		mapping->writeback_index = folio->index + folio_nr_pages(folio);
+	folio_batch_release(&wbc->fbatch);
+	return NULL;
 }
 
 /**
@@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data)
 {
-	struct folio *folio;
-	int error;
+	struct folio *folio = NULL;
+	int error = 0;
 
-	for_each_writeback_folio(mapping, wbc, folio, error)
+	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
 		error = writepage(folio, wbc, data);
+		if (error == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			error = 0;
+		}
+	}
 
-	return wbc->err;
+	return error;
 }
 EXPORT_SYMBOL(write_cache_pages);
 
@@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
 		struct writeback_control *wbc)
 {
 	struct blk_plug plug;
-	struct folio *folio;
-	int err;
+	struct folio *folio = NULL;
+	int err = 0;
 
 	blk_start_plug(&plug);
-	for_each_writeback_folio(mapping, wbc, folio, err) {
+	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
 		err = mapping->a_ops->writepage(&folio->page, wbc);
 		mapping_set_error(mapping, err);
+		if (err == AOP_WRITEPAGE_ACTIVATE) {
+			folio_unlock(folio);
+			err = 0;
+		}
 	}
 	blk_finish_plug(&plug);

Jan Kara Jan. 31, 2024, 10:40 a.m. UTC | #7

On Wed 31-01-24 08:14:37, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> > Well, batch release needs to be only here because if writeback_get_folio()
> > returns NULL, the batch has been already released by it.
> 
> Indeed.
> 
> > So what would be
> > duplicated is only the error assignment. But I'm fine with the version in
> > the following email and actually somewhat prefer it compared the yet
> > another variant you've sent.
> 
> So how about another variant, this is closer to your original suggestion.
> But I've switched around the ordered of the folio or not branches
> from my original patch, and completely reworked and (IMHO) improved the
> comments.  it replaces patch 19 instead of being incremental to be
> readable:

Yes, this looks very good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 58b3661f5eac9e..1593a783176ca2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1985,12 +1985,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
>  		struct iomap_writepage_ctx *wpc,
>  		const struct iomap_writeback_ops *ops)
>  {
> -	struct folio *folio;
> -	int ret;
> +	struct folio *folio = NULL;
> +	int ret = 0;
>  
>  	wpc->ops = ops;
> -	for_each_writeback_folio(mapping, wbc, folio, ret)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &ret)))
>  		ret = iomap_do_writepage(folio, wbc, wpc);
> +
>  	if (!wpc->ioend)
>  		return ret;
>  	return iomap_submit_ioend(wpc, wpc->ioend, ret);
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 2416da933440e2..fc4605627496fc 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -367,15 +367,8 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
>  
>  bool wb_over_bg_thresh(struct bdi_writeback *wb);
>  
> -struct folio *writeback_iter_init(struct address_space *mapping,
> -		struct writeback_control *wbc);
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error);
> -
> -#define for_each_writeback_folio(mapping, wbc, folio, error)		\
> -	for (folio = writeback_iter_init(mapping, wbc);			\
> -	     folio || ((error = wbc->err), false);			\
> -	     folio = writeback_iter_next(mapping, wbc, folio, error))
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error);
>  
>  typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
>  				void *data);
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0763c4353a676a..eefcb00cb7b227 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2360,29 +2360,6 @@ void tag_pages_for_writeback(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(tag_pages_for_writeback);
>  
> -static void writeback_finish(struct address_space *mapping,
> -		struct writeback_control *wbc, pgoff_t done_index)
> -{
> -	folio_batch_release(&wbc->fbatch);
> -
> -	/*
> -	 * For range cyclic writeback we need to remember where we stopped so
> -	 * that we can continue there next time we are called.  If  we hit the
> -	 * last page and there is more work to be done, wrap back to the start
> -	 * of the file.
> -	 *
> -	 * For non-cyclic writeback we always start looking up at the beginning
> -	 * of the file if we are called again, which can only happen due to
> -	 * -ENOMEM from the file system.
> -	 */
> -	if (wbc->range_cyclic) {
> -		if (wbc->err || wbc->nr_to_write <= 0)
> -			mapping->writeback_index = done_index;
> -		else
> -			mapping->writeback_index = 0;
> -	}
> -}
> -
>  static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
>  {
>  	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> @@ -2442,10 +2419,8 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  		filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
>  				wbc_to_tag(wbc), &wbc->fbatch);
>  		folio = folio_batch_next(&wbc->fbatch);
> -		if (!folio) {
> -			writeback_finish(mapping, wbc, 0);
> +		if (!folio)
>  			return NULL;
> -		}
>  	}
>  
>  	folio_lock(folio);
> @@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  	return folio;
>  }
>  
> -struct folio *writeback_iter_init(struct address_space *mapping,
> -		struct writeback_control *wbc)
> +/**
> + * writepage_iter - iterate folio of a mapping for writeback
> + * @mapping: address space structure to write
> + * @wbc: writeback context
> + * @folio: previously iterated folio (%NULL to start)
> + * @error: in-out pointer for writeback errors (see below)
> + *
> + * This function should be called in a while loop in the ->writepages
> + * implementation and returns the next folio for the writeback operation
> + * described by @wbc on @mapping.
> + *
> + * To start writeback @folio should be passed as NULL, for every following
> + * iteration the folio returned by this function previously should be passed.
> + * @error should contain the error from the previous writeback iteration when
> + * calling writeback_iter.
> + *
> + * Once the writeback described in @wbc has finished, this function will return
> + * %NULL and if there was an error in any iteration restore it to @error.
> + *
> + * Note: callers should not manually break out of the loop using break or goto.
> + */
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	if (wbc->range_cyclic)
> -		wbc->index = mapping->writeback_index; /* prev offset */
> -	else
> -		wbc->index = wbc->range_start >> PAGE_SHIFT;
> -
> -	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> -		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
> -
> -	wbc->err = 0;
> -	folio_batch_init(&wbc->fbatch);
> -	return writeback_get_folio(mapping, wbc);
> -}
> +	if (!folio) {
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;
>  
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error)
> -{
> -	unsigned long nr = folio_nr_pages(folio);
> +		/*
> +		 * For range cyclic writeback we remember where we stopped so
> +		 * that we can continue where we stopped.
> +		 *
> +		 * For non-cyclic writeback we always start at the beginning of
> +		 * the passed in range.
> +		 */
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index;
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
>  
> -	wbc->nr_to_write -= nr;
> +		/*
> +		 * To avoid livelocks when other processes dirty new pages, we
> +		 * first tag pages which should be written back and only then
> +		 * start writing them.
> +		 *
> +		 * For data-integrity sync we have to be careful so that we do
> +		 * not miss some pages (e.g., because some other process has
> +		 * cleared the TOWRITE tag we set).  The rule we follow is that
> +		 * TOWRITE tag can be cleared only by the process clearing the
> +		 * DIRTY tag (and submitting the page for I/O).
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +	} else {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity writeback  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors, and just stash away
> +		 * the first error we encounter in wbc->err so that it can
> +		 * be retrieved on return.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL) {
> +			if (*error && !wbc->err)
> +				wbc->err = *error;
> +		} else {
> +			if (*error || wbc->nr_to_write <= 0)
> +				goto done;
> +		}
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * For range cyclic writeback not finding another folios means
> +		 * that we are at the end of the file.  In that case go back
> +		 * to the start of the file for the next call.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
>  
> -	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> -	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> -	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller now that we are done.
> +		 */
> +		*error = wbc->err;
>  	}
> +	return folio;
>  
> -	return writeback_get_folio(mapping, wbc);
> +done:
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	folio_batch_release(&wbc->fbatch);
> +	return NULL;
>  }
>  
>  /**
> @@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
>  		      struct writeback_control *wbc, writepage_t writepage,
>  		      void *data)
>  {
> -	struct folio *folio;
> -	int error;
> +	struct folio *folio = NULL;
> +	int error = 0;
>  
> -	for_each_writeback_folio(mapping, wbc, folio, error)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
>  		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
>  
> -	return wbc->err;
> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> @@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = NULL;
> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
>

Christoph Hellwig Jan. 31, 2024, 10:43 a.m. UTC | #8

On Wed, Jan 31, 2024 at 11:40:45AM +0100, Jan Kara wrote:
> Yes, this looks very good to me. Feel free to add:

Thanks!  I'd also really love to hear from Brian and Matthew before
reposting.  Or from anyone else interested..

Brian Foster Jan. 31, 2024, 12:50 p.m. UTC | #9

On Wed, Jan 31, 2024 at 08:14:37AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 30, 2024 at 10:50:16PM +0100, Jan Kara wrote:
> > Well, batch release needs to be only here because if writeback_get_folio()
> > returns NULL, the batch has been already released by it.
> 
> Indeed.
> 
> > So what would be
> > duplicated is only the error assignment. But I'm fine with the version in
> > the following email and actually somewhat prefer it compared the yet
> > another variant you've sent.
> 
> So how about another variant, this is closer to your original suggestion.
> But I've switched around the ordered of the folio or not branches
> from my original patch, and completely reworked and (IMHO) improved the
> comments.  it replaces patch 19 instead of being incremental to be
> readable:
> 

Not a thorough review but at first glance I like it. I think the direct
function call makes things easier to follow. Just one quick thought...

...
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0763c4353a676a..eefcb00cb7b227 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
...
> @@ -2458,60 +2433,107 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>  	return folio;
>  }
>  
...
> +struct folio *writeback_iter(struct address_space *mapping,
> +		struct writeback_control *wbc, struct folio *folio, int *error)
>  {
> -	if (wbc->range_cyclic)
> -		wbc->index = mapping->writeback_index; /* prev offset */
> -	else
> -		wbc->index = wbc->range_start >> PAGE_SHIFT;
> -
> -	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> -		tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
> -
> -	wbc->err = 0;
> -	folio_batch_init(&wbc->fbatch);
> -	return writeback_get_folio(mapping, wbc);
> -}
> +	if (!folio) {
> +		folio_batch_init(&wbc->fbatch);
> +		wbc->err = 0;

The implied field initialization via !folio feels a little wonky to me
just because it's not clear from the client code that both fields must
be initialized. Even though the interface is simpler, I wonder if it's
still worth having a dumb/macro type init function that at least does
the batch and error field initialization.

Or on second thought maybe having writeback_iter() reset *error as well
on folio == NULL might be a little cleaner without changing the
interface....

Brian

>  
> -struct folio *writeback_iter_next(struct address_space *mapping,
> -		struct writeback_control *wbc, struct folio *folio, int error)
> -{
> -	unsigned long nr = folio_nr_pages(folio);
> +		/*
> +		 * For range cyclic writeback we remember where we stopped so
> +		 * that we can continue where we stopped.
> +		 *
> +		 * For non-cyclic writeback we always start at the beginning of
> +		 * the passed in range.
> +		 */
> +		if (wbc->range_cyclic)
> +			wbc->index = mapping->writeback_index;
> +		else
> +			wbc->index = wbc->range_start >> PAGE_SHIFT;
>  
> -	wbc->nr_to_write -= nr;
> +		/*
> +		 * To avoid livelocks when other processes dirty new pages, we
> +		 * first tag pages which should be written back and only then
> +		 * start writing them.
> +		 *
> +		 * For data-integrity sync we have to be careful so that we do
> +		 * not miss some pages (e.g., because some other process has
> +		 * cleared the TOWRITE tag we set).  The rule we follow is that
> +		 * TOWRITE tag can be cleared only by the process clearing the
> +		 * DIRTY tag (and submitting the page for I/O).
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> +			tag_pages_for_writeback(mapping, wbc->index,
> +					wbc_end(wbc));
> +	} else {
> +		wbc->nr_to_write -= folio_nr_pages(folio);
>  
> -	/*
> -	 * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> -	 * Eventually all instances should just unlock the folio themselves and
> -	 * return 0;
> -	 */
> -	if (error == AOP_WRITEPAGE_ACTIVATE) {
> -		folio_unlock(folio);
> -		error = 0;
> +		/*
> +		 * For integrity writeback  we have to keep going until we have
> +		 * written all the folios we tagged for writeback prior to
> +		 * entering the writeback loop, even if we run past
> +		 * wbc->nr_to_write or encounter errors, and just stash away
> +		 * the first error we encounter in wbc->err so that it can
> +		 * be retrieved on return.
> +		 *
> +		 * This is because the file system may still have state to clear
> +		 * for each folio.  We'll eventually return the first error
> +		 * encountered.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL) {
> +			if (*error && !wbc->err)
> +				wbc->err = *error;
> +		} else {
> +			if (*error || wbc->nr_to_write <= 0)
> +				goto done;
> +		}
>  	}
>  
> -	if (error && !wbc->err)
> -		wbc->err = error;
> +	folio = writeback_get_folio(mapping, wbc);
> +	if (!folio) {
> +		/*
> +		 * For range cyclic writeback not finding another folios means
> +		 * that we are at the end of the file.  In that case go back
> +		 * to the start of the file for the next call.
> +		 */
> +		if (wbc->range_cyclic)
> +			mapping->writeback_index = 0;
>  
> -	/*
> -	 * For integrity sync  we have to keep going until we have written all
> -	 * the folios we tagged for writeback prior to entering the writeback
> -	 * loop, even if we run past wbc->nr_to_write or encounter errors.
> -	 * This is because the file system may still have state to clear for
> -	 * each folio.   We'll eventually return the first error encountered.
> -	 *
> -	 * For background writeback just push done_index past this folio so that
> -	 * we can just restart where we left off and media errors won't choke
> -	 * writeout for the entire file.
> -	 */
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    (wbc->err || wbc->nr_to_write <= 0)) {
> -		writeback_finish(mapping, wbc, folio->index + nr);
> -		return NULL;
> +		/*
> +		 * Return the first error we encountered (if there was any) to
> +		 * the caller now that we are done.
> +		 */
> +		*error = wbc->err;
>  	}
> +	return folio;
>  
> -	return writeback_get_folio(mapping, wbc);
> +done:
> +	if (wbc->range_cyclic)
> +		mapping->writeback_index = folio->index + folio_nr_pages(folio);
> +	folio_batch_release(&wbc->fbatch);
> +	return NULL;
>  }
>  
>  /**
> @@ -2549,13 +2571,18 @@ int write_cache_pages(struct address_space *mapping,
>  		      struct writeback_control *wbc, writepage_t writepage,
>  		      void *data)
>  {
> -	struct folio *folio;
> -	int error;
> +	struct folio *folio = NULL;
> +	int error = 0;
>  
> -	for_each_writeback_folio(mapping, wbc, folio, error)
> +	while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
>  		error = writepage(folio, wbc, data);
> +		if (error == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			error = 0;
> +		}
> +	}
>  
> -	return wbc->err;
> +	return error;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>  
> @@ -2563,13 +2590,17 @@ static int writeback_use_writepage(struct address_space *mapping,
>  		struct writeback_control *wbc)
>  {
>  	struct blk_plug plug;
> -	struct folio *folio;
> -	int err;
> +	struct folio *folio = NULL;
> +	int err = 0;
>  
>  	blk_start_plug(&plug);
> -	for_each_writeback_folio(mapping, wbc, folio, err) {
> +	while ((folio = writeback_iter(mapping, wbc, folio, &err))) {
>  		err = mapping->a_ops->writepage(&folio->page, wbc);
>  		mapping_set_error(mapping, err);
> +		if (err == AOP_WRITEPAGE_ACTIVATE) {
> +			folio_unlock(folio);
> +			err = 0;
> +		}
>  	}
>  	blk_finish_plug(&plug);
>  
>

Christoph Hellwig Jan. 31, 2024, 1:01 p.m. UTC | #10

On Wed, Jan 31, 2024 at 07:50:25AM -0500, Brian Foster wrote:
> The implied field initialization via !folio feels a little wonky to me
> just because it's not clear from the client code that both fields must
> be initialized. Even though the interface is simpler, I wonder if it's
> still worth having a dumb/macro type init function that at least does
> the batch and error field initialization.
> 
> Or on second thought maybe having writeback_iter() reset *error as well
> on folio == NULL might be a little cleaner without changing the
> interface....

I like that second idea.  An initialization helper I could live with,
but if only folio needs a defined state, that seems superflous.

[19/19] writeback: simplify writeback iteration

Commit Message

Comments

Patch