diff mbox

[v2,block/for-linus] writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes()

Message ID 20150825181152.GA26785@mtj.duckdns.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tejun Heo Aug. 25, 2015, 6:11 p.m. UTC
e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
updated writeback path to avoid kicking writeback work items if there
are no inodes to be written out; unfortunately, the avoidance logic
was too aggressive and broke sync_inodes_sb().

* sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
  inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
  being skipped over.

* inodes are taken off wb->b_dirty/io/more_io lists after writeback
  starts on them.  sync_inodes_sb() skipping wait_sb_inodes() when
  bdi_has_dirty_io() breaks it by making it return while writebacks
  are in-flight.

This patch fixes the breakages by

* Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
  The callers are already testing the condition.

* Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
  it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().

* Making bdi_split_work_to_wbs() consider the b_dirty_time list for
  WB_SYNC_ALL writebacks.

Kudos to Eryu, Dave and Jan for tracking down the issue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Ted Ts'o <tytso@google.com>
---
 fs/fs-writeback.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jens Axboe Aug. 25, 2015, 8:37 p.m. UTC | #1
On 08/25/2015 12:11 PM, Tejun Heo wrote:
> e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
> updated writeback path to avoid kicking writeback work items if there
> are no inodes to be written out; unfortunately, the avoidance logic
> was too aggressive and broke sync_inodes_sb().
>
> * sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
>    inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
>    being skipped over.
>
> * inodes are taken off wb->b_dirty/io/more_io lists after writeback
>    starts on them.  sync_inodes_sb() skipping wait_sb_inodes() when
>    bdi_has_dirty_io() breaks it by making it return while writebacks
>    are in-flight.
>
> This patch fixes the breakages by
>
> * Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
>    The callers are already testing the condition.
>
> * Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
>    it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().
>
> * Making bdi_split_work_to_wbs() consider the b_dirty_time list for
>    WB_SYNC_ALL writebacks.
>
> Kudos to Eryu, Dave and Jan for tracking down the issue.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
> Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
> Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.com>
> Cc: Ted Ts'o <tytso@google.com>

Added for 4.2.
Jan Kara Aug. 26, 2015, 9 a.m. UTC | #2
On Tue 25-08-15 14:11:52, Tejun Heo wrote:
> e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
> updated writeback path to avoid kicking writeback work items if there
> are no inodes to be written out; unfortunately, the avoidance logic
> was too aggressive and broke sync_inodes_sb().
> 
> * sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
>   inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
>   being skipped over.
> 
> * inodes are taken off wb->b_dirty/io/more_io lists after writeback
>   starts on them.  sync_inodes_sb() skipping wait_sb_inodes() when
>   bdi_has_dirty_io() breaks it by making it return while writebacks
>   are in-flight.
> 
> This patch fixes the breakages by
> 
> * Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
>   The callers are already testing the condition.
> 
> * Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
>   it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().
> 
> * Making bdi_split_work_to_wbs() consider the b_dirty_time list for
>   WB_SYNC_ALL writebacks.
> 
> Kudos to Eryu, Dave and Jan for tracking down the issue.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
> Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
> Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.com>
> Cc: Ted Ts'o <tytso@google.com>
> ---
>  fs/fs-writeback.c |   22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)

The patch looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> 
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -844,14 +844,15 @@ static void bdi_split_work_to_wbs(struct
>  	struct wb_iter iter;
>  
>  	might_sleep();
> -
> -	if (!bdi_has_dirty_io(bdi))
> -		return;
>  restart:
>  	rcu_read_lock();
>  	bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) {
> -		if (!wb_has_dirty_io(wb) ||
> -		    (skip_if_busy && writeback_in_progress(wb)))
> +		/* SYNC_ALL writes out I_DIRTY_TIME too */
> +		if (!wb_has_dirty_io(wb) &&
> +		    (base_work->sync_mode == WB_SYNC_NONE ||
> +		     list_empty(&wb->b_dirty_time)))
> +			continue;
> +		if (skip_if_busy && writeback_in_progress(wb))
>  			continue;
>  
>  		base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages);
> @@ -899,8 +900,7 @@ static void bdi_split_work_to_wbs(struct
>  {
>  	might_sleep();
>  
> -	if (bdi_has_dirty_io(bdi) &&
> -	    (!skip_if_busy || !writeback_in_progress(&bdi->wb))) {
> +	if (!skip_if_busy || !writeback_in_progress(&bdi->wb)) {
>  		base_work->auto_free = 0;
>  		base_work->single_wait = 0;
>  		base_work->single_done = 0;
> @@ -2275,8 +2275,12 @@ void sync_inodes_sb(struct super_block *
>  	};
>  	struct backing_dev_info *bdi = sb->s_bdi;
>  
> -	/* Nothing to do? */
> -	if (!bdi_has_dirty_io(bdi) || bdi == &noop_backing_dev_info)
> +	/*
> +	 * Can't skip on !bdi_has_dirty() because we should wait for !dirty
> +	 * inodes under writeback and I_DIRTY_TIME inodes ignored by
> +	 * bdi_has_dirty() need to be written out too.
> +	 */
> +	if (bdi == &noop_backing_dev_info)
>  		return;
>  	WARN_ON(!rwsem_is_locked(&sb->s_umount));
>  
>
diff mbox

Patch

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -844,14 +844,15 @@  static void bdi_split_work_to_wbs(struct
 	struct wb_iter iter;
 
 	might_sleep();
-
-	if (!bdi_has_dirty_io(bdi))
-		return;
 restart:
 	rcu_read_lock();
 	bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) {
-		if (!wb_has_dirty_io(wb) ||
-		    (skip_if_busy && writeback_in_progress(wb)))
+		/* SYNC_ALL writes out I_DIRTY_TIME too */
+		if (!wb_has_dirty_io(wb) &&
+		    (base_work->sync_mode == WB_SYNC_NONE ||
+		     list_empty(&wb->b_dirty_time)))
+			continue;
+		if (skip_if_busy && writeback_in_progress(wb))
 			continue;
 
 		base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages);
@@ -899,8 +900,7 @@  static void bdi_split_work_to_wbs(struct
 {
 	might_sleep();
 
-	if (bdi_has_dirty_io(bdi) &&
-	    (!skip_if_busy || !writeback_in_progress(&bdi->wb))) {
+	if (!skip_if_busy || !writeback_in_progress(&bdi->wb)) {
 		base_work->auto_free = 0;
 		base_work->single_wait = 0;
 		base_work->single_done = 0;
@@ -2275,8 +2275,12 @@  void sync_inodes_sb(struct super_block *
 	};
 	struct backing_dev_info *bdi = sb->s_bdi;
 
-	/* Nothing to do? */
-	if (!bdi_has_dirty_io(bdi) || bdi == &noop_backing_dev_info)
+	/*
+	 * Can't skip on !bdi_has_dirty() because we should wait for !dirty
+	 * inodes under writeback and I_DIRTY_TIME inodes ignored by
+	 * bdi_has_dirty() need to be written out too.
+	 */
+	if (bdi == &noop_backing_dev_info)
 		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));