diff mbox series

Btrfs: fix unwritten extent buffers and hangs on future writeback attempts

Message ID 20190911145542.1125-1-fdmanana@kernel.org (mailing list archive)
State Superseded, archived
Headers show
Series Btrfs: fix unwritten extent buffers and hangs on future writeback attempts | expand

Commit Message

Filipe Manana Sept. 11, 2019, 2:55 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo #2
  [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer.

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/extent_io.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Comments

Chris Mason Sept. 11, 2019, 4:04 p.m. UTC | #1
On 11 Sep 2019, at 15:55, fdmanana@kernel.org wrote:

> From: Filipe Manana <fdmanana@suse.com>
>
> So fix this by not overwriting the return value (ret) with the result
> from flush_write_bio(). We also need to clear the 
> EXTENT_BUFFER_WRITEBACK
> bit in case flush_write_bio() returns an error, otherwise it will hang
> any future attempts to writeback the extent buffer.
>
> This is a regression introduced in the 5.2 kernel.
>
> Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to 
> lock_extent_buffer_for_io()")
> Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in 
> flush_write_bio() one level up")
> Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> Link: 
> https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
> Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
> Link: 
> https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
> Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
> Link: 
> https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/extent_io.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1ff438fd5bc2..1311ba0fc031 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct 
> extent_buffer *eb)
>  		       TASK_UNINTERRUPTIBLE);
>  }
>
> +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> +{
> +	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> +	smp_mb__after_atomic();
> +	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> +}
> +
>  /*
>   * Lock eb pages and flush the bio if we can't the locks
>   *
> @@ -3699,8 +3706,11 @@ static noinline_for_stack int 
> lock_extent_buffer_for_io(struct extent_buffer *eb
>
>  		if (!trylock_page(p)) {
>  			if (!flush) {
> -				ret = flush_write_bio(epd);
> -				if (ret < 0) {
> +				int err;
> +
> +				err = flush_write_bio(epd);
> +				if (err < 0) {
> +					ret = err;
>  					failed_page_nr = i;
>  					goto err_unlock;
>  				}


Dennis (cc'd) has been trying a similar fix against this in production, 
but sending it was interrupted by plumbing conferences.  I think he 
found that it needs to undo this as well:

                 percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
                                          -eb->len,
                                          fs_info->dirty_metadata_batch);

With the IO errors, we should end up aborting the FS.  This function 
also flips the the extent buffer written and dirty flags, and his patch 
resets them as well.  Given that we're aborting anyway, it's not 
critical, but it's probably a good idea to fix things up in the goto 
err_unlock just to make future bugs less likely.

-chris
Filipe Manana Sept. 11, 2019, 4:13 p.m. UTC | #2
On Wed, Sep 11, 2019 at 5:04 PM Chris Mason <clm@fb.com> wrote:
>
> On 11 Sep 2019, at 15:55, fdmanana@kernel.org wrote:
>
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > So fix this by not overwriting the return value (ret) with the result
> > from flush_write_bio(). We also need to clear the
> > EXTENT_BUFFER_WRITEBACK
> > bit in case flush_write_bio() returns an error, otherwise it will hang
> > any future attempts to writeback the extent buffer.
> >
> > This is a regression introduced in the 5.2 kernel.
> >
> > Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to
> > lock_extent_buffer_for_io()")
> > Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in
> > flush_write_bio() one level up")
> > Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> > Link:
> > https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
> > Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
> > Link:
> > https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
> > Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
> > Link:
> > https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  fs/btrfs/extent_io.c | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 1ff438fd5bc2..1311ba0fc031 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct
> > extent_buffer *eb)
> >                      TASK_UNINTERRUPTIBLE);
> >  }
> >
> > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > +{
> > +     clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > +     smp_mb__after_atomic();
> > +     wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > +}
> > +
> >  /*
> >   * Lock eb pages and flush the bio if we can't the locks
> >   *
> > @@ -3699,8 +3706,11 @@ static noinline_for_stack int
> > lock_extent_buffer_for_io(struct extent_buffer *eb
> >
> >               if (!trylock_page(p)) {
> >                       if (!flush) {
> > -                             ret = flush_write_bio(epd);
> > -                             if (ret < 0) {
> > +                             int err;
> > +
> > +                             err = flush_write_bio(epd);
> > +                             if (err < 0) {
> > +                                     ret = err;
> >                                       failed_page_nr = i;
> >                                       goto err_unlock;
> >                               }
>
>
> Dennis (cc'd) has been trying a similar fix against this in production,
> but sending it was interrupted by plumbing conferences.  I think he
> found that it needs to undo this as well:
>
>                  percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
>                                           -eb->len,
>                                           fs_info->dirty_metadata_batch);
>
> With the IO errors, we should end up aborting the FS.  This function
> also flips the the extent buffer written and dirty flags, and his patch
> resets them as well.  Given that we're aborting anyway, it's not
> critical, but it's probably a good idea to fix things up in the goto
> err_unlock just to make future bugs less likely.

Yes, I considered that at some point (undo everything done so far,
under the locks, etc) and thought it was pointless as well because we
abort.

But we may not abort - if we never start the writeback for an eb
because we returned error from flush_write_bio(), we can leave
btree_write_cache_pages() without noticing the error.
Since writeback never started, and btree_write_cache_pages() didn't
return the error, the transaction commit path may never get an error
from filemap_fdatawrite_range,
and we can commit the transaction despite failure to start writeback
for some extent buffer.

A problem that existed before that regression in 5.2 anyway. Sending
it as separate.

I'll include the undo of all operations in patch however, it doesn't
hurt for sure.

Thanks.

>
> -chris
Dennis Zhou Sept. 11, 2019, 4:54 p.m. UTC | #3
On Wed, Sep 11, 2019 at 05:13:15PM +0100, Filipe Manana wrote:
> On Wed, Sep 11, 2019 at 5:04 PM Chris Mason <clm@fb.com> wrote:
> >
> > On 11 Sep 2019, at 15:55, fdmanana@kernel.org wrote:
> >
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > So fix this by not overwriting the return value (ret) with the result
> > > from flush_write_bio(). We also need to clear the
> > > EXTENT_BUFFER_WRITEBACK
> > > bit in case flush_write_bio() returns an error, otherwise it will hang
> > > any future attempts to writeback the extent buffer.
> > >
> > > This is a regression introduced in the 5.2 kernel.
> > >
> > > Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to
> > > lock_extent_buffer_for_io()")
> > > Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in
> > > flush_write_bio() one level up")
> > > Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> > > Link:
> > > https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
> > > Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
> > > Link:
> > > https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
> > > Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
> > > Link:
> > > https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > >  fs/btrfs/extent_io.c | 23 ++++++++++++++---------
> > >  1 file changed, 14 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > index 1ff438fd5bc2..1311ba0fc031 100644
> > > --- a/fs/btrfs/extent_io.c
> > > +++ b/fs/btrfs/extent_io.c
> > > @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct
> > > extent_buffer *eb)
> > >                      TASK_UNINTERRUPTIBLE);
> > >  }
> > >
> > > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > > +{
> > > +     clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > > +     smp_mb__after_atomic();
> > > +     wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > > +}
> > > +
> > >  /*
> > >   * Lock eb pages and flush the bio if we can't the locks
> > >   *
> > > @@ -3699,8 +3706,11 @@ static noinline_for_stack int
> > > lock_extent_buffer_for_io(struct extent_buffer *eb
> > >
> > >               if (!trylock_page(p)) {
> > >                       if (!flush) {
> > > -                             ret = flush_write_bio(epd);
> > > -                             if (ret < 0) {
> > > +                             int err;
> > > +
> > > +                             err = flush_write_bio(epd);
> > > +                             if (err < 0) {
> > > +                                     ret = err;
> > >                                       failed_page_nr = i;
> > >                                       goto err_unlock;
> > >                               }
> >
> >
> > Dennis (cc'd) has been trying a similar fix against this in production,
> > but sending it was interrupted by plumbing conferences.  I think he
> > found that it needs to undo this as well:
> >
> >                  percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> >                                           -eb->len,
> >                                           fs_info->dirty_metadata_batch);
> >
> > With the IO errors, we should end up aborting the FS.  This function
> > also flips the the extent buffer written and dirty flags, and his patch
> > resets them as well.  Given that we're aborting anyway, it's not
> > critical, but it's probably a good idea to fix things up in the goto
> > err_unlock just to make future bugs less likely.
> 
> Yes, I considered that at some point (undo everything done so far,
> under the locks, etc) and thought it was pointless as well because we
> abort.
> 
> But we may not abort - if we never start the writeback for an eb
> because we returned error from flush_write_bio(), we can leave
> btree_write_cache_pages() without noticing the error.
> Since writeback never started, and btree_write_cache_pages() didn't
> return the error, the transaction commit path may never get an error
> from filemap_fdatawrite_range,
> and we can commit the transaction despite failure to start writeback
> for some extent buffer.
> 
> A problem that existed before that regression in 5.2 anyway. Sending
> it as separate.
> 
> I'll include the undo of all operations in patch however, it doesn't
> hurt for sure.
> 
> Thanks.
> 
> >
> > -chris

Hello,

I should have pushed this upstream sooner, I was hoping to have one of
my test hosts hit my WARN_ON() in a separate patch, but it hasn't.

The following is what I have to unblock 5.2 testing.

I think your patch is missing resetting the header bits + the percpu
metadata counter. I think on error we should break out of
btree_write_cache_pages() and return it too.

Thanks,
Dennis

-----
From 1a57b5ee6e52c63bf7c8e3ae969c0df406e3cf69 Mon Sep 17 00:00:00 2001
From: Dennis Zhou <dennis@kernel.org>
Date: Wed, 4 Sep 2019 10:49:53 -0700
Subject: [PATCH] btrfs: fix stall on writeback bit extent buffer

In lock_extent_buffer_for_io(), if we encounter a blocking action, we
try and flush the currently held onto bio. The failure mode here used to
be a BUG_ON(). f4340622e022 changed this to move BUG_ON() up and
incorrectly reset the current ret code. However,
lock_extent_buffer_for_io() returns 1 on we should write out the pages.
This caused the buffer to be skipped while keeping the writeback bit
set.

Now that we can fail here, we also need to fix up dirty_metadata_bytes,
clear BTRFS_HEADER_FLAG_WRITTEN and EXTENT_BUFFER_WRITEBACK, and set
EXTENT_BUFFER_DIRTY again.

Fixes: f4340622e022 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/extent_io.c | 52 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 43af8245c06e..4ba3cd972a2a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3636,6 +3636,13 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 		       TASK_UNINTERRUPTIBLE);
 }
 
+static void end_extent_buffer_writeback(struct extent_buffer *eb)
+{
+	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
+	smp_mb__after_atomic();
+	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
+}
+
 /*
  * Lock eb pages and flush the bio if we can't the locks
  *
@@ -3707,9 +3714,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				ret = flush_write_bio(epd);
-				if (ret < 0) {
+				int flush_ret = flush_write_bio(epd);
+
+				if (flush_ret < 0) {
 					failed_page_nr = i;
+					ret = flush_ret;
 					goto err_unlock;
 				}
 				flush = 1;
@@ -3723,24 +3732,45 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	/* Unlock already locked pages */
 	for (i = 0; i < failed_page_nr; i++)
 		unlock_page(eb->pages[i]);
-	return ret;
-}
 
-static void end_extent_buffer_writeback(struct extent_buffer *eb)
-{
-	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
-	smp_mb__after_atomic();
-	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
+	/* undo the above above because we failed */
+	btrfs_tree_lock(eb);
+
+	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
+					 eb->len,
+					 fs_info->dirty_metadata_batch);
+
+	btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
+
+	spin_lock(&eb->refs_lock);
+	set_bit(EXTENT_BUFFER_DIRTY, &eb->bflags);
+	spin_unlock(&eb->refs_lock);
+
+	btrfs_tree_unlock(eb);
+
+	end_extent_buffer_writeback(eb);
+
+	return ret;
 }
 
 static void set_btree_ioerr(struct page *page)
 {
 	struct extent_buffer *eb = (struct extent_buffer *)page->private;
+	struct btrfs_fs_info *fs_info;
 
 	SetPageError(page);
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
 		return;
 
+	/*
+	 * We just marked the extent as bad, that means we need retry
+	 * in the future, so fix up the dirty_metadata_bytes accounting.
+	 */
+	fs_info = eb->fs_info;
+	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
+				 eb->len,
+				 fs_info->dirty_metadata_batch);
+
 	/*
 	 * If writeback for a btree extent that doesn't belong to a log tree
 	 * failed, increment the counter transaction->eb_write_errors.
@@ -3977,6 +4007,10 @@ int btree_write_cache_pages(struct address_space *mapping,
 			if (!ret) {
 				free_extent_buffer(eb);
 				continue;
+			} else if (ret < 0) {
+				done = 1;
+				free_extent_buffer(eb);
+				break;
 			}
 
 			ret = write_one_eb(eb, wbc, &epd);
Filipe Manana Sept. 11, 2019, 5:02 p.m. UTC | #4
On Wed, Sep 11, 2019 at 5:54 PM Dennis Zhou <dennis@kernel.org> wrote:
>
> On Wed, Sep 11, 2019 at 05:13:15PM +0100, Filipe Manana wrote:
> > On Wed, Sep 11, 2019 at 5:04 PM Chris Mason <clm@fb.com> wrote:
> > >
> > > On 11 Sep 2019, at 15:55, fdmanana@kernel.org wrote:
> > >
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > So fix this by not overwriting the return value (ret) with the result
> > > > from flush_write_bio(). We also need to clear the
> > > > EXTENT_BUFFER_WRITEBACK
> > > > bit in case flush_write_bio() returns an error, otherwise it will hang
> > > > any future attempts to writeback the extent buffer.
> > > >
> > > > This is a regression introduced in the 5.2 kernel.
> > > >
> > > > Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to
> > > > lock_extent_buffer_for_io()")
> > > > Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in
> > > > flush_write_bio() one level up")
> > > > Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> > > > Link:
> > > > https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
> > > > Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
> > > > Link:
> > > > https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
> > > > Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
> > > > Link:
> > > > https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > ---
> > > >  fs/btrfs/extent_io.c | 23 ++++++++++++++---------
> > > >  1 file changed, 14 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > index 1ff438fd5bc2..1311ba0fc031 100644
> > > > --- a/fs/btrfs/extent_io.c
> > > > +++ b/fs/btrfs/extent_io.c
> > > > @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct
> > > > extent_buffer *eb)
> > > >                      TASK_UNINTERRUPTIBLE);
> > > >  }
> > > >
> > > > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > > > +{
> > > > +     clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > > > +     smp_mb__after_atomic();
> > > > +     wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Lock eb pages and flush the bio if we can't the locks
> > > >   *
> > > > @@ -3699,8 +3706,11 @@ static noinline_for_stack int
> > > > lock_extent_buffer_for_io(struct extent_buffer *eb
> > > >
> > > >               if (!trylock_page(p)) {
> > > >                       if (!flush) {
> > > > -                             ret = flush_write_bio(epd);
> > > > -                             if (ret < 0) {
> > > > +                             int err;
> > > > +
> > > > +                             err = flush_write_bio(epd);
> > > > +                             if (err < 0) {
> > > > +                                     ret = err;
> > > >                                       failed_page_nr = i;
> > > >                                       goto err_unlock;
> > > >                               }
> > >
> > >
> > > Dennis (cc'd) has been trying a similar fix against this in production,
> > > but sending it was interrupted by plumbing conferences.  I think he
> > > found that it needs to undo this as well:
> > >
> > >                  percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> > >                                           -eb->len,
> > >                                           fs_info->dirty_metadata_batch);
> > >
> > > With the IO errors, we should end up aborting the FS.  This function
> > > also flips the the extent buffer written and dirty flags, and his patch
> > > resets them as well.  Given that we're aborting anyway, it's not
> > > critical, but it's probably a good idea to fix things up in the goto
> > > err_unlock just to make future bugs less likely.
> >
> > Yes, I considered that at some point (undo everything done so far,
> > under the locks, etc) and thought it was pointless as well because we
> > abort.
> >
> > But we may not abort - if we never start the writeback for an eb
> > because we returned error from flush_write_bio(), we can leave
> > btree_write_cache_pages() without noticing the error.
> > Since writeback never started, and btree_write_cache_pages() didn't
> > return the error, the transaction commit path may never get an error
> > from filemap_fdatawrite_range,
> > and we can commit the transaction despite failure to start writeback
> > for some extent buffer.
> >
> > A problem that existed before that regression in 5.2 anyway. Sending
> > it as separate.
> >
> > I'll include the undo of all operations in patch however, it doesn't
> > hurt for sure.
> >
> > Thanks.
> >
> > >
> > > -chris
>
> Hello,
>
> I should have pushed this upstream sooner, I was hoping to have one of
> my test hosts hit my WARN_ON() in a separate patch, but it hasn't.
>
> The following is what I have to unblock 5.2 testing.
>
> I think your patch is missing resetting the header bits + the percpu
> metadata counter.

Yes, even though it's not a problem since we will end up aborting the
transaction later.
The v2 I sent some minutes ago does it:
https://patchwork.kernel.org/patch/11141559/

> I think on error we should break out of
> btree_write_cache_pages() and return it too.

Yes, but that's a separate change.
It makes the code clear but it doesn't fix any problem, the errors
will be marked in the end io callback and transaction and log commits
will see them and abort.
Sent a patch for that as well some minutes ago:
https://patchwork.kernel.org/patch/11141561/

>
> Thanks,
> Dennis
>
> -----
> From 1a57b5ee6e52c63bf7c8e3ae969c0df406e3cf69 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <dennis@kernel.org>
> Date: Wed, 4 Sep 2019 10:49:53 -0700
> Subject: [PATCH] btrfs: fix stall on writeback bit extent buffer
>
> In lock_extent_buffer_for_io(), if we encounter a blocking action, we
> try and flush the currently held onto bio. The failure mode here used to
> be a BUG_ON(). f4340622e022 changed this to move BUG_ON() up and
> incorrectly reset the current ret code. However,
> lock_extent_buffer_for_io() returns 1 on we should write out the pages.
> This caused the buffer to be skipped while keeping the writeback bit
> set.
>
> Now that we can fail here, we also need to fix up dirty_metadata_bytes,
> clear BTRFS_HEADER_FLAG_WRITTEN and EXTENT_BUFFER_WRITEBACK, and set
> EXTENT_BUFFER_DIRTY again.
>
> Fixes: f4340622e022 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
>  fs/btrfs/extent_io.c | 52 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 43 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 43af8245c06e..4ba3cd972a2a 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3636,6 +3636,13 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
>                        TASK_UNINTERRUPTIBLE);
>  }
>
> +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> +{
> +       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> +       smp_mb__after_atomic();
> +       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> +}
> +
>  /*
>   * Lock eb pages and flush the bio if we can't the locks
>   *
> @@ -3707,9 +3714,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
>
>                 if (!trylock_page(p)) {
>                         if (!flush) {
> -                               ret = flush_write_bio(epd);
> -                               if (ret < 0) {
> +                               int flush_ret = flush_write_bio(epd);
> +
> +                               if (flush_ret < 0) {
>                                         failed_page_nr = i;
> +                                       ret = flush_ret;
>                                         goto err_unlock;
>                                 }
>                                 flush = 1;
> @@ -3723,24 +3732,45 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
>         /* Unlock already locked pages */
>         for (i = 0; i < failed_page_nr; i++)
>                 unlock_page(eb->pages[i]);
> -       return ret;
> -}
>
> -static void end_extent_buffer_writeback(struct extent_buffer *eb)
> -{
> -       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> -       smp_mb__after_atomic();
> -       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> +       /* undo the above above because we failed */
> +       btrfs_tree_lock(eb);
> +
> +       percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> +                                        eb->len,
> +                                        fs_info->dirty_metadata_batch);
> +
> +       btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
> +
> +       spin_lock(&eb->refs_lock);
> +       set_bit(EXTENT_BUFFER_DIRTY, &eb->bflags);
> +       spin_unlock(&eb->refs_lock);

Clearing the writeback bit should also be done while holding eb->refs_lock.
Everything else is equivalent to what I sent in v2.

> +
> +       btrfs_tree_unlock(eb);
> +
> +       end_extent_buffer_writeback(eb);
> +
> +       return ret;
>  }
>
>  static void set_btree_ioerr(struct page *page)
>  {
>         struct extent_buffer *eb = (struct extent_buffer *)page->private;
> +       struct btrfs_fs_info *fs_info;
>
>         SetPageError(page);
>         if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
>                 return;
>
> +       /*
> +        * We just marked the extent as bad, that means we need retry
> +        * in the future, so fix up the dirty_metadata_bytes accounting.
> +        */
> +       fs_info = eb->fs_info;
> +       percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> +                                eb->len,
> +                                fs_info->dirty_metadata_batch);
> +

This is a separate change from the 5.2 regression. Should be separate
patch with a specific changelog IMHO.
Can you please submit that?

>         /*
>          * If writeback for a btree extent that doesn't belong to a log tree
>          * failed, increment the counter transaction->eb_write_errors.
> @@ -3977,6 +4007,10 @@ int btree_write_cache_pages(struct address_space *mapping,
>                         if (!ret) {
>                                 free_extent_buffer(eb);
>                                 continue;
> +                       } else if (ret < 0) {
> +                               done = 1;
> +                               free_extent_buffer(eb);
> +                               break;
>                         }

Separate change as well, not cause by the 5.2 regression. So separate
patch with a specific changelog as well IMHO.
Anyway, this hunk is exactly like the separate patch I sent some minutes ago.

Thanks!

>
>                         ret = write_one_eb(eb, wbc, &epd);
> --
> 2.17.1
>
Dennis Zhou Sept. 11, 2019, 5:37 p.m. UTC | #5
On Wed, Sep 11, 2019 at 06:02:58PM +0100, Filipe Manana wrote:
> On Wed, Sep 11, 2019 at 5:54 PM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > On Wed, Sep 11, 2019 at 05:13:15PM +0100, Filipe Manana wrote:
> > > On Wed, Sep 11, 2019 at 5:04 PM Chris Mason <clm@fb.com> wrote:
> > > >
> > > > On 11 Sep 2019, at 15:55, fdmanana@kernel.org wrote:
> > > >
> > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > >
> > > > > So fix this by not overwriting the return value (ret) with the result
> > > > > from flush_write_bio(). We also need to clear the
> > > > > EXTENT_BUFFER_WRITEBACK
> > > > > bit in case flush_write_bio() returns an error, otherwise it will hang
> > > > > any future attempts to writeback the extent buffer.
> > > > >
> > > > > This is a regression introduced in the 5.2 kernel.
> > > > >
> > > > > Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to
> > > > > lock_extent_buffer_for_io()")
> > > > > Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in
> > > > > flush_write_bio() one level up")
> > > > > Reported-by: Zdenek Sojka <zsojka@seznam.cz>
> > > > > Link:
> > > > > https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
> > > > > Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
> > > > > Link:
> > > > > https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
> > > > > Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
> > > > > Link:
> > > > > https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
> > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
> > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > > ---
> > > > >  fs/btrfs/extent_io.c | 23 ++++++++++++++---------
> > > > >  1 file changed, 14 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > > index 1ff438fd5bc2..1311ba0fc031 100644
> > > > > --- a/fs/btrfs/extent_io.c
> > > > > +++ b/fs/btrfs/extent_io.c
> > > > > @@ -3628,6 +3628,13 @@ void wait_on_extent_buffer_writeback(struct
> > > > > extent_buffer *eb)
> > > > >                      TASK_UNINTERRUPTIBLE);
> > > > >  }
> > > > >
> > > > > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > > > > +{
> > > > > +     clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > > > > +     smp_mb__after_atomic();
> > > > > +     wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Lock eb pages and flush the bio if we can't the locks
> > > > >   *
> > > > > @@ -3699,8 +3706,11 @@ static noinline_for_stack int
> > > > > lock_extent_buffer_for_io(struct extent_buffer *eb
> > > > >
> > > > >               if (!trylock_page(p)) {
> > > > >                       if (!flush) {
> > > > > -                             ret = flush_write_bio(epd);
> > > > > -                             if (ret < 0) {
> > > > > +                             int err;
> > > > > +
> > > > > +                             err = flush_write_bio(epd);
> > > > > +                             if (err < 0) {
> > > > > +                                     ret = err;
> > > > >                                       failed_page_nr = i;
> > > > >                                       goto err_unlock;
> > > > >                               }
> > > >
> > > >
> > > > Dennis (cc'd) has been trying a similar fix against this in production,
> > > > but sending it was interrupted by plumbing conferences.  I think he
> > > > found that it needs to undo this as well:
> > > >
> > > >                  percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> > > >                                           -eb->len,
> > > >                                           fs_info->dirty_metadata_batch);
> > > >
> > > > With the IO errors, we should end up aborting the FS.  This function
> > > > also flips the the extent buffer written and dirty flags, and his patch
> > > > resets them as well.  Given that we're aborting anyway, it's not
> > > > critical, but it's probably a good idea to fix things up in the goto
> > > > err_unlock just to make future bugs less likely.
> > >
> > > Yes, I considered that at some point (undo everything done so far,
> > > under the locks, etc) and thought it was pointless as well because we
> > > abort.
> > >
> > > But we may not abort - if we never start the writeback for an eb
> > > because we returned error from flush_write_bio(), we can leave
> > > btree_write_cache_pages() without noticing the error.
> > > Since writeback never started, and btree_write_cache_pages() didn't
> > > return the error, the transaction commit path may never get an error
> > > from filemap_fdatawrite_range,
> > > and we can commit the transaction despite failure to start writeback
> > > for some extent buffer.
> > >
> > > A problem that existed before that regression in 5.2 anyway. Sending
> > > it as separate.
> > >
> > > I'll include the undo of all operations in patch however, it doesn't
> > > hurt for sure.
> > >
> > > Thanks.
> > >
> > > >
> > > > -chris
> >
> > Hello,
> >
> > I should have pushed this upstream sooner, I was hoping to have one of
> > my test hosts hit my WARN_ON() in a separate patch, but it hasn't.
> >
> > The following is what I have to unblock 5.2 testing.
> >
> > I think your patch is missing resetting the header bits + the percpu
> > metadata counter.
> 
> Yes, even though it's not a problem since we will end up aborting the
> transaction later.
> The v2 I sent some minutes ago does it:
> https://patchwork.kernel.org/patch/11141559/
> 

I saw it as I refreshed the page after I sent my email :).

> > I think on error we should break out of
> > btree_write_cache_pages() and return it too.
> 
> Yes, but that's a separate change.
> It makes the code clear but it doesn't fix any problem, the errors
> will be marked in the end io callback and transaction and log commits
> will see them and abort.
> Sent a patch for that as well some minutes ago:
> https://patchwork.kernel.org/patch/11141561/
> 
> >
> > Thanks,
> > Dennis
> >
> > -----
> > From 1a57b5ee6e52c63bf7c8e3ae969c0df406e3cf69 Mon Sep 17 00:00:00 2001
> > From: Dennis Zhou <dennis@kernel.org>
> > Date: Wed, 4 Sep 2019 10:49:53 -0700
> > Subject: [PATCH] btrfs: fix stall on writeback bit extent buffer
> >
> > In lock_extent_buffer_for_io(), if we encounter a blocking action, we
> > try and flush the currently held onto bio. The failure mode here used to
> > be a BUG_ON(). f4340622e022 changed this to move BUG_ON() up and
> > incorrectly reset the current ret code. However,
> > lock_extent_buffer_for_io() returns 1 on we should write out the pages.
> > This caused the buffer to be skipped while keeping the writeback bit
> > set.
> >
> > Now that we can fail here, we also need to fix up dirty_metadata_bytes,
> > clear BTRFS_HEADER_FLAG_WRITTEN and EXTENT_BUFFER_WRITEBACK, and set
> > EXTENT_BUFFER_DIRTY again.
> >
> > Fixes: f4340622e022 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > ---
> >  fs/btrfs/extent_io.c | 52 ++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 43 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 43af8245c06e..4ba3cd972a2a 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -3636,6 +3636,13 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
> >                        TASK_UNINTERRUPTIBLE);
> >  }
> >
> > +static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > +{
> > +       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > +       smp_mb__after_atomic();
> > +       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > +}
> > +
> >  /*
> >   * Lock eb pages and flush the bio if we can't the locks
> >   *
> > @@ -3707,9 +3714,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
> >
> >                 if (!trylock_page(p)) {
> >                         if (!flush) {
> > -                               ret = flush_write_bio(epd);
> > -                               if (ret < 0) {
> > +                               int flush_ret = flush_write_bio(epd);
> > +
> > +                               if (flush_ret < 0) {
> >                                         failed_page_nr = i;
> > +                                       ret = flush_ret;
> >                                         goto err_unlock;
> >                                 }
> >                                 flush = 1;
> > @@ -3723,24 +3732,45 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
> >         /* Unlock already locked pages */
> >         for (i = 0; i < failed_page_nr; i++)
> >                 unlock_page(eb->pages[i]);
> > -       return ret;
> > -}
> >
> > -static void end_extent_buffer_writeback(struct extent_buffer *eb)
> > -{
> > -       clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
> > -       smp_mb__after_atomic();
> > -       wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
> > +       /* undo the above above because we failed */
> > +       btrfs_tree_lock(eb);
> > +
> > +       percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> > +                                        eb->len,
> > +                                        fs_info->dirty_metadata_batch);
> > +
> > +       btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
> > +
> > +       spin_lock(&eb->refs_lock);
> > +       set_bit(EXTENT_BUFFER_DIRTY, &eb->bflags);
> > +       spin_unlock(&eb->refs_lock);
> 
> Clearing the writeback bit should also be done while holding eb->refs_lock.
> Everything else is equivalent to what I sent in v2.
> 
> > +
> > +       btrfs_tree_unlock(eb);
> > +
> > +       end_extent_buffer_writeback(eb);
> > +
> > +       return ret;
> >  }
> >
> >  static void set_btree_ioerr(struct page *page)
> >  {
> >         struct extent_buffer *eb = (struct extent_buffer *)page->private;
> > +       struct btrfs_fs_info *fs_info;
> >
> >         SetPageError(page);
> >         if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
> >                 return;
> >
> > +       /*
> > +        * We just marked the extent as bad, that means we need retry
> > +        * in the future, so fix up the dirty_metadata_bytes accounting.
> > +        */
> > +       fs_info = eb->fs_info;
> > +       percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> > +                                eb->len,
> > +                                fs_info->dirty_metadata_batch);
> > +
> 
> This is a separate change from the 5.2 regression. Should be separate
> patch with a specific changelog IMHO.
> Can you please submit that?
> 

Will do.

> >         /*
> >          * If writeback for a btree extent that doesn't belong to a log tree
> >          * failed, increment the counter transaction->eb_write_errors.
> > @@ -3977,6 +4007,10 @@ int btree_write_cache_pages(struct address_space *mapping,
> >                         if (!ret) {
> >                                 free_extent_buffer(eb);
> >                                 continue;
> > +                       } else if (ret < 0) {
> > +                               done = 1;
> > +                               free_extent_buffer(eb);
> > +                               break;
> >                         }
> 
> Separate change as well, not cause by the 5.2 regression. So separate
> patch with a specific changelog as well IMHO.
> Anyway, this hunk is exactly like the separate patch I sent some minutes ago.
> 
> Thanks!
> 
> >
> >                         ret = write_one_eb(eb, wbc, &epd);
> > --
> > 2.17.1
> >

Thanks,
Dennis
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1ff438fd5bc2..1311ba0fc031 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3628,6 +3628,13 @@  void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 		       TASK_UNINTERRUPTIBLE);
 }
 
+static void end_extent_buffer_writeback(struct extent_buffer *eb)
+{
+	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
+	smp_mb__after_atomic();
+	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
+}
+
 /*
  * Lock eb pages and flush the bio if we can't the locks
  *
@@ -3699,8 +3706,11 @@  static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				ret = flush_write_bio(epd);
-				if (ret < 0) {
+				int err;
+
+				err = flush_write_bio(epd);
+				if (err < 0) {
+					ret = err;
 					failed_page_nr = i;
 					goto err_unlock;
 				}
@@ -3715,16 +3725,11 @@  static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	/* Unlock already locked pages */
 	for (i = 0; i < failed_page_nr; i++)
 		unlock_page(eb->pages[i]);
+	/* Clear EXTENT_BUFFER_WRITEBACK and wake up anyone waiting on it. */
+	end_extent_buffer_writeback(eb);
 	return ret;
 }
 
-static void end_extent_buffer_writeback(struct extent_buffer *eb)
-{
-	clear_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags);
-	smp_mb__after_atomic();
-	wake_up_bit(&eb->bflags, EXTENT_BUFFER_WRITEBACK);
-}
-
 static void set_btree_ioerr(struct page *page)
 {
 	struct extent_buffer *eb = (struct extent_buffer *)page->private;