btrfs: flush write bio if we loop in extent_write_cache_pages
diff mbox series

Message ID 20200123203302.2180124-1-josef@toxicpanda.com
State New
Headers show
Series
  • btrfs: flush write bio if we loop in extent_write_cache_pages
Related show

Commit Message

Josef Bacik Jan. 23, 2020, 8:33 p.m. UTC
There exists a deadlock with range_cyclic that has existed forever.  If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out.  The task traces are as follows

PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
 #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
 #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
 #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
 #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd

PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
"aio-dio-invalid"
 #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
 #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
 #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032

I used drgn to find the respective pages we were stuck on

page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874

As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
following

crash> struct extent_page_data ffffc900297bbbb0
struct extent_page_data {
  bio = 0xffff889f747ed830,
  tree = 0xffff889eed6ba448,
  extent_locked = 0,
  sync_io = 0
}

and using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on

bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
for i in range(0, bio.bi_vcnt.value_()):
    bv = bio.bi_io_vec[i]
    if bv.bv_page.value_() == 0xffffea00fbfc7500:
        print("FOUND IT")

which validated what I suspected.

The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.

Fixes: b293f02e1423 ("Btrfs: Add writepages support")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Filipe Manana Jan. 24, 2020, 9:28 a.m. UTC | #1
On Thu, Jan 23, 2020 at 8:51 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> There exists a deadlock with range_cyclic that has existed forever.  If
> we loop around with a bio already built we could deadlock with a writer
> who has the page locked that we're attempting to write but is waiting on
> a page in our bio to be written out.  The task traces are as follows
>
> PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
>  #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
>  #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
>  #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
>  #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
>  #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
>  #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
>  #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
>  #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
>  #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd
>
> PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
> "aio-dio-invalid"
>  #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
>  #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
>  #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
>  #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
>  #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
>  #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
>  #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
>  #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
>  #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032
>
> I used drgn to find the respective pages we were stuck on
>
> page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
> page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874
>
> As you can see the kworker is waiting for bit 0 (PG_locked) on index
> 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
> 8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
> following

Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).
Anyway, it's a very minor thing and easy to figure that out.

>
> crash> struct extent_page_data ffffc900297bbbb0
> struct extent_page_data {
>   bio = 0xffff889f747ed830,
>   tree = 0xffff889eed6ba448,
>   extent_locked = 0,
>   sync_io = 0
> }
>
> and using drgn I walked the bio pages looking for page
> 0xffffea00fbfc7500 which is the one we're waiting for writeback on
>
> bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
> for i in range(0, bio.bi_vcnt.value_()):
>     bv = bio.bi_io_vec[i]
>     if bv.bv_page.value_() == 0xffffea00fbfc7500:
>         print("FOUND IT")
>
> which validated what I suspected.
>
> The fix for this is simple, flush the epd before we loop back around to
> the beginning of the file during writeout.
>
> Fixes: b293f02e1423 ("Btrfs: Add writepages support")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

That dgrn, where is it? A quick google search pointed me only to
blockchain stuff on the first page of results.

Thanks.

> ---
>  fs/btrfs/extent_io.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0b5513f98a67..e3295f2d2975 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4181,7 +4181,16 @@ static int extent_write_cache_pages(struct address_space *mapping,
>                  */
>                 scanned = 1;
>                 index = 0;
> -               goto retry;
> +
> +               /*
> +                * If we're looping we could run into a page that is locked by a
> +                * writer and that writer could be waiting on writeback for a
> +                * page in our current bio, and thus deadlock, so flush the
> +                * write bio here.
> +                */
> +               ret = flush_write_bio(epd);
> +               if (!ret)
> +                       goto retry;
>         }
>
>         if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
> --
> 2.24.1
>
Holger Hoffstätte Jan. 24, 2020, 10:25 a.m. UTC | #2
On 1/24/20 10:28 AM, Filipe Manana wrote:
> That dgrn, where is it? A quick google search pointed me only to
> blockchain stuff on the first page of results.

drgn aka "dragon", the author should sound familiar ;)

https://drgn.readthedocs.io/en/latest/

Sounds incredibly useful!

-h
Chris Mason Jan. 24, 2020, 12:45 p.m. UTC | #3
On 24 Jan 2020, at 4:28, Filipe Manana wrote:

> On Thu, Jan 23, 2020 at 8:51 PM Josef Bacik <josef@toxicpanda.com> 
> wrote:
>>
>> There exists a deadlock with range_cyclic that has existed forever.  
>> If
>> we loop around with a bio already built we could deadlock with a 
>> writer
>> who has the page locked that we're attempting to write but is waiting 
>> on
>> a page in our bio to be written out.  The task traces are as follows
>>
>> PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: 
>> "kworker/u113:5"
>>  #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
>>  #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
>>  #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
>>  #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
>>  #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
>>  #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
>>  #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
>>  #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
>>  #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
>>  #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd
>>
>> PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
>> "aio-dio-invalid"
>>  #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
>>  #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
>>  #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
>>  #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
>>  #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
>>  #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
>>  #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
>>  #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
>>  #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
>>  #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032
>>
>> I used drgn to find the respective pages we were stuck on
>>
>> page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
>> page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874
>>
>> As you can see the kworker is waiting for bit 0 (PG_locked) on index
>> 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on 
>> index
>> 8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
>> following
>
> Probably worth mentioning as well that it waits for writeback of the
> page to complete while holding a lock on it (at prepare_pages()).
> Anyway, it's a very minor thing and easy to figure that out.

Yeah, I was wondering if we should drop the lock in prepare_pages() and 
make a goto again, but I don't think more complexity there is a good 
idea.

>
>>
>> crash> struct extent_page_data ffffc900297bbbb0
>> struct extent_page_data {
>>   bio = 0xffff889f747ed830,
>>   tree = 0xffff889eed6ba448,
>>   extent_locked = 0,
>>   sync_io = 0
>> }
>>
>> and using drgn I walked the bio pages looking for page
>> 0xffffea00fbfc7500 which is the one we're waiting for writeback on
>>
>> bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
>> for i in range(0, bio.bi_vcnt.value_()):
>>     bv = bio.bi_io_vec[i]
>>     if bv.bv_page.value_() == 0xffffea00fbfc7500:
>>         print("FOUND IT")
>>
>> which validated what I suspected.
>>
>> The fix for this is simple, flush the epd before we loop back around 
>> to
>> the beginning of the file during writeout.
>>
>> Fixes: b293f02e1423 ("Btrfs: Add writepages support")
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>
> That dgrn, where is it? A quick google search pointed me only to
> blockchain stuff on the first page of results.

Holger had the right link for the docs, but 
https://github.com/osandov/drgn works too.

Josef intermixes crash and drgn, but drgn supports most of the things 
crash can do these days.

For this command:

crash> struct extent_page_data ffffc900297bbbb0

You can just ask drgn for the register you care about instead of having 
to find it on the stack (see class drgn.StackFrame in the docs).

Tejun and Roman both shipped drgn scripts to help monitor bits of the 
kernel:

https://lkml.org/lkml/2019/7/10/793
https://lore.kernel.org/linux-mm/20191016001504.1099106-1-guro@fb.com/t/

-chris
David Sterba Jan. 28, 2020, 4:32 p.m. UTC | #4
On Thu, Jan 23, 2020 at 03:33:02PM -0500, Josef Bacik wrote:
> There exists a deadlock with range_cyclic that has existed forever.  If
> we loop around with a bio already built we could deadlock with a writer
> who has the page locked that we're attempting to write but is waiting on
> a page in our bio to be written out.  The task traces are as follows
> 
> PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
>  #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
>  #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
>  #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
>  #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
>  #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
>  #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
>  #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
>  #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
>  #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd
> 
> PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
> "aio-dio-invalid"
>  #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
>  #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
>  #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
>  #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
>  #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
>  #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
>  #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
>  #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
>  #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032
> 
> I used drgn to find the respective pages we were stuck on
> 
> page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
> page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874
> 
> As you can see the kworker is waiting for bit 0 (PG_locked) on index
> 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
> 8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
> following
> 
> crash> struct extent_page_data ffffc900297bbbb0
> struct extent_page_data {
>   bio = 0xffff889f747ed830,
>   tree = 0xffff889eed6ba448,
>   extent_locked = 0,
>   sync_io = 0
> }
> 
> and using drgn I walked the bio pages looking for page
> 0xffffea00fbfc7500 which is the one we're waiting for writeback on
> 
> bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
> for i in range(0, bio.bi_vcnt.value_()):
>     bv = bio.bi_io_vec[i]
>     if bv.bv_page.value_() == 0xffffea00fbfc7500:
>         print("FOUND IT")
> 
> which validated what I suspected.
> 
> The fix for this is simple, flush the epd before we loop back around to
> the beginning of the file during writeout.
> 
> Fixes: b293f02e1423 ("Btrfs: Add writepages support")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0b5513f98a67..e3295f2d2975 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4181,7 +4181,16 @@  static int extent_write_cache_pages(struct address_space *mapping,
 		 */
 		scanned = 1;
 		index = 0;
-		goto retry;
+
+		/*
+		 * If we're looping we could run into a page that is locked by a
+		 * writer and that writer could be waiting on writeback for a
+		 * page in our current bio, and thus deadlock, so flush the
+		 * write bio here.
+		 */
+		ret = flush_write_bio(epd);
+		if (!ret)
+			goto retry;
 	}
 
 	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))