Message ID | 20200123203302.2180124-1-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: flush write bio if we loop in extent_write_cache_pages | expand |
On Thu, Jan 23, 2020 at 8:51 PM Josef Bacik <josef@toxicpanda.com> wrote: > > There exists a deadlock with range_cyclic that has existed forever. If > we loop around with a bio already built we could deadlock with a writer > who has the page locked that we're attempting to write but is waiting on > a page in our bio to be written out. The task traces are as follows > > PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" > #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f > #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 > #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 > #3 [ffffc900297bb708] __lock_page at ffffffff811f145b > #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 > #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 > #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff > #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 > #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 > #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd > > PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: > "aio-dio-invalid" > #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f > #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 > #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 > #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 > #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 > #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 > #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 > #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 > #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d > #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 > > I used drgn to find the respective pages we were stuck on > > page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 > page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 > > As you can see the kworker is waiting for bit 0 (PG_locked) on index > 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index > 8148. aio-dio-invalid has 7680, and the kworker epd looks like the > following Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Anyway, it's a very minor thing and easy to figure that out. > > crash> struct extent_page_data ffffc900297bbbb0 > struct extent_page_data { > bio = 0xffff889f747ed830, > tree = 0xffff889eed6ba448, > extent_locked = 0, > sync_io = 0 > } > > and using drgn I walked the bio pages looking for page > 0xffffea00fbfc7500 which is the one we're waiting for writeback on > > bio = Object(prog, 'struct bio', address=0xffff889f747ed830) > for i in range(0, bio.bi_vcnt.value_()): > bv = bio.bi_io_vec[i] > if bv.bv_page.value_() == 0xffffea00fbfc7500: > print("FOUND IT") > > which validated what I suspected. > > The fix for this is simple, flush the epd before we loop back around to > the beginning of the file during writeout. > > Fixes: b293f02e1423 ("Btrfs: Add writepages support") > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> That dgrn, where is it? A quick google search pointed me only to blockchain stuff on the first page of results. Thanks. > --- > fs/btrfs/extent_io.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 0b5513f98a67..e3295f2d2975 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4181,7 +4181,16 @@ static int extent_write_cache_pages(struct address_space *mapping, > */ > scanned = 1; > index = 0; > - goto retry; > + > + /* > + * If we're looping we could run into a page that is locked by a > + * writer and that writer could be waiting on writeback for a > + * page in our current bio, and thus deadlock, so flush the > + * write bio here. > + */ > + ret = flush_write_bio(epd); > + if (!ret) > + goto retry; > } > > if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) > -- > 2.24.1 >
On 1/24/20 10:28 AM, Filipe Manana wrote: > That dgrn, where is it? A quick google search pointed me only to > blockchain stuff on the first page of results. drgn aka "dragon", the author should sound familiar ;) https://drgn.readthedocs.io/en/latest/ Sounds incredibly useful! -h
On 24 Jan 2020, at 4:28, Filipe Manana wrote: > On Thu, Jan 23, 2020 at 8:51 PM Josef Bacik <josef@toxicpanda.com> > wrote: >> >> There exists a deadlock with range_cyclic that has existed forever. >> If >> we loop around with a bio already built we could deadlock with a >> writer >> who has the page locked that we're attempting to write but is waiting >> on >> a page in our bio to be written out. The task traces are as follows >> >> PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: >> "kworker/u113:5" >> #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f >> #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 >> #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 >> #3 [ffffc900297bb708] __lock_page at ffffffff811f145b >> #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 >> #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 >> #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff >> #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 >> #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 >> #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd >> >> PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: >> "aio-dio-invalid" >> #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f >> #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 >> #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 >> #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 >> #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 >> #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 >> #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 >> #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 >> #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d >> #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 >> >> I used drgn to find the respective pages we were stuck on >> >> page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 >> page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 >> >> As you can see the kworker is waiting for bit 0 (PG_locked) on index >> 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on >> index >> 8148. aio-dio-invalid has 7680, and the kworker epd looks like the >> following > > Probably worth mentioning as well that it waits for writeback of the > page to complete while holding a lock on it (at prepare_pages()). > Anyway, it's a very minor thing and easy to figure that out. Yeah, I was wondering if we should drop the lock in prepare_pages() and make a goto again, but I don't think more complexity there is a good idea. > >> >> crash> struct extent_page_data ffffc900297bbbb0 >> struct extent_page_data { >> bio = 0xffff889f747ed830, >> tree = 0xffff889eed6ba448, >> extent_locked = 0, >> sync_io = 0 >> } >> >> and using drgn I walked the bio pages looking for page >> 0xffffea00fbfc7500 which is the one we're waiting for writeback on >> >> bio = Object(prog, 'struct bio', address=0xffff889f747ed830) >> for i in range(0, bio.bi_vcnt.value_()): >> bv = bio.bi_io_vec[i] >> if bv.bv_page.value_() == 0xffffea00fbfc7500: >> print("FOUND IT") >> >> which validated what I suspected. >> >> The fix for this is simple, flush the epd before we loop back around >> to >> the beginning of the file during writeout. >> >> Fixes: b293f02e1423 ("Btrfs: Add writepages support") >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > > That dgrn, where is it? A quick google search pointed me only to > blockchain stuff on the first page of results. Holger had the right link for the docs, but https://github.com/osandov/drgn works too. Josef intermixes crash and drgn, but drgn supports most of the things crash can do these days. For this command: crash> struct extent_page_data ffffc900297bbbb0 You can just ask drgn for the register you care about instead of having to find it on the stack (see class drgn.StackFrame in the docs). Tejun and Roman both shipped drgn scripts to help monitor bits of the kernel: https://lkml.org/lkml/2019/7/10/793 https://lore.kernel.org/linux-mm/20191016001504.1099106-1-guro@fb.com/t/ -chris
On Thu, Jan 23, 2020 at 03:33:02PM -0500, Josef Bacik wrote: > There exists a deadlock with range_cyclic that has existed forever. If > we loop around with a bio already built we could deadlock with a writer > who has the page locked that we're attempting to write but is waiting on > a page in our bio to be written out. The task traces are as follows > > PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" > #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f > #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 > #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 > #3 [ffffc900297bb708] __lock_page at ffffffff811f145b > #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 > #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 > #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff > #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 > #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 > #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd > > PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: > "aio-dio-invalid" > #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f > #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 > #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 > #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 > #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 > #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 > #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 > #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 > #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d > #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 > > I used drgn to find the respective pages we were stuck on > > page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 > page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 > > As you can see the kworker is waiting for bit 0 (PG_locked) on index > 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index > 8148. aio-dio-invalid has 7680, and the kworker epd looks like the > following > > crash> struct extent_page_data ffffc900297bbbb0 > struct extent_page_data { > bio = 0xffff889f747ed830, > tree = 0xffff889eed6ba448, > extent_locked = 0, > sync_io = 0 > } > > and using drgn I walked the bio pages looking for page > 0xffffea00fbfc7500 which is the one we're waiting for writeback on > > bio = Object(prog, 'struct bio', address=0xffff889f747ed830) > for i in range(0, bio.bi_vcnt.value_()): > bv = bio.bi_io_vec[i] > if bv.bv_page.value_() == 0xffffea00fbfc7500: > print("FOUND IT") > > which validated what I suspected. > > The fix for this is simple, flush the epd before we loop back around to > the beginning of the file during writeout. > > Fixes: b293f02e1423 ("Btrfs: Add writepages support") > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Added to misc-next, thanks.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0b5513f98a67..e3295f2d2975 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4181,7 +4181,16 @@ static int extent_write_cache_pages(struct address_space *mapping, */ scanned = 1; index = 0; - goto retry; + + /* + * If we're looping we could run into a page that is locked by a + * writer and that writer could be waiting on writeback for a + * page in our current bio, and thus deadlock, so flush the + * write bio here. + */ + ret = flush_write_bio(epd); + if (!ret) + goto retry; } if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } and using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: b293f02e1423 ("Btrfs: Add writepages support") Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/extent_io.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)