btrfs: fix improper setting of scanned
diff mbox series

Message ID 20200103153844.13852-1-josef@toxicpanda.com
State New
Headers show
Series
  • btrfs: fix improper setting of scanned
Related show

Commit Message

Josef Bacik Jan. 3, 2020, 3:38 p.m. UTC
We noticed that we were having regular CG OOM kills in cases where there
was still enough dirty pages to avoid OOM'ing.  It turned out there's
this corner case in btrfs's handling of range_cyclic where files that
were being redirtied were not getting fully written out because of how
we do range_cyclic writeback.

We unconditionally were setting scanned = 1; the first time we found any
pages in the inode.  This isn't actually what we want, we want it to be
set if we've scanned the entire file.  For range_cyclic we could be
starting in the middle or towards the end of the file, so we could write
one page and then not write any of the other dirty pages in the file
because we set scanned = 1.

Fix this by not setting scanned = 1 if we find pages.  The rules for
setting scanned should be

1) !range_cyclic.  In this case we have a specified range to write out.
2) range_cyclic && index == 0.  In this case we've started at the
   beginning and there is no need to loop around a second time.
3) range_cyclic && we started at index > 0 and we've reached the end of
   the file without satisfying our nr_to_write.

This patch fixes both of our writepages implementations to make sure
these rules hold true.  This fixed our over zealous CG OOMs in
production.

Fixes: d1310b2e0cd9 ("Btrfs: Split the extent_map code into two parts")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Sterba Jan. 8, 2020, 5:44 p.m. UTC | #1
As 'scanning' has another meaning in btrfs, the subject should say
something about writeback or at least write cache range scanning.

On Fri, Jan 03, 2020 at 10:38:44AM -0500, Josef Bacik wrote:
> We noticed that we were having regular CG OOM kills in cases where there
> was still enough dirty pages to avoid OOM'ing.  It turned out there's
> this corner case in btrfs's handling of range_cyclic where files that
> were being redirtied were not getting fully written out because of how
> we do range_cyclic writeback.
> 
> We unconditionally were setting scanned = 1; the first time we found any
> pages in the inode.  This isn't actually what we want, we want it to be
> set if we've scanned the entire file.  For range_cyclic we could be
> starting in the middle or towards the end of the file, so we could write
> one page and then not write any of the other dirty pages in the file
> because we set scanned = 1.
> 
> Fix this by not setting scanned = 1 if we find pages.  The rules for
> setting scanned should be
> 
> 1) !range_cyclic.  In this case we have a specified range to write out.
> 2) range_cyclic && index == 0.  In this case we've started at the
>    beginning and there is no need to loop around a second time.
> 3) range_cyclic && we started at index > 0 and we've reached the end of
>    the file without satisfying our nr_to_write.
> 
> This patch fixes both of our writepages implementations to make sure
> these rules hold true.  This fixed our over zealous CG OOMs in
> production.
> 
> Fixes: d1310b2e0cd9 ("Btrfs: Split the extent_map code into two parts")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 410f5a64d3a6..d634cb0c39e3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3965,6 +3965,7 @@ int btree_write_cache_pages(struct address_space *mapping,
>  	if (wbc->range_cyclic) {
>  		index = mapping->writeback_index; /* Start from prev offset */
>  		end = -1;
> +		scanned = (index == 0);

It's explained in the changelog but I think a comment would be good here
too. I'll add it at commit time.

Patch
diff mbox series

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 410f5a64d3a6..d634cb0c39e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3965,6 +3965,7 @@  int btree_write_cache_pages(struct address_space *mapping,
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
+		scanned = (index == 0);
 	} else {
 		index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
@@ -3982,7 +3983,6 @@  int btree_write_cache_pages(struct address_space *mapping,
 			tag))) {
 		unsigned i;
 
-		scanned = 1;
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
@@ -4111,6 +4111,7 @@  static int extent_write_cache_pages(struct address_space *mapping,
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
+		scanned = (index == 0);
 	} else {
 		index = wbc->range_start >> PAGE_SHIFT;
 		end = wbc->range_end >> PAGE_SHIFT;
@@ -4144,7 +4145,6 @@  static int extent_write_cache_pages(struct address_space *mapping,
 						&index, end, tag))) {
 		unsigned i;
 
-		scanned = 1;
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];