diff mbox

[v2] Btrfs: fix eb memory leak due to readpage failure

Message ID 1464980918-8365-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Superseded
Headers show

Commit Message

Liu Bo June 3, 2016, 7:08 p.m. UTC
eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This lets __do_readpage propagate errors to callers and adds the
 'atomic_dec(&eb->io_pages)'.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2:
  - Move 'dec io_pages' to the caller so that we're consistent with
    write_one_eb()

 fs/btrfs/extent_io.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Comments

David Sterba June 13, 2016, 3:36 p.m. UTC | #1
On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
> 
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
> 
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
> 
> This lets __do_readpage propagate errors to callers and adds the
>  'atomic_dec(&eb->io_pages)'.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

I'm adding this to for-next, but a review is needed if this is supposed
to go to 4.7.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 8, 2016, 4:01 p.m. UTC | #2
On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
> 
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
> 
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
> 
> This lets __do_readpage propagate errors to callers and adds the
>  'atomic_dec(&eb->io_pages)'.

I'm not sure, but could we lose some error values from __do_readpage?
Ie. return 0 even if there was an error in a page that's in the middle
(not the first, not the last).

The loop in __do_readpage iterates while (cur <= end), and ret is only
set by submit_extent_page, but the loop does not exit immediatelly. So
we can detect error, set page error state bit, but next loop will
overwrite ret with 0 (if the page submission was ok).

Then we still don't decrement the io_pages as needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo July 8, 2016, 9:23 p.m. UTC | #3
On Fri, Jul 08, 2016 at 06:01:49PM +0200, David Sterba wrote:
> On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote:
> > eb->io_pages is set in read_extent_buffer_pages().
> > 
> > In case of readpage failure, for pages that have been added to bio,
> > it calls bio_endio and later readpage_io_failed_hook() does the work.
> > 
> > When this eb's page (couldn't be the 1st page) fails to add itself to bio
> > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
> >  and ends up with a memory leak eventually.
> > 
> > This lets __do_readpage propagate errors to callers and adds the
> >  'atomic_dec(&eb->io_pages)'.
> 
> I'm not sure, but could we lose some error values from __do_readpage?
> Ie. return 0 even if there was an error in a page that's in the middle
> (not the first, not the last).
> 
> The loop in __do_readpage iterates while (cur <= end), and ret is only
> set by submit_extent_page, but the loop does not exit immediatelly. So
> we can detect error, set page error state bit, but next loop will
> overwrite ret with 0 (if the page submission was ok).
> 
> Then we still don't decrement the io_pages as needed.

Right, it still has that problem, then the possible way I can see is to break
the while (cur <= end) loop when we fail on submit_extent_page() and
pass an error up to its caller and we can do the rest eb->io_pages cleanup work in
read_extent_buffer_pages(), just like how we did in write_one_eb()
(this was already suggested by Josef, but seems I was off the right track).

This also assumes that if one page fails on submit_extent_page(), it's
likely for the rest pages to fail as well.

What do you think?

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d247fc0..0309388 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2869,6 +2869,7 @@  __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset,
  * into the tree that are removed when the IO is done (by the end_io
  * handlers)
  * XXX JDM: This needs looking at to ensure proper page locking
+ * return 0 on success, otherwise return error
  */
 static int __do_readpage(struct extent_io_tree *tree,
 			 struct page *page,
@@ -2890,7 +2891,7 @@  static int __do_readpage(struct extent_io_tree *tree,
 	sector_t sector;
 	struct extent_map *em;
 	struct block_device *bdev;
-	int ret;
+	int ret = 0;
 	int nr = 0;
 	size_t pg_offset = 0;
 	size_t iosize;
@@ -3081,7 +3082,7 @@  out:
 			SetPageUptodate(page);
 		unlock_page(page);
 	}
-	return 0;
+	return ret;
 }
 
 static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
@@ -5204,8 +5205,17 @@  int read_extent_buffer_pages(struct extent_io_tree *tree,
 						      get_extent, &bio,
 						      mirror_num, &bio_flags,
 						      READ | REQ_META);
-			if (err)
+			if (err) {
 				ret = err;
+				/*
+				 * We use &bio in above __extent_read_full_page,
+				 * so we ensure that if it returns error, the
+				 * current page fails to add itself to bio.
+				 *
+				 * We must dec io_pages by ourselves.
+				 */
+				atomic_dec(&eb->io_pages);
+			}
 		} else {
 			unlock_page(page);
 		}