diff mbox

[v3] Btrfs: fix eb memory leak due to readpage failure

Message ID 1468258747-19617-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Accepted
Headers show

Commit Message

Liu Bo July 11, 2016, 5:39 p.m. UTC
eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This lets __do_readpage propagate errors to callers and adds the
 'atomic_dec(&eb->io_pages)'.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: - Move 'dec io_pages' to the caller so that we're consistent with
      write_one_eb()
v3: - Bail out once we fail to read a page and do the cleanup work
      for eb->io_pages

 fs/btrfs/extent_io.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

Comments

Chris Mason July 11, 2016, 6:27 p.m. UTC | #1
On 07/11/2016 01:39 PM, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
>
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
>
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
>
> This lets __do_readpage propagate errors to callers and adds the
>  'atomic_dec(&eb->io_pages)'.

Thanks for looking at this Liu, how is it currently being tested?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo July 11, 2016, 10:48 p.m. UTC | #2
On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:
> 
> 
> On 07/11/2016 01:39 PM, Liu Bo wrote:
> > eb->io_pages is set in read_extent_buffer_pages().
> > 
> > In case of readpage failure, for pages that have been added to bio,
> > it calls bio_endio and later readpage_io_failed_hook() does the work.
> > 
> > When this eb's page (couldn't be the 1st page) fails to add itself to bio
> > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
> >  and ends up with a memory leak eventually.
> > 
> > This lets __do_readpage propagate errors to callers and adds the
> >  'atomic_dec(&eb->io_pages)'.
> 
> Thanks for looking at this Liu, how is it currently being tested?

I have a btrfs disk image which was corrupted by btrfs-corrupt-block
tool, in that image, the chunk tree's content has been removed while the
chunk node can be read from read successfully, so we'd get -EIO when
trying to read tree root's node since __btrfs_map_block() would fail to
find the right item in chunk mapping_tree.  Thus, we can test our error
handling path in read_extent_buffer_pages().

Thanks,

-liubo

> 
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason July 11, 2016, 10:54 p.m. UTC | #3
On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote:
>On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:
>>
>>
>> On 07/11/2016 01:39 PM, Liu Bo wrote:
>> > eb->io_pages is set in read_extent_buffer_pages().
>> >
>> > In case of readpage failure, for pages that have been added to bio,
>> > it calls bio_endio and later readpage_io_failed_hook() does the work.
>> >
>> > When this eb's page (couldn't be the 1st page) fails to add itself to bio
>> > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>> >  and ends up with a memory leak eventually.
>> >
>> > This lets __do_readpage propagate errors to callers and adds the
>> >  'atomic_dec(&eb->io_pages)'.
>>
>> Thanks for looking at this Liu, how is it currently being tested?
>
>I have a btrfs disk image which was corrupted by btrfs-corrupt-block
>tool, in that image, the chunk tree's content has been removed while the
>chunk node can be read from read successfully, so we'd get -EIO when
>trying to read tree root's node since __btrfs_map_block() would fail to
>find the right item in chunk mapping_tree.  Thus, we can test our error
>handling path in read_extent_buffer_pages().

Fantastic.  Can you please make this an xfstest, maybe along with a dm-flakey?
as the second phase?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo July 11, 2016, 11:04 p.m. UTC | #4
On Mon, Jul 11, 2016 at 06:54:02PM -0400, Chris Mason wrote:
> On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote:
> > On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote:
> > > 
> > > 
> > > On 07/11/2016 01:39 PM, Liu Bo wrote:
> > > > eb->io_pages is set in read_extent_buffer_pages().
> > > >
> > > > In case of readpage failure, for pages that have been added to bio,
> > > > it calls bio_endio and later readpage_io_failed_hook() does the work.
> > > >
> > > > When this eb's page (couldn't be the 1st page) fails to add itself to bio
> > > > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
> > > >  and ends up with a memory leak eventually.
> > > >
> > > > This lets __do_readpage propagate errors to callers and adds the
> > > >  'atomic_dec(&eb->io_pages)'.
> > > 
> > > Thanks for looking at this Liu, how is it currently being tested?
> > 
> > I have a btrfs disk image which was corrupted by btrfs-corrupt-block
> > tool, in that image, the chunk tree's content has been removed while the
> > chunk node can be read from read successfully, so we'd get -EIO when
> > trying to read tree root's node since __btrfs_map_block() would fail to
> > find the right item in chunk mapping_tree.  Thus, we can test our error
> > handling path in read_extent_buffer_pages().
> 
> Fantastic.  Can you please make this an xfstest, maybe along with a dm-flakey?
> as the second phase?

Sure, this depends on a btrfs-corrupt-block patch, which I've not sent
out, I'll try to work out a xfstests case :)

Btw, I'm also planning to add this into our fuzz images of btrfs-progs.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 12, 2016, 5:30 p.m. UTC | #5
On Mon, Jul 11, 2016 at 10:39:07AM -0700, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
> 
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
> 
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
> 
> This lets __do_readpage propagate errors to callers and adds the
>  'atomic_dec(&eb->io_pages)'.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

>  		if (!PageUptodate(page)) {
> +			if (ret) {
> +				atomic_dec(&eb->io_pages);
> +				unlock_page(page);
> +				continue;
> +			}

This changes the behaviour to "fail early", which could be positive as a
sequence of unreadable blocks will not try to reread all of them with
the timeouts and retries.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ac1a696..7303e5a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2878,6 +2878,7 @@  __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset,
  * into the tree that are removed when the IO is done (by the end_io
  * handlers)
  * XXX JDM: This needs looking at to ensure proper page locking
+ * return 0 on success, otherwise return error
  */
 static int __do_readpage(struct extent_io_tree *tree,
 			 struct page *page,
@@ -2899,7 +2900,7 @@  static int __do_readpage(struct extent_io_tree *tree,
 	sector_t sector;
 	struct extent_map *em;
 	struct block_device *bdev;
-	int ret;
+	int ret = 0;
 	int nr = 0;
 	size_t pg_offset = 0;
 	size_t iosize;
@@ -3080,6 +3081,7 @@  static int __do_readpage(struct extent_io_tree *tree,
 		} else {
 			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			goto out;
 		}
 		cur = cur + iosize;
 		pg_offset += iosize;
@@ -3090,7 +3092,7 @@  out:
 			SetPageUptodate(page);
 		unlock_page(page);
 	}
-	return 0;
+	return ret;
 }
 
 static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
@@ -5230,14 +5232,31 @@  int read_extent_buffer_pages(struct extent_io_tree *tree,
 	atomic_set(&eb->io_pages, num_reads);
 	for (i = start_i; i < num_pages; i++) {
 		page = eb->pages[i];
+
 		if (!PageUptodate(page)) {
+			if (ret) {
+				atomic_dec(&eb->io_pages);
+				unlock_page(page);
+				continue;
+			}
+
 			ClearPageError(page);
 			err = __extent_read_full_page(tree, page,
 						      get_extent, &bio,
 						      mirror_num, &bio_flags,
 						      READ | REQ_META);
-			if (err)
+			if (err) {
 				ret = err;
+				/*
+				 * We use &bio in above __extent_read_full_page,
+				 * so we ensure that if it returns error, the
+				 * current page fails to add itself to bio and
+				 * it's been unlocked.
+				 *
+				 * We must dec io_pages by ourselves.
+				 */
+				atomic_dec(&eb->io_pages);
+			}
 		} else {
 			unlock_page(page);
 		}