Message ID | 1464980918-8365-1-git-send-email-bo.li.liu@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote: > eb->io_pages is set in read_extent_buffer_pages(). > > In case of readpage failure, for pages that have been added to bio, > it calls bio_endio and later readpage_io_failed_hook() does the work. > > When this eb's page (couldn't be the 1st page) fails to add itself to bio > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, > and ends up with a memory leak eventually. > > This lets __do_readpage propagate errors to callers and adds the > 'atomic_dec(&eb->io_pages)'. > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> I'm adding this to for-next, but a review is needed if this is supposed to go to 4.7. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote: > eb->io_pages is set in read_extent_buffer_pages(). > > In case of readpage failure, for pages that have been added to bio, > it calls bio_endio and later readpage_io_failed_hook() does the work. > > When this eb's page (couldn't be the 1st page) fails to add itself to bio > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, > and ends up with a memory leak eventually. > > This lets __do_readpage propagate errors to callers and adds the > 'atomic_dec(&eb->io_pages)'. I'm not sure, but could we lose some error values from __do_readpage? Ie. return 0 even if there was an error in a page that's in the middle (not the first, not the last). The loop in __do_readpage iterates while (cur <= end), and ret is only set by submit_extent_page, but the loop does not exit immediatelly. So we can detect error, set page error state bit, but next loop will overwrite ret with 0 (if the page submission was ok). Then we still don't decrement the io_pages as needed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 08, 2016 at 06:01:49PM +0200, David Sterba wrote: > On Fri, Jun 03, 2016 at 12:08:38PM -0700, Liu Bo wrote: > > eb->io_pages is set in read_extent_buffer_pages(). > > > > In case of readpage failure, for pages that have been added to bio, > > it calls bio_endio and later readpage_io_failed_hook() does the work. > > > > When this eb's page (couldn't be the 1st page) fails to add itself to bio > > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, > > and ends up with a memory leak eventually. > > > > This lets __do_readpage propagate errors to callers and adds the > > 'atomic_dec(&eb->io_pages)'. > > I'm not sure, but could we lose some error values from __do_readpage? > Ie. return 0 even if there was an error in a page that's in the middle > (not the first, not the last). > > The loop in __do_readpage iterates while (cur <= end), and ret is only > set by submit_extent_page, but the loop does not exit immediatelly. So > we can detect error, set page error state bit, but next loop will > overwrite ret with 0 (if the page submission was ok). > > Then we still don't decrement the io_pages as needed. Right, it still has that problem, then the possible way I can see is to break the while (cur <= end) loop when we fail on submit_extent_page() and pass an error up to its caller and we can do the rest eb->io_pages cleanup work in read_extent_buffer_pages(), just like how we did in write_one_eb() (this was already suggested by Josef, but seems I was off the right track). This also assumes that if one page fails on submit_extent_page(), it's likely for the rest pages to fail as well. What do you think? Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d247fc0..0309388 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2869,6 +2869,7 @@ __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset, * into the tree that are removed when the IO is done (by the end_io * handlers) * XXX JDM: This needs looking at to ensure proper page locking + * return 0 on success, otherwise return error */ static int __do_readpage(struct extent_io_tree *tree, struct page *page, @@ -2890,7 +2891,7 @@ static int __do_readpage(struct extent_io_tree *tree, sector_t sector; struct extent_map *em; struct block_device *bdev; - int ret; + int ret = 0; int nr = 0; size_t pg_offset = 0; size_t iosize; @@ -3081,7 +3082,7 @@ out: SetPageUptodate(page); unlock_page(page); } - return 0; + return ret; } static inline void __do_contiguous_readpages(struct extent_io_tree *tree, @@ -5204,8 +5205,17 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, get_extent, &bio, mirror_num, &bio_flags, READ | REQ_META); - if (err) + if (err) { ret = err; + /* + * We use &bio in above __extent_read_full_page, + * so we ensure that if it returns error, the + * current page fails to add itself to bio. + * + * We must dec io_pages by ourselves. + */ + atomic_dec(&eb->io_pages); + } } else { unlock_page(page); }
eb->io_pages is set in read_extent_buffer_pages(). In case of readpage failure, for pages that have been added to bio, it calls bio_endio and later readpage_io_failed_hook() does the work. When this eb's page (couldn't be the 1st page) fails to add itself to bio due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, and ends up with a memory leak eventually. This lets __do_readpage propagate errors to callers and adds the 'atomic_dec(&eb->io_pages)'. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- v2: - Move 'dec io_pages' to the caller so that we're consistent with write_one_eb() fs/btrfs/extent_io.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)