diff mbox

[v2] xfs: clear _XBF_PAGES from buffers when readahead page allocation fails

Message ID 20170126035330.GU9134@birch.djwong.org (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Darrick J. Wong Jan. 26, 2017, 3:53 a.m. UTC
If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails.  For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.

Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own.  It then double-frees
the b_pages pages.

This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering.  To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system.  The "check summary"
phase of xfs_scrub also works for this purpose.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_buf.c |    1 +
 1 file changed, 1 insertion(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen Jan. 26, 2017, 4:14 a.m. UTC | #1
On 1/25/17 9:53 PM, Darrick J. Wong wrote:
> If we try to allocate memory pages to back an xfs_buf that we're trying
> to read, it's possible that we'll be so short on memory that the page
> allocation fails.  For a blocking read we'll just wait, but for
> readahead we simply dump all the pages we've collected so far.
> 
> Unfortunately, after dumping the pages we neglect to clear the
> _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
> thinks that b_pages still points to pages we own.  It then double-frees
> the b_pages pages.
> 
> This results in screaming about negative page refcounts from the memory
> manager, which xfs oughtn't be triggering.  To reproduce this case,
> mount a filesystem where the size of the inodes far outweighs the
> availalble memory (a ~500M inode filesystem on a VM with 300MB memory
> did the trick here) and run bulkstat in parallel with other memory
> eating processes to put a huge load on the system.  The "check summary"
> phase of xfs_scrub also works for this purpose.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks for the updated changelog, looks good.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> ---
>  fs/xfs/xfs_buf.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 7f0a01f..ac3b4db 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -422,6 +422,7 @@ xfs_buf_allocate_memory(
>  out_free_pages:
>  	for (i = 0; i < bp->b_page_count; i++)
>  		__free_page(bp->b_pages[i]);
> +	bp->b_flags &= ~_XBF_PAGES;
>  	return error;
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 28, 2017, 10:55 p.m. UTC | #2
On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote:
> If we try to allocate memory pages to back an xfs_buf that we're trying
> to read, it's possible that we'll be so short on memory that the page
> allocation fails.  For a blocking read we'll just wait, but for
> readahead we simply dump all the pages we've collected so far.
> 
> Unfortunately, after dumping the pages we neglect to clear the
> _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
> thinks that b_pages still points to pages we own.  It then double-frees
> the b_pages pages.
> 
> This results in screaming about negative page refcounts from the memory
> manager, which xfs oughtn't be triggering.  To reproduce this case,
> mount a filesystem where the size of the inodes far outweighs the
> availalble memory (a ~500M inode filesystem on a VM with 300MB memory
> did the trick here) and run bulkstat in parallel with other memory
> eating processes to put a huge load on the system.  The "check summary"
> phase of xfs_scrub also works for this purpose.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Can you rename the patch "prevent double free of buffer pages on
readahead failure"? That way people looking for commits to backport
are going to see it clearly and easily....

And I think it also needs a stable cc....

Cheers,

Dave.
Darrick J. Wong Jan. 30, 2017, 11:35 p.m. UTC | #3
On Sun, Jan 29, 2017 at 09:55:17AM +1100, Dave Chinner wrote:
> On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote:
> > If we try to allocate memory pages to back an xfs_buf that we're trying
> > to read, it's possible that we'll be so short on memory that the page
> > allocation fails.  For a blocking read we'll just wait, but for
> > readahead we simply dump all the pages we've collected so far.
> > 
> > Unfortunately, after dumping the pages we neglect to clear the
> > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
> > thinks that b_pages still points to pages we own.  It then double-frees
> > the b_pages pages.
> > 
> > This results in screaming about negative page refcounts from the memory
> > manager, which xfs oughtn't be triggering.  To reproduce this case,
> > mount a filesystem where the size of the inodes far outweighs the
> > availalble memory (a ~500M inode filesystem on a VM with 300MB memory
> > did the trick here) and run bulkstat in parallel with other memory
> > eating processes to put a huge load on the system.  The "check summary"
> > phase of xfs_scrub also works for this purpose.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Can you rename the patch "prevent double free of buffer pages on
> readahead failure"? That way people looking for commits to backport
> are going to see it clearly and easily....

Uh... this patch was already in Linus' tree by the time I received this reply,
so I don't think I can change it.

> And I think it also needs a stable cc....

Will send it to the stable list.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 30, 2017, 11:44 p.m. UTC | #4
[add hch to the thread]

On Mon, Jan 30, 2017 at 03:35:57PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 29, 2017 at 09:55:17AM +1100, Dave Chinner wrote:
> > On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote:
> > > If we try to allocate memory pages to back an xfs_buf that we're trying
> > > to read, it's possible that we'll be so short on memory that the page
> > > allocation fails.  For a blocking read we'll just wait, but for
> > > readahead we simply dump all the pages we've collected so far.
> > > 
> > > Unfortunately, after dumping the pages we neglect to clear the
> > > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
> > > thinks that b_pages still points to pages we own.  It then double-frees
> > > the b_pages pages.
> > > 
> > > This results in screaming about negative page refcounts from the memory
> > > manager, which xfs oughtn't be triggering.  To reproduce this case,
> > > mount a filesystem where the size of the inodes far outweighs the
> > > availalble memory (a ~500M inode filesystem on a VM with 300MB memory
> > > did the trick here) and run bulkstat in parallel with other memory
> > > eating processes to put a huge load on the system.  The "check summary"
> > > phase of xfs_scrub also works for this purpose.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Can you rename the patch "prevent double free of buffer pages on
> > readahead failure"? That way people looking for commits to backport
> > are going to see it clearly and easily....
> 
> Uh... this patch was already in Linus' tree by the time I received this reply,
> so I don't think I can change it.
> 
> > And I think it also needs a stable cc....
> 
> Will send it to the stable list.

Actually -- Christoph, are you planning to send another batch of XFS
fixes for 4.9?

--D

> 
> --D
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Jan. 31, 2017, 7:31 a.m. UTC | #5
On Mon, Jan 30, 2017 at 03:44:15PM -0800, Darrick J. Wong wrote:
> Actually -- Christoph, are you planning to send another batch of XFS
> fixes for 4.9?

Yes, it is on my todo list for this week.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 31, 2017, 7:41 a.m. UTC | #6
On Mon, Jan 30, 2017 at 11:31:47PM -0800, Christoph Hellwig wrote:
> On Mon, Jan 30, 2017 at 03:44:15PM -0800, Darrick J. Wong wrote:
> > Actually -- Christoph, are you planning to send another batch of XFS
> > fixes for 4.9?
> 
> Yes, it is on my todo list for this week.

Ok, thanks!

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 7f0a01f..ac3b4db 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -422,6 +422,7 @@  xfs_buf_allocate_memory(
 out_free_pages:
 	for (i = 0; i < bp->b_page_count; i++)
 		__free_page(bp->b_pages[i]);
+	bp->b_flags &= ~_XBF_PAGES;
 	return error;
 }