Message ID | 20170126035330.GU9134@birch.djwong.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 1/25/17 9:53 PM, Darrick J. Wong wrote: > If we try to allocate memory pages to back an xfs_buf that we're trying > to read, it's possible that we'll be so short on memory that the page > allocation fails. For a blocking read we'll just wait, but for > readahead we simply dump all the pages we've collected so far. > > Unfortunately, after dumping the pages we neglect to clear the > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free > thinks that b_pages still points to pages we own. It then double-frees > the b_pages pages. > > This results in screaming about negative page refcounts from the memory > manager, which xfs oughtn't be triggering. To reproduce this case, > mount a filesystem where the size of the inodes far outweighs the > availalble memory (a ~500M inode filesystem on a VM with 300MB memory > did the trick here) and run bulkstat in parallel with other memory > eating processes to put a huge load on the system. The "check summary" > phase of xfs_scrub also works for this purpose. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Thanks for the updated changelog, looks good. Reviewed-by: Eric Sandeen <sandeen@redhat.com> > --- > fs/xfs/xfs_buf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 7f0a01f..ac3b4db 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -422,6 +422,7 @@ xfs_buf_allocate_memory( > out_free_pages: > for (i = 0; i < bp->b_page_count; i++) > __free_page(bp->b_pages[i]); > + bp->b_flags &= ~_XBF_PAGES; > return error; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote: > If we try to allocate memory pages to back an xfs_buf that we're trying > to read, it's possible that we'll be so short on memory that the page > allocation fails. For a blocking read we'll just wait, but for > readahead we simply dump all the pages we've collected so far. > > Unfortunately, after dumping the pages we neglect to clear the > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free > thinks that b_pages still points to pages we own. It then double-frees > the b_pages pages. > > This results in screaming about negative page refcounts from the memory > manager, which xfs oughtn't be triggering. To reproduce this case, > mount a filesystem where the size of the inodes far outweighs the > availalble memory (a ~500M inode filesystem on a VM with 300MB memory > did the trick here) and run bulkstat in parallel with other memory > eating processes to put a huge load on the system. The "check summary" > phase of xfs_scrub also works for this purpose. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Can you rename the patch "prevent double free of buffer pages on readahead failure"? That way people looking for commits to backport are going to see it clearly and easily.... And I think it also needs a stable cc.... Cheers, Dave.
On Sun, Jan 29, 2017 at 09:55:17AM +1100, Dave Chinner wrote: > On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote: > > If we try to allocate memory pages to back an xfs_buf that we're trying > > to read, it's possible that we'll be so short on memory that the page > > allocation fails. For a blocking read we'll just wait, but for > > readahead we simply dump all the pages we've collected so far. > > > > Unfortunately, after dumping the pages we neglect to clear the > > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free > > thinks that b_pages still points to pages we own. It then double-frees > > the b_pages pages. > > > > This results in screaming about negative page refcounts from the memory > > manager, which xfs oughtn't be triggering. To reproduce this case, > > mount a filesystem where the size of the inodes far outweighs the > > availalble memory (a ~500M inode filesystem on a VM with 300MB memory > > did the trick here) and run bulkstat in parallel with other memory > > eating processes to put a huge load on the system. The "check summary" > > phase of xfs_scrub also works for this purpose. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Can you rename the patch "prevent double free of buffer pages on > readahead failure"? That way people looking for commits to backport > are going to see it clearly and easily.... Uh... this patch was already in Linus' tree by the time I received this reply, so I don't think I can change it. > And I think it also needs a stable cc.... Will send it to the stable list. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[add hch to the thread] On Mon, Jan 30, 2017 at 03:35:57PM -0800, Darrick J. Wong wrote: > On Sun, Jan 29, 2017 at 09:55:17AM +1100, Dave Chinner wrote: > > On Wed, Jan 25, 2017 at 07:53:30PM -0800, Darrick J. Wong wrote: > > > If we try to allocate memory pages to back an xfs_buf that we're trying > > > to read, it's possible that we'll be so short on memory that the page > > > allocation fails. For a blocking read we'll just wait, but for > > > readahead we simply dump all the pages we've collected so far. > > > > > > Unfortunately, after dumping the pages we neglect to clear the > > > _XBF_PAGES state, which means that the subsequent call to xfs_buf_free > > > thinks that b_pages still points to pages we own. It then double-frees > > > the b_pages pages. > > > > > > This results in screaming about negative page refcounts from the memory > > > manager, which xfs oughtn't be triggering. To reproduce this case, > > > mount a filesystem where the size of the inodes far outweighs the > > > availalble memory (a ~500M inode filesystem on a VM with 300MB memory > > > did the trick here) and run bulkstat in parallel with other memory > > > eating processes to put a huge load on the system. The "check summary" > > > phase of xfs_scrub also works for this purpose. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > Can you rename the patch "prevent double free of buffer pages on > > readahead failure"? That way people looking for commits to backport > > are going to see it clearly and easily.... > > Uh... this patch was already in Linus' tree by the time I received this reply, > so I don't think I can change it. > > > And I think it also needs a stable cc.... > > Will send it to the stable list. Actually -- Christoph, are you planning to send another batch of XFS fixes for 4.9? --D > > --D > > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 30, 2017 at 03:44:15PM -0800, Darrick J. Wong wrote: > Actually -- Christoph, are you planning to send another batch of XFS > fixes for 4.9? Yes, it is on my todo list for this week. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 30, 2017 at 11:31:47PM -0800, Christoph Hellwig wrote: > On Mon, Jan 30, 2017 at 03:44:15PM -0800, Darrick J. Wong wrote: > > Actually -- Christoph, are you planning to send another batch of XFS > > fixes for 4.9? > > Yes, it is on my todo list for this week. Ok, thanks! --D -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 7f0a01f..ac3b4db 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -422,6 +422,7 @@ xfs_buf_allocate_memory( out_free_pages: for (i = 0; i < bp->b_page_count; i++) __free_page(bp->b_pages[i]); + bp->b_flags &= ~_XBF_PAGES; return error; }
If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/xfs_buf.c | 1 + 1 file changed, 1 insertion(+) -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html