Message ID | 20181107063127.3902-2-david@fromorbit.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xfs: Block size > PAGE_SIZE support | expand |
[adding linux-mm to the CC list] On Wed, Nov 07, 2018 at 05:31:12PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > ->writepage is only used in one place - single page writeback from > memory reclaim. We only allow such writeback from kswapd, not from > direct memory reclaim, and so it is rarely used. When it comes from > kswapd, it is effectively random dirty page shoot-down, which is > horrible for IO patterns. We will already have background writeback > trying to clean all the dirty pages in memory as efficiently as > possible, so having kswapd interrupt our well formed IO stream only > slows things down. So get rid of xfs_vm_writepage() completely. Interesting. IFF we can pull this off it would simplify a lot of things, so I'm generally in favor of it. ->writepage callers in generic code are: (1) mm/vmscan.c:pageout() - this is the kswaped (or direct reclaim) you mention above. It basically does nothing in this case which isn't great, but the whole point of this patch.. (2) mm/migrate.c:writeout() - this is only called if no ->migratepage method is presend, but we have one in XFS, so we should be ok. Plus a few pieces of code that are just library functions like generic_writepages and mpage_writepages.
On Fri, Nov 09, 2018 at 07:12:39AM -0800, Christoph Hellwig wrote: > [adding linux-mm to the CC list] > > On Wed, Nov 07, 2018 at 05:31:12PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > ->writepage is only used in one place - single page writeback from > > memory reclaim. We only allow such writeback from kswapd, not from > > direct memory reclaim, and so it is rarely used. When it comes from > > kswapd, it is effectively random dirty page shoot-down, which is > > horrible for IO patterns. We will already have background writeback > > trying to clean all the dirty pages in memory as efficiently as > > possible, so having kswapd interrupt our well formed IO stream only > > slows things down. So get rid of xfs_vm_writepage() completely. > > Interesting. IFF we can pull this off it would simplify a lot of > things, so I'm generally in favor of it. Over the past few days of hammeringon this, the only thing I've noticed is that page reclaim hangs up less, but it's also putting a bit more pressure on the shrinkers. Filesystem intensive workloads that drive the machine into reclaim via the page cache seem to hit breakdown conditions slightly earlier and the impact is that the shrinkers are run harder. Mostly I see this as the XFS buffer cache having a much harder time keeping a working set active. However, while the workloads hit the working set cache, writeback performance does seem to be slightly higher. It is, however, being offset by the deeper lows that come from the cache being turned over. So there's a bit of rebalancing to be done here as a followup, but I've been unable to drive the system into unexepected OOM kills or other bad behaviour as a result of removing ->writepage. Cheers, Dave.
On Tue, Nov 13, 2018 at 08:08:39AM +1100, Dave Chinner wrote: > On Fri, Nov 09, 2018 at 07:12:39AM -0800, Christoph Hellwig wrote: > > [adding linux-mm to the CC list] > > > > On Wed, Nov 07, 2018 at 05:31:12PM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > ->writepage is only used in one place - single page writeback from > > > memory reclaim. We only allow such writeback from kswapd, not from > > > direct memory reclaim, and so it is rarely used. When it comes from > > > kswapd, it is effectively random dirty page shoot-down, which is > > > horrible for IO patterns. We will already have background writeback > > > trying to clean all the dirty pages in memory as efficiently as > > > possible, so having kswapd interrupt our well formed IO stream only > > > slows things down. So get rid of xfs_vm_writepage() completely. > > > > Interesting. IFF we can pull this off it would simplify a lot of > > things, so I'm generally in favor of it. > > Over the past few days of hammeringon this, the only thing I've > noticed is that page reclaim hangs up less, but it's also putting a > bit more pressure on the shrinkers. Filesystem intensive workloads > that drive the machine into reclaim via the page cache seem to hit > breakdown conditions slightly earlier and the impact is that the > shrinkers are run harder. Mostly I see this as the XFS buffer cache > having a much harder time keeping a working set active. > > However, while the workloads hit the working set cache, writeback > performance does seem to be slightly higher. It is, however, being > offset by the deeper lows that come from the cache being turned > over. > > So there's a bit of rebalancing to be done here as a followup, but > I've been unable to drive the system into unexepected OOM kills or > other bad behaviour as a result of removing ->writepage. FWIW I've been running this patch in my development kernels as part of exercising realtime reflink with rextsize > 1. So far I haven't seen any particularly adverse effects. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 338b9d9984e0..0e1342e359ca 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -911,22 +911,6 @@ xfs_do_writepage( return 0; } -STATIC int -xfs_vm_writepage( - struct page *page, - struct writeback_control *wbc) -{ - struct xfs_writepage_ctx wpc = { - .io_type = XFS_IO_HOLE, - }; - int ret; - - ret = xfs_do_writepage(page, wbc, &wpc); - if (wpc.ioend) - ret = xfs_submit_ioend(wbc, wpc.ioend, ret); - return ret; -} - STATIC int xfs_vm_writepages( struct address_space *mapping, @@ -1019,7 +1003,6 @@ xfs_iomap_swapfile_activate( const struct address_space_operations xfs_address_space_operations = { .readpage = xfs_vm_readpage, .readpages = xfs_vm_readpages, - .writepage = xfs_vm_writepage, .writepages = xfs_vm_writepages, .set_page_dirty = iomap_set_page_dirty, .releasepage = xfs_vm_releasepage,