diff mbox series

mm: Free unused swap cache page in write protection fault handler

Message ID 20210113024241.179113-1-ying.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series mm: Free unused swap cache page in write protection fault handler | expand

Commit Message

Huang, Ying Jan. 13, 2021, 2:42 a.m. UTC
Commit 09854ba94c6a ("mm: do_wp_page() simplification") introduces an
issue as follows.

On a system with free memory as follow before test,

              total        used        free      shared  buff/cache   available
Mem:        1697300      160156     1459220        8648       77924     1419724
Swap:       1048572           0           0

The AnonPages filed of /proc/meminfo is 11712 kB.  After running a
memory eater which will trigger many swapins and write protection
faults, the free memory becomes,

              total        used        free      shared  buff/cache   available
Mem:        1697300      352620     1309004         624       35676     1252380
Swap:       1048572      216924      831648

While the /proc/meminfo shows,

SwapCached:       198908 kB
AnonPages:          1956 kB

Then, with `swapoff -a`, the free memory becomes,

              total        used        free      shared  buff/cache   available
Mem:        1697300      161972     1488184        8648       47144     1433172
Swap:             0           0           0

That is, after swapins and write protection faults, many unused swap
cache pages will be left unfreed in system.  Although the following
page reclaiming or swapoff will free these pages, it's still better to
free these pages at the first place.

So in this patch, at the end of wp_page_copy(), the old unused swap
cache page will be tried to be freed.  With that, after running the
memory eater which will trigger many swapins and write protection
faults, the free memory is,

              total        used        free      shared  buff/cache   available
Mem:        1697300      154020     1509400        1212       33880     1451524
Swap:       1048572       18432     1030140

While the /proc/meminfo shows,

SwapCached:         1240 kB
AnonPages:          1904 kB

BTW: I think this should be in stable after v5.9.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
---
 mm/memory.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Linus Torvalds Jan. 13, 2021, 2:47 a.m. UTC | #1
On Tue, Jan 12, 2021 at 6:43 PM Huang Ying <ying.huang@intel.com> wrote:
>
> So in this patch, at the end of wp_page_copy(), the old unused swap
> cache page will be tried to be freed.

I'd much rather free it later when needed, rather than when you're in
a COW section.

            Linus
huang ying Jan. 13, 2021, 3:08 a.m. UTC | #2
On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jan 12, 2021 at 6:43 PM Huang Ying <ying.huang@intel.com> wrote:
> >
> > So in this patch, at the end of wp_page_copy(), the old unused swap
> > cache page will be tried to be freed.
>
> I'd much rather free it later when needed, rather than when you're in
> a COW section.

Unused swap cache isn't unused file cache.  Nobody can reuse them
directly before freeing them firstly.  It will make COW a little
faster via keeping them.  But I think the overhead to free them isn't
high.  While keeping them in system will confuse users (users will
expect file cache to use free memory, but not expect unused swap cache
to use much free memory), make the swap space more fragmented, and add
system overall overhead (scanning LRU list, etc.).

Best Regards,
Huang, Ying
Matthew Wilcox Jan. 13, 2021, 3:11 a.m. UTC | #3
On Wed, Jan 13, 2021 at 11:08:56AM +0800, huang ying wrote:
> On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Tue, Jan 12, 2021 at 6:43 PM Huang Ying <ying.huang@intel.com> wrote:
> > >
> > > So in this patch, at the end of wp_page_copy(), the old unused swap
> > > cache page will be tried to be freed.
> >
> > I'd much rather free it later when needed, rather than when you're in
> > a COW section.
> 
> Unused swap cache isn't unused file cache.  Nobody can reuse them
> directly before freeing them firstly.  It will make COW a little
> faster via keeping them.  But I think the overhead to free them isn't
> high.  While keeping them in system will confuse users (users will
> expect file cache to use free memory, but not expect unused swap cache
> to use much free memory), make the swap space more fragmented, and add
> system overall overhead (scanning LRU list, etc.).

Couldn't we just move it to the tail of the LRU list so it's reclaimed
first?  Or is locking going to be a problem here?
huang ying Jan. 13, 2021, 5:24 a.m. UTC | #4
On Wed, Jan 13, 2021 at 11:12 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 13, 2021 at 11:08:56AM +0800, huang ying wrote:
> > On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Tue, Jan 12, 2021 at 6:43 PM Huang Ying <ying.huang@intel.com> wrote:
> > > >
> > > > So in this patch, at the end of wp_page_copy(), the old unused swap
> > > > cache page will be tried to be freed.
> > >
> > > I'd much rather free it later when needed, rather than when you're in
> > > a COW section.
> >
> > Unused swap cache isn't unused file cache.  Nobody can reuse them
> > directly before freeing them firstly.  It will make COW a little
> > faster via keeping them.  But I think the overhead to free them isn't
> > high.  While keeping them in system will confuse users (users will
> > expect file cache to use free memory, but not expect unused swap cache
> > to use much free memory), make the swap space more fragmented, and add
> > system overall overhead (scanning LRU list, etc.).
>
> Couldn't we just move it to the tail of the LRU list so it's reclaimed
> first?  Or is locking going to be a problem here?

Yes.  That's a way to reduce the disturbance to the page reclaiming.
For LRU lock contention, is it sufficient to use another pagevec?

Best Regards,
Huang, Ying
Linus Torvalds Jan. 13, 2021, 9:09 p.m. UTC | #5
On Tue, Jan 12, 2021 at 9:24 PM huang ying <huang.ying.caritas@gmail.com> wrote:
> >
> > Couldn't we just move it to the tail of the LRU list so it's reclaimed
> > first?  Or is locking going to be a problem here?
>
> Yes.  That's a way to reduce the disturbance to the page reclaiming.
> For LRU lock contention, is it sufficient to use another pagevec?

I wonder if this is really worth it. I'd like to see numbers.

Because in probably 99%+ of all cases, that LRU dance is only going to
hurt and add extra locking overhead and dirty caches.

So I'd like to see some numbers that it actually helps measurably in
whatever paging-heavy case...

            Linus
Huang, Ying Jan. 15, 2021, 8:47 a.m. UTC | #6
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Jan 12, 2021 at 9:24 PM huang ying <huang.ying.caritas@gmail.com> wrote:
>> >
>> > Couldn't we just move it to the tail of the LRU list so it's reclaimed
>> > first?  Or is locking going to be a problem here?
>>
>> Yes.  That's a way to reduce the disturbance to the page reclaiming.
>> For LRU lock contention, is it sufficient to use another pagevec?
>
> I wonder if this is really worth it. I'd like to see numbers.
>
> Because in probably 99%+ of all cases, that LRU dance is only going to
> hurt and add extra locking overhead and dirty caches.
>
> So I'd like to see some numbers that it actually helps measurably in
> whatever paging-heavy case...

OK.  I will start from a simpler version and only use a pagevec if
there's measurable difference.

Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..2abaff1befcb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2963,6 +2963,11 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				munlock_vma_page(old_page);
 			unlock_page(old_page);
 		}
+		if (page_copied && PageSwapCache(old_page) &&
+		    !page_mapped(old_page) && trylock_page(old_page)) {
+			try_to_free_swap(old_page);
+			unlock_page(old_page);
+		}
 		put_page(old_page);
 	}
 	return page_copied ? VM_FAULT_WRITE : 0;