diff mbox series

mm/vmscan: Prioritize anonymous executable pages like we do file-backed

Message ID 20200304203235.3623103-1-ajax@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/vmscan: Prioritize anonymous executable pages like we do file-backed | expand

Commit Message

Adam Jackson March 4, 2020, 8:32 p.m. UTC
The page reclamation scanner tries to keep executable pages resident,
since taking a hard page fault to satisfy an icache miss is really not
great for interactivity.  Anonymous executable pages tend to contain
code that has been just-in-time compiled for performance reasons. By
requiring that executable pages be file-backed, we're putting possibly
the most performance-sensitive code at higher risk of eviction, which
seems backwards.

On an amd64 machine running Fedora 31, the firefox I happen to have
running requires about 89M of file-backed text and 12M of anonymous text
for 30 open tabs. The next largest process in terms of anonymous text is
gnome-shell, with 1M anonymous and 57M file-backed. No other process had
significant anonymous text, most had none. Penalizing those 13M
specifically when under memory pressure seems like an easy hazard to
avoid.

Signed-off-by: Adam Jackson <ajax@redhat.com>
---
 mm/vmscan.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Matthew Wilcox (Oracle) March 4, 2020, 8:38 p.m. UTC | #1
On Wed, Mar 04, 2020 at 03:32:35PM -0500, Adam Jackson wrote:
> -			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
> +			if ((vm_flags & VM_EXEC)) {

You can drop one set of parens here
Vlastimil Babka March 5, 2020, 12:47 p.m. UTC | #2
+ CC folks who focus on reclaim

On 3/4/20 9:32 PM, Adam Jackson wrote:
> The page reclamation scanner tries to keep executable pages resident,
> since taking a hard page fault to satisfy an icache miss is really not
> great for interactivity.  Anonymous executable pages tend to contain
> code that has been just-in-time compiled for performance reasons. By
> requiring that executable pages be file-backed, we're putting possibly
> the most performance-sensitive code at higher risk of eviction, which
> seems backwards.
> 
> On an amd64 machine running Fedora 31, the firefox I happen to have
> running requires about 89M of file-backed text and 12M of anonymous text
> for 30 open tabs. The next largest process in terms of anonymous text is
> gnome-shell, with 1M anonymous and 57M file-backed. No other process had
> significant anonymous text, most had none. Penalizing those 13M
> specifically when under memory pressure seems like an easy hazard to
> avoid.
> 
> Signed-off-by: Adam Jackson <ajax@redhat.com>
> ---
>  mm/vmscan.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ee4eecc7e1c2..9bfbc30d61d8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  				    &vm_flags)) {
>  			nr_rotated += hpage_nr_pages(page);
>  			/*
> -			 * Identify referenced, file-backed active pages and
> -			 * give them one more trip around the active list. So
> +			 * Identify referenced, executable active pages and
> +			 * give them one more trip around the active list, so
>  			 * that executable code get better chances to stay in
> -			 * memory under moderate memory pressure.  Anon pages
> -			 * are not likely to be evicted by use-once streaming
> -			 * IO, plus JVM can create lots of anon VM_EXEC pages,
> -			 * so we ignore them here.
> +			 * memory under moderate memory pressure.
>  			 */
> -			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
> +			if ((vm_flags & VM_EXEC)) {
>  				list_add(&page->lru, &l_active);
>  				continue;
>  			}
>
Michal Hocko March 5, 2020, 3:17 p.m. UTC | #3
On Wed 04-03-20 15:32:35, Adam Jackson wrote:
> The page reclamation scanner tries to keep executable pages resident,
> since taking a hard page fault to satisfy an icache miss is really not
> great for interactivity.  Anonymous executable pages tend to contain
> code that has been just-in-time compiled for performance reasons. By
> requiring that executable pages be file-backed, we're putting possibly
> the most performance-sensitive code at higher risk of eviction, which
> seems backwards.
> 
> On an amd64 machine running Fedora 31, the firefox I happen to have
> running requires about 89M of file-backed text and 12M of anonymous text
> for 30 open tabs. The next largest process in terms of anonymous text is
> gnome-shell, with 1M anonymous and 57M file-backed. No other process had
> significant anonymous text, most had none. Penalizing those 13M
> specifically when under memory pressure seems like an easy hazard to
> avoid.

Are you seeing an actual improvement from this change? IIRC the primary
motivation to make this heuristic page cache oriented is that it was
quite easy to evict file backed memory by streaming IO. This shouldn't
really be a major problem for the anonymous memory in most cases. A
heavy swapin/out workload is likely to suffer from not having data
available more than having the code evicted. But I might be wrong here
and getting some numbers would be really interesting.

> Signed-off-by: Adam Jackson <ajax@redhat.com>
> ---
>  mm/vmscan.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ee4eecc7e1c2..9bfbc30d61d8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  				    &vm_flags)) {
>  			nr_rotated += hpage_nr_pages(page);
>  			/*
> -			 * Identify referenced, file-backed active pages and
> -			 * give them one more trip around the active list. So
> +			 * Identify referenced, executable active pages and
> +			 * give them one more trip around the active list, so
>  			 * that executable code get better chances to stay in
> -			 * memory under moderate memory pressure.  Anon pages
> -			 * are not likely to be evicted by use-once streaming
> -			 * IO, plus JVM can create lots of anon VM_EXEC pages,
> -			 * so we ignore them here.
> +			 * memory under moderate memory pressure.
>  			 */
> -			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
> +			if ((vm_flags & VM_EXEC)) {
>  				list_add(&page->lru, &l_active);
>  				continue;
>  			}
> -- 
> 2.23.0
>
Adam Jackson March 5, 2020, 6:05 p.m. UTC | #4
On Thu, 2020-03-05 at 16:17 +0100, Michal Hocko wrote:
> On Wed 04-03-20 15:32:35, Adam Jackson wrote:
> > The page reclamation scanner tries to keep executable pages resident,
> > since taking a hard page fault to satisfy an icache miss is really not
> > great for interactivity.  Anonymous executable pages tend to contain
> > code that has been just-in-time compiled for performance reasons. By
> > requiring that executable pages be file-backed, we're putting possibly
> > the most performance-sensitive code at higher risk of eviction, which
> > seems backwards.
> > 
> > On an amd64 machine running Fedora 31, the firefox I happen to have
> > running requires about 89M of file-backed text and 12M of anonymous text
> > for 30 open tabs. The next largest process in terms of anonymous text is
> > gnome-shell, with 1M anonymous and 57M file-backed. No other process had
> > significant anonymous text, most had none. Penalizing those 13M
> > specifically when under memory pressure seems like an easy hazard to
> > avoid.
> 
> Are you seeing an actual improvement from this change? IIRC the primary
> motivation to make this heuristic page cache oriented is that it was
> quite easy to evict file backed memory by streaming IO. This shouldn't
> really be a major problem for the anonymous memory in most cases. A
> heavy swapin/out workload is likely to suffer from not having data
> available more than having the code evicted. But I might be wrong here
> and getting some numbers would be really interesting.

They would be! I confess I don't have any, I'll see if I can gather
some for you. The problem case for this patch is maybe as much about
streaming I/O as it's about just giant wads of dirty data. Given enough
pressure, eventually this loop will prefer to evict jitted code before
precompiled. I find that to be a difficult preference to justify. If
anything maybe you should evict the file-backed code first because you
can discard it instead of writing it out (assuming your swap and
executables are on comparably fast disk, etc).

That's my theory anyway. It's entirely possible I don't understand the
larger environment for this code, and like I say I don't have hard data
yet. For all I know big data-center-y java jobs might actually want the
existing behavior. But it seemed curious enough to warrant at least
sending the patch for feedback.

- ajax
Shakeel Butt March 5, 2020, 8:41 p.m. UTC | #5
On Thu, Mar 5, 2020 at 4:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> + CC folks who focus on reclaim
>
> On 3/4/20 9:32 PM, Adam Jackson wrote:
> > The page reclamation scanner tries to keep executable pages resident,
> > since taking a hard page fault to satisfy an icache miss is really not
> > great for interactivity.  Anonymous executable pages tend to contain
> > code that has been just-in-time compiled for performance reasons. By
> > requiring that executable pages be file-backed, we're putting possibly
> > the most performance-sensitive code at higher risk of eviction, which
> > seems backwards.
> >
> > On an amd64 machine running Fedora 31, the firefox I happen to have
> > running requires about 89M of file-backed text and 12M of anonymous text
> > for 30 open tabs. The next largest process in terms of anonymous text is
> > gnome-shell, with 1M anonymous and 57M file-backed. No other process had
> > significant anonymous text, most had none. Penalizing those 13M
> > specifically when under memory pressure seems like an easy hazard to
> > avoid.

Have you actually seen this issue (i.e. JIT code reclaimed and
thrashing) happening for real workloads?

> >
> > Signed-off-by: Adam Jackson <ajax@redhat.com>
> > ---
> >  mm/vmscan.c | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ee4eecc7e1c2..9bfbc30d61d8 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
> >                                   &vm_flags)) {
> >                       nr_rotated += hpage_nr_pages(page);
> >                       /*
> > -                      * Identify referenced, file-backed active pages and
> > -                      * give them one more trip around the active list. So
> > +                      * Identify referenced, executable active pages and
> > +                      * give them one more trip around the active list, so
> >                        * that executable code get better chances to stay in
> > -                      * memory under moderate memory pressure.  Anon pages
> > -                      * are not likely to be evicted by use-once streaming
> > -                      * IO, plus JVM can create lots of anon VM_EXEC pages,
> > -                      * so we ignore them here.
> > +                      * memory under moderate memory pressure.
> >                        */
> > -                     if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {

Originally this heuristic was added to protect executable file pages
from the streaming workloads. Now we have workingset detection for
file pages and there is ongoing work for adding that support for anon
pages, I am wondering if this specific heuristic is still helpful.

> > +                     if ((vm_flags & VM_EXEC)) {
> >                               list_add(&page->lru, &l_active);
> >                               continue;
> >                       }
> >
>
Adam Jackson March 5, 2020, 10:58 p.m. UTC | #6
On Thu, 2020-03-05 at 12:41 -0800, Shakeel Butt wrote:

> Have you actually seen this issue (i.e. JIT code reclaimed and
> thrashing) happening for real workloads?

The mention of gnome-shell wasn't entirely an accident, it's largely
written in javascript which is jit-compiled. Under sufficiently dire
disk I/O conditions merely moving the cursor would lag. That ought to
be impossible, because nothing in the cursor update path ought to be
blocking on the disk and we believe the CPU scheduler to be competent.
Plenty of bugs are contributing here, but among them yes the (jitted
bits of the) cursor update code can get paged out while megabytes of
file-backed text stay resident. It's certainly drowned out by the
contribution of the plenty of other bugs involved, but still.

> Originally this heuristic was added to protect executable file pages
> from the streaming workloads. Now we have workingset detection for
> file pages and there is ongoing work for adding that support for anon
> pages, I am wondering if this specific heuristic is still helpful.

Enh. I would tend to think that code is way more precious than data in
terms of staying resident. If the working set detection works well
enough to come to that conclusion on its own without explictly knowing
about executable pages, that'd be awesome and I'd be entirely fine with
removing even more of this heuristic.

- ajax
Mel Gorman March 6, 2020, 9:22 a.m. UTC | #7
On Thu, Mar 05, 2020 at 05:58:59PM -0500, Adam Jackson wrote:
> > Originally this heuristic was added to protect executable file pages
> > from the streaming workloads. Now we have workingset detection for
> > file pages and there is ongoing work for adding that support for anon
> > pages, I am wondering if this specific heuristic is still helpful.
> 
> Enh. I would tend to think that code is way more precious than data in
> terms of staying resident. If the working set detection works well
> enough to come to that conclusion on its own without explictly knowing
> about executable pages, that'd be awesome and I'd be entirely fine with
> removing even more of this heuristic.
> 

Given the increased use of JIT engines, the existence of working set
detection and the fact it may also work for anonymous pages soon, I
think it's worth at least *trying* to remove the heuristic in case it
stays around for years as magic code.

Creating an automated test case for this would be relatively tricky. Could
you put together a debugging patch that simply counts some events to put
into the changelog? The events (which could be vmstat) would be

o VM_EXEC pages encountered in reclaim
o Number exec file-backed pages preserved
o Number exec anon pages preserved
o Number exec file-backed pages reclaimed
o NUmber exec anon pages reclaimed

And show the figures before and after in the changelog running Firefox
with excessive IO in the background. It's a bit of legwork but it's to
preserve in the changelog that this problem definitely happens and the
patch has a positive impact. Some comment saying the cursor is not laggy
with the patch applied would also be a plus. The debugging patch would
not actually be merged.

With the figures, if there ever is a bug report that bisects to this patch,
it'll be known exactly what the impact and motivation was. That will at
least make people pause and think before blindly reverting it.

I'm guessing the impact is that the ratio of reclaimed/preserved for anon
pages is currently skewed and after the patch it's more in line with the
ratio for file-backed. It's a tough prediction as the size of the file
vs anon LRUs at the time of reclaim matter as well as the ordering of
pages in the LRU.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ee4eecc7e1c2..9bfbc30d61d8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2095,15 +2095,12 @@  static void shrink_active_list(unsigned long nr_to_scan,
 				    &vm_flags)) {
 			nr_rotated += hpage_nr_pages(page);
 			/*
-			 * Identify referenced, file-backed active pages and
-			 * give them one more trip around the active list. So
+			 * Identify referenced, executable active pages and
+			 * give them one more trip around the active list, so
 			 * that executable code get better chances to stay in
-			 * memory under moderate memory pressure.  Anon pages
-			 * are not likely to be evicted by use-once streaming
-			 * IO, plus JVM can create lots of anon VM_EXEC pages,
-			 * so we ignore them here.
+			 * memory under moderate memory pressure.
 			 */
-			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
+			if ((vm_flags & VM_EXEC)) {
 				list_add(&page->lru, &l_active);
 				continue;
 			}