Message ID | 20200304203235.3623103-1-ajax@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmscan: Prioritize anonymous executable pages like we do file-backed | expand |
On Wed, Mar 04, 2020 at 03:32:35PM -0500, Adam Jackson wrote: > - if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) { > + if ((vm_flags & VM_EXEC)) { You can drop one set of parens here
+ CC folks who focus on reclaim On 3/4/20 9:32 PM, Adam Jackson wrote: > The page reclamation scanner tries to keep executable pages resident, > since taking a hard page fault to satisfy an icache miss is really not > great for interactivity. Anonymous executable pages tend to contain > code that has been just-in-time compiled for performance reasons. By > requiring that executable pages be file-backed, we're putting possibly > the most performance-sensitive code at higher risk of eviction, which > seems backwards. > > On an amd64 machine running Fedora 31, the firefox I happen to have > running requires about 89M of file-backed text and 12M of anonymous text > for 30 open tabs. The next largest process in terms of anonymous text is > gnome-shell, with 1M anonymous and 57M file-backed. No other process had > significant anonymous text, most had none. Penalizing those 13M > specifically when under memory pressure seems like an easy hazard to > avoid. > > Signed-off-by: Adam Jackson <ajax@redhat.com> > --- > mm/vmscan.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ee4eecc7e1c2..9bfbc30d61d8 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan, > &vm_flags)) { > nr_rotated += hpage_nr_pages(page); > /* > - * Identify referenced, file-backed active pages and > - * give them one more trip around the active list. So > + * Identify referenced, executable active pages and > + * give them one more trip around the active list, so > * that executable code get better chances to stay in > - * memory under moderate memory pressure. Anon pages > - * are not likely to be evicted by use-once streaming > - * IO, plus JVM can create lots of anon VM_EXEC pages, > - * so we ignore them here. > + * memory under moderate memory pressure. > */ > - if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) { > + if ((vm_flags & VM_EXEC)) { > list_add(&page->lru, &l_active); > continue; > } >
On Wed 04-03-20 15:32:35, Adam Jackson wrote: > The page reclamation scanner tries to keep executable pages resident, > since taking a hard page fault to satisfy an icache miss is really not > great for interactivity. Anonymous executable pages tend to contain > code that has been just-in-time compiled for performance reasons. By > requiring that executable pages be file-backed, we're putting possibly > the most performance-sensitive code at higher risk of eviction, which > seems backwards. > > On an amd64 machine running Fedora 31, the firefox I happen to have > running requires about 89M of file-backed text and 12M of anonymous text > for 30 open tabs. The next largest process in terms of anonymous text is > gnome-shell, with 1M anonymous and 57M file-backed. No other process had > significant anonymous text, most had none. Penalizing those 13M > specifically when under memory pressure seems like an easy hazard to > avoid. Are you seeing an actual improvement from this change? IIRC the primary motivation to make this heuristic page cache oriented is that it was quite easy to evict file backed memory by streaming IO. This shouldn't really be a major problem for the anonymous memory in most cases. A heavy swapin/out workload is likely to suffer from not having data available more than having the code evicted. But I might be wrong here and getting some numbers would be really interesting. > Signed-off-by: Adam Jackson <ajax@redhat.com> > --- > mm/vmscan.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ee4eecc7e1c2..9bfbc30d61d8 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan, > &vm_flags)) { > nr_rotated += hpage_nr_pages(page); > /* > - * Identify referenced, file-backed active pages and > - * give them one more trip around the active list. So > + * Identify referenced, executable active pages and > + * give them one more trip around the active list, so > * that executable code get better chances to stay in > - * memory under moderate memory pressure. Anon pages > - * are not likely to be evicted by use-once streaming > - * IO, plus JVM can create lots of anon VM_EXEC pages, > - * so we ignore them here. > + * memory under moderate memory pressure. > */ > - if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) { > + if ((vm_flags & VM_EXEC)) { > list_add(&page->lru, &l_active); > continue; > } > -- > 2.23.0 >
On Thu, 2020-03-05 at 16:17 +0100, Michal Hocko wrote: > On Wed 04-03-20 15:32:35, Adam Jackson wrote: > > The page reclamation scanner tries to keep executable pages resident, > > since taking a hard page fault to satisfy an icache miss is really not > > great for interactivity. Anonymous executable pages tend to contain > > code that has been just-in-time compiled for performance reasons. By > > requiring that executable pages be file-backed, we're putting possibly > > the most performance-sensitive code at higher risk of eviction, which > > seems backwards. > > > > On an amd64 machine running Fedora 31, the firefox I happen to have > > running requires about 89M of file-backed text and 12M of anonymous text > > for 30 open tabs. The next largest process in terms of anonymous text is > > gnome-shell, with 1M anonymous and 57M file-backed. No other process had > > significant anonymous text, most had none. Penalizing those 13M > > specifically when under memory pressure seems like an easy hazard to > > avoid. > > Are you seeing an actual improvement from this change? IIRC the primary > motivation to make this heuristic page cache oriented is that it was > quite easy to evict file backed memory by streaming IO. This shouldn't > really be a major problem for the anonymous memory in most cases. A > heavy swapin/out workload is likely to suffer from not having data > available more than having the code evicted. But I might be wrong here > and getting some numbers would be really interesting. They would be! I confess I don't have any, I'll see if I can gather some for you. The problem case for this patch is maybe as much about streaming I/O as it's about just giant wads of dirty data. Given enough pressure, eventually this loop will prefer to evict jitted code before precompiled. I find that to be a difficult preference to justify. If anything maybe you should evict the file-backed code first because you can discard it instead of writing it out (assuming your swap and executables are on comparably fast disk, etc). That's my theory anyway. It's entirely possible I don't understand the larger environment for this code, and like I say I don't have hard data yet. For all I know big data-center-y java jobs might actually want the existing behavior. But it seemed curious enough to warrant at least sending the patch for feedback. - ajax
On Thu, Mar 5, 2020 at 4:47 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > + CC folks who focus on reclaim > > On 3/4/20 9:32 PM, Adam Jackson wrote: > > The page reclamation scanner tries to keep executable pages resident, > > since taking a hard page fault to satisfy an icache miss is really not > > great for interactivity. Anonymous executable pages tend to contain > > code that has been just-in-time compiled for performance reasons. By > > requiring that executable pages be file-backed, we're putting possibly > > the most performance-sensitive code at higher risk of eviction, which > > seems backwards. > > > > On an amd64 machine running Fedora 31, the firefox I happen to have > > running requires about 89M of file-backed text and 12M of anonymous text > > for 30 open tabs. The next largest process in terms of anonymous text is > > gnome-shell, with 1M anonymous and 57M file-backed. No other process had > > significant anonymous text, most had none. Penalizing those 13M > > specifically when under memory pressure seems like an easy hazard to > > avoid. Have you actually seen this issue (i.e. JIT code reclaimed and thrashing) happening for real workloads? > > > > Signed-off-by: Adam Jackson <ajax@redhat.com> > > --- > > mm/vmscan.c | 11 ++++------- > > 1 file changed, 4 insertions(+), 7 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index ee4eecc7e1c2..9bfbc30d61d8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan, > > &vm_flags)) { > > nr_rotated += hpage_nr_pages(page); > > /* > > - * Identify referenced, file-backed active pages and > > - * give them one more trip around the active list. So > > + * Identify referenced, executable active pages and > > + * give them one more trip around the active list, so > > * that executable code get better chances to stay in > > - * memory under moderate memory pressure. Anon pages > > - * are not likely to be evicted by use-once streaming > > - * IO, plus JVM can create lots of anon VM_EXEC pages, > > - * so we ignore them here. > > + * memory under moderate memory pressure. > > */ > > - if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) { Originally this heuristic was added to protect executable file pages from the streaming workloads. Now we have workingset detection for file pages and there is ongoing work for adding that support for anon pages, I am wondering if this specific heuristic is still helpful. > > + if ((vm_flags & VM_EXEC)) { > > list_add(&page->lru, &l_active); > > continue; > > } > > >
On Thu, 2020-03-05 at 12:41 -0800, Shakeel Butt wrote: > Have you actually seen this issue (i.e. JIT code reclaimed and > thrashing) happening for real workloads? The mention of gnome-shell wasn't entirely an accident, it's largely written in javascript which is jit-compiled. Under sufficiently dire disk I/O conditions merely moving the cursor would lag. That ought to be impossible, because nothing in the cursor update path ought to be blocking on the disk and we believe the CPU scheduler to be competent. Plenty of bugs are contributing here, but among them yes the (jitted bits of the) cursor update code can get paged out while megabytes of file-backed text stay resident. It's certainly drowned out by the contribution of the plenty of other bugs involved, but still. > Originally this heuristic was added to protect executable file pages > from the streaming workloads. Now we have workingset detection for > file pages and there is ongoing work for adding that support for anon > pages, I am wondering if this specific heuristic is still helpful. Enh. I would tend to think that code is way more precious than data in terms of staying resident. If the working set detection works well enough to come to that conclusion on its own without explictly knowing about executable pages, that'd be awesome and I'd be entirely fine with removing even more of this heuristic. - ajax
On Thu, Mar 05, 2020 at 05:58:59PM -0500, Adam Jackson wrote: > > Originally this heuristic was added to protect executable file pages > > from the streaming workloads. Now we have workingset detection for > > file pages and there is ongoing work for adding that support for anon > > pages, I am wondering if this specific heuristic is still helpful. > > Enh. I would tend to think that code is way more precious than data in > terms of staying resident. If the working set detection works well > enough to come to that conclusion on its own without explictly knowing > about executable pages, that'd be awesome and I'd be entirely fine with > removing even more of this heuristic. > Given the increased use of JIT engines, the existence of working set detection and the fact it may also work for anonymous pages soon, I think it's worth at least *trying* to remove the heuristic in case it stays around for years as magic code. Creating an automated test case for this would be relatively tricky. Could you put together a debugging patch that simply counts some events to put into the changelog? The events (which could be vmstat) would be o VM_EXEC pages encountered in reclaim o Number exec file-backed pages preserved o Number exec anon pages preserved o Number exec file-backed pages reclaimed o NUmber exec anon pages reclaimed And show the figures before and after in the changelog running Firefox with excessive IO in the background. It's a bit of legwork but it's to preserve in the changelog that this problem definitely happens and the patch has a positive impact. Some comment saying the cursor is not laggy with the patch applied would also be a plus. The debugging patch would not actually be merged. With the figures, if there ever is a bug report that bisects to this patch, it'll be known exactly what the impact and motivation was. That will at least make people pause and think before blindly reverting it. I'm guessing the impact is that the ratio of reclaimed/preserved for anon pages is currently skewed and after the patch it's more in line with the ratio for file-backed. It's a tough prediction as the size of the file vs anon LRUs at the time of reclaim matter as well as the ordering of pages in the LRU.
diff --git a/mm/vmscan.c b/mm/vmscan.c index ee4eecc7e1c2..9bfbc30d61d8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2095,15 +2095,12 @@ static void shrink_active_list(unsigned long nr_to_scan, &vm_flags)) { nr_rotated += hpage_nr_pages(page); /* - * Identify referenced, file-backed active pages and - * give them one more trip around the active list. So + * Identify referenced, executable active pages and + * give them one more trip around the active list, so * that executable code get better chances to stay in - * memory under moderate memory pressure. Anon pages - * are not likely to be evicted by use-once streaming - * IO, plus JVM can create lots of anon VM_EXEC pages, - * so we ignore them here. + * memory under moderate memory pressure. */ - if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) { + if ((vm_flags & VM_EXEC)) { list_add(&page->lru, &l_active); continue; }
The page reclamation scanner tries to keep executable pages resident, since taking a hard page fault to satisfy an icache miss is really not great for interactivity. Anonymous executable pages tend to contain code that has been just-in-time compiled for performance reasons. By requiring that executable pages be file-backed, we're putting possibly the most performance-sensitive code at higher risk of eviction, which seems backwards. On an amd64 machine running Fedora 31, the firefox I happen to have running requires about 89M of file-backed text and 12M of anonymous text for 30 open tabs. The next largest process in terms of anonymous text is gnome-shell, with 1M anonymous and 57M file-backed. No other process had significant anonymous text, most had none. Penalizing those 13M specifically when under memory pressure seems like an easy hazard to avoid. Signed-off-by: Adam Jackson <ajax@redhat.com> --- mm/vmscan.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)