diff mbox series

[4/5] mm: fix check_move_unevictable_pages() on THP

Message ID alpine.LSU.2.11.2008301405000.5954@eggly.anvils (mailing list archive)
State New, archived
Headers show
Series mm: fixes to past from future testing | expand

Commit Message

Hugh Dickins Aug. 30, 2020, 9:08 p.m. UTC
check_move_unevictable_pages() is used in making unevictable shmem pages
evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
i915/gem check_release_pagevec().  Those may pass down subpages of a huge
page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".

That does not crash or warn at present, but the accounting of vmstats
unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
scanned being incremented on each subpage, rescued only on the head
(since tails already appear evictable once the head has been updated).

5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events in general (and unevictable_pgs_rescued in
particular) should count every subpage: so follow that precedent here.

Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
(to check page->mem_cgroup is always set), no problem: skip the tails
before calling it, and add thp_nr_pages() to vmstats on the head.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Nothing here worth going to stable, since it's just a testing config
that is fixed, whose event numbers are not very important; but this
will be needed before Alex Shi's warning, and might as well go in now.

The callers of check_move_unevictable_pages() could be optimized,
to skip over tails: but Matthew Wilcox has other changes in flight
there, so let's skip the optimization for now.

 mm/vmscan.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Shakeel Butt Aug. 31, 2020, 2:44 p.m. UTC | #1
On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <hughd@google.com> wrote:
>
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec().  Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
>
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
>
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

Thanks for catching this.

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Alex Shi Sept. 1, 2020, 2:04 a.m. UTC | #2
在 2020/8/31 上午5:08, Hugh Dickins 写道:
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec().  Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
> 
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
> 
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
> 
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> Nothing here worth going to stable, since it's just a testing config
> that is fixed, whose event numbers are not very important; but this
> will be needed before Alex Shi's warning, and might as well go in now.
> 
> The callers of check_move_unevictable_pages() could be optimized,
> to skip over tails: but Matthew Wilcox has other changes in flight
> there, so let's skip the optimization for now.
> 
>  mm/vmscan.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> --- 5.9-rc2/mm/vmscan.c	2020-08-16 17:32:50.721507348 -0700
> +++ linux/mm/vmscan.c	2020-08-28 17:47:10.595580876 -0700
> @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
>  	for (i = 0; i < pvec->nr; i++) {
>  		struct page *page = pvec->pages[i];
>  		struct pglist_data *pagepgdat = page_pgdat(page);
> +		int nr_pages;
> +
> +		if (PageTransTail(page))
> +			continue;
> +
> +		nr_pages = thp_nr_pages(page);
> +		pgscanned += nr_pages;
>  
> -		pgscanned++;
>  		if (pagepgdat != pgdat) {
>  			if (pgdat)
>  				spin_unlock_irq(&pgdat->lru_lock);
> @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
>  			ClearPageUnevictable(page);
>  			del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
>  			add_page_to_lru_list(page, lruvec, lru);

So, we might randomly del or add a thp tail page into lru?
It's interesting to know here. :)

Thanks
Alex

> -			pgrescued++;
> +			pgrescued += nr_pages;
>  		}
>  	}
>  
>
Hugh Dickins Sept. 1, 2020, 3:59 a.m. UTC | #3
On Tue, 1 Sep 2020, Alex Shi wrote:
> 在 2020/8/31 上午5:08, Hugh Dickins 写道:
> > --- 5.9-rc2/mm/vmscan.c	2020-08-16 17:32:50.721507348 -0700
> > +++ linux/mm/vmscan.c	2020-08-28 17:47:10.595580876 -0700
> > @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
> >  	for (i = 0; i < pvec->nr; i++) {
> >  		struct page *page = pvec->pages[i];
> >  		struct pglist_data *pagepgdat = page_pgdat(page);
> > +		int nr_pages;
> > +
> > +		if (PageTransTail(page))
> > +			continue;
> > +
> > +		nr_pages = thp_nr_pages(page);
> > +		pgscanned += nr_pages;
> >  
> > -		pgscanned++;
> >  		if (pagepgdat != pgdat) {
> >  			if (pgdat)
> >  				spin_unlock_irq(&pgdat->lru_lock);
> > @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
> >  			ClearPageUnevictable(page);
> >  			del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
> >  			add_page_to_lru_list(page, lruvec, lru);
> 
> So, we might randomly del or add a thp tail page into lru?
> It's interesting to know here. :)

No, it wasn't that interesting, I'd have been more concerned if it was
like that.  Amusing idea though: because the act of adding a thp tail
to lru would clear the very bit that says it's a tail.

		if (!PageLRU(page) || !PageUnevictable(page))
			continue;

Let's see: PageLRU and PageUnevictable are both of the PF_HEAD type,
so permitted on tails, but always redirecting to head: so typically
PageLRU(page) would be true, because head on lru; but PageUnevictable(page)
would be false on tail, because head has already been moved to an evictable
lru by this same function.  So it safely went the "continue" way, but
without incrementing pgrescued.

Hugh

> 
> Thanks
> Alex
> 
> > -			pgrescued++;
> > +			pgrescued += nr_pages;
> >  		}
> >  	}
> >
Yang Shi Sept. 1, 2020, 3:57 p.m. UTC | #4
On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <hughd@google.com> wrote:
>
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec().  Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
>
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
>
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Yang Shi <shy828301@gmail.com>

> ---
> Nothing here worth going to stable, since it's just a testing config
> that is fixed, whose event numbers are not very important; but this
> will be needed before Alex Shi's warning, and might as well go in now.
>
> The callers of check_move_unevictable_pages() could be optimized,
> to skip over tails: but Matthew Wilcox has other changes in flight
> there, so let's skip the optimization for now.
>
>  mm/vmscan.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700
> +++ linux/mm/vmscan.c   2020-08-28 17:47:10.595580876 -0700
> @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
>         for (i = 0; i < pvec->nr; i++) {
>                 struct page *page = pvec->pages[i];
>                 struct pglist_data *pagepgdat = page_pgdat(page);
> +               int nr_pages;
> +
> +               if (PageTransTail(page))
> +                       continue;
> +
> +               nr_pages = thp_nr_pages(page);
> +               pgscanned += nr_pages;
>
> -               pgscanned++;
>                 if (pagepgdat != pgdat) {
>                         if (pgdat)
>                                 spin_unlock_irq(&pgdat->lru_lock);
> @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
>                         ClearPageUnevictable(page);
>                         del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
>                         add_page_to_lru_list(page, lruvec, lru);
> -                       pgrescued++;
> +                       pgrescued += nr_pages;
>                 }
>         }
>
>
diff mbox series

Patch

--- 5.9-rc2/mm/vmscan.c	2020-08-16 17:32:50.721507348 -0700
+++ linux/mm/vmscan.c	2020-08-28 17:47:10.595580876 -0700
@@ -4260,8 +4260,14 @@  void check_move_unevictable_pages(struct
 	for (i = 0; i < pvec->nr; i++) {
 		struct page *page = pvec->pages[i];
 		struct pglist_data *pagepgdat = page_pgdat(page);
+		int nr_pages;
+
+		if (PageTransTail(page))
+			continue;
+
+		nr_pages = thp_nr_pages(page);
+		pgscanned += nr_pages;
 
-		pgscanned++;
 		if (pagepgdat != pgdat) {
 			if (pgdat)
 				spin_unlock_irq(&pgdat->lru_lock);
@@ -4280,7 +4286,7 @@  void check_move_unevictable_pages(struct
 			ClearPageUnevictable(page);
 			del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
 			add_page_to_lru_list(page, lruvec, lru);
-			pgrescued++;
+			pgrescued += nr_pages;
 		}
 	}