diff mbox series

[RESEND] mm/vmscan: shrink slab in node reclaim

Message ID 1564538401-21353-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] mm/vmscan: shrink slab in node reclaim | expand

Commit Message

Yafang Shao July 31, 2019, 2 a.m. UTC
In the node reclaim, may_shrinkslab is 0 by default,
hence shrink_slab will never be performed in it.
While shrik_slab should be performed if the relcaimable slab is over
min slab limit.

If reclaimable pagecache is less than min_unmapped_pages while
reclaimable slab is greater than min_slab_pages, we only shrink slab.
Otherwise the min_unmapped_pages will be useless under this condition.
A new bitmask no_pagecache is introduced in scan_control for this
purpose, which is 0 by default.
Once __node_reclaim() is called, either the reclaimable pagecache is
greater than min_unmapped_pages or reclaimable slab is greater than
min_slab_pages, that is ensured in function node_reclaim(). So wen can
remove the if statement in __node_reclaim().

sc.reclaim_state.reclaimed_slab will tell us how many pages are
reclaimed in shrink slab.

This issue is very easy to produce, first you continuously cat a random
non-exist file to produce more and more dentry, then you read big file
to produce page cache. And finally you will find that the denty will
never be shrunk in node reclaim (they can only be shrunk in kswapd until
the watermark is reached).

Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
reclaim. Someone may prefer to enable it if their different workloads work
on different nodes.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 mm/vmscan.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

Comments

Daniel Jordan Aug. 5, 2019, 9:44 p.m. UTC | #1
Hi Yafang,

On Tue, Jul 30, 2019 at 10:00:01PM -0400, Yafang Shao wrote:
> In the node reclaim, may_shrinkslab is 0 by default,
> hence shrink_slab will never be performed in it.
> While shrik_slab should be performed if the relcaimable slab is over
> min slab limit.

Nice catch, I think this needs

Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")

> If reclaimable pagecache is less than min_unmapped_pages while
> reclaimable slab is greater than min_slab_pages, we only shrink slab.
> Otherwise the min_unmapped_pages will be useless under this condition.
> A new bitmask no_pagecache is introduced in scan_control for this
> purpose, which is 0 by default.
> Once __node_reclaim() is called, either the reclaimable pagecache is
> greater than min_unmapped_pages or reclaimable slab is greater than
> min_slab_pages, that is ensured in function node_reclaim(). So wen can
> remove the if statement in __node_reclaim().

Why is the if statement there to begin with then, if the condition has
already been checked in node_reclaim?  Looks like it came in with
0ff38490c836 ("[PATCH] zone_reclaim: dynamic slab reclaim"), but it's not
obvious to me why.  Maybe Christoph remembers.

I found this part of the changelog kind of hard to parse.  This instead instead
of above block?

    Add scan_control::no_pagecache so shrink_node can decide to reclaim page
    cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
    shrink_node will do at least one of the two because otherwise node_reclaim
    returns early.

Maybe start the next paragraph with

  __node_reclaim can detect when enough slab has been reclaimed because...

> sc.reclaim_state.reclaimed_slab will tell us how many pages are
> reclaimed in shrink slab.
...

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 47aa215..1e410ef 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -91,6 +91,9 @@ struct scan_control {
>  	/* e.g. boosted watermark reclaim leaves slabs alone */
>  	unsigned int may_shrinkslab:1;
>  
> +	/* in node relcaim mode, we may shrink slab only */

                   reclaim

> @@ -4268,6 +4273,10 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  		.may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE),
>  		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
>  		.may_swap = 1,
> +		.may_shrinkslab = (node_page_state(pgdat, NR_SLAB_RECLAIMABLE) >
> +				   pgdat->min_slab_pages),
> +		.no_pagecache = !(node_pagecache_reclaimable(pgdat) >
> +				  pgdat->min_unmapped_pages),

It's less awkward to do away with the ! and invert the condition.
Yafang Shao Aug. 6, 2019, 6:44 a.m. UTC | #2
On Tue, Aug 6, 2019 at 5:44 AM Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
>
> Hi Yafang,
>
> On Tue, Jul 30, 2019 at 10:00:01PM -0400, Yafang Shao wrote:
> > In the node reclaim, may_shrinkslab is 0 by default,
> > hence shrink_slab will never be performed in it.
> > While shrik_slab should be performed if the relcaimable slab is over
> > min slab limit.
>
> Nice catch, I think this needs
>
> Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
>

Thanks. I will add it.

> > If reclaimable pagecache is less than min_unmapped_pages while
> > reclaimable slab is greater than min_slab_pages, we only shrink slab.
> > Otherwise the min_unmapped_pages will be useless under this condition.
> > A new bitmask no_pagecache is introduced in scan_control for this
> > purpose, which is 0 by default.
> > Once __node_reclaim() is called, either the reclaimable pagecache is
> > greater than min_unmapped_pages or reclaimable slab is greater than
> > min_slab_pages, that is ensured in function node_reclaim(). So wen can
> > remove the if statement in __node_reclaim().
>
> Why is the if statement there to begin with then, if the condition has
> already been checked in node_reclaim?

In node_reclaim it is
if (condition_pagecache || condition_slab)
     will_do___node_reclaim();

After scan_control::no_pagecache is introuduced, we don't need the if
statement in
___node_reclaim() any more.

> Looks like it came in with
> 0ff38490c836 ("[PATCH] zone_reclaim: dynamic slab reclaim"), but it's not
> obvious to me why.  Maybe Christoph remembers.
>

> I found this part of the changelog kind of hard to parse.  This instead instead
> of above block?
>
>     Add scan_control::no_pagecache so shrink_node can decide to reclaim page
>     cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
>     shrink_node will do at least one of the two because otherwise node_reclaim
>     returns early.
>
> Maybe start the next paragraph with
>
>   __node_reclaim can detect when enough slab has been reclaimed because...
>

That's better. I appreciate your improvement on the changlog. I will update it.

> > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > reclaimed in shrink slab.
> ...
>
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 47aa215..1e410ef 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -91,6 +91,9 @@ struct scan_control {
> >       /* e.g. boosted watermark reclaim leaves slabs alone */
> >       unsigned int may_shrinkslab:1;
> >
> > +     /* in node relcaim mode, we may shrink slab only */
>
>                    reclaim

Thanks. I will correct it.

>
> > @@ -4268,6 +4273,10 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >               .may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE),
> >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> >               .may_swap = 1,
> > +             .may_shrinkslab = (node_page_state(pgdat, NR_SLAB_RECLAIMABLE) >
> > +                                pgdat->min_slab_pages),
> > +             .no_pagecache = !(node_pagecache_reclaimable(pgdat) >
> > +                               pgdat->min_unmapped_pages),
>
> It's less awkward to do away with the ! and invert the condition.

Sure.

Thanks
Yafang
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 47aa215..1e410ef 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -91,6 +91,9 @@  struct scan_control {
 	/* e.g. boosted watermark reclaim leaves slabs alone */
 	unsigned int may_shrinkslab:1;
 
+	/* in node relcaim mode, we may shrink slab only */
+	unsigned int no_pagecache:1;
+
 	/*
 	 * Cgroups are not reclaimed below their configured memory.low,
 	 * unless we threaten to OOM. If any cgroups are skipped due to
@@ -2831,7 +2834,9 @@  static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 			reclaimed = sc->nr_reclaimed;
 			scanned = sc->nr_scanned;
-			shrink_node_memcg(pgdat, memcg, sc);
+
+			if (!sc->no_pagecache)
+				shrink_node_memcg(pgdat, memcg, sc);
 
 			if (sc->may_shrinkslab) {
 				shrink_slab(sc->gfp_mask, pgdat->node_id,
@@ -4268,6 +4273,10 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 		.may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
 		.may_swap = 1,
+		.may_shrinkslab = (node_page_state(pgdat, NR_SLAB_RECLAIMABLE) >
+				   pgdat->min_slab_pages),
+		.no_pagecache = !(node_pagecache_reclaimable(pgdat) >
+				  pgdat->min_unmapped_pages),
 		.reclaim_idx = gfp_zone(gfp_mask),
 	};
 
@@ -4285,15 +4294,13 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	p->flags |= PF_SWAPWRITE;
 	set_task_reclaim_state(p, &sc.reclaim_state);
 
-	if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
-		/*
-		 * Free memory by calling shrink node with increasing
-		 * priorities until we have enough memory freed.
-		 */
-		do {
-			shrink_node(pgdat, &sc);
-		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
-	}
+	/*
+	 * Free memory by calling shrink node with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	do {
+		shrink_node(pgdat, &sc);
+	} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 
 	set_task_reclaim_state(p, NULL);
 	current->flags &= ~PF_SWAPWRITE;