diff mbox series

[v2] mm/vmscan: shrink slab in node reclaim

Message ID 1565075940-23121-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm/vmscan: shrink slab in node reclaim | expand

Commit Message

Yafang Shao Aug. 6, 2019, 7:19 a.m. UTC
In the node reclaim, may_shrinkslab is 0 by default,
hence shrink_slab will never be performed in it.
While shrik_slab should be performed if the relcaimable slab is over
min slab limit.

Add scan_control::no_pagecache so shrink_node can decide to reclaim page
cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
shrink_node will do at least one of the two because otherwise node_reclaim
returns early.

__node_reclaim can detect when enough slab has been reclaimed because
sc.reclaim_state.reclaimed_slab will tell us how many pages are
reclaimed in shrink slab.

This issue is very easy to produce, first you continuously cat a random
non-exist file to produce more and more dentry, then you read big file
to produce page cache. And finally you will find that the denty will
never be shrunk in node reclaim (they can only be shrunk in kswapd until
the watermark is reached).

Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
reclaim. Someone may prefer to enable it if their different workloads work
on different nodes.

[Daniel improved the changelog]

Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 mm/vmscan.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

Comments

Michal Hocko Aug. 6, 2019, 7:35 a.m. UTC | #1
On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> In the node reclaim, may_shrinkslab is 0 by default,
> hence shrink_slab will never be performed in it.
> While shrik_slab should be performed if the relcaimable slab is over
> min slab limit.
> 
> Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> shrink_node will do at least one of the two because otherwise node_reclaim
> returns early.
> 
> __node_reclaim can detect when enough slab has been reclaimed because
> sc.reclaim_state.reclaimed_slab will tell us how many pages are
> reclaimed in shrink slab.
> 
> This issue is very easy to produce, first you continuously cat a random
> non-exist file to produce more and more dentry, then you read big file
> to produce page cache. And finally you will find that the denty will
> never be shrunk in node reclaim (they can only be shrunk in kswapd until
> the watermark is reached).
> 
> Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> reclaim. Someone may prefer to enable it if their different workloads work
> on different nodes.

Considering that this is a long term behavior of a rarely used node
reclaim I would rather not touch it unless some _real_ workload suffers
from this behavior. Or is there any reason to fix this even though there
is no evidence of real workloads suffering from the current behavior?
Michal Hocko Aug. 6, 2019, 7:41 a.m. UTC | #2
On Tue 06-08-19 09:35:25, Michal Hocko wrote:
> On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> > In the node reclaim, may_shrinkslab is 0 by default,
> > hence shrink_slab will never be performed in it.
> > While shrik_slab should be performed if the relcaimable slab is over
> > min slab limit.
> > 
> > Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> > cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> > shrink_node will do at least one of the two because otherwise node_reclaim
> > returns early.
> > 
> > __node_reclaim can detect when enough slab has been reclaimed because
> > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > reclaimed in shrink slab.
> > 
> > This issue is very easy to produce, first you continuously cat a random
> > non-exist file to produce more and more dentry, then you read big file
> > to produce page cache. And finally you will find that the denty will
> > never be shrunk in node reclaim (they can only be shrunk in kswapd until
> > the watermark is reached).
> > 
> > Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> > reclaim. Someone may prefer to enable it if their different workloads work
> > on different nodes.
> 
> Considering that this is a long term behavior of a rarely used node
> reclaim I would rather not touch it unless some _real_ workload suffers
> from this behavior. Or is there any reason to fix this even though there
> is no evidence of real workloads suffering from the current behavior?

I have only now noticed that you have added
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")

could you be more specific how that commit introduced a bug in the node
reclaim? It has introduced may_shrink_slab but the direct reclaim seems
to set it to 1.
Yafang Shao Aug. 6, 2019, 8:23 a.m. UTC | #3
On Tue, Aug 6, 2019 at 3:35 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> > In the node reclaim, may_shrinkslab is 0 by default,
> > hence shrink_slab will never be performed in it.
> > While shrik_slab should be performed if the relcaimable slab is over
> > min slab limit.
> >
> > Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> > cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> > shrink_node will do at least one of the two because otherwise node_reclaim
> > returns early.
> >
> > __node_reclaim can detect when enough slab has been reclaimed because
> > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > reclaimed in shrink slab.
> >
> > This issue is very easy to produce, first you continuously cat a random
> > non-exist file to produce more and more dentry, then you read big file
> > to produce page cache. And finally you will find that the denty will
> > never be shrunk in node reclaim (they can only be shrunk in kswapd until
> > the watermark is reached).
> >
> > Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> > reclaim. Someone may prefer to enable it if their different workloads work
> > on different nodes.
>
> Considering that this is a long term behavior of a rarely used node
> reclaim I would rather not touch it unless some _real_ workload suffers
> from this behavior. Or is there any reason to fix this even though there
> is no evidence of real workloads suffering from the current behavior?
> --

When we do performance tuning on some workloads(especially if this
workload is NUMA sensitive), sometimes we may enable it on our test
environment and then do some benchmark to  dicide whether or not
applying it on the production envrioment. Although the result is not
good enough as expected, it is really a performance tuning knob.

Thanks
Yafang
Yafang Shao Aug. 6, 2019, 8:57 a.m. UTC | #4
On Tue, Aug 6, 2019 at 3:41 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 09:35:25, Michal Hocko wrote:
> > On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> > > In the node reclaim, may_shrinkslab is 0 by default,
> > > hence shrink_slab will never be performed in it.
> > > While shrik_slab should be performed if the relcaimable slab is over
> > > min slab limit.
> > >
> > > Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> > > cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> > > shrink_node will do at least one of the two because otherwise node_reclaim
> > > returns early.
> > >
> > > __node_reclaim can detect when enough slab has been reclaimed because
> > > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > > reclaimed in shrink slab.
> > >
> > > This issue is very easy to produce, first you continuously cat a random
> > > non-exist file to produce more and more dentry, then you read big file
> > > to produce page cache. And finally you will find that the denty will
> > > never be shrunk in node reclaim (they can only be shrunk in kswapd until
> > > the watermark is reached).
> > >
> > > Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> > > reclaim. Someone may prefer to enable it if their different workloads work
> > > on different nodes.
> >
> > Considering that this is a long term behavior of a rarely used node
> > reclaim I would rather not touch it unless some _real_ workload suffers
> > from this behavior. Or is there any reason to fix this even though there
> > is no evidence of real workloads suffering from the current behavior?
>
> I have only now noticed that you have added
> Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
>
> could you be more specific how that commit introduced a bug in the node
> reclaim? It has introduced may_shrink_slab but the direct reclaim seems
> to set it to 1.

As you said, the direct reclaim path set it to 1, but the
__node_reclaim() forgot to process may_shrink_slab.

Thanks
Yafang
Michal Hocko Aug. 6, 2019, 9:05 a.m. UTC | #5
On Tue 06-08-19 16:57:22, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 3:41 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 06-08-19 09:35:25, Michal Hocko wrote:
> > > On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> > > > In the node reclaim, may_shrinkslab is 0 by default,
> > > > hence shrink_slab will never be performed in it.
> > > > While shrik_slab should be performed if the relcaimable slab is over
> > > > min slab limit.
> > > >
> > > > Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> > > > cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> > > > shrink_node will do at least one of the two because otherwise node_reclaim
> > > > returns early.
> > > >
> > > > __node_reclaim can detect when enough slab has been reclaimed because
> > > > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > > > reclaimed in shrink slab.
> > > >
> > > > This issue is very easy to produce, first you continuously cat a random
> > > > non-exist file to produce more and more dentry, then you read big file
> > > > to produce page cache. And finally you will find that the denty will
> > > > never be shrunk in node reclaim (they can only be shrunk in kswapd until
> > > > the watermark is reached).
> > > >
> > > > Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> > > > reclaim. Someone may prefer to enable it if their different workloads work
> > > > on different nodes.
> > >
> > > Considering that this is a long term behavior of a rarely used node
> > > reclaim I would rather not touch it unless some _real_ workload suffers
> > > from this behavior. Or is there any reason to fix this even though there
> > > is no evidence of real workloads suffering from the current behavior?
> >
> > I have only now noticed that you have added
> > Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
> >
> > could you be more specific how that commit introduced a bug in the node
> > reclaim? It has introduced may_shrink_slab but the direct reclaim seems
> > to set it to 1.
> 
> As you said, the direct reclaim path set it to 1, but the
> __node_reclaim() forgot to process may_shrink_slab.

OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
get back to the original behavior by setting may_shrink_slab in that
path as well?
Yafang Shao Aug. 6, 2019, 9:15 a.m. UTC | #6
On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 16:57:22, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 3:41 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 06-08-19 09:35:25, Michal Hocko wrote:
> > > > On Tue 06-08-19 03:19:00, Yafang Shao wrote:
> > > > > In the node reclaim, may_shrinkslab is 0 by default,
> > > > > hence shrink_slab will never be performed in it.
> > > > > While shrik_slab should be performed if the relcaimable slab is over
> > > > > min slab limit.
> > > > >
> > > > > Add scan_control::no_pagecache so shrink_node can decide to reclaim page
> > > > > cache, slab, or both as dictated by min_unmapped_pages and min_slab_pages.
> > > > > shrink_node will do at least one of the two because otherwise node_reclaim
> > > > > returns early.
> > > > >
> > > > > __node_reclaim can detect when enough slab has been reclaimed because
> > > > > sc.reclaim_state.reclaimed_slab will tell us how many pages are
> > > > > reclaimed in shrink slab.
> > > > >
> > > > > This issue is very easy to produce, first you continuously cat a random
> > > > > non-exist file to produce more and more dentry, then you read big file
> > > > > to produce page cache. And finally you will find that the denty will
> > > > > never be shrunk in node reclaim (they can only be shrunk in kswapd until
> > > > > the watermark is reached).
> > > > >
> > > > > Regarding vm.zone_reclaim_mode, we always set it to zero to disable node
> > > > > reclaim. Someone may prefer to enable it if their different workloads work
> > > > > on different nodes.
> > > >
> > > > Considering that this is a long term behavior of a rarely used node
> > > > reclaim I would rather not touch it unless some _real_ workload suffers
> > > > from this behavior. Or is there any reason to fix this even though there
> > > > is no evidence of real workloads suffering from the current behavior?
> > >
> > > I have only now noticed that you have added
> > > Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
> > >
> > > could you be more specific how that commit introduced a bug in the node
> > > reclaim? It has introduced may_shrink_slab but the direct reclaim seems
> > > to set it to 1.
> >
> > As you said, the direct reclaim path set it to 1, but the
> > __node_reclaim() forgot to process may_shrink_slab.
>
> OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> get back to the original behavior by setting may_shrink_slab in that
> path as well?

You mean do it as the commit 0ff38490c836 did  before ?
I haven't check in which commit the shrink_slab() is removed from
__node_reclaim().
But I think introduce a flag can make it more robust, otherwise we
have to modify shrink_node() and there will be more changes.

Thanks
Yafang
Michal Hocko Aug. 6, 2019, 9:25 a.m. UTC | #7
On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > > As you said, the direct reclaim path set it to 1, but the
> > > __node_reclaim() forgot to process may_shrink_slab.
> >
> > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > get back to the original behavior by setting may_shrink_slab in that
> > path as well?
> 
> You mean do it as the commit 0ff38490c836 did  before ?
> I haven't check in which commit the shrink_slab() is removed from

What I've had in mind was essentially this:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7889f583ced9..8011288a80e2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
 		.may_swap = 1,
 		.reclaim_idx = gfp_zone(gfp_mask),
+		.may_shrinkslab = 1;
 	};
 
 	trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,

shrink_node path already does shrink slab when the flag allows that. In
other words get us back to before 1c30844d2dfe because that has clearly
changed the long term node reclaim behavior just recently.
Yafang Shao Aug. 6, 2019, 9:32 a.m. UTC | #8
On Tue, Aug 6, 2019 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > > As you said, the direct reclaim path set it to 1, but the
> > > > __node_reclaim() forgot to process may_shrink_slab.
> > >
> > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > get back to the original behavior by setting may_shrink_slab in that
> > > path as well?
> >
> > You mean do it as the commit 0ff38490c836 did  before ?
> > I haven't check in which commit the shrink_slab() is removed from
>
> What I've had in mind was essentially this:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7889f583ced9..8011288a80e2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>                 .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
>                 .may_swap = 1,
>                 .reclaim_idx = gfp_zone(gfp_mask),
> +               .may_shrinkslab = 1;
>         };
>
>         trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
>
> shrink_node path already does shrink slab when the flag allows that. In
> other words get us back to before 1c30844d2dfe because that has clearly
> changed the long term node reclaim behavior just recently.
> --

If we do it like this, then vm.min_slab_ratio will not take effect if
there're enough relcaimable page cache.
Seems there're bugs in the original behavior as well.

Thanks
Yafang
Mel Gorman Aug. 6, 2019, 9:50 a.m. UTC | #9
On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > > As you said, the direct reclaim path set it to 1, but the
> > > > __node_reclaim() forgot to process may_shrink_slab.
> > >
> > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > get back to the original behavior by setting may_shrink_slab in that
> > > path as well?
> > 
> > You mean do it as the commit 0ff38490c836 did  before ?
> > I haven't check in which commit the shrink_slab() is removed from
> 
> What I've had in mind was essentially this:
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7889f583ced9..8011288a80e2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
>  		.may_swap = 1,
>  		.reclaim_idx = gfp_zone(gfp_mask),
> +		.may_shrinkslab = 1;
>  	};
>  
>  	trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> 
> shrink_node path already does shrink slab when the flag allows that. In
> other words get us back to before 1c30844d2dfe because that has clearly
> changed the long term node reclaim behavior just recently.

I'd be fine with this change. It was not intentional to significantly
change the behaviour of node reclaim in that patch.
Yafang Shao Aug. 6, 2019, 9:54 a.m. UTC | #10
On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > >
> > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > get back to the original behavior by setting may_shrink_slab in that
> > > > path as well?
> > >
> > > You mean do it as the commit 0ff38490c836 did  before ?
> > > I haven't check in which commit the shrink_slab() is removed from
> >
> > What I've had in mind was essentially this:
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 7889f583ced9..8011288a80e2 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> >               .may_swap = 1,
> >               .reclaim_idx = gfp_zone(gfp_mask),
> > +             .may_shrinkslab = 1;
> >       };
> >
> >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> >
> > shrink_node path already does shrink slab when the flag allows that. In
> > other words get us back to before 1c30844d2dfe because that has clearly
> > changed the long term node reclaim behavior just recently.
>
> I'd be fine with this change. It was not intentional to significantly
> change the behaviour of node reclaim in that patch.
>

But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
Right ?

Thanks
Yafang
Michal Hocko Aug. 6, 2019, 10:28 a.m. UTC | #11
On Tue 06-08-19 17:54:02, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > >
> > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > path as well?
> > > >
> > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > I haven't check in which commit the shrink_slab() is removed from
> > >
> > > What I've had in mind was essentially this:
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 7889f583ced9..8011288a80e2 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > >               .may_swap = 1,
> > >               .reclaim_idx = gfp_zone(gfp_mask),
> > > +             .may_shrinkslab = 1;
> > >       };
> > >
> > >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > >
> > > shrink_node path already does shrink slab when the flag allows that. In
> > > other words get us back to before 1c30844d2dfe because that has clearly
> > > changed the long term node reclaim behavior just recently.
> >
> > I'd be fine with this change. It was not intentional to significantly
> > change the behaviour of node reclaim in that patch.
> >
> 
> But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
> Right ?

Yes, and the answer for that is a question why do we even care? Which
real life workload does suffer from the of min_slab_ratio misbehavior.
Also it is much more preferred to fix an obvious bug/omission which
lack of may_shrinkslab in node reclaim seem to be than a larger rewrite
with a harder to see changes.

Really, I wouldn't be opposing normally but node_reclaim is an odd ball
rarely used and changing its behavior based on some trivial testing
doesn't sound very convincing to me.
Yafang Shao Aug. 6, 2019, 10:59 a.m. UTC | #12
On Tue, Aug 6, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 17:54:02, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > [...]
> > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > >
> > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > path as well?
> > > > >
> > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > I haven't check in which commit the shrink_slab() is removed from
> > > >
> > > > What I've had in mind was essentially this:
> > > >
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 7889f583ced9..8011288a80e2 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > >               .may_swap = 1,
> > > >               .reclaim_idx = gfp_zone(gfp_mask),
> > > > +             .may_shrinkslab = 1;
> > > >       };
> > > >
> > > >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > >
> > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > changed the long term node reclaim behavior just recently.
> > >
> > > I'd be fine with this change. It was not intentional to significantly
> > > change the behaviour of node reclaim in that patch.
> > >
> >
> > But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
> > Right ?
>
> Yes, and the answer for that is a question why do we even care? Which
> real life workload does suffer from the of min_slab_ratio misbehavior.
> Also it is much more preferred to fix an obvious bug/omission which
> lack of may_shrinkslab in node reclaim seem to be than a larger rewrite
> with a harder to see changes.
>

Fixing the bug in min_slab_ratio doesn't  require much change, as it
just introduce a new bit in scan_control which doesn't require more
space
and a if-branch in shrink_node() which doesn't take much cpu cycles
neither, and it will not take much maintaince neither as no_pagecache
is 0 by default and then we don't need to worry about what if we
forget it.

> Really, I wouldn't be opposing normally but node_reclaim is an odd ball
> rarely used and changing its behavior based on some trivial testing
> doesn't sound very convincing to me.
>

Well.  I'm not insist if Andrew prefer your change.

Thanks
Yafang
Michal Hocko Aug. 6, 2019, 11:09 a.m. UTC | #13
On Tue 06-08-19 18:59:52, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 06-08-19 17:54:02, Yafang Shao wrote:
> > > On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> > > >
> > > > On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > [...]
> > > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > > >
> > > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > > path as well?
> > > > > >
> > > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > > I haven't check in which commit the shrink_slab() is removed from
> > > > >
> > > > > What I've had in mind was essentially this:
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 7889f583ced9..8011288a80e2 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > > >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > > >               .may_swap = 1,
> > > > >               .reclaim_idx = gfp_zone(gfp_mask),
> > > > > +             .may_shrinkslab = 1;
> > > > >       };
> > > > >
> > > > >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > > >
> > > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > > changed the long term node reclaim behavior just recently.
> > > >
> > > > I'd be fine with this change. It was not intentional to significantly
> > > > change the behaviour of node reclaim in that patch.
> > > >
> > >
> > > But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
> > > Right ?
> >
> > Yes, and the answer for that is a question why do we even care? Which
> > real life workload does suffer from the of min_slab_ratio misbehavior.
> > Also it is much more preferred to fix an obvious bug/omission which
> > lack of may_shrinkslab in node reclaim seem to be than a larger rewrite
> > with a harder to see changes.
> >
> 
> Fixing the bug in min_slab_ratio doesn't  require much change, as it
> just introduce a new bit in scan_control which doesn't require more
> space
> and a if-branch in shrink_node() which doesn't take much cpu cycles
> neither, and it will not take much maintaince neither as no_pagecache
> is 0 by default and then we don't need to worry about what if we
> forget it.

You are still missing my point, I am afraid. I am not saying your change
is wrong or complex. I am saying that there is an established behavior
(even when wrong) that node-reclaim dependent loads might depend on.
Your testing doesn't really suggest you have done much testing beyond
the targeted one which is quite artificial to say the least.

Maybe there are workloads which do depend on proper min_slab_ratio
behavior but it would be much more preferable to hear from them rather
than change the behavior based on the code inspection and a
microbenchmark.

Is my thinking more clear now?
Mel Gorman Aug. 6, 2019, 11:14 a.m. UTC | #14
On Tue, Aug 06, 2019 at 05:32:54PM +0800, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > >
> > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > get back to the original behavior by setting may_shrink_slab in that
> > > > path as well?
> > >
> > > You mean do it as the commit 0ff38490c836 did  before ?
> > > I haven't check in which commit the shrink_slab() is removed from
> >
> > What I've had in mind was essentially this:
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 7889f583ced9..8011288a80e2 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >                 .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> >                 .may_swap = 1,
> >                 .reclaim_idx = gfp_zone(gfp_mask),
> > +               .may_shrinkslab = 1;
> >         };
> >
> >         trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> >
> > shrink_node path already does shrink slab when the flag allows that. In
> > other words get us back to before 1c30844d2dfe because that has clearly
> > changed the long term node reclaim behavior just recently.
> > --
> 
> If we do it like this, then vm.min_slab_ratio will not take effect if
> there're enough relcaimable page cache.
> Seems there're bugs in the original behavior as well.
> 

Typically that would be done as a separate patch with a standalone
justification for it. The first patch should simply restore expected
behaviour with a Fixes: tag noting that the change in behaviour was
unintentional.
Yafang Shao Aug. 6, 2019, 11:34 a.m. UTC | #15
On Tue, Aug 6, 2019 at 7:09 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 06-08-19 18:59:52, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 06-08-19 17:54:02, Yafang Shao wrote:
> > > > On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> > > > >
> > > > > On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > > > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > [...]
> > > > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > > > >
> > > > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > > > path as well?
> > > > > > >
> > > > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > > > I haven't check in which commit the shrink_slab() is removed from
> > > > > >
> > > > > > What I've had in mind was essentially this:
> > > > > >
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 7889f583ced9..8011288a80e2 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > > > >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > > > >               .may_swap = 1,
> > > > > >               .reclaim_idx = gfp_zone(gfp_mask),
> > > > > > +             .may_shrinkslab = 1;
> > > > > >       };
> > > > > >
> > > > > >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > > > >
> > > > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > > > changed the long term node reclaim behavior just recently.
> > > > >
> > > > > I'd be fine with this change. It was not intentional to significantly
> > > > > change the behaviour of node reclaim in that patch.
> > > > >
> > > >
> > > > But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
> > > > Right ?
> > >
> > > Yes, and the answer for that is a question why do we even care? Which
> > > real life workload does suffer from the of min_slab_ratio misbehavior.
> > > Also it is much more preferred to fix an obvious bug/omission which
> > > lack of may_shrinkslab in node reclaim seem to be than a larger rewrite
> > > with a harder to see changes.
> > >
> >
> > Fixing the bug in min_slab_ratio doesn't  require much change, as it
> > just introduce a new bit in scan_control which doesn't require more
> > space
> > and a if-branch in shrink_node() which doesn't take much cpu cycles
> > neither, and it will not take much maintaince neither as no_pagecache
> > is 0 by default and then we don't need to worry about what if we
> > forget it.
>
> You are still missing my point, I am afraid. I am not saying your change
> is wrong or complex. I am saying that there is an established behavior
> (even when wrong) that node-reclaim dependent loads might depend on.
> Your testing doesn't really suggest you have done much testing beyond
> the targeted one which is quite artificial to say the least.
>
> Maybe there are workloads which do depend on proper min_slab_ratio
> behavior but it would be much more preferable to hear from them rather
> than change the behavior based on the code inspection and a
> microbenchmark.
>
> Is my thinking more clear now?
>

Thanks for your clarification.
I get your point now.

Thanks
Yafang
Yafang Shao Aug. 6, 2019, 11:35 a.m. UTC | #16
On Tue, Aug 6, 2019 at 7:15 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Tue, Aug 06, 2019 at 05:32:54PM +0800, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > >
> > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > path as well?
> > > >
> > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > I haven't check in which commit the shrink_slab() is removed from
> > >
> > > What I've had in mind was essentially this:
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 7889f583ced9..8011288a80e2 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > >                 .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > >                 .may_swap = 1,
> > >                 .reclaim_idx = gfp_zone(gfp_mask),
> > > +               .may_shrinkslab = 1;
> > >         };
> > >
> > >         trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > >
> > > shrink_node path already does shrink slab when the flag allows that. In
> > > other words get us back to before 1c30844d2dfe because that has clearly
> > > changed the long term node reclaim behavior just recently.
> > > --
> >
> > If we do it like this, then vm.min_slab_ratio will not take effect if
> > there're enough relcaimable page cache.
> > Seems there're bugs in the original behavior as well.
> >
>
> Typically that would be done as a separate patch with a standalone
> justification for it. The first patch should simply restore expected
> behaviour with a Fixes: tag noting that the change in behaviour was
> unintentional.
>

Sure, I will do it.

Thanks
Yafang
Michal Hocko Aug. 6, 2019, 11:58 a.m. UTC | #17
On Tue 06-08-19 18:59:52, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > Really, I wouldn't be opposing normally but node_reclaim is an odd ball
> > rarely used and changing its behavior based on some trivial testing
> > doesn't sound very convincing to me.
> >
> 
> Well.  I'm not insist if Andrew prefer your change.

Btw. feel free to use the diff I've send for the real patch with the
changelog. Thanks!
Daniel Jordan Aug. 6, 2019, 3:29 p.m. UTC | #18
On Tue, Aug 06, 2019 at 04:23:29PM +0800, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 3:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> > Considering that this is a long term behavior of a rarely used node
> > reclaim I would rather not touch it unless some _real_ workload suffers
> > from this behavior. Or is there any reason to fix this even though there
> > is no evidence of real workloads suffering from the current behavior?
> > --
> 
> When we do performance tuning on some workloads(especially if this
> workload is NUMA sensitive), sometimes we may enable it on our test
> environment and then do some benchmark to  dicide whether or not
> applying it on the production envrioment. Although the result is not
> good enough as expected, it is really a performance tuning knob.

So am I understanding correctly that you sometimes enable node reclaim in
production workloads when you find the numbers justify it?  If so, which ones?
Daniel Jordan Aug. 6, 2019, 3:59 p.m. UTC | #19
On Tue, Aug 06, 2019 at 07:35:29PM +0800, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 7:15 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Tue, Aug 06, 2019 at 05:32:54PM +0800, Yafang Shao wrote:
> > > On Tue, Aug 6, 2019 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > [...]
> > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > >
> > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > path as well?
> > > > >
> > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > I haven't check in which commit the shrink_slab() is removed from
> > > >
> > > > What I've had in mind was essentially this:
> > > >
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 7889f583ced9..8011288a80e2 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > >                 .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > >                 .may_swap = 1,
> > > >                 .reclaim_idx = gfp_zone(gfp_mask),
> > > > +               .may_shrinkslab = 1;
> > > >         };
> > > >
> > > >         trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > >
> > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > changed the long term node reclaim behavior just recently.
> > > > --
> > >
> > > If we do it like this, then vm.min_slab_ratio will not take effect if
> > > there're enough relcaimable page cache.
> > > Seems there're bugs in the original behavior as well.
> > >
> >
> > Typically that would be done as a separate patch with a standalone
> > justification for it. The first patch should simply restore expected
> > behaviour with a Fixes: tag noting that the change in behaviour was
> > unintentional.
> >
> 
> Sure, I will do it.

Do you plan to send the second patch?  If not I think we should at least update
the documentation for the admittedly obscure vm.min_slab_ratio to reflect its
effect on node reclaim, which is currently none.
Yafang Shao Aug. 7, 2019, 1 a.m. UTC | #20
On Tue, Aug 6, 2019 at 11:29 PM Daniel Jordan
<daniel.m.jordan@oracle.com> wrote:
>
> On Tue, Aug 06, 2019 at 04:23:29PM +0800, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 3:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > Considering that this is a long term behavior of a rarely used node
> > > reclaim I would rather not touch it unless some _real_ workload suffers
> > > from this behavior. Or is there any reason to fix this even though there
> > > is no evidence of real workloads suffering from the current behavior?
> > > --
> >
> > When we do performance tuning on some workloads(especially if this
> > workload is NUMA sensitive), sometimes we may enable it on our test
> > environment and then do some benchmark to  dicide whether or not
> > applying it on the production envrioment. Although the result is not
> > good enough as expected, it is really a performance tuning knob.
>
> So am I understanding correctly that you sometimes enable node reclaim in
> production workloads when you find the numbers justify it?  If so, which ones?

We used to enable it on production environment, but it caused some
latency spike(because of memory pressure),
so we have to disable it now.
BTW, do you plan to enable it for your workload ?

Thanks
Yafang
Yafang Shao Aug. 7, 2019, 1:03 a.m. UTC | #21
On Tue, Aug 6, 2019 at 11:59 PM Daniel Jordan
<daniel.m.jordan@oracle.com> wrote:
>
> On Tue, Aug 06, 2019 at 07:35:29PM +0800, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 7:15 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Tue, Aug 06, 2019 at 05:32:54PM +0800, Yafang Shao wrote:
> > > > On Tue, Aug 6, 2019 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > [...]
> > > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > > >
> > > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > > path as well?
> > > > > >
> > > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > > I haven't check in which commit the shrink_slab() is removed from
> > > > >
> > > > > What I've had in mind was essentially this:
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 7889f583ced9..8011288a80e2 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > > >                 .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > > >                 .may_swap = 1,
> > > > >                 .reclaim_idx = gfp_zone(gfp_mask),
> > > > > +               .may_shrinkslab = 1;
> > > > >         };
> > > > >
> > > > >         trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > > >
> > > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > > changed the long term node reclaim behavior just recently.
> > > > > --
> > > >
> > > > If we do it like this, then vm.min_slab_ratio will not take effect if
> > > > there're enough relcaimable page cache.
> > > > Seems there're bugs in the original behavior as well.
> > > >
> > >
> > > Typically that would be done as a separate patch with a standalone
> > > justification for it. The first patch should simply restore expected
> > > behaviour with a Fixes: tag noting that the change in behaviour was
> > > unintentional.
> > >
> >
> > Sure, I will do it.
>
> Do you plan to send the second patch?  If not I think we should at least update
> the documentation for the admittedly obscure vm.min_slab_ratio to reflect its
> effect on node reclaim, which is currently none.

I don't have a explicit plan when to post the second patch because I'm
not sure when it will be ready.
If your workload depends on vm.min_slab_ratio, you could post a fix
for it if you would like to. I will appreciate it.

I don't think it is a good idea to document it, because this is not a
limitation, while it is really a issue.

Thanks
Yafang
Daniel Jordan Aug. 7, 2019, 3:03 p.m. UTC | #22
On 8/6/19 9:00 PM, Yafang Shao wrote:
> We used to enable it on production environment, but it caused some
> latency spike(because of memory pressure),
> so we have to disable it now.
> BTW, do you plan to enable it for your workload ?

No, I haven't experimented with it.
Daniel Jordan Aug. 7, 2019, 3:03 p.m. UTC | #23
On 8/6/19 9:03 PM, Yafang Shao wrote:
> On Tue, Aug 6, 2019 at 11:59 PM Daniel Jordan
>> Do you plan to send the second patch?  If not I think we should at least update
>> the documentation for the admittedly obscure vm.min_slab_ratio to reflect its
>> effect on node reclaim, which is currently none.
> 
> I don't have a explicit plan when to post the second patch because I'm
> not sure when it will be ready.
> If your workload depends on vm.min_slab_ratio, you could post a fix
> for it if you would like to. I will appreciate it.

We have no workloads that depend on this, so I'll leave it to you to post.

> I don't think it is a good idea to document it, because this is not a
> limitation, while it is really a issue.

Sure, if it's going to be fixed, no point in doing this.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 47aa215..7e2a8ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -91,6 +91,9 @@  struct scan_control {
 	/* e.g. boosted watermark reclaim leaves slabs alone */
 	unsigned int may_shrinkslab:1;
 
+	/* In node reclaim mode, we may shrink slab only */
+	unsigned int no_pagecache:1;
+
 	/*
 	 * Cgroups are not reclaimed below their configured memory.low,
 	 * unless we threaten to OOM. If any cgroups are skipped due to
@@ -2831,7 +2834,9 @@  static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 			reclaimed = sc->nr_reclaimed;
 			scanned = sc->nr_scanned;
-			shrink_node_memcg(pgdat, memcg, sc);
+
+			if (!sc->no_pagecache)
+				shrink_node_memcg(pgdat, memcg, sc);
 
 			if (sc->may_shrinkslab) {
 				shrink_slab(sc->gfp_mask, pgdat->node_id,
@@ -4268,6 +4273,10 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 		.may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
 		.may_swap = 1,
+		.may_shrinkslab = (node_page_state(pgdat, NR_SLAB_RECLAIMABLE) >
+				   pgdat->min_slab_pages),
+		.no_pagecache = (node_pagecache_reclaimable(pgdat) <=
+				  pgdat->min_unmapped_pages),
 		.reclaim_idx = gfp_zone(gfp_mask),
 	};
 
@@ -4285,15 +4294,13 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	p->flags |= PF_SWAPWRITE;
 	set_task_reclaim_state(p, &sc.reclaim_state);
 
-	if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
-		/*
-		 * Free memory by calling shrink node with increasing
-		 * priorities until we have enough memory freed.
-		 */
-		do {
-			shrink_node(pgdat, &sc);
-		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
-	}
+	/*
+	 * Free memory by calling shrink node with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	do {
+		shrink_node(pgdat, &sc);
+	} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 
 	set_task_reclaim_state(p, NULL);
 	current->flags &= ~PF_SWAPWRITE;