Message ID | 20210203172042.800474-12-shy828301@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make shrinker's nr_deferred memcg aware | expand |
On 03.02.2021 20:20, Yang Shi wrote: > The number of deferred objects might get windup to an absurd number, and it > results in clamp of slab objects. It is undesirable for sustaining workingset. > > So shrink deferred objects proportional to priority and cap nr_deferred to twice > of cache items. > > The idea is borrowed fron Dave Chinner's patch: > https://lore.kernel.org/linux-xfs/20191031234618.15403-13-david@fromorbit.com/ > > Tested with kernel build and vfs metadata heavy workload in our production > environment, no regression is spotted so far. > > Signed-off-by: Yang Shi <shy828301@gmail.com> For some time I was away from this do_shrink_slab() magic formulas and recent changes, so I hope somebody else, who is being in touch with this, can review. > --- > mm/vmscan.c | 40 +++++----------------------------------- > 1 file changed, 5 insertions(+), 35 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 574d920c4cab..d0a86170854b 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -649,7 +649,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > */ > nr = count_nr_deferred(shrinker, shrinkctl); > > - total_scan = nr; > if (shrinker->seeks) { > delta = freeable >> priority; > delta *= 4; > @@ -663,37 +662,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > delta = freeable / 2; > } > > + total_scan = nr >> priority; > total_scan += delta; > - if (total_scan < 0) { > - pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n", > - shrinker->scan_objects, total_scan); > - total_scan = freeable; > - next_deferred = nr; > - } else > - next_deferred = total_scan; > - > - /* > - * We need to avoid excessive windup on filesystem shrinkers > - * due to large numbers of GFP_NOFS allocations causing the > - * shrinkers to return -1 all the time. This results in a large > - * nr being built up so when a shrink that can do some work > - * comes along it empties the entire cache due to nr >>> > - * freeable. This is bad for sustaining a working set in > - * memory. > - * > - * Hence only allow the shrinker to scan the entire cache when > - * a large delta change is calculated directly. > - */ > - if (delta < freeable / 4) > - total_scan = min(total_scan, freeable / 2); > - > - /* > - * Avoid risking looping forever due to too large nr value: > - * never try to free more than twice the estimate number of > - * freeable entries. > - */ > - if (total_scan > freeable * 2) > - total_scan = freeable * 2; > + total_scan = min(total_scan, (2 * freeable)); > > trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, > freeable, delta, total_scan, priority); > @@ -732,10 +703,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > cond_resched(); > } > > - if (next_deferred >= scanned) > - next_deferred -= scanned; > - else > - next_deferred = 0; > + next_deferred = max_t(long, (nr - scanned), 0) + total_scan; > + next_deferred = min(next_deferred, (2 * freeable)); > + > /* > * move the unused scan count back into the shrinker in a > * manner that handles concurrent updates. Thanks
On Thu, Feb 4, 2021 at 2:23 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > > On 03.02.2021 20:20, Yang Shi wrote: > > The number of deferred objects might get windup to an absurd number, and it > > results in clamp of slab objects. It is undesirable for sustaining workingset. > > > > So shrink deferred objects proportional to priority and cap nr_deferred to twice > > of cache items. > > > > The idea is borrowed fron Dave Chinner's patch: > > https://lore.kernel.org/linux-xfs/20191031234618.15403-13-david@fromorbit.com/ > > > > Tested with kernel build and vfs metadata heavy workload in our production > > environment, no regression is spotted so far. > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > For some time I was away from this do_shrink_slab() magic formulas and recent changes, > so I hope somebody else, who is being in touch with this, can review. Yes, I agree it is intimidating. The patch has been tested in our test and production environment for a couple of months, so far no regression is spotted. Of course it doesn't mean it will not incur regression for other workloads. My plan is to leave it stay in -mm then linux-next for a while for a broader test. The first 10 patches could go to Linus's tree separately. > > > --- > > mm/vmscan.c | 40 +++++----------------------------------- > > 1 file changed, 5 insertions(+), 35 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 574d920c4cab..d0a86170854b 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -649,7 +649,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > > */ > > nr = count_nr_deferred(shrinker, shrinkctl); > > > > - total_scan = nr; > > if (shrinker->seeks) { > > delta = freeable >> priority; > > delta *= 4; > > @@ -663,37 +662,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > > delta = freeable / 2; > > } > > > > + total_scan = nr >> priority; > > total_scan += delta; > > - if (total_scan < 0) { > > - pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n", > > - shrinker->scan_objects, total_scan); > > - total_scan = freeable; > > - next_deferred = nr; > > - } else > > - next_deferred = total_scan; > > - > > - /* > > - * We need to avoid excessive windup on filesystem shrinkers > > - * due to large numbers of GFP_NOFS allocations causing the > > - * shrinkers to return -1 all the time. This results in a large > > - * nr being built up so when a shrink that can do some work > > - * comes along it empties the entire cache due to nr >>> > > - * freeable. This is bad for sustaining a working set in > > - * memory. > > - * > > - * Hence only allow the shrinker to scan the entire cache when > > - * a large delta change is calculated directly. > > - */ > > - if (delta < freeable / 4) > > - total_scan = min(total_scan, freeable / 2); > > - > > - /* > > - * Avoid risking looping forever due to too large nr value: > > - * never try to free more than twice the estimate number of > > - * freeable entries. > > - */ > > - if (total_scan > freeable * 2) > > - total_scan = freeable * 2; > > + total_scan = min(total_scan, (2 * freeable)); > > > > trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, > > freeable, delta, total_scan, priority); > > @@ -732,10 +703,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > > cond_resched(); > > } > > > > - if (next_deferred >= scanned) > > - next_deferred -= scanned; > > - else > > - next_deferred = 0; > > + next_deferred = max_t(long, (nr - scanned), 0) + total_scan; > > + next_deferred = min(next_deferred, (2 * freeable)); > > + > > /* > > * move the unused scan count back into the shrinker in a > > * manner that handles concurrent updates. > > Thanks > >
diff --git a/mm/vmscan.c b/mm/vmscan.c index 574d920c4cab..d0a86170854b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -649,7 +649,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, */ nr = count_nr_deferred(shrinker, shrinkctl); - total_scan = nr; if (shrinker->seeks) { delta = freeable >> priority; delta *= 4; @@ -663,37 +662,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, delta = freeable / 2; } + total_scan = nr >> priority; total_scan += delta; - if (total_scan < 0) { - pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n", - shrinker->scan_objects, total_scan); - total_scan = freeable; - next_deferred = nr; - } else - next_deferred = total_scan; - - /* - * We need to avoid excessive windup on filesystem shrinkers - * due to large numbers of GFP_NOFS allocations causing the - * shrinkers to return -1 all the time. This results in a large - * nr being built up so when a shrink that can do some work - * comes along it empties the entire cache due to nr >>> - * freeable. This is bad for sustaining a working set in - * memory. - * - * Hence only allow the shrinker to scan the entire cache when - * a large delta change is calculated directly. - */ - if (delta < freeable / 4) - total_scan = min(total_scan, freeable / 2); - - /* - * Avoid risking looping forever due to too large nr value: - * never try to free more than twice the estimate number of - * freeable entries. - */ - if (total_scan > freeable * 2) - total_scan = freeable * 2; + total_scan = min(total_scan, (2 * freeable)); trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, freeable, delta, total_scan, priority); @@ -732,10 +703,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, cond_resched(); } - if (next_deferred >= scanned) - next_deferred -= scanned; - else - next_deferred = 0; + next_deferred = max_t(long, (nr - scanned), 0) + total_scan; + next_deferred = min(next_deferred, (2 * freeable)); + /* * move the unused scan count back into the shrinker in a * manner that handles concurrent updates.
The number of deferred objects might get windup to an absurd number, and it results in clamp of slab objects. It is undesirable for sustaining workingset. So shrink deferred objects proportional to priority and cap nr_deferred to twice of cache items. The idea is borrowed fron Dave Chinner's patch: https://lore.kernel.org/linux-xfs/20191031234618.15403-13-david@fromorbit.com/ Tested with kernel build and vfs metadata heavy workload in our production environment, no regression is spotted so far. Signed-off-by: Yang Shi <shy828301@gmail.com> --- mm/vmscan.c | 40 +++++----------------------------------- 1 file changed, 5 insertions(+), 35 deletions(-)