diff mbox series

[v2] mm: slowly shrink slabs with a relatively small number of objects

Message ID 20180904224707.10356-1-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm: slowly shrink slabs with a relatively small number of objects | expand

Commit Message

Roman Gushchin Sept. 4, 2018, 10:47 p.m. UTC
Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
changed the way how the target slab pressure is calculated and
made it priority-based:

    delta = freeable >> priority;
    delta *= 4;
    do_div(delta, shrinker->seeks);

The problem is that on a default priority (which is 12) no pressure
is applied at all, if the number of potentially reclaimable objects
is less than 4096 (1<<12).

This causes the last objects on slab caches of no longer used cgroups
to never get reclaimed, resulting in dead cgroups staying around forever.

Slab LRU lists are reparented on memcg offlining, but corresponding
objects are still holding a reference to the dying cgroup.
If we don't scan them at all, the dying cgroup can't go away.
Most likely, the parent cgroup hasn't any directly associated objects,
only remaining objects from dying children cgroups. So it can easily
hold a reference to hundreds of dying cgroups.

If there are no big spikes in memory pressure, and new memory cgroups
are created and destroyed periodically, this causes the number of
dying cgroups grow steadily, causing a slow-ish and hard-to-detect
memory "leak". It's not a real leak, as the memory can be eventually
reclaimed, but it could not happen in a real life at all. I've seen
hosts with a steadily climbing number of dying cgroups, which doesn't
show any signs of a decline in months, despite the host is loaded
with a production workload.

It is an obvious waste of memory, and to prevent it, let's apply
a minimal pressure even on small shrinker lists. E.g. if there are
freeable objects, let's scan at least min(freeable, scan_batch)
objects.

This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady
growth of the dying cgroups number on some of our hosts.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/vmscan.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Andrew Morton Sept. 5, 2018, 8:51 p.m. UTC | #1
On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote:

> Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
> changed the way how the target slab pressure is calculated and
> made it priority-based:
> 
>     delta = freeable >> priority;
>     delta *= 4;
>     do_div(delta, shrinker->seeks);
> 
> The problem is that on a default priority (which is 12) no pressure
> is applied at all, if the number of potentially reclaimable objects
> is less than 4096 (1<<12).
> 
> This causes the last objects on slab caches of no longer used cgroups
> to never get reclaimed, resulting in dead cgroups staying around forever.

But this problem pertains to all types of objects, not just the cgroup
cache, yes?

> Slab LRU lists are reparented on memcg offlining, but corresponding
> objects are still holding a reference to the dying cgroup.
> If we don't scan them at all, the dying cgroup can't go away.
> Most likely, the parent cgroup hasn't any directly associated objects,
> only remaining objects from dying children cgroups. So it can easily
> hold a reference to hundreds of dying cgroups.
> 
> If there are no big spikes in memory pressure, and new memory cgroups
> are created and destroyed periodically, this causes the number of
> dying cgroups grow steadily, causing a slow-ish and hard-to-detect
> memory "leak". It's not a real leak, as the memory can be eventually
> reclaimed, but it could not happen in a real life at all. I've seen
> hosts with a steadily climbing number of dying cgroups, which doesn't
> show any signs of a decline in months, despite the host is loaded
> with a production workload.
> 
> It is an obvious waste of memory, and to prevent it, let's apply
> a minimal pressure even on small shrinker lists. E.g. if there are
> freeable objects, let's scan at least min(freeable, scan_batch)
> objects.
> 
> This fix significantly improves a chance of a dying cgroup to be
> reclaimed, and together with some previous patches stops the steady
> growth of the dying cgroups number on some of our hosts.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  	delta = freeable >> priority;
>  	delta *= 4;
>  	do_div(delta, shrinker->seeks);
> +
> +	/*
> +	 * Make sure we apply some minimal pressure even on
> +	 * small cgroups. This is necessary because some of
> +	 * belonging objects can hold a reference to a dying
> +	 * child cgroup. If we don't scan them, the dying
> +	 * cgroup can't go away unless the memory pressure
> +	 * (and the scanning priority) raise significantly.
> +	 */
> +	delta = max(delta, min(freeable, batch_size));
> +

If so I think the comment should be cast in more general terms.  Maybe
with a final sentence "the cgroup cache is one such case".

Also, please use all 80 columns in block comments to save a few display
lines.

And `delta' has type ULL whereas the other two are longs.  We'll
presumably hit warnings here, preventable with max_t.
Roman Gushchin Sept. 5, 2018, 9:22 p.m. UTC | #2
On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote:
> On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote:
> 
> > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
> > changed the way how the target slab pressure is calculated and
> > made it priority-based:
> > 
> >     delta = freeable >> priority;
> >     delta *= 4;
> >     do_div(delta, shrinker->seeks);
> > 
> > The problem is that on a default priority (which is 12) no pressure
> > is applied at all, if the number of potentially reclaimable objects
> > is less than 4096 (1<<12).
> > 
> > This causes the last objects on slab caches of no longer used cgroups
> > to never get reclaimed, resulting in dead cgroups staying around forever.
> 
> But this problem pertains to all types of objects, not just the cgroup
> cache, yes?

Well, of course, but there is a dramatic difference in size.

Most of these objects are taking few hundreds bytes (or less),
while a memcg can take few hundred kilobytes on a modern multi-CPU
machine. Mostly due to per-cpu stats and events counters.

> 
> > Slab LRU lists are reparented on memcg offlining, but corresponding
> > objects are still holding a reference to the dying cgroup.
> > If we don't scan them at all, the dying cgroup can't go away.
> > Most likely, the parent cgroup hasn't any directly associated objects,
> > only remaining objects from dying children cgroups. So it can easily
> > hold a reference to hundreds of dying cgroups.
> > 
> > If there are no big spikes in memory pressure, and new memory cgroups
> > are created and destroyed periodically, this causes the number of
> > dying cgroups grow steadily, causing a slow-ish and hard-to-detect
> > memory "leak". It's not a real leak, as the memory can be eventually
> > reclaimed, but it could not happen in a real life at all. I've seen
> > hosts with a steadily climbing number of dying cgroups, which doesn't
> > show any signs of a decline in months, despite the host is loaded
> > with a production workload.
> > 
> > It is an obvious waste of memory, and to prevent it, let's apply
> > a minimal pressure even on small shrinker lists. E.g. if there are
> > freeable objects, let's scan at least min(freeable, scan_batch)
> > objects.
> > 
> > This fix significantly improves a chance of a dying cgroup to be
> > reclaimed, and together with some previous patches stops the steady
> > growth of the dying cgroups number on some of our hosts.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >  	delta = freeable >> priority;
> >  	delta *= 4;
> >  	do_div(delta, shrinker->seeks);
> > +
> > +	/*
> > +	 * Make sure we apply some minimal pressure even on
> > +	 * small cgroups. This is necessary because some of
> > +	 * belonging objects can hold a reference to a dying
> > +	 * child cgroup. If we don't scan them, the dying
> > +	 * cgroup can't go away unless the memory pressure
> > +	 * (and the scanning priority) raise significantly.
> > +	 */
> > +	delta = max(delta, min(freeable, batch_size));
> > +
> 
> If so I think the comment should be cast in more general terms.  Maybe
> with a final sentence "the cgroup cache is one such case".

So, I think that we have to leave explicitly explained memcg refcounting
case, but I'll add a line about other cases as well.

> 
> Also, please use all 80 columns in block comments to save a few display
> lines.
> 
> And `delta' has type ULL whereas the other two are longs.  We'll
> presumably hit warnings here, preventable with max_t.
>

Let me fix this in v3.

Thank you!
Shakeel Butt Sept. 5, 2018, 9:35 p.m. UTC | #3
On Wed, Sep 5, 2018 at 2:23 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote:
> > On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote:
> >
> > > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
> > > changed the way how the target slab pressure is calculated and
> > > made it priority-based:
> > >
> > >     delta = freeable >> priority;
> > >     delta *= 4;
> > >     do_div(delta, shrinker->seeks);
> > >
> > > The problem is that on a default priority (which is 12) no pressure
> > > is applied at all, if the number of potentially reclaimable objects
> > > is less than 4096 (1<<12).
> > >
> > > This causes the last objects on slab caches of no longer used cgroups
> > > to never get reclaimed, resulting in dead cgroups staying around forever.
> >
> > But this problem pertains to all types of objects, not just the cgroup
> > cache, yes?
>
> Well, of course, but there is a dramatic difference in size.
>
> Most of these objects are taking few hundreds bytes (or less),
> while a memcg can take few hundred kilobytes on a modern multi-CPU
> machine. Mostly due to per-cpu stats and events counters.
>

Beside memcg, all of its kmem caches, most empty, are stuck in memory
as well. For SLAB even the memory overhead of an empty kmem cache is
not negligible.

Shakeel
Roman Gushchin Sept. 5, 2018, 9:47 p.m. UTC | #4
On Wed, Sep 05, 2018 at 02:35:29PM -0700, Shakeel Butt wrote:
> On Wed, Sep 5, 2018 at 2:23 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote:
> > > On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote:
> > >
> > > > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
> > > > changed the way how the target slab pressure is calculated and
> > > > made it priority-based:
> > > >
> > > >     delta = freeable >> priority;
> > > >     delta *= 4;
> > > >     do_div(delta, shrinker->seeks);
> > > >
> > > > The problem is that on a default priority (which is 12) no pressure
> > > > is applied at all, if the number of potentially reclaimable objects
> > > > is less than 4096 (1<<12).
> > > >
> > > > This causes the last objects on slab caches of no longer used cgroups
> > > > to never get reclaimed, resulting in dead cgroups staying around forever.
> > >
> > > But this problem pertains to all types of objects, not just the cgroup
> > > cache, yes?
> >
> > Well, of course, but there is a dramatic difference in size.
> >
> > Most of these objects are taking few hundreds bytes (or less),
> > while a memcg can take few hundred kilobytes on a modern multi-CPU
> > machine. Mostly due to per-cpu stats and events counters.
> >
> 
> Beside memcg, all of its kmem caches, most empty, are stuck in memory
> as well. For SLAB even the memory overhead of an empty kmem cache is
> not negligible.

Right!

I mean the main part of the problem is not in these 4k (mostly vfs-cache related)
objects themselves, but in objects, which are referenced by these 4k objects.

Thanks!
kernel test robot Sept. 6, 2018, 7:42 a.m. UTC | #5
Hi Roman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc2 next-20180905]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Roman-Gushchin/mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects/20180906-142351
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:18:0,
                    from ./arch/openrisc/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from mm/vmscan.c:17:
   mm/vmscan.c: In function 'do_shrink_slab':
   include/linux/kernel.h:845:29: warning: comparison of distinct pointer types lacks a cast
      (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                ^
   include/linux/kernel.h:859:4: note: in expansion of macro '__typecheck'
      (__typecheck(x, y) && __no_side_effects(x, y))
       ^~~~~~~~~~~
   include/linux/kernel.h:869:24: note: in expansion of macro '__safe_cmp'
     __builtin_choose_expr(__safe_cmp(x, y), \
                           ^~~~~~~~~~
   include/linux/kernel.h:885:19: note: in expansion of macro '__careful_cmp'
    #define max(x, y) __careful_cmp(x, y, >)
                      ^~~~~~~~~~~~~
>> mm/vmscan.c:488:10: note: in expansion of macro 'max'
     delta = max(delta, min(freeable, batch_size));
             ^~~

vim +/max +488 mm/vmscan.c

   446	
   447	static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
   448					    struct shrinker *shrinker, int priority)
   449	{
   450		unsigned long freed = 0;
   451		unsigned long long delta;
   452		long total_scan;
   453		long freeable;
   454		long nr;
   455		long new_nr;
   456		int nid = shrinkctl->nid;
   457		long batch_size = shrinker->batch ? shrinker->batch
   458						  : SHRINK_BATCH;
   459		long scanned = 0, next_deferred;
   460	
   461		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
   462			nid = 0;
   463	
   464		freeable = shrinker->count_objects(shrinker, shrinkctl);
   465		if (freeable == 0 || freeable == SHRINK_EMPTY)
   466			return freeable;
   467	
   468		/*
   469		 * copy the current shrinker scan count into a local variable
   470		 * and zero it so that other concurrent shrinker invocations
   471		 * don't also do this scanning work.
   472		 */
   473		nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
   474	
   475		total_scan = nr;
   476		delta = freeable >> priority;
   477		delta *= 4;
   478		do_div(delta, shrinker->seeks);
   479	
   480		/*
   481		 * Make sure we apply some minimal pressure even on
   482		 * small cgroups. This is necessary because some of
   483		 * belonging objects can hold a reference to a dying
   484		 * child cgroup. If we don't scan them, the dying
   485		 * cgroup can't go away unless the memory pressure
   486		 * (and the scanning priority) raise significantly.
   487		 */
 > 488		delta = max(delta, min(freeable, batch_size));
   489	
   490		total_scan += delta;
   491		if (total_scan < 0) {
   492			pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
   493			       shrinker->scan_objects, total_scan);
   494			total_scan = freeable;
   495			next_deferred = nr;
   496		} else
   497			next_deferred = total_scan;
   498	
   499		/*
   500		 * We need to avoid excessive windup on filesystem shrinkers
   501		 * due to large numbers of GFP_NOFS allocations causing the
   502		 * shrinkers to return -1 all the time. This results in a large
   503		 * nr being built up so when a shrink that can do some work
   504		 * comes along it empties the entire cache due to nr >>>
   505		 * freeable. This is bad for sustaining a working set in
   506		 * memory.
   507		 *
   508		 * Hence only allow the shrinker to scan the entire cache when
   509		 * a large delta change is calculated directly.
   510		 */
   511		if (delta < freeable / 4)
   512			total_scan = min(total_scan, freeable / 2);
   513	
   514		/*
   515		 * Avoid risking looping forever due to too large nr value:
   516		 * never try to free more than twice the estimate number of
   517		 * freeable entries.
   518		 */
   519		if (total_scan > freeable * 2)
   520			total_scan = freeable * 2;
   521	
   522		trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
   523					   freeable, delta, total_scan, priority);
   524	
   525		/*
   526		 * Normally, we should not scan less than batch_size objects in one
   527		 * pass to avoid too frequent shrinker calls, but if the slab has less
   528		 * than batch_size objects in total and we are really tight on memory,
   529		 * we will try to reclaim all available objects, otherwise we can end
   530		 * up failing allocations although there are plenty of reclaimable
   531		 * objects spread over several slabs with usage less than the
   532		 * batch_size.
   533		 *
   534		 * We detect the "tight on memory" situations by looking at the total
   535		 * number of objects we want to scan (total_scan). If it is greater
   536		 * than the total number of objects on slab (freeable), we must be
   537		 * scanning at high prio and therefore should try to reclaim as much as
   538		 * possible.
   539		 */
   540		while (total_scan >= batch_size ||
   541		       total_scan >= freeable) {
   542			unsigned long ret;
   543			unsigned long nr_to_scan = min(batch_size, total_scan);
   544	
   545			shrinkctl->nr_to_scan = nr_to_scan;
   546			shrinkctl->nr_scanned = nr_to_scan;
   547			ret = shrinker->scan_objects(shrinker, shrinkctl);
   548			if (ret == SHRINK_STOP)
   549				break;
   550			freed += ret;
   551	
   552			count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
   553			total_scan -= shrinkctl->nr_scanned;
   554			scanned += shrinkctl->nr_scanned;
   555	
   556			cond_resched();
   557		}
   558	
   559		if (next_deferred >= scanned)
   560			next_deferred -= scanned;
   561		else
   562			next_deferred = 0;
   563		/*
   564		 * move the unused scan count back into the shrinker in a
   565		 * manner that handles concurrent updates. If we exhausted the
   566		 * scan, there is no need to do an update.
   567		 */
   568		if (next_deferred > 0)
   569			new_nr = atomic_long_add_return(next_deferred,
   570							&shrinker->nr_deferred[nid]);
   571		else
   572			new_nr = atomic_long_read(&shrinker->nr_deferred[nid]);
   573	
   574		trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan);
   575		return freed;
   576	}
   577	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Roman Gushchin Sept. 6, 2018, 3:58 p.m. UTC | #6
On Thu, Sep 06, 2018 at 03:42:07PM +0800, kbuild test robot wrote:
> Hi Roman,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.19-rc2 next-20180905]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

Thanks for the report!

The issue has been fixed in v3, which I sent yesterday.

Thanks,
Roman
kernel test robot Sept. 6, 2018, 10:21 p.m. UTC | #7
Hi Roman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc2 next-20180906]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Roman-Gushchin/mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects/20180906-142351
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:79:1: sparse: incorrect type in argument 3 (different base types) @@    expected unsigned long [unsigned] flags @@    got resunsigned long [unsigned] flags @@
   include/trace/events/vmscan.h:79:1:    expected unsigned long [unsigned] flags
   include/trace/events/vmscan.h:79:1:    got restricted gfp_t [usertype] gfp_flags
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:106:1: sparse: incorrect type in argument 3 (different base types) @@    expected unsigned long [unsigned] flags @@    got resunsigned long [unsigned] flags @@
   include/trace/events/vmscan.h:106:1:    expected unsigned long [unsigned] flags
   include/trace/events/vmscan.h:106:1:    got restricted gfp_t [usertype] gfp_flags
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t
   include/trace/events/vmscan.h:196:1: sparse: too many warnings
>> mm/vmscan.c:488:17: sparse: incompatible types in comparison expression (different type sizes)
   In file included from include/asm-generic/bug.h:18:0,
                    from arch/x86/include/asm/bug.h:83,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from mm/vmscan.c:17:
   mm/vmscan.c: In function 'do_shrink_slab':
   include/linux/kernel.h:845:29: warning: comparison of distinct pointer types lacks a cast
      (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                ^
   include/linux/kernel.h:859:4: note: in expansion of macro '__typecheck'
      (__typecheck(x, y) && __no_side_effects(x, y))
       ^~~~~~~~~~~
   include/linux/kernel.h:869:24: note: in expansion of macro '__safe_cmp'
     __builtin_choose_expr(__safe_cmp(x, y), 121-                        ^~~~~~~~~~
   include/linux/kernel.h:885:19: note: in expansion of macro '__careful_cmp'
    #define max(x, y) __careful_cmp(x, y, >)
                      ^~~~~~~~~~~~~
   mm/vmscan.c:488:10: note: in expansion of macro 'max'
     delta = max(delta, min(freeable, batch_size));
             ^~~

vim +488 mm/vmscan.c

   446	
   447	static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
   448					    struct shrinker *shrinker, int priority)
   449	{
   450		unsigned long freed = 0;
   451		unsigned long long delta;
   452		long total_scan;
   453		long freeable;
   454		long nr;
   455		long new_nr;
   456		int nid = shrinkctl->nid;
   457		long batch_size = shrinker->batch ? shrinker->batch
   458						  : SHRINK_BATCH;
   459		long scanned = 0, next_deferred;
   460	
   461		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
   462			nid = 0;
   463	
   464		freeable = shrinker->count_objects(shrinker, shrinkctl);
   465		if (freeable == 0 || freeable == SHRINK_EMPTY)
   466			return freeable;
   467	
   468		/*
   469		 * copy the current shrinker scan count into a local variable
   470		 * and zero it so that other concurrent shrinker invocations
   471		 * don't also do this scanning work.
   472		 */
   473		nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
   474	
   475		total_scan = nr;
   476		delta = freeable >> priority;
   477		delta *= 4;
   478		do_div(delta, shrinker->seeks);
   479	
   480		/*
   481		 * Make sure we apply some minimal pressure even on
   482		 * small cgroups. This is necessary because some of
   483		 * belonging objects can hold a reference to a dying
   484		 * child cgroup. If we don't scan them, the dying
   485		 * cgroup can't go away unless the memory pressure
   486		 * (and the scanning priority) raise significantly.
   487		 */
 > 488		delta = max(delta, min(freeable, batch_size));
   489	
   490		total_scan += delta;
   491		if (total_scan < 0) {
   492			pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
   493			       shrinker->scan_objects, total_scan);
   494			total_scan = freeable;
   495			next_deferred = nr;
   496		} else
   497			next_deferred = total_scan;
   498	
   499		/*
   500		 * We need to avoid excessive windup on filesystem shrinkers
   501		 * due to large numbers of GFP_NOFS allocations causing the
   502		 * shrinkers to return -1 all the time. This results in a large
   503		 * nr being built up so when a shrink that can do some work
   504		 * comes along it empties the entire cache due to nr >>>
   505		 * freeable. This is bad for sustaining a working set in
   506		 * memory.
   507		 *
   508		 * Hence only allow the shrinker to scan the entire cache when
   509		 * a large delta change is calculated directly.
   510		 */
   511		if (delta < freeable / 4)
   512			total_scan = min(total_scan, freeable / 2);
   513	
   514		/*
   515		 * Avoid risking looping forever due to too large nr value:
   516		 * never try to free more than twice the estimate number of
   517		 * freeable entries.
   518		 */
   519		if (total_scan > freeable * 2)
   520			total_scan = freeable * 2;
   521	
   522		trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
   523					   freeable, delta, total_scan, priority);
   524	
   525		/*
   526		 * Normally, we should not scan less than batch_size objects in one
   527		 * pass to avoid too frequent shrinker calls, but if the slab has less
   528		 * than batch_size objects in total and we are really tight on memory,
   529		 * we will try to reclaim all available objects, otherwise we can end
   530		 * up failing allocations although there are plenty of reclaimable
   531		 * objects spread over several slabs with usage less than the
   532		 * batch_size.
   533		 *
   534		 * We detect the "tight on memory" situations by looking at the total
   535		 * number of objects we want to scan (total_scan). If it is greater
   536		 * than the total number of objects on slab (freeable), we must be
   537		 * scanning at high prio and therefore should try to reclaim as much as
   538		 * possible.
   539		 */
   540		while (total_scan >= batch_size ||
   541		       total_scan >= freeable) {
   542			unsigned long ret;
   543			unsigned long nr_to_scan = min(batch_size, total_scan);
   544	
   545			shrinkctl->nr_to_scan = nr_to_scan;
   546			shrinkctl->nr_scanned = nr_to_scan;
   547			ret = shrinker->scan_objects(shrinker, shrinkctl);
   548			if (ret == SHRINK_STOP)
   549				break;
   550			freed += ret;
   551	
   552			count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
   553			total_scan -= shrinkctl->nr_scanned;
   554			scanned += shrinkctl->nr_scanned;
   555	
   556			cond_resched();
   557		}
   558	
   559		if (next_deferred >= scanned)
   560			next_deferred -= scanned;
   561		else
   562			next_deferred = 0;
   563		/*
   564		 * move the unused scan count back into the shrinker in a
   565		 * manner that handles concurrent updates. If we exhausted the
   566		 * scan, there is no need to do an update.
   567		 */
   568		if (next_deferred > 0)
   569			new_nr = atomic_long_add_return(next_deferred,
   570							&shrinker->nr_deferred[nid]);
   571		else
   572			new_nr = atomic_long_read(&shrinker->nr_deferred[nid]);
   573	
   574		trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan);
   575		return freed;
   576	}
   577	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa2c150ab7b9..8544f4c5cd4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -476,6 +476,17 @@  static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	delta = freeable >> priority;
 	delta *= 4;
 	do_div(delta, shrinker->seeks);
+
+	/*
+	 * Make sure we apply some minimal pressure even on
+	 * small cgroups. This is necessary because some of
+	 * belonging objects can hold a reference to a dying
+	 * child cgroup. If we don't scan them, the dying
+	 * cgroup can't go away unless the memory pressure
+	 * (and the scanning priority) raise significantly.
+	 */
+	delta = max(delta, min(freeable, batch_size));
+
 	total_scan += delta;
 	if (total_scan < 0) {
 		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",