Message ID | 20180904224707.10356-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm: slowly shrink slabs with a relatively small number of objects | expand |
On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote: > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets") > changed the way how the target slab pressure is calculated and > made it priority-based: > > delta = freeable >> priority; > delta *= 4; > do_div(delta, shrinker->seeks); > > The problem is that on a default priority (which is 12) no pressure > is applied at all, if the number of potentially reclaimable objects > is less than 4096 (1<<12). > > This causes the last objects on slab caches of no longer used cgroups > to never get reclaimed, resulting in dead cgroups staying around forever. But this problem pertains to all types of objects, not just the cgroup cache, yes? > Slab LRU lists are reparented on memcg offlining, but corresponding > objects are still holding a reference to the dying cgroup. > If we don't scan them at all, the dying cgroup can't go away. > Most likely, the parent cgroup hasn't any directly associated objects, > only remaining objects from dying children cgroups. So it can easily > hold a reference to hundreds of dying cgroups. > > If there are no big spikes in memory pressure, and new memory cgroups > are created and destroyed periodically, this causes the number of > dying cgroups grow steadily, causing a slow-ish and hard-to-detect > memory "leak". It's not a real leak, as the memory can be eventually > reclaimed, but it could not happen in a real life at all. I've seen > hosts with a steadily climbing number of dying cgroups, which doesn't > show any signs of a decline in months, despite the host is loaded > with a production workload. > > It is an obvious waste of memory, and to prevent it, let's apply > a minimal pressure even on small shrinker lists. E.g. if there are > freeable objects, let's scan at least min(freeable, scan_batch) > objects. > > This fix significantly improves a chance of a dying cgroup to be > reclaimed, and together with some previous patches stops the steady > growth of the dying cgroups number on some of our hosts. > > ... > > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > delta = freeable >> priority; > delta *= 4; > do_div(delta, shrinker->seeks); > + > + /* > + * Make sure we apply some minimal pressure even on > + * small cgroups. This is necessary because some of > + * belonging objects can hold a reference to a dying > + * child cgroup. If we don't scan them, the dying > + * cgroup can't go away unless the memory pressure > + * (and the scanning priority) raise significantly. > + */ > + delta = max(delta, min(freeable, batch_size)); > + If so I think the comment should be cast in more general terms. Maybe with a final sentence "the cgroup cache is one such case". Also, please use all 80 columns in block comments to save a few display lines. And `delta' has type ULL whereas the other two are longs. We'll presumably hit warnings here, preventable with max_t.
On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote: > On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote: > > > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets") > > changed the way how the target slab pressure is calculated and > > made it priority-based: > > > > delta = freeable >> priority; > > delta *= 4; > > do_div(delta, shrinker->seeks); > > > > The problem is that on a default priority (which is 12) no pressure > > is applied at all, if the number of potentially reclaimable objects > > is less than 4096 (1<<12). > > > > This causes the last objects on slab caches of no longer used cgroups > > to never get reclaimed, resulting in dead cgroups staying around forever. > > But this problem pertains to all types of objects, not just the cgroup > cache, yes? Well, of course, but there is a dramatic difference in size. Most of these objects are taking few hundreds bytes (or less), while a memcg can take few hundred kilobytes on a modern multi-CPU machine. Mostly due to per-cpu stats and events counters. > > > Slab LRU lists are reparented on memcg offlining, but corresponding > > objects are still holding a reference to the dying cgroup. > > If we don't scan them at all, the dying cgroup can't go away. > > Most likely, the parent cgroup hasn't any directly associated objects, > > only remaining objects from dying children cgroups. So it can easily > > hold a reference to hundreds of dying cgroups. > > > > If there are no big spikes in memory pressure, and new memory cgroups > > are created and destroyed periodically, this causes the number of > > dying cgroups grow steadily, causing a slow-ish and hard-to-detect > > memory "leak". It's not a real leak, as the memory can be eventually > > reclaimed, but it could not happen in a real life at all. I've seen > > hosts with a steadily climbing number of dying cgroups, which doesn't > > show any signs of a decline in months, despite the host is loaded > > with a production workload. > > > > It is an obvious waste of memory, and to prevent it, let's apply > > a minimal pressure even on small shrinker lists. E.g. if there are > > freeable objects, let's scan at least min(freeable, scan_batch) > > objects. > > > > This fix significantly improves a chance of a dying cgroup to be > > reclaimed, and together with some previous patches stops the steady > > growth of the dying cgroups number on some of our hosts. > > > > ... > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > > delta = freeable >> priority; > > delta *= 4; > > do_div(delta, shrinker->seeks); > > + > > + /* > > + * Make sure we apply some minimal pressure even on > > + * small cgroups. This is necessary because some of > > + * belonging objects can hold a reference to a dying > > + * child cgroup. If we don't scan them, the dying > > + * cgroup can't go away unless the memory pressure > > + * (and the scanning priority) raise significantly. > > + */ > > + delta = max(delta, min(freeable, batch_size)); > > + > > If so I think the comment should be cast in more general terms. Maybe > with a final sentence "the cgroup cache is one such case". So, I think that we have to leave explicitly explained memcg refcounting case, but I'll add a line about other cases as well. > > Also, please use all 80 columns in block comments to save a few display > lines. > > And `delta' has type ULL whereas the other two are longs. We'll > presumably hit warnings here, preventable with max_t. > Let me fix this in v3. Thank you!
On Wed, Sep 5, 2018 at 2:23 PM Roman Gushchin <guro@fb.com> wrote: > > On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote: > > On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote: > > > > > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets") > > > changed the way how the target slab pressure is calculated and > > > made it priority-based: > > > > > > delta = freeable >> priority; > > > delta *= 4; > > > do_div(delta, shrinker->seeks); > > > > > > The problem is that on a default priority (which is 12) no pressure > > > is applied at all, if the number of potentially reclaimable objects > > > is less than 4096 (1<<12). > > > > > > This causes the last objects on slab caches of no longer used cgroups > > > to never get reclaimed, resulting in dead cgroups staying around forever. > > > > But this problem pertains to all types of objects, not just the cgroup > > cache, yes? > > Well, of course, but there is a dramatic difference in size. > > Most of these objects are taking few hundreds bytes (or less), > while a memcg can take few hundred kilobytes on a modern multi-CPU > machine. Mostly due to per-cpu stats and events counters. > Beside memcg, all of its kmem caches, most empty, are stuck in memory as well. For SLAB even the memory overhead of an empty kmem cache is not negligible. Shakeel
On Wed, Sep 05, 2018 at 02:35:29PM -0700, Shakeel Butt wrote: > On Wed, Sep 5, 2018 at 2:23 PM Roman Gushchin <guro@fb.com> wrote: > > > > On Wed, Sep 05, 2018 at 01:51:52PM -0700, Andrew Morton wrote: > > > On Tue, 4 Sep 2018 15:47:07 -0700 Roman Gushchin <guro@fb.com> wrote: > > > > > > > Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets") > > > > changed the way how the target slab pressure is calculated and > > > > made it priority-based: > > > > > > > > delta = freeable >> priority; > > > > delta *= 4; > > > > do_div(delta, shrinker->seeks); > > > > > > > > The problem is that on a default priority (which is 12) no pressure > > > > is applied at all, if the number of potentially reclaimable objects > > > > is less than 4096 (1<<12). > > > > > > > > This causes the last objects on slab caches of no longer used cgroups > > > > to never get reclaimed, resulting in dead cgroups staying around forever. > > > > > > But this problem pertains to all types of objects, not just the cgroup > > > cache, yes? > > > > Well, of course, but there is a dramatic difference in size. > > > > Most of these objects are taking few hundreds bytes (or less), > > while a memcg can take few hundred kilobytes on a modern multi-CPU > > machine. Mostly due to per-cpu stats and events counters. > > > > Beside memcg, all of its kmem caches, most empty, are stuck in memory > as well. For SLAB even the memory overhead of an empty kmem cache is > not negligible. Right! I mean the main part of the problem is not in these 4k (mostly vfs-cache related) objects themselves, but in objects, which are referenced by these 4k objects. Thanks!
Hi Roman, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.19-rc2 next-20180905] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Roman-Gushchin/mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects/20180906-142351 config: openrisc-or1ksim_defconfig (attached as .config) compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental) reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=openrisc All warnings (new ones prefixed by >>): In file included from include/asm-generic/bug.h:18:0, from ./arch/openrisc/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:9, from mm/vmscan.c:17: mm/vmscan.c: In function 'do_shrink_slab': include/linux/kernel.h:845:29: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ include/linux/kernel.h:859:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ include/linux/kernel.h:869:24: note: in expansion of macro '__safe_cmp' __builtin_choose_expr(__safe_cmp(x, y), \ ^~~~~~~~~~ include/linux/kernel.h:885:19: note: in expansion of macro '__careful_cmp' #define max(x, y) __careful_cmp(x, y, >) ^~~~~~~~~~~~~ >> mm/vmscan.c:488:10: note: in expansion of macro 'max' delta = max(delta, min(freeable, batch_size)); ^~~ vim +/max +488 mm/vmscan.c 446 447 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, 448 struct shrinker *shrinker, int priority) 449 { 450 unsigned long freed = 0; 451 unsigned long long delta; 452 long total_scan; 453 long freeable; 454 long nr; 455 long new_nr; 456 int nid = shrinkctl->nid; 457 long batch_size = shrinker->batch ? shrinker->batch 458 : SHRINK_BATCH; 459 long scanned = 0, next_deferred; 460 461 if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) 462 nid = 0; 463 464 freeable = shrinker->count_objects(shrinker, shrinkctl); 465 if (freeable == 0 || freeable == SHRINK_EMPTY) 466 return freeable; 467 468 /* 469 * copy the current shrinker scan count into a local variable 470 * and zero it so that other concurrent shrinker invocations 471 * don't also do this scanning work. 472 */ 473 nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); 474 475 total_scan = nr; 476 delta = freeable >> priority; 477 delta *= 4; 478 do_div(delta, shrinker->seeks); 479 480 /* 481 * Make sure we apply some minimal pressure even on 482 * small cgroups. This is necessary because some of 483 * belonging objects can hold a reference to a dying 484 * child cgroup. If we don't scan them, the dying 485 * cgroup can't go away unless the memory pressure 486 * (and the scanning priority) raise significantly. 487 */ > 488 delta = max(delta, min(freeable, batch_size)); 489 490 total_scan += delta; 491 if (total_scan < 0) { 492 pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n", 493 shrinker->scan_objects, total_scan); 494 total_scan = freeable; 495 next_deferred = nr; 496 } else 497 next_deferred = total_scan; 498 499 /* 500 * We need to avoid excessive windup on filesystem shrinkers 501 * due to large numbers of GFP_NOFS allocations causing the 502 * shrinkers to return -1 all the time. This results in a large 503 * nr being built up so when a shrink that can do some work 504 * comes along it empties the entire cache due to nr >>> 505 * freeable. This is bad for sustaining a working set in 506 * memory. 507 * 508 * Hence only allow the shrinker to scan the entire cache when 509 * a large delta change is calculated directly. 510 */ 511 if (delta < freeable / 4) 512 total_scan = min(total_scan, freeable / 2); 513 514 /* 515 * Avoid risking looping forever due to too large nr value: 516 * never try to free more than twice the estimate number of 517 * freeable entries. 518 */ 519 if (total_scan > freeable * 2) 520 total_scan = freeable * 2; 521 522 trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, 523 freeable, delta, total_scan, priority); 524 525 /* 526 * Normally, we should not scan less than batch_size objects in one 527 * pass to avoid too frequent shrinker calls, but if the slab has less 528 * than batch_size objects in total and we are really tight on memory, 529 * we will try to reclaim all available objects, otherwise we can end 530 * up failing allocations although there are plenty of reclaimable 531 * objects spread over several slabs with usage less than the 532 * batch_size. 533 * 534 * We detect the "tight on memory" situations by looking at the total 535 * number of objects we want to scan (total_scan). If it is greater 536 * than the total number of objects on slab (freeable), we must be 537 * scanning at high prio and therefore should try to reclaim as much as 538 * possible. 539 */ 540 while (total_scan >= batch_size || 541 total_scan >= freeable) { 542 unsigned long ret; 543 unsigned long nr_to_scan = min(batch_size, total_scan); 544 545 shrinkctl->nr_to_scan = nr_to_scan; 546 shrinkctl->nr_scanned = nr_to_scan; 547 ret = shrinker->scan_objects(shrinker, shrinkctl); 548 if (ret == SHRINK_STOP) 549 break; 550 freed += ret; 551 552 count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned); 553 total_scan -= shrinkctl->nr_scanned; 554 scanned += shrinkctl->nr_scanned; 555 556 cond_resched(); 557 } 558 559 if (next_deferred >= scanned) 560 next_deferred -= scanned; 561 else 562 next_deferred = 0; 563 /* 564 * move the unused scan count back into the shrinker in a 565 * manner that handles concurrent updates. If we exhausted the 566 * scan, there is no need to do an update. 567 */ 568 if (next_deferred > 0) 569 new_nr = atomic_long_add_return(next_deferred, 570 &shrinker->nr_deferred[nid]); 571 else 572 new_nr = atomic_long_read(&shrinker->nr_deferred[nid]); 573 574 trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan); 575 return freed; 576 } 577 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, Sep 06, 2018 at 03:42:07PM +0800, kbuild test robot wrote: > Hi Roman, > > Thank you for the patch! Perhaps something to improve: > > [auto build test WARNING on linus/master] > [also build test WARNING on v4.19-rc2 next-20180905] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] Thanks for the report! The issue has been fixed in v3, which I sent yesterday. Thanks, Roman
Hi Roman, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.19-rc2 next-20180906] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Roman-Gushchin/mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects/20180906-142351 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:79:1: sparse: incorrect type in argument 3 (different base types) @@ expected unsigned long [unsigned] flags @@ got resunsigned long [unsigned] flags @@ include/trace/events/vmscan.h:79:1: expected unsigned long [unsigned] flags include/trace/events/vmscan.h:79:1: got restricted gfp_t [usertype] gfp_flags include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:106:1: sparse: incorrect type in argument 3 (different base types) @@ expected unsigned long [unsigned] flags @@ got resunsigned long [unsigned] flags @@ include/trace/events/vmscan.h:106:1: expected unsigned long [unsigned] flags include/trace/events/vmscan.h:106:1: got restricted gfp_t [usertype] gfp_flags include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: cast from restricted gfp_t include/trace/events/vmscan.h:196:1: sparse: too many warnings >> mm/vmscan.c:488:17: sparse: incompatible types in comparison expression (different type sizes) In file included from include/asm-generic/bug.h:18:0, from arch/x86/include/asm/bug.h:83, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:9, from mm/vmscan.c:17: mm/vmscan.c: In function 'do_shrink_slab': include/linux/kernel.h:845:29: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ include/linux/kernel.h:859:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ include/linux/kernel.h:869:24: note: in expansion of macro '__safe_cmp' __builtin_choose_expr(__safe_cmp(x, y), 121- ^~~~~~~~~~ include/linux/kernel.h:885:19: note: in expansion of macro '__careful_cmp' #define max(x, y) __careful_cmp(x, y, >) ^~~~~~~~~~~~~ mm/vmscan.c:488:10: note: in expansion of macro 'max' delta = max(delta, min(freeable, batch_size)); ^~~ vim +488 mm/vmscan.c 446 447 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, 448 struct shrinker *shrinker, int priority) 449 { 450 unsigned long freed = 0; 451 unsigned long long delta; 452 long total_scan; 453 long freeable; 454 long nr; 455 long new_nr; 456 int nid = shrinkctl->nid; 457 long batch_size = shrinker->batch ? shrinker->batch 458 : SHRINK_BATCH; 459 long scanned = 0, next_deferred; 460 461 if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) 462 nid = 0; 463 464 freeable = shrinker->count_objects(shrinker, shrinkctl); 465 if (freeable == 0 || freeable == SHRINK_EMPTY) 466 return freeable; 467 468 /* 469 * copy the current shrinker scan count into a local variable 470 * and zero it so that other concurrent shrinker invocations 471 * don't also do this scanning work. 472 */ 473 nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); 474 475 total_scan = nr; 476 delta = freeable >> priority; 477 delta *= 4; 478 do_div(delta, shrinker->seeks); 479 480 /* 481 * Make sure we apply some minimal pressure even on 482 * small cgroups. This is necessary because some of 483 * belonging objects can hold a reference to a dying 484 * child cgroup. If we don't scan them, the dying 485 * cgroup can't go away unless the memory pressure 486 * (and the scanning priority) raise significantly. 487 */ > 488 delta = max(delta, min(freeable, batch_size)); 489 490 total_scan += delta; 491 if (total_scan < 0) { 492 pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n", 493 shrinker->scan_objects, total_scan); 494 total_scan = freeable; 495 next_deferred = nr; 496 } else 497 next_deferred = total_scan; 498 499 /* 500 * We need to avoid excessive windup on filesystem shrinkers 501 * due to large numbers of GFP_NOFS allocations causing the 502 * shrinkers to return -1 all the time. This results in a large 503 * nr being built up so when a shrink that can do some work 504 * comes along it empties the entire cache due to nr >>> 505 * freeable. This is bad for sustaining a working set in 506 * memory. 507 * 508 * Hence only allow the shrinker to scan the entire cache when 509 * a large delta change is calculated directly. 510 */ 511 if (delta < freeable / 4) 512 total_scan = min(total_scan, freeable / 2); 513 514 /* 515 * Avoid risking looping forever due to too large nr value: 516 * never try to free more than twice the estimate number of 517 * freeable entries. 518 */ 519 if (total_scan > freeable * 2) 520 total_scan = freeable * 2; 521 522 trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, 523 freeable, delta, total_scan, priority); 524 525 /* 526 * Normally, we should not scan less than batch_size objects in one 527 * pass to avoid too frequent shrinker calls, but if the slab has less 528 * than batch_size objects in total and we are really tight on memory, 529 * we will try to reclaim all available objects, otherwise we can end 530 * up failing allocations although there are plenty of reclaimable 531 * objects spread over several slabs with usage less than the 532 * batch_size. 533 * 534 * We detect the "tight on memory" situations by looking at the total 535 * number of objects we want to scan (total_scan). If it is greater 536 * than the total number of objects on slab (freeable), we must be 537 * scanning at high prio and therefore should try to reclaim as much as 538 * possible. 539 */ 540 while (total_scan >= batch_size || 541 total_scan >= freeable) { 542 unsigned long ret; 543 unsigned long nr_to_scan = min(batch_size, total_scan); 544 545 shrinkctl->nr_to_scan = nr_to_scan; 546 shrinkctl->nr_scanned = nr_to_scan; 547 ret = shrinker->scan_objects(shrinker, shrinkctl); 548 if (ret == SHRINK_STOP) 549 break; 550 freed += ret; 551 552 count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned); 553 total_scan -= shrinkctl->nr_scanned; 554 scanned += shrinkctl->nr_scanned; 555 556 cond_resched(); 557 } 558 559 if (next_deferred >= scanned) 560 next_deferred -= scanned; 561 else 562 next_deferred = 0; 563 /* 564 * move the unused scan count back into the shrinker in a 565 * manner that handles concurrent updates. If we exhausted the 566 * scan, there is no need to do an update. 567 */ 568 if (next_deferred > 0) 569 new_nr = atomic_long_add_return(next_deferred, 570 &shrinker->nr_deferred[nid]); 571 else 572 new_nr = atomic_long_read(&shrinker->nr_deferred[nid]); 573 574 trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan); 575 return freed; 576 } 577 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/mm/vmscan.c b/mm/vmscan.c index fa2c150ab7b9..8544f4c5cd4f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, delta = freeable >> priority; delta *= 4; do_div(delta, shrinker->seeks); + + /* + * Make sure we apply some minimal pressure even on + * small cgroups. This is necessary because some of + * belonging objects can hold a reference to a dying + * child cgroup. If we don't scan them, the dying + * cgroup can't go away unless the memory pressure + * (and the scanning priority) raise significantly. + */ + delta = max(delta, min(freeable, batch_size)); + total_scan += delta; if (total_scan < 0) { pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",