Message ID | 20240413015410.30951-1-lipeifeng@oppo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/shrinker: add SHRINKER_NO_DIRECT_RECLAIM | expand |
On Sat, Apr 13, 2024 at 1:54 PM <lipeifeng@oppo.com> wrote: > > From: Peifeng Li <lipeifeng@oppo.com> > > In the case of insufficient memory, threads will be in direct_reclaim to > reclaim memory, direct_reclaim will call shrink_slab to run sequentially > each shrinker callback. If there is a lock-contention in the shrinker > callback,such as spinlock,mutex_lock and so on, threads may be likely to > be stuck in direct_reclaim for a long time, even if the memfree has reached > the high watermarks of the zone, resulting in poor performance of threads. > > Example 1: shrinker callback may wait for spinlock > static unsigned long mb_cache_shrink(struct mb_cache *cache, > unsigned long nr_to_scan) > { > struct mb_cache_entry *entry; > unsigned long shrunk = 0; > > spin_lock(&cache->c_list_lock); > while (nr_to_scan-- && !list_empty(&cache->c_list)) { > entry = list_first_entry(&cache->c_list, > struct mb_cache_entry, e_list); > if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || > atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { > clear_bit(MBE_REFERENCED_B, &entry->e_flags); > list_move_tail(&entry->e_list, &cache->c_list); > continue; > } > list_del_init(&entry->e_list); > cache->c_entry_count--; > spin_unlock(&cache->c_list_lock); > __mb_cache_entry_free(cache, entry); > shrunk++; > cond_resched(); > spin_lock(&cache->c_list_lock); > } > spin_unlock(&cache->c_list_lock); > > return shrunk; > } > Example 2: shrinker callback may wait for mutex lock > static > unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, > struct shrink_control *sc) > { > struct kbase_context *kctx; > struct kbase_mem_phy_alloc *alloc; > struct kbase_mem_phy_alloc *tmp; > unsigned long freed = 0; > > kctx = container_of(s, struct kbase_context, reclaim); > > // MTK add to prevent false alarm > lockdep_off(); > > mutex_lock(&kctx->jit_evict_lock); > > list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict_node) { > int err; > > err = kbase_mem_shrink_gpu_mapping(kctx, alloc->reg, > 0, alloc->nents); > if (err != 0) { > freed = -1; > goto out_unlock; > } > > alloc->evicted = alloc->nents; > > kbase_free_phy_pages_helper(alloc, alloc->evicted); > freed += alloc->evicted; > list_del_init(&alloc->evict_node); > > kbase_jit_backing_lost(alloc->reg); > > if (freed > sc->nr_to_scan) > break; > } > out_unlock: > mutex_unlock(&kctx->jit_evict_lock); > > // MTK add to prevent false alarm > lockdep_on(); > > return freed; > } > > In mobile-phone,threads are likely to be stuck in shrinker callback during > direct_reclaim, with example like the following: > <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: > dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2 > <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: > dynamic_mem_shrink_scan+0x0/0xb8 ... > > For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that > allows driver to set shrinker callback not to be called in direct_reclaim > unless sc->priority is 0. > > The reason why sc->priority=0 allows shrinker callback to be called in > direct_reclaim is for two reasons: > 1.Always call all shrinker callback in drop_slab that priority is 0. > 2.sc->priority is 0 during direct_reclaim, allow direct_reclaim to call > shrinker callback, to reclaim memory timely. > > Note: > 1.shrinker_register() default not to set SHRINKER_NO_DIRECT_RECLAIM, to > maintain the current behavior of the code. > 2.Logic of kswapd and drop_slab to call shrinker callback isn't affected. > > Signed-off-by: Peifeng Li <lipeifeng@oppo.com> > --- > -v2: fix the commit message > include/linux/shrinker.h | 5 +++++ > mm/shrinker.c | 38 +++++++++++++++++++++++++++++++++++--- > 2 files changed, 40 insertions(+), 3 deletions(-) > > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > index 1a00be90d93a..2d5a8b3a720b 100644 > --- a/include/linux/shrinker.h > +++ b/include/linux/shrinker.h > @@ -130,6 +130,11 @@ struct shrinker { > * non-MEMCG_AWARE shrinker should not have this flag set. > */ > #define SHRINKER_NONSLAB BIT(4) > +/* > + * Can shrinker callback be called in direct_relcaim unless > + * sc->priority is 0? > + */ > +#define SHRINKER_NO_DIRECT_RECLAIM BIT(5) > My point is, drivers won't voluntarily stay unreclaimed during direct reclamation. Hence, this approach is unlikely to succeed. Those drivers can't be trusted. Had they been aware of their slowness, they wouldn't have written code in this manner. Detecting problematic driver shrinkers and marking them as skipped might prove challenging. I concur with Zhengqi; the priority should be fixing the driver whose shrinker is slow. Do you have a list of slow drivers? > __printf(2, 3) > struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); > diff --git a/mm/shrinker.c b/mm/shrinker.c > index dc5d2a6fcfc4..7a5dffd166cd 100644 > --- a/mm/shrinker.c > +++ b/mm/shrinker.c > @@ -4,7 +4,7 @@ > #include <linux/shrinker.h> > #include <linux/rculist.h> > #include <trace/events/vmscan.h> > - > +#include <linux/swap.h> > #include "internal.h" > > LIST_HEAD(shrinker_list); > @@ -544,7 +544,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > if (!memcg_kmem_online() && > !(shrinker->flags & SHRINKER_NONSLAB)) > continue; > - > + /* > + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback > + * should not be called in direct_reclaim unless priority > + * is 0. > + */ > + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && > + !current_is_kswapd()) { > + /* > + * 1.Always call shrinker callback in drop_slab that > + * priority is 0. > + * 2.sc->priority is 0 during direct_reclaim, allow > + * direct_reclaim to call shrinker callback, to reclaim > + * memory timely. > + */ > + if (priority) > + continue; > + } > ret = do_shrink_slab(&sc, shrinker, priority); > if (ret == SHRINK_EMPTY) { > clear_bit(offset, unit->map); > @@ -658,7 +674,23 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, > continue; > > rcu_read_unlock(); > - > + /* > + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback > + * should not be called in direct_reclaim unless priority > + * is 0. > + */ > + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && > + !current_is_kswapd()) { > + /* > + * 1.Always call shrinker callback in drop_slab that > + * priority is 0. > + * 2.sc->priority is 0 during direct_reclaim, allow > + * direct_reclaim to call shrinker callback, to reclaim > + * memory timely. > + */ > + if (priority) > + continue; > + } > ret = do_shrink_slab(&sc, shrinker, priority); > if (ret == SHRINK_EMPTY) > ret = 0; > -- > 2.34.1 Thanks Barry
在 2024/4/13 13:19, Barry Song 写道: > On Sat, Apr 13, 2024 at 1:54 PM <lipeifeng@oppo.com> wrote: >> From: Peifeng Li <lipeifeng@oppo.com> >> >> In the case of insufficient memory, threads will be in direct_reclaim to >> reclaim memory, direct_reclaim will call shrink_slab to run sequentially >> each shrinker callback. If there is a lock-contention in the shrinker >> callback,such as spinlock,mutex_lock and so on, threads may be likely to >> be stuck in direct_reclaim for a long time, even if the memfree has reached >> the high watermarks of the zone, resulting in poor performance of threads. >> >> Example 1: shrinker callback may wait for spinlock >> static unsigned long mb_cache_shrink(struct mb_cache *cache, >> unsigned long nr_to_scan) >> { >> struct mb_cache_entry *entry; >> unsigned long shrunk = 0; >> >> spin_lock(&cache->c_list_lock); >> while (nr_to_scan-- && !list_empty(&cache->c_list)) { >> entry = list_first_entry(&cache->c_list, >> struct mb_cache_entry, e_list); >> if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || >> atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { >> clear_bit(MBE_REFERENCED_B, &entry->e_flags); >> list_move_tail(&entry->e_list, &cache->c_list); >> continue; >> } >> list_del_init(&entry->e_list); >> cache->c_entry_count--; >> spin_unlock(&cache->c_list_lock); >> __mb_cache_entry_free(cache, entry); >> shrunk++; >> cond_resched(); >> spin_lock(&cache->c_list_lock); >> } >> spin_unlock(&cache->c_list_lock); >> >> return shrunk; >> } >> Example 2: shrinker callback may wait for mutex lock >> static >> unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, >> struct shrink_control *sc) >> { >> struct kbase_context *kctx; >> struct kbase_mem_phy_alloc *alloc; >> struct kbase_mem_phy_alloc *tmp; >> unsigned long freed = 0; >> >> kctx = container_of(s, struct kbase_context, reclaim); >> >> // MTK add to prevent false alarm >> lockdep_off(); >> >> mutex_lock(&kctx->jit_evict_lock); >> >> list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict_node) { >> int err; >> >> err = kbase_mem_shrink_gpu_mapping(kctx, alloc->reg, >> 0, alloc->nents); >> if (err != 0) { >> freed = -1; >> goto out_unlock; >> } >> >> alloc->evicted = alloc->nents; >> >> kbase_free_phy_pages_helper(alloc, alloc->evicted); >> freed += alloc->evicted; >> list_del_init(&alloc->evict_node); >> >> kbase_jit_backing_lost(alloc->reg); >> >> if (freed > sc->nr_to_scan) >> break; >> } >> out_unlock: >> mutex_unlock(&kctx->jit_evict_lock); >> >> // MTK add to prevent false alarm >> lockdep_on(); >> >> return freed; >> } >> >> In mobile-phone,threads are likely to be stuck in shrinker callback during >> direct_reclaim, with example like the following: >> <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: >> dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2 >> <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: >> dynamic_mem_shrink_scan+0x0/0xb8 ... >> >> For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that >> allows driver to set shrinker callback not to be called in direct_reclaim >> unless sc->priority is 0. >> >> The reason why sc->priority=0 allows shrinker callback to be called in >> direct_reclaim is for two reasons: >> 1.Always call all shrinker callback in drop_slab that priority is 0. >> 2.sc->priority is 0 during direct_reclaim, allow direct_reclaim to call >> shrinker callback, to reclaim memory timely. >> >> Note: >> 1.shrinker_register() default not to set SHRINKER_NO_DIRECT_RECLAIM, to >> maintain the current behavior of the code. >> 2.Logic of kswapd and drop_slab to call shrinker callback isn't affected. >> >> Signed-off-by: Peifeng Li <lipeifeng@oppo.com> >> --- >> -v2: fix the commit message >> include/linux/shrinker.h | 5 +++++ >> mm/shrinker.c | 38 +++++++++++++++++++++++++++++++++++--- >> 2 files changed, 40 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h >> index 1a00be90d93a..2d5a8b3a720b 100644 >> --- a/include/linux/shrinker.h >> +++ b/include/linux/shrinker.h >> @@ -130,6 +130,11 @@ struct shrinker { >> * non-MEMCG_AWARE shrinker should not have this flag set. >> */ >> #define SHRINKER_NONSLAB BIT(4) >> +/* >> + * Can shrinker callback be called in direct_relcaim unless >> + * sc->priority is 0? >> + */ >> +#define SHRINKER_NO_DIRECT_RECLAIM BIT(5) >> > My point is, drivers won't voluntarily stay unreclaimed during direct > reclamation. Hence, this approach is unlikely to succeed. Those > drivers can't be trusted. Had they been aware of their slowness, > they wouldn't have written code in this manner. Actually, we hope there is a way for us to solve the block of shrinker callback, Because many drivers can't remove their lock in shrinker callback timely, with the flags, we can tell drivers to add it. > Detecting problematic driver shrinkers and marking them as skipped > might prove challenging. I concur with Zhengqi; the priority should > be fixing the driver whose shrinker is slow. Do you have a list of > slow drivers? Most of slow drivers hadn't been upstreamed, so that we can not gather a list of slow drivers. I am curious if executing do_shrink_slab() asynchronously could be accepted in Linux? or executing part of shrinker callbacks asynchronously? In my mind, if the memory-reclaim-path of the kernel would be affected by the driver, the robustness of the kernel will be greatly reduced. > > >> __printf(2, 3) >> struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); >> diff --git a/mm/shrinker.c b/mm/shrinker.c >> index dc5d2a6fcfc4..7a5dffd166cd 100644 >> --- a/mm/shrinker.c >> +++ b/mm/shrinker.c >> @@ -4,7 +4,7 @@ >> #include <linux/shrinker.h> >> #include <linux/rculist.h> >> #include <trace/events/vmscan.h> >> - >> +#include <linux/swap.h> >> #include "internal.h" >> >> LIST_HEAD(shrinker_list); >> @@ -544,7 +544,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, >> if (!memcg_kmem_online() && >> !(shrinker->flags & SHRINKER_NONSLAB)) >> continue; >> - >> + /* >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback >> + * should not be called in direct_reclaim unless priority >> + * is 0. >> + */ >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && >> + !current_is_kswapd()) { >> + /* >> + * 1.Always call shrinker callback in drop_slab that >> + * priority is 0. >> + * 2.sc->priority is 0 during direct_reclaim, allow >> + * direct_reclaim to call shrinker callback, to reclaim >> + * memory timely. >> + */ >> + if (priority) >> + continue; >> + } >> ret = do_shrink_slab(&sc, shrinker, priority); >> if (ret == SHRINK_EMPTY) { >> clear_bit(offset, unit->map); >> @@ -658,7 +674,23 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, >> continue; >> >> rcu_read_unlock(); >> - >> + /* >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback >> + * should not be called in direct_reclaim unless priority >> + * is 0. >> + */ >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && >> + !current_is_kswapd()) { >> + /* >> + * 1.Always call shrinker callback in drop_slab that >> + * priority is 0. >> + * 2.sc->priority is 0 during direct_reclaim, allow >> + * direct_reclaim to call shrinker callback, to reclaim >> + * memory timely. >> + */ >> + if (priority) >> + continue; >> + } >> ret = do_shrink_slab(&sc, shrinker, priority); >> if (ret == SHRINK_EMPTY) >> ret = 0; >> -- >> 2.34.1 > Thanks > Barry
On Sat, Apr 13, 2024 at 5:42 PM 李培锋 <lipeifeng@oppo.com> wrote: > > > 在 2024/4/13 13:19, Barry Song 写道: > > On Sat, Apr 13, 2024 at 1:54 PM <lipeifeng@oppo.com> wrote: > >> From: Peifeng Li <lipeifeng@oppo.com> > >> > >> In the case of insufficient memory, threads will be in direct_reclaim to > >> reclaim memory, direct_reclaim will call shrink_slab to run sequentially > >> each shrinker callback. If there is a lock-contention in the shrinker > >> callback,such as spinlock,mutex_lock and so on, threads may be likely to > >> be stuck in direct_reclaim for a long time, even if the memfree has reached > >> the high watermarks of the zone, resulting in poor performance of threads. > >> > >> Example 1: shrinker callback may wait for spinlock > >> static unsigned long mb_cache_shrink(struct mb_cache *cache, > >> unsigned long nr_to_scan) > >> { > >> struct mb_cache_entry *entry; > >> unsigned long shrunk = 0; > >> > >> spin_lock(&cache->c_list_lock); > >> while (nr_to_scan-- && !list_empty(&cache->c_list)) { > >> entry = list_first_entry(&cache->c_list, > >> struct mb_cache_entry, e_list); > >> if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || > >> atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { > >> clear_bit(MBE_REFERENCED_B, &entry->e_flags); > >> list_move_tail(&entry->e_list, &cache->c_list); > >> continue; > >> } > >> list_del_init(&entry->e_list); > >> cache->c_entry_count--; > >> spin_unlock(&cache->c_list_lock); > >> __mb_cache_entry_free(cache, entry); > >> shrunk++; > >> cond_resched(); > >> spin_lock(&cache->c_list_lock); > >> } > >> spin_unlock(&cache->c_list_lock); > >> > >> return shrunk; > >> } > >> Example 2: shrinker callback may wait for mutex lock > >> static > >> unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, > >> struct shrink_control *sc) > >> { > >> struct kbase_context *kctx; > >> struct kbase_mem_phy_alloc *alloc; > >> struct kbase_mem_phy_alloc *tmp; > >> unsigned long freed = 0; > >> > >> kctx = container_of(s, struct kbase_context, reclaim); > >> > >> // MTK add to prevent false alarm > >> lockdep_off(); > >> > >> mutex_lock(&kctx->jit_evict_lock); > >> > >> list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict_node) { > >> int err; > >> > >> err = kbase_mem_shrink_gpu_mapping(kctx, alloc->reg, > >> 0, alloc->nents); > >> if (err != 0) { > >> freed = -1; > >> goto out_unlock; > >> } > >> > >> alloc->evicted = alloc->nents; > >> > >> kbase_free_phy_pages_helper(alloc, alloc->evicted); > >> freed += alloc->evicted; > >> list_del_init(&alloc->evict_node); > >> > >> kbase_jit_backing_lost(alloc->reg); > >> > >> if (freed > sc->nr_to_scan) > >> break; > >> } > >> out_unlock: > >> mutex_unlock(&kctx->jit_evict_lock); > >> > >> // MTK add to prevent false alarm > >> lockdep_on(); > >> > >> return freed; > >> } > >> > >> In mobile-phone,threads are likely to be stuck in shrinker callback during > >> direct_reclaim, with example like the following: > >> <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: > >> dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2 > >> <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: > >> dynamic_mem_shrink_scan+0x0/0xb8 ... > >> > >> For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that > >> allows driver to set shrinker callback not to be called in direct_reclaim > >> unless sc->priority is 0. > >> > >> The reason why sc->priority=0 allows shrinker callback to be called in > >> direct_reclaim is for two reasons: > >> 1.Always call all shrinker callback in drop_slab that priority is 0. > >> 2.sc->priority is 0 during direct_reclaim, allow direct_reclaim to call > >> shrinker callback, to reclaim memory timely. We already provide current_is_kswapd() and shrinker_control to drivers. If you believe that sc->priority can assist shrinker callbacks in behaving differently, why not simply pass it along with the shrinker_control and allow drivers to decide their preferred course of action? I don't find it reasonable to reverse the approach. Allowing drivers to pass a flag to the memory management core doesn't seem sensible. > >> > >> Note: > >> 1.shrinker_register() default not to set SHRINKER_NO_DIRECT_RECLAIM, to > >> maintain the current behavior of the code. > >> 2.Logic of kswapd and drop_slab to call shrinker callback isn't affected. > >> > >> Signed-off-by: Peifeng Li <lipeifeng@oppo.com> > >> --- > >> -v2: fix the commit message > >> include/linux/shrinker.h | 5 +++++ > >> mm/shrinker.c | 38 +++++++++++++++++++++++++++++++++++--- > >> 2 files changed, 40 insertions(+), 3 deletions(-) > >> > >> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > >> index 1a00be90d93a..2d5a8b3a720b 100644 > >> --- a/include/linux/shrinker.h > >> +++ b/include/linux/shrinker.h > >> @@ -130,6 +130,11 @@ struct shrinker { > >> * non-MEMCG_AWARE shrinker should not have this flag set. > >> */ > >> #define SHRINKER_NONSLAB BIT(4) > >> +/* > >> + * Can shrinker callback be called in direct_relcaim unless > >> + * sc->priority is 0? > >> + */ > >> +#define SHRINKER_NO_DIRECT_RECLAIM BIT(5) > >> > > My point is, drivers won't voluntarily stay unreclaimed during direct > > reclamation. Hence, this approach is unlikely to succeed. Those > > drivers can't be trusted. Had they been aware of their slowness, > > they wouldn't have written code in this manner. > > Actually, we hope there is a way for us to solve the block of shrinker > callback, > > Because many drivers can't remove their lock in shrinker callback > timely, with > > the flags, we can tell drivers to add it. > > > Detecting problematic driver shrinkers and marking them as skipped > > might prove challenging. I concur with Zhengqi; the priority should > > be fixing the driver whose shrinker is slow. Do you have a list of > > slow drivers? > > Most of slow drivers hadn't been upstreamed, so that we can not gather > > a list of slow drivers. > > I am curious if executing do_shrink_slab() asynchronously could be accepted > > in Linux? or executing part of shrinker callbacks asynchronously? That entirely hinges on the type of data we possess. As Zhengqi pointed out, asynchronous slab shrinkers could also impede memory reclamation. If the data eventually shows that this isn't an issue, asynchronous slab shrinkers might discover a solution. > > In my mind, if the memory-reclaim-path of the kernel would be affected by > > the driver, the robustness of the kernel will be greatly reduced. > > > > > > >> __printf(2, 3) > >> struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); > >> diff --git a/mm/shrinker.c b/mm/shrinker.c > >> index dc5d2a6fcfc4..7a5dffd166cd 100644 > >> --- a/mm/shrinker.c > >> +++ b/mm/shrinker.c > >> @@ -4,7 +4,7 @@ > >> #include <linux/shrinker.h> > >> #include <linux/rculist.h> > >> #include <trace/events/vmscan.h> > >> - > >> +#include <linux/swap.h> > >> #include "internal.h" > >> > >> LIST_HEAD(shrinker_list); > >> @@ -544,7 +544,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > >> if (!memcg_kmem_online() && > >> !(shrinker->flags & SHRINKER_NONSLAB)) > >> continue; > >> - > >> + /* > >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback > >> + * should not be called in direct_reclaim unless priority > >> + * is 0. > >> + */ > >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && > >> + !current_is_kswapd()) { > >> + /* > >> + * 1.Always call shrinker callback in drop_slab that > >> + * priority is 0. > >> + * 2.sc->priority is 0 during direct_reclaim, allow > >> + * direct_reclaim to call shrinker callback, to reclaim > >> + * memory timely. > >> + */ > >> + if (priority) > >> + continue; > >> + } > >> ret = do_shrink_slab(&sc, shrinker, priority); > >> if (ret == SHRINK_EMPTY) { > >> clear_bit(offset, unit->map); > >> @@ -658,7 +674,23 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, > >> continue; > >> > >> rcu_read_unlock(); > >> - > >> + /* > >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback > >> + * should not be called in direct_reclaim unless priority > >> + * is 0. > >> + */ > >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && > >> + !current_is_kswapd()) { > >> + /* > >> + * 1.Always call shrinker callback in drop_slab that > >> + * priority is 0. > >> + * 2.sc->priority is 0 during direct_reclaim, allow > >> + * direct_reclaim to call shrinker callback, to reclaim > >> + * memory timely. > >> + */ > >> + if (priority) > >> + continue; > >> + } > >> ret = do_shrink_slab(&sc, shrinker, priority); > >> if (ret == SHRINK_EMPTY) > >> ret = 0; > >> -- > >> 2.34.1 > > Thanks > > Barry
On Sat, Apr 13, 2024 at 09:54:10AM +0800, lipeifeng@oppo.com wrote: > From: Peifeng Li <lipeifeng@oppo.com> > > In the case of insufficient memory, threads will be in direct_reclaim to > reclaim memory, direct_reclaim will call shrink_slab to run sequentially > each shrinker callback. If there is a lock-contention in the shrinker > callback,such as spinlock,mutex_lock and so on, threads may be likely to > be stuck in direct_reclaim for a long time, even if the memfree has reached > the high watermarks of the zone, resulting in poor performance of threads. That's always been a problem. That's a shrinker implementation problem, not a shrinker infrastructure problem. > Example 1: shrinker callback may wait for spinlock > static unsigned long mb_cache_shrink(struct mb_cache *cache, > unsigned long nr_to_scan) > { > struct mb_cache_entry *entry; > unsigned long shrunk = 0; > > spin_lock(&cache->c_list_lock); > while (nr_to_scan-- && !list_empty(&cache->c_list)) { > entry = list_first_entry(&cache->c_list, > struct mb_cache_entry, e_list); > if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || > atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { > clear_bit(MBE_REFERENCED_B, &entry->e_flags); > list_move_tail(&entry->e_list, &cache->c_list); > continue; > } > list_del_init(&entry->e_list); > cache->c_entry_count--; > spin_unlock(&cache->c_list_lock); > __mb_cache_entry_free(cache, entry); > shrunk++; > cond_resched(); > spin_lock(&cache->c_list_lock); > } > spin_unlock(&cache->c_list_lock); > > return shrunk; > } Yeah, we learnt a -long- time ago that using global locks in shrinkers that have -unbounded concurrency- is a really bad idea. This is just a poorly implemented shrinker implemenation because it doesn't take into account memory reclaim concurrency. This is, for example, why list_lru exists is tightly tied into the SHRINKER_NUMA_AWARE infrastructure - it gets rid of the need for global locks in reclaim lists that shrinkers traverse. > Example 2: shrinker callback may wait for mutex lock > static > unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s, > struct shrink_control *sc) > { > struct kbase_context *kctx; > struct kbase_mem_phy_alloc *alloc; > struct kbase_mem_phy_alloc *tmp; > unsigned long freed = 0; > > kctx = container_of(s, struct kbase_context, reclaim); > > // MTK add to prevent false alarm > lockdep_off(); That's just -broken-. If shrinkers are called from a context that they can't take locks because they might deadlock, then they must either use trylocks and abort (i.e. SHRINK_STOP) or use context flags provided by the allocation context (e.g. GFP_NOFS, memalloc_nofs_save()) to tell reclaim that context specific subsystem locks are held and the shrinker should not attempt to take them and/or run in this context. > mutex_lock(&kctx->jit_evict_lock); That's also wrong. Shrinkers must be non-blocking, otherwise the cause memory reclaim latencies that will result in unpredicatable memory allocation latencies and that makes anyone running applications with latency specific SLAs very unhappy. IOWs, this is a subsystem shrinker that is very poorly implemented and needs to be fixed before we do anything else. > In mobile-phone,threads are likely to be stuck in shrinker callback during > direct_reclaim, with example like the following: > <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: > dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2 > <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: > dynamic_mem_shrink_scan+0x0/0xb8 ... Yup, that's exactly the problem with blocking shrinkers - they can screw the whole system over because it stops memory allocation in it's tracks. Shrinkers must be non-blocking. > For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that > allows driver to set shrinker callback not to be called in direct_reclaim > unless sc->priority is 0. No, that's fundamentally flawed, too. Firstly, it doesn't avoid deadlocks, nor does it avoid lock contention under heavy memory pressure - it just hides these problems until we are critically low on memory. Which will happen much faster, because we aren't reclaiming memory from caches that hold memory that needs to be reclaimed. This isn't good. Further, it bypasses the mechanism we use to defer the shrinker work to a context where it can be executed safely (i.e. kswapd). Shrinkers that cannot run in the current context are supposed to return SHRINK_STOP to tell the shrink_slab infrastructure to accumulate the work for the next context that can run the reclaim rather than execute it. This allows kswapd to do the reclaim work instead of direct reclaim. It also ensures that all the memory pressure being applied to the shrinkers is actually actioned so we keep all the caches and memory usage in relative balance. IOWs, the choice of running the shrinker or not is controlled by two things: 1. the shrinker implementation itself, and 2. the reclaim context flags provided by the allocation that needs reclaim to be performed. Long story short: if a shrinker is causing direct reclaim problems because of poor locking design, latency and/or context specific deadlocks, then the subsystem and it's shrinker needs to be fixed. We should not be skipping direct reclaim just because a shrinker is really poorly implemented. -Dave.
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 1a00be90d93a..2d5a8b3a720b 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -130,6 +130,11 @@ struct shrinker { * non-MEMCG_AWARE shrinker should not have this flag set. */ #define SHRINKER_NONSLAB BIT(4) +/* + * Can shrinker callback be called in direct_relcaim unless + * sc->priority is 0? + */ +#define SHRINKER_NO_DIRECT_RECLAIM BIT(5) __printf(2, 3) struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); diff --git a/mm/shrinker.c b/mm/shrinker.c index dc5d2a6fcfc4..7a5dffd166cd 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -4,7 +4,7 @@ #include <linux/shrinker.h> #include <linux/rculist.h> #include <trace/events/vmscan.h> - +#include <linux/swap.h> #include "internal.h" LIST_HEAD(shrinker_list); @@ -544,7 +544,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!memcg_kmem_online() && !(shrinker->flags & SHRINKER_NONSLAB)) continue; - + /* + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback + * should not be called in direct_reclaim unless priority + * is 0. + */ + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && + !current_is_kswapd()) { + /* + * 1.Always call shrinker callback in drop_slab that + * priority is 0. + * 2.sc->priority is 0 during direct_reclaim, allow + * direct_reclaim to call shrinker callback, to reclaim + * memory timely. + */ + if (priority) + continue; + } ret = do_shrink_slab(&sc, shrinker, priority); if (ret == SHRINK_EMPTY) { clear_bit(offset, unit->map); @@ -658,7 +674,23 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, continue; rcu_read_unlock(); - + /* + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker callback + * should not be called in direct_reclaim unless priority + * is 0. + */ + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && + !current_is_kswapd()) { + /* + * 1.Always call shrinker callback in drop_slab that + * priority is 0. + * 2.sc->priority is 0 during direct_reclaim, allow + * direct_reclaim to call shrinker callback, to reclaim + * memory timely. + */ + if (priority) + continue; + } ret = do_shrink_slab(&sc, shrinker, priority); if (ret == SHRINK_EMPTY) ret = 0;