Message ID | 20220919163929.351068-1-mlombard@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V3] mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context. | expand |
On 9/19/22 18:39, Maurizio Lombardi wrote: > Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations > __free_slab() invocations out of IRQ context") moved all flush_cpu_slab() > invocations to the global workqueue to avoid a problem related > with deactivate_slab()/__free_slab() being called from an IRQ context > on PREEMPT_RT kernels. > > When the flush_all_cpu_locked() function is called from a task context > it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up > flushing the global workqueue, this will cause a dependency issue. > > workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] > is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab > WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637 > check_flush_dependency+0x10a/0x120 > Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] > RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace: > __flush_work.isra.0+0xbf/0x220 > ? __queue_work+0x1dc/0x420 > flush_all_cpus_locked+0xfb/0x120 > __kmem_cache_shutdown+0x2b/0x320 > kmem_cache_destroy+0x49/0x100 > bioset_exit+0x143/0x190 > blk_release_queue+0xb9/0x100 > kobject_cleanup+0x37/0x130 > nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc] > nvme_free_ctrl+0x1ac/0x2b0 [nvme_core] > > Fix this bug by creating a workqueue for the flush operation with > the WQ_MEM_RECLAIM bit set. > > v2: Create a workqueue with WQ_MEM_RECLAIM > instead of trying to revert the changes. > > v3: replace create_workqueue() with alloc_workqueue() and BUG_ON() with > WARN_ON() > > Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Thanks, I added a Fixes: and Cc stable as AFAICS the warnings are not under a debugging config option, and it could bite somebody near OOM. Added to slab.git for-6.0/fixes and will include in pullrq this week.
On Mon, Sep 19, 2022 at 06:39:29PM +0200, Maurizio Lombardi wrote: > Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations > __free_slab() invocations out of IRQ context") moved all flush_cpu_slab() > invocations to the global workqueue to avoid a problem related > with deactivate_slab()/__free_slab() being called from an IRQ context > on PREEMPT_RT kernels. > > When the flush_all_cpu_locked() function is called from a task context > it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up > flushing the global workqueue, this will cause a dependency issue. > > workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] > is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab > WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637 > check_flush_dependency+0x10a/0x120 > Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] > RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace: > __flush_work.isra.0+0xbf/0x220 > ? __queue_work+0x1dc/0x420 > flush_all_cpus_locked+0xfb/0x120 > __kmem_cache_shutdown+0x2b/0x320 > kmem_cache_destroy+0x49/0x100 > bioset_exit+0x143/0x190 > blk_release_queue+0xb9/0x100 > kobject_cleanup+0x37/0x130 > nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc] > nvme_free_ctrl+0x1ac/0x2b0 [nvme_core] > > Fix this bug by creating a workqueue for the flush operation with > the WQ_MEM_RECLAIM bit set. > > v2: Create a workqueue with WQ_MEM_RECLAIM > instead of trying to revert the changes. > > v3: replace create_workqueue() with alloc_workqueue() and BUG_ON() with > WARN_ON() > > Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> > --- > mm/slub.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 862dbd9af4f5..016da09608fb 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -310,6 +310,11 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si) > */ > static nodemask_t slab_nodes; > > +/* > + * Workqueue used for flush_cpu_slab(). > + */ > +static struct workqueue_struct *flushwq; > + > /******************************************************************** > * Core slab cache functions > *******************************************************************/ > @@ -2730,7 +2735,7 @@ static void flush_all_cpus_locked(struct kmem_cache *s) > INIT_WORK(&sfw->work, flush_cpu_slab); > sfw->skip = false; > sfw->s = s; > - schedule_work_on(cpu, &sfw->work); > + queue_work_on(cpu, flushwq, &sfw->work); Hi. what happens here if flushwq failed? I think avoiding BUG_ON() makes sense, but shouldn't we have fallback method? > } > > for_each_online_cpu(cpu) { > @@ -4858,6 +4863,8 @@ void __init kmem_cache_init(void) > > void __init kmem_cache_init_late(void) > { > + flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0); > + WARN_ON(!flushwq); > } > > struct kmem_cache * > -- > 2.31.1 > >
On 2022-09-20 16:46:41 [+0900], Hyeonggon Yoo wrote: > > @@ -2730,7 +2735,7 @@ static void flush_all_cpus_locked(struct kmem_cache *s) > > INIT_WORK(&sfw->work, flush_cpu_slab); > > sfw->skip = false; > > sfw->s = s; > > - schedule_work_on(cpu, &sfw->work); > > + queue_work_on(cpu, flushwq, &sfw->work); > > Hi. what happens here if flushwq failed? > > I think avoiding BUG_ON() makes sense, > but shouldn't we have fallback method? You get an output to act on and fix. The point is that it shouldn't have happen in the first place. With the bug_on() that early, chances are that you never see anything but a blank screen. So with the warn_on you get probably to see the warn_on before you get here. Sebastian
On Tue, Sep 20, 2022 at 09:56:59AM +0200, Sebastian Andrzej Siewior wrote: > On 2022-09-20 16:46:41 [+0900], Hyeonggon Yoo wrote: > > > @@ -2730,7 +2735,7 @@ static void flush_all_cpus_locked(struct kmem_cache *s) > > > INIT_WORK(&sfw->work, flush_cpu_slab); > > > sfw->skip = false; > > > sfw->s = s; > > > - schedule_work_on(cpu, &sfw->work); > > > + queue_work_on(cpu, flushwq, &sfw->work); > > > > Hi. what happens here if flushwq failed? > > > > I think avoiding BUG_ON() makes sense, > > but shouldn't we have fallback method? > > You get an output to act on and fix. The point is that it shouldn't have > happen in the first place. With the bug_on() that early, chances are > that you never see anything but a blank screen. So with the warn_on you > get probably to see the warn_on before you get here. > > Sebastian Thank you for kind explanation. Makes sense!
On Mon, Sep 19, 2022 at 06:39:29PM +0200, Maurizio Lombardi wrote: > Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations > __free_slab() invocations out of IRQ context") moved all flush_cpu_slab() > invocations to the global workqueue to avoid a problem related > with deactivate_slab()/__free_slab() being called from an IRQ context > on PREEMPT_RT kernels. > > When the flush_all_cpu_locked() function is called from a task context > it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up > flushing the global workqueue, this will cause a dependency issue. > > workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] > is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab > WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637 > check_flush_dependency+0x10a/0x120 > Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] > RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace: > __flush_work.isra.0+0xbf/0x220 > ? __queue_work+0x1dc/0x420 > flush_all_cpus_locked+0xfb/0x120 > __kmem_cache_shutdown+0x2b/0x320 > kmem_cache_destroy+0x49/0x100 > bioset_exit+0x143/0x190 > blk_release_queue+0xb9/0x100 > kobject_cleanup+0x37/0x130 > nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc] > nvme_free_ctrl+0x1ac/0x2b0 [nvme_core] > > Fix this bug by creating a workqueue for the flush operation with > the WQ_MEM_RECLAIM bit set. > > v2: Create a workqueue with WQ_MEM_RECLAIM > instead of trying to revert the changes. > > v3: replace create_workqueue() with alloc_workqueue() and BUG_ON() with > WARN_ON() > > Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> > --- > mm/slub.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 862dbd9af4f5..016da09608fb 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -310,6 +310,11 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si) > */ > static nodemask_t slab_nodes; > > +/* > + * Workqueue used for flush_cpu_slab(). > + */ > +static struct workqueue_struct *flushwq; > + > /******************************************************************** > * Core slab cache functions > *******************************************************************/ > @@ -2730,7 +2735,7 @@ static void flush_all_cpus_locked(struct kmem_cache *s) > INIT_WORK(&sfw->work, flush_cpu_slab); > sfw->skip = false; > sfw->s = s; > - schedule_work_on(cpu, &sfw->work); > + queue_work_on(cpu, flushwq, &sfw->work); > } > > for_each_online_cpu(cpu) { > @@ -4858,6 +4863,8 @@ void __init kmem_cache_init(void) > > void __init kmem_cache_init_late(void) > { > + flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0); > + WARN_ON(!flushwq); > } > > struct kmem_cache * > -- > 2.31.1 > Looks good to me. Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
diff --git a/mm/slub.c b/mm/slub.c index 862dbd9af4f5..016da09608fb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -310,6 +310,11 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si) */ static nodemask_t slab_nodes; +/* + * Workqueue used for flush_cpu_slab(). + */ +static struct workqueue_struct *flushwq; + /******************************************************************** * Core slab cache functions *******************************************************************/ @@ -2730,7 +2735,7 @@ static void flush_all_cpus_locked(struct kmem_cache *s) INIT_WORK(&sfw->work, flush_cpu_slab); sfw->skip = false; sfw->s = s; - schedule_work_on(cpu, &sfw->work); + queue_work_on(cpu, flushwq, &sfw->work); } for_each_online_cpu(cpu) { @@ -4858,6 +4863,8 @@ void __init kmem_cache_init(void) void __init kmem_cache_init_late(void) { + flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0); + WARN_ON(!flushwq); } struct kmem_cache *
Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") moved all flush_cpu_slab() invocations to the global workqueue to avoid a problem related with deactivate_slab()/__free_slab() being called from an IRQ context on PREEMPT_RT kernels. When the flush_all_cpu_locked() function is called from a task context it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up flushing the global workqueue, this will cause a dependency issue. workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637 check_flush_dependency+0x10a/0x120 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace: __flush_work.isra.0+0xbf/0x220 ? __queue_work+0x1dc/0x420 flush_all_cpus_locked+0xfb/0x120 __kmem_cache_shutdown+0x2b/0x320 kmem_cache_destroy+0x49/0x100 bioset_exit+0x143/0x190 blk_release_queue+0xb9/0x100 kobject_cleanup+0x37/0x130 nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc] nvme_free_ctrl+0x1ac/0x2b0 [nvme_core] Fix this bug by creating a workqueue for the flush operation with the WQ_MEM_RECLAIM bit set. v2: Create a workqueue with WQ_MEM_RECLAIM instead of trying to revert the changes. v3: replace create_workqueue() with alloc_workqueue() and BUG_ON() with WARN_ON() Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> --- mm/slub.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)