Message ID | 20180504104451.20278-1-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/04/2018 12:44 PM, Sebastian Andrzej Siewior wrote: > This patch reverts commit c7f26ccfb2c3 ("mm/vmstat.c: fix > vmstat_update() preemption BUG"). > Steven saw a "using smp_processor_id() in preemptible" message and > added a preempt_disable() section around it to keep it quiet. This is > not the right thing to do it does not fix the real problem. > > vmstat_update() is invoked by a kworker on a specific CPU. This worker > it bound to this CPU. The name of the worker was "kworker/1:1" so it > should have been a worker which was bound to CPU1. A worker which can > run on any CPU would have a `u' before the first digit. > > smp_processor_id() can be used in a preempt-enabled region as long as > the task is bound to a single CPU which is the case here. If it could > run on an arbitrary CPU then this is the problem we have an should seek > to resolve. > Not only this smp_processor_id() must not be migrated to another CPU but > also refresh_cpu_vm_stats() which might access wrong per-CPU variables. > Not to mention that other code relies on the fact that such a worker > runs on one specific CPU only. > > Therefore I revert that commit and we should look instead what broke the > affinity mask of the kworker. Yes. I think there's an important detail that should be perhaps included explicitly here. The check check_preemption_disabled() does include this test: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ if (cpumask_equal(¤t->cpus_allowed, cpumask_of(this_cpu))) goto out; So indeed, if kworkers work as expected, there's no false positive bug. The report in commit c7f26ccfb2c3 included: BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:1/269 caller is vmstat_update+0x50/0xa0 CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted I.e. kworker/1 running on CPU 0. Because the BUG was reported, we know that the test above did not prevent it. That means either the kworker's cpumask was not restricted to CPU 1 (it included also cpu 0), or it was restricted, but the restriction was ignored, and it still executed on cpu 0. Note the report also said "Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can...". IIRC Tejun mentioned that during hotplug (or hotremove, actually?) the guarantees are off, but vmstat should not schedule work items on such cpus due to its hooks/checks. In any case I agree that the revert should be done immediately even before fixing the underlying bug. The preempt_disable/enable doesn't prevent the bug, it only prevents the debugging code from actually reporting it! Note that it's debugging code (CONFIG_DEBUG_PREEMPT) that production kernels most likely don't have enabled, so we are not even helping them not crash (while allowing possible data corruption). Acked-by: Vlastimil Babka <vbabka@suse.cz> > Cc: Steven J. Hill <steven.hill@cavium.com> > Cc: Tejun Heo <htejun@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > mm/vmstat.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 33581be705f0..40b2db6db6b1 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1839,11 +1839,9 @@ static void vmstat_update(struct work_struct *w) > * to occur in the future. Keep on running the > * update worker thread. > */ > - preempt_disable(); > queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, > this_cpu_ptr(&vmstat_work), > round_jiffies_relative(sysctl_stat_interval)); > - preempt_enable(); > } > } > >
On Mon, 7 May 2018 09:31:05 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > In any case I agree that the revert should be done immediately even > before fixing the underlying bug. The preempt_disable/enable doesn't > prevent the bug, it only prevents the debugging code from actually > reporting it! Note that it's debugging code (CONFIG_DEBUG_PREEMPT) that > production kernels most likely don't have enabled, so we are not even > helping them not crash (while allowing possible data corruption). Grumble. I don't see much benefit in emitting warnings into end-users' logs for bugs which we already know about. The only thing this buys us is that people will hassle us if we forget to fix the bug, and how pathetic is that? I mean, we may as well put printk("don't forget to fix the vmstat_update() bug!\n"); into start_kernel().
On 2018-05-08 16:02:57 [-0700], Andrew Morton wrote: > On Mon, 7 May 2018 09:31:05 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > > > In any case I agree that the revert should be done immediately even > > before fixing the underlying bug. The preempt_disable/enable doesn't > > prevent the bug, it only prevents the debugging code from actually > > reporting it! Note that it's debugging code (CONFIG_DEBUG_PREEMPT) that > > production kernels most likely don't have enabled, so we are not even > > helping them not crash (while allowing possible data corruption). > > Grumble. > > I don't see much benefit in emitting warnings into end-users' logs for > bugs which we already know about. not end-users (not to mention that neither Debian Stretch nor F28 has preemption enabled in their kernels). And if so, they may provide additional information for someone to fix the bug in the end. I wasn't able to reproduce the bug but I don't have access to anything MIPSish where I can boot my own kernels. At least two people were looking at the code after I posted the revert and nobody spotted the bug. > The only thing this buys us is that people will hassle us if we forget > to fix the bug, and how pathetic is that? I mean, we may as well put > > printk("don't forget to fix the vmstat_update() bug!\n"); No that is different. That would be seen by everyone. The bug was only reported by Steven J. Hill which did not respond since. This message would also imply that we know how to fix the bug but didn't do it yet which is not the case. We seen that something was wrong but have no idea *how* it got there. The preempt_disable() was added by the end of v4.16. The smp_processor_id() in vmstat_update() was added in commit 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8") which was in v3.18-rc1. The hotplug rework took place in v4.10-rc1. And it took (counting from the hotplug rework) 6 kernel releases for someone to trigger that warning _if_ this was related to the hotplug rework. What we have *now* is way worse: We have a possible bug that triggered the warning. As we see in report the code in question was _already_ invoked on the wrong CPU. The preempt_disable() just silences the warning, hiding the real issue so nobody will do a thing about it since it will be never reported again (in a kernel with preemption and debug enabled). > into start_kernel(). Sebastian
On 05/10/2018 12:35 AM, Sebastian Andrzej Siewior wrote: > On 2018-05-08 16:02:57 [-0700], Andrew Morton wrote: >> On Mon, 7 May 2018 09:31:05 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: >> >>> In any case I agree that the revert should be done immediately even >>> before fixing the underlying bug. The preempt_disable/enable doesn't >>> prevent the bug, it only prevents the debugging code from actually >>> reporting it! Note that it's debugging code (CONFIG_DEBUG_PREEMPT) that >>> production kernels most likely don't have enabled, so we are not even >>> helping them not crash (while allowing possible data corruption). >> >> Grumble. >> >> I don't see much benefit in emitting warnings into end-users' logs for >> bugs which we already know about. > > not end-users (not to mention that neither Debian Stretch nor F28 has > preemption enabled in their kernels). And if so, they may provide > additional information for someone to fix the bug in the end. I wasn't Even if end-users have enabled preemption, they likely won't have enabled CONFIG_DEBUG_PREEMPT anyway. > able to reproduce the bug but I don't have access to anything MIPSish > where I can boot my own kernels. At least two people were looking at the > code after I posted the revert and nobody spotted the bug. > >> The only thing this buys us is that people will hassle us if we forget >> to fix the bug, and how pathetic is that? I mean, we may as well put >> >> printk("don't forget to fix the vmstat_update() bug!\n"); > > No that is different. That would be seen by everyone. The bug was only > reported by Steven J. Hill which did not respond since. This message > would also imply that we know how to fix the bug but didn't do it yet > which is not the case. We seen that something was wrong but have no idea > *how* it got there. > > The preempt_disable() was added by the end of v4.16. The > smp_processor_id() in vmstat_update() was added in commit 7cc36bbddde5 > ("vmstat: on-demand vmstat workers V8") which was in v3.18-rc1. The > hotplug rework took place in v4.10-rc1. And it took (counting from the > hotplug rework) 6 kernel releases for someone to trigger that warning > _if_ this was related to the hotplug rework. > > What we have *now* is way worse: We have a possible bug that triggered > the warning. As we see in report the code in question was _already_ > invoked on the wrong CPU. The preempt_disable() just silences the > warning, hiding the real issue so nobody will do a thing about it since > it will be never reported again (in a kernel with preemption and debug > enabled). Fully agree with everything you said! We could extend the warning to e.g. print affinity mask of the thread, and e.g. state of cpus that are subject to ongoing hotplug/hotremove. But maybe it's not so useful in general, as the common case is likely indeed a missing preempt_disable, and this is an exception? In any case, I would hope that Steven applies some patch locally and we get more details about what's going on at that MIPS machine.
On Thu, 10 May 2018, Vlastimil Babka wrote: > On 05/10/2018 12:35 AM, Sebastian Andrzej Siewior wrote: > >> The only thing this buys us is that people will hassle us if we forget > >> to fix the bug, and how pathetic is that? I mean, we may as well put > >> > >> printk("don't forget to fix the vmstat_update() bug!\n"); > > > > No that is different. That would be seen by everyone. The bug was only > > reported by Steven J. Hill which did not respond since. This message > > would also imply that we know how to fix the bug but didn't do it yet > > which is not the case. We seen that something was wrong but have no idea > > *how* it got there. > > > > The preempt_disable() was added by the end of v4.16. The > > smp_processor_id() in vmstat_update() was added in commit 7cc36bbddde5 > > ("vmstat: on-demand vmstat workers V8") which was in v3.18-rc1. The > > hotplug rework took place in v4.10-rc1. And it took (counting from the > > hotplug rework) 6 kernel releases for someone to trigger that warning > > _if_ this was related to the hotplug rework. > > > > What we have *now* is way worse: We have a possible bug that triggered > > the warning. As we see in report the code in question was _already_ > > invoked on the wrong CPU. The preempt_disable() just silences the > > warning, hiding the real issue so nobody will do a thing about it since > > it will be never reported again (in a kernel with preemption and debug > > enabled). > > Fully agree with everything you said! > > We could extend the warning to e.g. print affinity mask of the thread, > and e.g. state of cpus that are subject to ongoing hotplug/hotremove. > But maybe it's not so useful in general, as the common case is likely > indeed a missing preempt_disable, and this is an exception? In any case, > I would hope that Steven applies some patch locally and we get more > details about what's going on at that MIPS machine. So after this went completely silent and S.J. Hill seems to be lost in the intertubes, I spent quite some time staring at that code and the commit in question: "Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can cause vmstat_update() to report a BUG due to preemption not being disabled around smp_processor_id()." That changelog is pretty much useless as it just decscribes the symptom and the 'fix' follows suit by papering over that symptom. Plus it's even more obscure that only the queue_delayed_work_on() bit needs to be fixed because vmstat_update() does: if (refresh_cpu_vm_stats(true)) queue_delayed_work_on(smp_processor_id() ....); and refresh_cpu_vm_stats() is full of this_cpu_* accesses which all are equipped with __this_cpu_preempt_check() calls which depend on CONFIG_DEBUG_PREEMPT as well. So how does only the queue_delayed_work_on(smp_processor_id()) part trigger? That does not make any sense at all. Can we please revert that master piece of duct tape engineering and wait for someone to actually trigger the warning again? All we can hope it's someone who really sits down and does a proper analysis of the problem instead of throwing some half baken 'works for me' crap over the fence and then running as fast as it goes. As I was at it I stared at the hotplug code some more. vmstat_update() is a delayed work function which is scheduled on a particular per CPU mm_percpu_wq. In case of CPU hotplug the work (and the timer) is canceled _before_ the per CPU workqueues are unbound. It cannot be requeued because vmstat_shepherd() on some other CPU would be stuck in get_online_cpus() and by the time it's unstuck the CPU is gone from the online cpumask. So this looks all about correct, but there is a very subtle case where the above has a hole. That requires to execute the hotplug state control and not the full /sys/..../cpu$N/online mechanism, e.g. by doing: # echo $PERF_ONLINE_STATE > /sys/devices/system/cpu/cpu1/hotplug/target CPU0 CPU1 do_cpu_down(1, CPUHP_AP_PERF_ONLINE) write_lock_cpus() __cpu_down(1, CPUHP_AP_PERF_ONLINE) kick_cpu(1); wait_for_completion(); while (state > CPUHP_AP_PERF_ONLINE) invoke_shutdown_callback(state--); That invokes: vmstat_cpu_offline(); cancel_delayed_work(); ... workqueue_offline_cpu() unbind_workers(); ... complete(); write_unlock_cpus(); Note, after this returns, CPU1 is still in the online mask. So the next invocation of vmstat_sheperd() can queue the work on CPU1 again. If that happens then the work will run on an unbound work queue somewhere if I'm not completely misreading the workqueue code. Tejun ??? I'll try that tomorrow if nobody beats me to it, but that's the only way I found how the debug warning can trigger. That does not explain why it triggers only the smp_processor_id() thing and not the this_cpu_* check, but I don't trust that information in the changelog at all. Thanks, tglx
On Wed, 13 Jun 2018 23:46:45 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > Can we please revert that master piece of duct tape engineering and wait > for someone to actually trigger the warning again? OK.
On Thu, Jun 14, 2018 at 02:27:10PM -0700, Andrew Morton wrote: > On Wed, 13 Jun 2018 23:46:45 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > > > Can we please revert that master piece of duct tape engineering and wait > > for someone to actually trigger the warning again? > > OK. And while we're at it, can we revert it from stable as well. As this is just an overly aggressive pulling anything that looks like a fix into stable. -- Steve
diff --git a/mm/vmstat.c b/mm/vmstat.c index 33581be705f0..40b2db6db6b1 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1839,11 +1839,9 @@ static void vmstat_update(struct work_struct *w) * to occur in the future. Keep on running the * update worker thread. */ - preempt_disable(); queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, this_cpu_ptr(&vmstat_work), round_jiffies_relative(sysctl_stat_interval)); - preempt_enable(); } }
This patch reverts commit c7f26ccfb2c3 ("mm/vmstat.c: fix vmstat_update() preemption BUG"). Steven saw a "using smp_processor_id() in preemptible" message and added a preempt_disable() section around it to keep it quiet. This is not the right thing to do it does not fix the real problem. vmstat_update() is invoked by a kworker on a specific CPU. This worker it bound to this CPU. The name of the worker was "kworker/1:1" so it should have been a worker which was bound to CPU1. A worker which can run on any CPU would have a `u' before the first digit. smp_processor_id() can be used in a preempt-enabled region as long as the task is bound to a single CPU which is the case here. If it could run on an arbitrary CPU then this is the problem we have an should seek to resolve. Not only this smp_processor_id() must not be migrated to another CPU but also refresh_cpu_vm_stats() which might access wrong per-CPU variables. Not to mention that other code relies on the fact that such a worker runs on one specific CPU only. Therefore I revert that commit and we should look instead what broke the affinity mask of the kworker. Cc: Steven J. Hill <steven.hill@cavium.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- mm/vmstat.c | 2 -- 1 file changed, 2 deletions(-)