Message ID | 20180117123444.18393-3-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 1/17/18 5:34 AM, Ming Lei wrote: > We know this WARN_ON is harmless and the stack trace isn't useful too, > so convert it to printk(), and avoid to confuse people. I disagree, it is useful to know the exact path it happened from, in case it's a valid warning. It could be an inline run and we screwed up the logic, or it could be from a workqueue and the reason would be entirely different.
On 01/17/2018 04:46 PM, Jens Axboe wrote: > On 1/17/18 5:34 AM, Ming Lei wrote: >> We know this WARN_ON is harmless and the stack trace isn't useful too, >> so convert it to printk(), and avoid to confuse people. > > I disagree, it is useful to know the exact path it happened from, > in case it's a valid warning. It could be an inline run and > we screwed up the logic, or it could be from a workqueue and > the reason would be entirely different. Then add a dump_stack or whatever, but WARN_ON does have fatal effects for some setups. If it can happen then WARN_ON is just wrong.
On 1/17/18 9:05 AM, Christian Borntraeger wrote: > > > On 01/17/2018 04:46 PM, Jens Axboe wrote: >> On 1/17/18 5:34 AM, Ming Lei wrote: >>> We know this WARN_ON is harmless and the stack trace isn't useful too, >>> so convert it to printk(), and avoid to confuse people. >> >> I disagree, it is useful to know the exact path it happened from, >> in case it's a valid warning. It could be an inline run and >> we screwed up the logic, or it could be from a workqueue and >> the reason would be entirely different. > > Then add a dump_stack or whatever, but WARN_ON does have fatal effects > for some setups. If it can happen then WARN_ON is just wrong. dump_stack() is fine - and the intent is for it to never happen, it would be nice to close those holes so we're only catching cases that are due to bad code.
On Wed, Jan 17, 2018 at 08:46:40AM -0700, Jens Axboe wrote: > On 1/17/18 5:34 AM, Ming Lei wrote: > > We know this WARN_ON is harmless and the stack trace isn't useful too, > > so convert it to printk(), and avoid to confuse people. > > I disagree, it is useful to know the exact path it happened from, > in case it's a valid warning. It could be an inline run and inline run is only possible in theory, since the window is too small. > we screwed up the logic, or it could be from a workqueue and > the reason would be entirely different. OK, if you don't agree, I will just add the comment on the races we found.
diff --git a/block/blk-mq.c b/block/blk-mq.c index dc4066d28323..6562360bf108 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1391,9 +1391,25 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) /* * We should be running this queue from one of the CPUs that * are mapped to it. + * + * There are at least two related races now between setting + * hctx->next_cpu from blk_mq_hctx_next_cpu() and running + * __blk_mq_run_hw_queue(): + * + * - hctx->next_cpu is found offline in blk_mq_hctx_next_cpu(), + * but later it becomes online, then this warning is harmless + * at all + * + * - hctx->next_cpu is found online in blk_mq_hctx_next_cpu(), + * but later it becomes offline, then the warning can't be + * triggered, and we depend on blk-mq timeout handler to + * handle dispatched requests to this hctx */ - WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && - cpu_online(hctx->next_cpu)); + if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && + cpu_online(hctx->next_cpu)) + printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", + raw_smp_processor_id(), + cpumask_empty(hctx->cpumask) ? "inactive": "active"); /* * We can't run the queue inline with ints disabled. Ensure that
We know this WARN_ON is harmless and the stack trace isn't useful too, so convert it to printk(), and avoid to confuse people. Also add comment about two releated races here. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)