Message ID | 20170131092048.GB3687@linux-x5ow.site (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Tue, Jan 31, 2017 at 10:20 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote: > On Tue, Jan 31, 2017 at 09:55:52AM +0100, Dmitry Vyukov wrote: >> Hello, >> >> The following program triggers BUG in scsi_init_io: > > Well crashing a machine just because of an empty dma transfer is a bit harsh, > isn't it? > > From 86e6fa5f618fe588b98e923e032f33e075fcd4f4 Mon Sep 17 00:00:00 2001 > From: Johannes Thumshirn <jthumshirn@suse.de> > Date: Tue, 31 Jan 2017 10:16:00 +0100 > Subject: [PATCH] scsi: don't BUG_ON() empty DMA transfers > > Don't crash the machine just because of an empty transfer. Use WARN_ON() > combined with returning an error. > > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> > --- > drivers/scsi/scsi_lib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index e9e1e14..414588a 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1040,7 +1040,8 @@ int scsi_init_io(struct scsi_cmnd *cmd) > bool is_mq = (rq->mq_ctx != NULL); > int error; > > - BUG_ON(!blk_rq_nr_phys_segments(rq)); > + if (WARN_ON(!blk_rq_nr_phys_segments(rq))) > + return -EINVAL; Please-please-please, let's not use WARN for something that is not a kernel bug and is user-triggerable. This makes it impossible to automate kernel testing and requires hiring an army of people doing mechanical job of sorting out WARNING reports into kernel-bugs and non-kernel-bugs. If the message is absolutely necessary (while kernel does not generally explain every EINVAL on console), the following will do: if (!blk_rq_nr_phys_segments(rq)) { pr_err("you are doing something wrong\n"); return -EINVAL; } > error = scsi_init_sgtable(rq, &cmd->sdb); > if (error) > -- > 2.10.2 > > > -- > Johannes Thumshirn Storage > jthumshirn@suse.de +49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 31, 2017 at 10:50:49AM +0100, Dmitry Vyukov wrote: > On Tue, Jan 31, 2017 at 10:20 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote: > > On Tue, Jan 31, 2017 at 09:55:52AM +0100, Dmitry Vyukov wrote: [...] > Please-please-please, let's not use WARN for something that is not a > kernel bug and is user-triggerable. This makes it impossible to > automate kernel testing and requires hiring an army of people doing > mechanical job of sorting out WARNING reports into kernel-bugs and > non-kernel-bugs. > If the message is absolutely necessary (while kernel does not > generally explain every EINVAL on console), the following will do: > > if (!blk_rq_nr_phys_segments(rq)) { > pr_err("you are doing something wrong\n"); > return -EINVAL; > } Yes I understand that. OTOH having the WARN helps you finding the caller because of to the stack trace. But arguably that could be accomplished with function graph tracing as well. I'll re-send a v2 as a proper patch. Thanks, Johannes
On Tue, Jan 31, 2017 at 10:58 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote: > > [...] > >> Please-please-please, let's not use WARN for something that is not a >> kernel bug and is user-triggerable. This makes it impossible to >> automate kernel testing and requires hiring an army of people doing >> mechanical job of sorting out WARNING reports into kernel-bugs and >> non-kernel-bugs. >> If the message is absolutely necessary (while kernel does not >> generally explain every EINVAL on console), the following will do: >> >> if (!blk_rq_nr_phys_segments(rq)) { >> pr_err("you are doing something wrong\n"); >> return -EINVAL; >> } > > Yes I understand that. OTOH having the WARN helps you finding the caller because > of to the stack trace. But arguably that could be accomplished with function > graph tracing as well. I'll re-send a v2 as a proper patch. Thank you very much. Stack trace can be done with dump_stack() if necessary, e.g. pr_err("you are doing something wrong here:\n"); dump_stack(); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-01-31 at 10:50 +0100, Dmitry Vyukov wrote: > On Tue, Jan 31, 2017 at 10:20 AM, Johannes Thumshirn < > jthumshirn@suse.de> wrote: > > On Tue, Jan 31, 2017 at 09:55:52AM +0100, Dmitry Vyukov wrote: > > > Hello, > > > > > > The following program triggers BUG in scsi_init_io: > > > > Well crashing a machine just because of an empty dma transfer is a > > bit harsh, > > isn't it? > > > > From 86e6fa5f618fe588b98e923e032f33e075fcd4f4 Mon Sep 17 00:00:00 > > 2001 > > From: Johannes Thumshirn <jthumshirn@suse.de> > > Date: Tue, 31 Jan 2017 10:16:00 +0100 > > Subject: [PATCH] scsi: don't BUG_ON() empty DMA transfers > > > > Don't crash the machine just because of an empty transfer. Use > > WARN_ON() > > combined with returning an error. > > > > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> > > --- > > drivers/scsi/scsi_lib.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index e9e1e14..414588a 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1040,7 +1040,8 @@ int scsi_init_io(struct scsi_cmnd *cmd) > > bool is_mq = (rq->mq_ctx != NULL); > > int error; > > > > - BUG_ON(!blk_rq_nr_phys_segments(rq)); > > + if (WARN_ON(!blk_rq_nr_phys_segments(rq))) > > + return -EINVAL; > > > Please-please-please, let's not use WARN for something that is not a > kernel bug and is user-triggerable. It is a kernel bug and it should not be user triggerable, so it should have a warn_on or bug_on. It means something called a data setup function with no data. There's actually a root cause that patches like this won't fix, can we find it? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 31, 2017 at 7:41 AM, James Bottomley <jejb@linux.vnet.ibm.com> wrote: > > It is a kernel bug and it should not be user triggerable, so it should > have a warn_on or bug_on. Hell NO. Christ, James, listen to yourself. What you are basically saying when you say it should be a BUG_ON() is "This shouldn't happen, but if it ever does happen, let's just turn our mistaken assumptions into a dead machine that is really hard to debug". Because a BUG_ON() effectively kills the machine if the call chain has some locks held. In the SCSI layer, that generally means that there will be no logged oops either, because any locks held likely just killed your filesystem or disk subsystem, so now that oops is basically not even likely to be reported by most normal users. So stop this "should have a bug_on". In fact, since this apparently _is_ easily user-triggerable, it damn well shouldn't have a warn_on either. At most, a WARN_ON_ONCE(), so that we might get reports of _what_ the bad call chain is, but we will never kill the machine and we will *not* give people the ability to randomly spam the system logs. BUG_ON() needs to die. People need to realize that it is a _problem_, and that it makes any bugs _worse_. Don't do it. The only valid reason for BUG_ON() is when some very core data structure is _so_ corrupt that you can't even continue, because you simply can't even return an error and there's no way for you to just say "log it once and continue". And by that I don't mean some random value you have in a request. I mean literally "this is a really core data structure, and I simply _cannot_ continue" (where that "cannot" is about an actual physical impossibility, not a "I could continue but I think this is serious"). Anything else is a "return error, possibly with a WARN_ON() to let people know that bad things are going on". Basically, BUG_ON() should be in core kernel code. Not in drivers. And even in core kernel code, it's likely wrong. Linus
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index e9e1e14..414588a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1040,7 +1040,8 @@ int scsi_init_io(struct scsi_cmnd *cmd) bool is_mq = (rq->mq_ctx != NULL); int error; - BUG_ON(!blk_rq_nr_phys_segments(rq)); + if (WARN_ON(!blk_rq_nr_phys_segments(rq))) + return -EINVAL; error = scsi_init_sgtable(rq, &cmd->sdb); if (error)