Message ID | 20200429074627.5955-6-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix blktrace debugfs use after free | expand |
On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote: > We use one blktrace per request_queue, that means one per the entire > disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, > or just two calls on /dev/vda. > > We check for concurrent setup only at the very end of the blktrace setup though. > > If we try to run two concurrent blktraces on the same block device the > second one will fail, and the first one seems to go on. However when > one tries to kill the first one one will see things like this: > > The kernel will show these: > > ``` > debugfs: File 'dropped' in directory 'nvme1n1' already present! > debugfs: File 'msg' in directory 'nvme1n1' already present! > debugfs: File 'trace0' in directory 'nvme1n1' already present! > `` > > And userspace just sees this error message for the second call: > > ``` > blktrace /dev/nvme1n1 > BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error > ``` > > The first userspace process #1 will also claim that the files > were taken underneath their nose as well. The files are taken > away form the first process given that when the second blktrace > fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. > This means that even if go-happy process #1 is waiting for blktrace > data, we *have* been asked to take teardown the blktrace. > > This can easily be reproduced with break-blktrace [0] run_0005.sh test. > > Just break out early if we know we're already going to fail, this will > prevent trying to create the files all over again, which we know still > exist. > > [0] https://github.com/mcgrof/break-blktrace > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > kernel/trace/blktrace.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > index 5c52976bd762..383045f67cb8 100644 > --- a/kernel/trace/blktrace.c > +++ b/kernel/trace/blktrace.c > @@ -4,6 +4,8 @@ > * > */ > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > #include <linux/kernel.h> > #include <linux/blkdev.h> > #include <linux/blktrace_api.h> > @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > */ > strreplace(buts->name, '/', '_'); > > + if (q->blk_trace) { > + pr_warn("Concurrent blktraces are not allowed\n"); > + return -EBUSY; You have access to a block device here, please use dev_warn() instead here for that, that makes it obvious as to what device a "concurrent blktrace" was attempted for. thanks, greg k-h
On Wed, Apr 29, 2020 at 11:49:37AM +0200, Greg KH wrote: > On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote: > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > > index 5c52976bd762..383045f67cb8 100644 > > --- a/kernel/trace/blktrace.c > > +++ b/kernel/trace/blktrace.c > > @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > > */ > > strreplace(buts->name, '/', '_'); > > > > + if (q->blk_trace) { > > + pr_warn("Concurrent blktraces are not allowed\n"); > > + return -EBUSY; > > You have access to a block device here, please use dev_warn() instead > here for that, that makes it obvious as to what device a "concurrent > blktrace" was attempted for. The block device may be empty, one example is for scsi-generic, but I'll use buts->name. Luis
On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote: > > You have access to a block device here, please use dev_warn() instead > > here for that, that makes it obvious as to what device a "concurrent > > blktrace" was attempted for. > > The block device may be empty, one example is for scsi-generic, but I'll > use buts->name. Is blktrace on /dev/sg something we intentionally support, or just by some accident of history? Given all the pains it causes I'd be tempted to just remove the support and see if anyone screams.
On Fri, May 01, 2020 at 08:34:23AM -0700, Christoph Hellwig wrote: > On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote: > > > You have access to a block device here, please use dev_warn() instead > > > here for that, that makes it obvious as to what device a "concurrent > > > blktrace" was attempted for. > > > > The block device may be empty, one example is for scsi-generic, but I'll > > use buts->name. > > Is blktrace on /dev/sg something we intentionally support, or just by > some accident of history? Given all the pains it causes I'd be tempted > to just remove the support and see if anyone screams. From what I can tell I think it was a historic and brutal mistake. I am more than happy to remove it. Re-adding support would just be a symlink. Luis
On Fri, May 1, 2020 at 9:40 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > On Fri, May 01, 2020 at 08:34:23AM -0700, Christoph Hellwig wrote: > > On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote: > > > > You have access to a block device here, please use dev_warn() instead > > > > here for that, that makes it obvious as to what device a "concurrent > > > > blktrace" was attempted for. > > > > > > The block device may be empty, one example is for scsi-generic, but I'll > > > use buts->name. > > > > Is blktrace on /dev/sg something we intentionally support, or just by > > some accident of history? Given all the pains it causes I'd be tempted > > to just remove the support and see if anyone screams. > > From what I can tell I think it was a historic and brutal mistake. I am > more than happy to remove it. I take that back: commit 6da127ad0918f93ea93678dad62ce15ffed18797 Author: Christof Schmitt <christof.schmitt@de.ibm.com> Date: Fri Jan 11 10:09:43 2008 +0100 blktrace: Add blktrace ioctls to SCSI generic devices Since the SCSI layer uses the request queues from the block layer, blktrace can also be used to trace the requests to all SCSI devices (like SCSI tape drives), not only disks. The only missing part is the ioctl interface to start and stop tracing. This patch adds the SETUP, START, STOP and TEARDOWN ioctls from blktrace to the sg device files. With this change, blktrace can be used for SCSI devices like for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i - Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Christof, any thoughts on removing this support? Luis
On Fri, May 01, 2020 at 03:06:26PM +0000, Luis Chamberlain wrote: > On Wed, Apr 29, 2020 at 11:49:37AM +0200, Greg KH wrote: > > On Wed, Apr 29, 2020 at 07:46:26AM +0000, Luis Chamberlain wrote: > > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > > > index 5c52976bd762..383045f67cb8 100644 > > > --- a/kernel/trace/blktrace.c > > > +++ b/kernel/trace/blktrace.c > > > @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > > > */ > > > strreplace(buts->name, '/', '_'); > > > > > > + if (q->blk_trace) { > > > + pr_warn("Concurrent blktraces are not allowed\n"); > > > + return -EBUSY; > > > > You have access to a block device here, please use dev_warn() instead > > here for that, that makes it obvious as to what device a "concurrent > > blktrace" was attempted for. > > The block device may be empty, one example is for scsi-generic, but I'll > use buts->name. That's fine, give us a chance to know what went wrong, your line as is does not do that :( thanks, greg k-h
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 5c52976bd762..383045f67cb8 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -4,6 +4,8 @@ * */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/blktrace_api.h> @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_'); + if (q->blk_trace) { + pr_warn("Concurrent blktraces are not allowed\n"); + return -EBUSY; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM;
We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- kernel/trace/blktrace.c | 7 +++++++ 1 file changed, 7 insertions(+)