diff mbox series

[v4,4/5] blktrace: break out of blktrace setup on concurrent calls

Message ID 20200509031058.8239-5-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show
Series block: fix blktrace debugfs use after free | expand

Commit Message

Luis Chamberlain May 9, 2020, 3:10 a.m. UTC
We use one blktrace per request_queue, that means one per the entire
disk.  So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
or just two calls on /dev/vda.

We check for concurrent setup only at the very end of the blktrace setup though.

If we try to run two concurrent blktraces on the same block device the
second one will fail, and the first one seems to go on. However when
one tries to kill the first one one will see things like this:

The kernel will show these:

```
debugfs: File 'dropped' in directory 'nvme1n1' already present!
debugfs: File 'msg' in directory 'nvme1n1' already present!
debugfs: File 'trace0' in directory 'nvme1n1' already present!
``

And userspace just sees this error message for the second call:

```
blktrace /dev/nvme1n1
BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
```

The first userspace process #1 will also claim that the files
were taken underneath their nose as well. The files are taken
away form the first process given that when the second blktrace
fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
This means that even if go-happy process #1 is waiting for blktrace
data, we *have* been asked to take teardown the blktrace.

This can easily be reproduced with break-blktrace [0] run_0005.sh test.

Just break out early if we know we're already going to fail, this will
prevent trying to create the files all over again, which we know still
exist.

[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/trace/blktrace.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Bart Van Assche May 10, 2020, 1:09 a.m. UTC | #1
On 2020-05-08 20:10, Luis Chamberlain wrote:
> @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
>  	 */
>  	strreplace(buts->name, '/', '_');
>  
> +	if (q->blk_trace) {
> +		pr_warn("Concurrent blktraces are not allowed on %s\n",
> +			buts->name);
> +		return -EBUSY;
> +	}
> +
>  	bt = kzalloc(sizeof(*bt), GFP_KERNEL);
>  	if (!bt)
>  		return -ENOMEM;

Is this really sufficient? Shouldn't concurrent do_blk_trace_setup()
calls that refer to the same request queue be serialized to really
prevent that debugfs attribute creation fails?

How about using the block device name instead of the partition name in
the error message since the concurrency context is the block device and
not the partition?

Thanks,

Bart.
Luis Chamberlain May 11, 2020, 1:39 p.m. UTC | #2
On Sat, May 09, 2020 at 06:09:38PM -0700, Bart Van Assche wrote:
> On 2020-05-08 20:10, Luis Chamberlain wrote:
> > @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
> >  	 */
> >  	strreplace(buts->name, '/', '_');
> >  
> > +	if (q->blk_trace) {
> > +		pr_warn("Concurrent blktraces are not allowed on %s\n",
> > +			buts->name);
> > +		return -EBUSY;
> > +	}
> > +
> >  	bt = kzalloc(sizeof(*bt), GFP_KERNEL);
> >  	if (!bt)
> >  		return -ENOMEM;
> 
> Is this really sufficient? Shouldn't concurrent do_blk_trace_setup()
> calls that refer to the same request queue be serialized to really
> prevent that debugfs attribute creation fails?

We'd have to add something like a linked list. Right now I'm just
clarifying things which were not clear before. What you describe is
a functional feature change. I'm just trying to fix a bug and clarify
limitations.

> How about using the block device name instead of the partition name in
> the error message since the concurrency context is the block device and
> not the partition?

blk device argument can be NULL here. sg-generic is one case.

 Luis
Luis Chamberlain May 16, 2020, 1:39 a.m. UTC | #3
On Mon, May 11, 2020 at 7:39 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Sat, May 09, 2020 at 06:09:38PM -0700, Bart Van Assche wrote:
> > How about using the block device name instead of the partition name in
> > the error message since the concurrency context is the block device and
> > not the partition?
>
> blk device argument can be NULL here. sg-generic is one case.

I'm going to add a comment about this, as it is easily forgotten.

  Luis
diff mbox series

Patch

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 6c10a1427de2..bd5ec2184d46 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -3,6 +3,9 @@ 
  * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
  *
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kernel.h>
 #include <linux/blkdev.h>
 #include <linux/blktrace_api.h>
@@ -493,6 +496,12 @@  static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	 */
 	strreplace(buts->name, '/', '_');
 
+	if (q->blk_trace) {
+		pr_warn("Concurrent blktraces are not allowed on %s\n",
+			buts->name);
+		return -EBUSY;
+	}
+
 	bt = kzalloc(sizeof(*bt), GFP_KERNEL);
 	if (!bt)
 		return -ENOMEM;