diff mbox

[v2] blk-cgroup: remove entries in blkg_tree before queue release

Message ID b7809887-24e6-3ad7-e8bd-4fe7ea0927c0@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 11, 2018, 7:55 p.m. UTC
On 04/11/18 13:00, Alexandru Moise wrote:
> But the root cause of it is in blkcg_init_queue() when blkg_create() returns
> an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,
> the entry that we fail to remove at __blk_release_queue().

Hello Alex,

Had you considered something like the untested patch below?

Thanks,

Bart.

Comments

Tejun Heo April 11, 2018, 7:57 p.m. UTC | #1
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:
> On 04/11/18 13:00, Alexandru Moise wrote:
> >But the root cause of it is in blkcg_init_queue() when blkg_create() returns
> >an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,
> >the entry that we fail to remove at __blk_release_queue().
> 
> Hello Alex,
> 
> Had you considered something like the untested patch below?

But queue init shouldn't fail here, right?

Thanks.
Bart Van Assche April 11, 2018, 8 p.m. UTC | #2
On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote:
> On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:

> > On 04/11/18 13:00, Alexandru Moise wrote:

> > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns

> > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,

> > > the entry that we fail to remove at __blk_release_queue().

> > 

> > Hello Alex,

> > 

> > Had you considered something like the untested patch below?

> 

> But queue init shouldn't fail here, right?


Hello Tejun,

Your question is not entirely clear to me. Are you referring to the atomic
allocations in blkg_create() or are you perhaps referring to something else?

Bart.
Tejun Heo April 11, 2018, 8:02 p.m. UTC | #3
On Wed, Apr 11, 2018 at 08:00:29PM +0000, Bart Van Assche wrote:
> On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote:
> > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:
> > > On 04/11/18 13:00, Alexandru Moise wrote:
> > > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns
> > > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,
> > > > the entry that we fail to remove at __blk_release_queue().
> > > 
> > > Hello Alex,
> > > 
> > > Had you considered something like the untested patch below?
> > 
> > But queue init shouldn't fail here, right?
> 
> Hello Tejun,
> 
> Your question is not entirely clear to me. Are you referring to the atomic
> allocations in blkg_create() or are you perhaps referring to something else?

Hmm.. maybe I'm confused but I thought that the fact that
blkcg_init_queue() fails itself is already a bug, which happens
because a previously destroyed queue left behind blkgs.

Thanks.
Bart Van Assche April 11, 2018, 8:23 p.m. UTC | #4
On Wed, 2018-04-11 at 13:02 -0700, tj@kernel.org wrote:
> On Wed, Apr 11, 2018 at 08:00:29PM +0000, Bart Van Assche wrote:

> > On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote:

> > > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:

> > > > On 04/11/18 13:00, Alexandru Moise wrote:

> > > > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns

> > > > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,

> > > > > the entry that we fail to remove at __blk_release_queue().

> > > > 

> > > > Hello Alex,

> > > > 

> > > > Had you considered something like the untested patch below?

> > > 

> > > But queue init shouldn't fail here, right?

> > 

> > Hello Tejun,

> > 

> > Your question is not entirely clear to me. Are you referring to the atomic

> > allocations in blkg_create() or are you perhaps referring to something else?

> 

> Hmm.. maybe I'm confused but I thought that the fact that

> blkcg_init_queue() fails itself is already a bug, which happens

> because a previously destroyed queue left behind blkgs.


Hello Tejun,

I had missed the start of this thread so I was not aware of which problem Alex
was trying to solve. In the description of v1 of this patch I read that Alex
thinks that he ran into a scenario in which blk_queue_alloc_node() assigns a
q->id that is still in use by another request queue? That's weird. The following
code still occurs in __blk_release_queue():

	ida_simple_remove(&blk_queue_ida, q->id);

It's not clear to me how that remove call could happen *before* q->id is removed
from the blkcg radix tree.

Bart.
Alexandru Moise April 11, 2018, 9:23 p.m. UTC | #5
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:
> On 04/11/18 13:00, Alexandru Moise wrote:
> > But the root cause of it is in blkcg_init_queue() when blkg_create() returns
> > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree,
> > the entry that we fail to remove at __blk_release_queue().
> 
> Hello Alex,
> 
> Had you considered something like the untested patch below?
> 
> Thanks,
> 
> Bart.
> 
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1c16694ae145..f2ced19e74b8 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1191,14 +1191,17 @@ int blkcg_init_queue(struct request_queue *q)
>  	if (preloaded)
>  		radix_tree_preload_end();
> 
> -	if (IS_ERR(blkg))
> -		return PTR_ERR(blkg);
> +	if (IS_ERR(blkg)) {
> +		ret = PTR_ERR(blkg);
> +		goto destroy_all;
> +	}
> 
>  	q->root_blkg = blkg;
>  	q->root_rl.blkg = blkg;
> 
>  	ret = blk_throtl_init(q);
>  	if (ret) {
> +destroy_all:
>  		spin_lock_irq(q->queue_lock);
>  		blkg_destroy_all(q);
>  		spin_unlock_irq(q->queue_lock);
> 

Hi, I tested it, it doesn't solve the problem.
By the time you get here it's already too late, my patch
prevents this from failing in the first place.

I would have liked this more than my solution though.

../Alex
Bart Van Assche April 11, 2018, 9:28 p.m. UTC | #6
On Wed, 2018-04-11 at 23:23 +0200, Alexandru Moise wrote:
> Hi, I tested it, it doesn't solve the problem.

> By the time you get here it's already too late, my patch

> prevents this from failing in the first place.


Hello Alex,

If you can share the steps to follow to trigger the bug you reported then
I will have a closer look at this.

Thanks,

Bart.
diff mbox

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1c16694ae145..f2ced19e74b8 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1191,14 +1191,17 @@  int blkcg_init_queue(struct request_queue *q)
  	if (preloaded)
  		radix_tree_preload_end();

-	if (IS_ERR(blkg))
-		return PTR_ERR(blkg);
+	if (IS_ERR(blkg)) {
+		ret = PTR_ERR(blkg);
+		goto destroy_all;
+	}

  	q->root_blkg = blkg;
  	q->root_rl.blkg = blkg;

  	ret = blk_throtl_init(q);
  	if (ret) {
+destroy_all:
  		spin_lock_irq(q->queue_lock);
  		blkg_destroy_all(q);
  		spin_unlock_irq(q->queue_lock);