diff mbox series

blkcg: handle dying request_queue when associating a blkg

Message ID 20181211230308.66276-1-dennis@kernel.org (mailing list archive)
State New, archived
Headers show
Series blkcg: handle dying request_queue when associating a blkg | expand

Commit Message

Dennis Zhou Dec. 11, 2018, 11:03 p.m. UTC
Between v3 [1] and v4 [2] of the blkg association series, the
association point moved from generic_make_request_checks(), which is
called after the request enters the queue, to bio_set_dev(), which is when
the bio is formed before submit_bio(). When the request_queue goes away,
the blkgs supporting the request_queue are destroyed and then the
q->root_blkg is set to %NULL.

This patch adds a %NULL check to blkg_tryget_closest() to prevent the
NPE caused by the above. It also adds a guard to see if the
request_queue is dying when creating a blkg to prevent creating a blkg
for a dead request_queue.

[1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/

Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device")
Reported-and-tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 block/blk-cgroup.c         | 6 ++++++
 include/linux/blk-cgroup.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Dec. 11, 2018, 11:16 p.m. UTC | #1
On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 6bd0619a7d6e..c30661ddc873 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>  	WARN_ON_ONCE(!rcu_read_lock_held());
>  	lockdep_assert_held(&q->queue_lock);
>  
> +	/* request_queue is dying, do not create/recreate a blkg */
> +	if (blk_queue_dying(q)) {
> +		ret = -ENODEV;
> +		goto err_free_blkg;
> +	}
> +
>  	/* blkg holds a reference to blkcg */
>  	if (!css_tryget_online(&blkcg->css)) {
>  		ret = -ENODEV;

What prevents that the queue state changes after blk_queue_dying() has returned
and before blkg_create() returns? Are you sure you don't need to protect this
code with a blk_queue_enter() / blk_queue_exit() pair?

Bart.
Dennis Zhou Dec. 12, 2018, 4:06 a.m. UTC | #2
Hi Bart,

On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
> On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 6bd0619a7d6e..c30661ddc873 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> >  	WARN_ON_ONCE(!rcu_read_lock_held());
> >  	lockdep_assert_held(&q->queue_lock);
> >  
> > +	/* request_queue is dying, do not create/recreate a blkg */
> > +	if (blk_queue_dying(q)) {
> > +		ret = -ENODEV;
> > +		goto err_free_blkg;
> > +	}
> > +
> >  	/* blkg holds a reference to blkcg */
> >  	if (!css_tryget_online(&blkcg->css)) {
> >  		ret = -ENODEV;
> 
> What prevents that the queue state changes after blk_queue_dying() has returned
> and before blkg_create() returns? Are you sure you don't need to protect this
> code with a blk_queue_enter() / blk_queue_exit() pair?
> 

Hmmm. So I think the idea is that we rely on normal shutdown as I don't
think there is anything wrong with creating a blkg on a dying
request_queue. When we are doing association, the request_queue should
be pinned by the open call. What we are racing against is when the
request_queue is shutting down, it goes around and destroys the blkgs.
For clarity, QUEUE_FLAG_DYING is set in blk_cleanup_queue() before
calling blk_exit_queue() which eventually calls blkcg_exit_queue().

The use of blk_queue_dying() is to determine whether blkg shutdown has
already started as if we create one after it has started, we may
incorrectly orphan a blkg and leak it. Both blkg creation and
destruction require holding the queue_lock, so if the QUEUE_FLAG_DYING
flag is set after we've checked it, it means blkg destruction hasn't
started because it has to wait on the queue_lock. If QUEUE_FLAG_DYING is
set, then we have no guarantee of knowing what phase blkg destruction is
in leading to a potential leak.

Thanks,
Dennis
Bart Van Assche Dec. 12, 2018, 11:54 p.m. UTC | #3
On Tue, 2018-12-11 at 23:06 -0500, Dennis Zhou wrote:
> Hi Bart,
> 
> On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
> > On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > > index 6bd0619a7d6e..c30661ddc873 100644
> > > --- a/block/blk-cgroup.c
> > > +++ b/block/blk-cgroup.c
> > > @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> > >  	WARN_ON_ONCE(!rcu_read_lock_held());
> > >  	lockdep_assert_held(&q->queue_lock);
> > >  
> > > +	/* request_queue is dying, do not create/recreate a blkg */
> > > +	if (blk_queue_dying(q)) {
> > > +		ret = -ENODEV;
> > > +		goto err_free_blkg;
> > > +	}
> > > +
> > >  	/* blkg holds a reference to blkcg */
> > >  	if (!css_tryget_online(&blkcg->css)) {
> > >  		ret = -ENODEV;
> > 
> > What prevents that the queue state changes after blk_queue_dying() has returned
> > and before blkg_create() returns? Are you sure you don't need to protect this
> > code with a blk_queue_enter() / blk_queue_exit() pair?
> > 
> 
> Hmmm. So I think the idea is that we rely on normal shutdown as I don't
> think there is anything wrong with creating a blkg on a dying
> request_queue. When we are doing association, the request_queue should
> be pinned by the open call. What we are racing against is when the
> request_queue is shutting down, it goes around and destroys the blkgs.
> For clarity, QUEUE_FLAG_DYING is set in blk_cleanup_queue() before
> calling blk_exit_queue() which eventually calls blkcg_exit_queue().
> 
> The use of blk_queue_dying() is to determine whether blkg shutdown has
> already started as if we create one after it has started, we may
> incorrectly orphan a blkg and leak it. Both blkg creation and
> destruction require holding the queue_lock, so if the QUEUE_FLAG_DYING
> flag is set after we've checked it, it means blkg destruction hasn't
> started because it has to wait on the queue_lock. If QUEUE_FLAG_DYING is
> set, then we have no guarantee of knowing what phase blkg destruction is
> in leading to a potential leak.

Hi Dennis,

To answer my own question: since all queue flag manipulations are protected
by the queue lock and since blkg_create() is called with the queue lock held
the above code does not need any further protection. Hence feel free to add
the following:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Jens Axboe Dec. 13, 2018, 12:43 a.m. UTC | #4
On 12/11/18 4:03 PM, Dennis Zhou wrote:
> Between v3 [1] and v4 [2] of the blkg association series, the
> association point moved from generic_make_request_checks(), which is
> called after the request enters the queue, to bio_set_dev(), which is when
> the bio is formed before submit_bio(). When the request_queue goes away,
> the blkgs supporting the request_queue are destroyed and then the
> q->root_blkg is set to %NULL.
> 
> This patch adds a %NULL check to blkg_tryget_closest() to prevent the
> NPE caused by the above. It also adds a guard to see if the
> request_queue is dying when creating a blkg to prevent creating a blkg
> for a dead request_queue.

Applied, thanks Dennis.
Dennis Zhou Dec. 13, 2018, 3:47 p.m. UTC | #5
On Wed, Dec 12, 2018 at 03:54:52PM -0800, Bart Van Assche wrote:
> On Tue, 2018-12-11 at 23:06 -0500, Dennis Zhou wrote:
> > Hi Bart,
> > 
> > On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
> > > On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > > > index 6bd0619a7d6e..c30661ddc873 100644
> > > > --- a/block/blk-cgroup.c
> > > > +++ b/block/blk-cgroup.c
> > > > @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> > > >  	WARN_ON_ONCE(!rcu_read_lock_held());
> > > >  	lockdep_assert_held(&q->queue_lock);
> > > >  
> > > > +	/* request_queue is dying, do not create/recreate a blkg */
> > > > +	if (blk_queue_dying(q)) {
> > > > +		ret = -ENODEV;
> > > > +		goto err_free_blkg;
> > > > +	}
> > > > +
> > > >  	/* blkg holds a reference to blkcg */
> > > >  	if (!css_tryget_online(&blkcg->css)) {
> > > >  		ret = -ENODEV;
> > > 
> > > What prevents that the queue state changes after blk_queue_dying() has returned
> > > and before blkg_create() returns? Are you sure you don't need to protect this
> > > code with a blk_queue_enter() / blk_queue_exit() pair?
> > > 
> > 
> > Hmmm. So I think the idea is that we rely on normal shutdown as I don't
> > think there is anything wrong with creating a blkg on a dying
> > request_queue. When we are doing association, the request_queue should
> > be pinned by the open call. What we are racing against is when the
> > request_queue is shutting down, it goes around and destroys the blkgs.
> > For clarity, QUEUE_FLAG_DYING is set in blk_cleanup_queue() before
> > calling blk_exit_queue() which eventually calls blkcg_exit_queue().
> > 
> > The use of blk_queue_dying() is to determine whether blkg shutdown has
> > already started as if we create one after it has started, we may
> > incorrectly orphan a blkg and leak it. Both blkg creation and
> > destruction require holding the queue_lock, so if the QUEUE_FLAG_DYING
> > flag is set after we've checked it, it means blkg destruction hasn't
> > started because it has to wait on the queue_lock. If QUEUE_FLAG_DYING is
> > set, then we have no guarantee of knowing what phase blkg destruction is
> > in leading to a potential leak.
> 
> Hi Dennis,
> 
> To answer my own question: since all queue flag manipulations are protected
> by the queue lock and since blkg_create() is called with the queue lock held
> the above code does not need any further protection. Hence feel free to add
> the following:
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> 

It seems that Christoph in 57d74df90783 ("block: use atomic bitops for
->queue_flags") changed it so that flag manipulations no longer are
protected by the queue_lock in for-4.21/block. But I think my
explanation above suffices that we will always be able to clean up a
blkg created as long as the QUEUE_FLAG_DYING is not set.

Thanks for reviewing this!
Dennis
diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6bd0619a7d6e..c30661ddc873 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -202,6 +202,12 @@  static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	lockdep_assert_held(&q->queue_lock);
 
+	/* request_queue is dying, do not create/recreate a blkg */
+	if (blk_queue_dying(q)) {
+		ret = -ENODEV;
+		goto err_free_blkg;
+	}
+
 	/* blkg holds a reference to blkcg */
 	if (!css_tryget_online(&blkcg->css)) {
 		ret = -ENODEV;
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index bf13ecb0fe4f..f025fd1e22e6 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -511,7 +511,7 @@  static inline bool blkg_tryget(struct blkcg_gq *blkg)
  */
 static inline struct blkcg_gq *blkg_tryget_closest(struct blkcg_gq *blkg)
 {
-	while (!percpu_ref_tryget(&blkg->refcnt))
+	while (blkg && !percpu_ref_tryget(&blkg->refcnt))
 		blkg = blkg->parent;
 
 	return blkg;