diff mbox series

[-next,3/4] block/rq_qos: use a global mutex to protect rq_qos apis

Message ID 20230104085354.2343590-4-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series block/rq_qos: protect rq_qos apis with global mutex | expand

Commit Message

Yu Kuai Jan. 4, 2023, 8:53 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

This patch fix following problems:

1) rq_qos_add() and rq_qos_del() is protected, while rq_qos_exit() is
   not.
2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
   rq_qos_exit() is done before blkcg_activate_policy(),
   null-ptr-deference can be triggered.

rq_qos_add(), rq_qos_del() and rq_qos_exit() are super cold path, hence
fix the problems by using a global mutex to protect them.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-rq-qos.c | 50 ++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 17 deletions(-)

Comments

Tejun Heo Jan. 4, 2023, 9:39 p.m. UTC | #1
On Wed, Jan 04, 2023 at 04:53:53PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> This patch fix following problems:
> 
> 1) rq_qos_add() and rq_qos_del() is protected, while rq_qos_exit() is
>    not.

This part makes sense.

> 2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
>    rq_qos_exit() is done before blkcg_activate_policy(),
>    null-ptr-deference can be triggered.

I'm not sure this part does. I think it'd be better to guarantee that device
destruction is blocked while these configuration operations are in progress
which can be built into blkg_conf helpers.

Thanks.
Tejun Heo Jan. 5, 2023, 12:28 a.m. UTC | #2
Hello, again.

On Wed, Jan 04, 2023 at 11:39:47AM -1000, Tejun Heo wrote:
> > 2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
> >    rq_qos_exit() is done before blkcg_activate_policy(),
> >    null-ptr-deference can be triggered.
> 
> I'm not sure this part does. I think it'd be better to guarantee that device
> destruction is blocked while these configuration operations are in progress
> which can be built into blkg_conf helpers.

A bit more explanation:

Usually, this would be handled in the core - when a device goes away, its
sysfs files get shut down before stuff gets freed and the sysfs file removal
waits for in-flight operations to finish and prevents new ones from
starting, so we don't have to worry about in-flight config file operations
racing against device removal.

Here, the problem isn't solved by that because the config files live on
cgroupfs and their lifetimes are not coupled with the block devices'. So, we
need to synchronize manually. And, given that, the right place to do is the
blkg config helpers cuz they're the ones which establish the connection
between cgroup and block layer.

Can you please take a look at the following patchset I just posted:

  https://lkml.kernel.org/r/20230105002007.157497-1-tj@kernel.org

After that, all these configuration operations are wrapped between
blkg_conf_init() and blkg_conf_exit() which probably are the right place to
implement the synchronization.

Thanks.
Yu Kuai Jan. 5, 2023, 1:35 a.m. UTC | #3
Hi,

在 2023/01/05 8:28, Tejun Heo 写道:
> Hello, again.
> 
> On Wed, Jan 04, 2023 at 11:39:47AM -1000, Tejun Heo wrote:
>>> 2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
>>>     rq_qos_exit() is done before blkcg_activate_policy(),
>>>     null-ptr-deference can be triggered.
>>
>> I'm not sure this part does. I think it'd be better to guarantee that device
>> destruction is blocked while these configuration operations are in progress
>> which can be built into blkg_conf helpers.
> 
> A bit more explanation:
> 
> Usually, this would be handled in the core - when a device goes away, its
> sysfs files get shut down before stuff gets freed and the sysfs file removal
> waits for in-flight operations to finish and prevents new ones from
> starting, so we don't have to worry about in-flight config file operations
> racing against device removal.
> 
> Here, the problem isn't solved by that because the config files live on
> cgroupfs and their lifetimes are not coupled with the block devices'. So, we
> need to synchronize manually. And, given that, the right place to do is the
> blkg config helpers cuz they're the ones which establish the connection
> between cgroup and block layer.

Thanks for the explanation, I agree with that.
> 
> Can you please take a look at the following patchset I just posted:
> 
>    https://lkml.kernel.org/r/20230105002007.157497-1-tj@kernel.org
> 
> After that, all these configuration operations are wrapped between
> blkg_conf_init() and blkg_conf_exit() which probably are the right place to
> implement the synchronization.

I see that, blkg_conf_init() and blkg_conf_exit() is good, however there
are some details I want to confirm:

1) rq_qos_add() can be called from iocost/iolatency, where
blkg_conf_init() will be called first, while rq_qos_add() can also be
called from wbt, where there is no blkg_conf_init(). Hence it seems to
me we need two locks here, one to protect rq_qos apis; one to
synchronize policy configuration and device removal.

2) If you agree with 1), it seems better to use the other lock in device
level, consider that there is no need to synchronize confituration for
different devices.

Thanks,
Kuai
> 
> Thanks.
>
Tejun Heo Jan. 5, 2023, 6:34 p.m. UTC | #4
Hello,

On Thu, Jan 05, 2023 at 09:35:21AM +0800, Yu Kuai wrote:
> > Can you please take a look at the following patchset I just posted:
> > 
> >    https://lkml.kernel.org/r/20230105002007.157497-1-tj@kernel.org
> > 
> > After that, all these configuration operations are wrapped between
> > blkg_conf_init() and blkg_conf_exit() which probably are the right place to
> > implement the synchronization.
> 
> I see that, blkg_conf_init() and blkg_conf_exit() is good, however there
> are some details I want to confirm:
> 
> 1) rq_qos_add() can be called from iocost/iolatency, where
> blkg_conf_init() will be called first, while rq_qos_add() can also be
> called from wbt, where there is no blkg_conf_init(). Hence it seems to
> me we need two locks here, one to protect rq_qos apis; one to
> synchronize policy configuration and device removal.

wbt's lazy init is tied to one of the block device sysfs files, right? So,
it *should* already be protected against device removal.

Thanks.
Yu Kuai Jan. 6, 2023, 1:33 a.m. UTC | #5
Hi,

在 2023/01/06 2:34, Tejun Heo 写道:
> Hello,
> 
> On Thu, Jan 05, 2023 at 09:35:21AM +0800, Yu Kuai wrote:
>>> Can you please take a look at the following patchset I just posted:
>>>
>>>     https://lkml.kernel.org/r/20230105002007.157497-1-tj@kernel.org
>>>
>>> After that, all these configuration operations are wrapped between
>>> blkg_conf_init() and blkg_conf_exit() which probably are the right place to
>>> implement the synchronization.
>>
>> I see that, blkg_conf_init() and blkg_conf_exit() is good, however there
>> are some details I want to confirm:
>>
>> 1) rq_qos_add() can be called from iocost/iolatency, where
>> blkg_conf_init() will be called first, while rq_qos_add() can also be
>> called from wbt, where there is no blkg_conf_init(). Hence it seems to
>> me we need two locks here, one to protect rq_qos apis; one to
>> synchronize policy configuration and device removal.
> 
> wbt's lazy init is tied to one of the block device sysfs files, right? So,
> it *should* already be protected against device removal.

That seems not true, I don't think q->sysfs_lock can protect that,
consider that queue_wb_lat_store() doesn't check if del_gendisk() is
called or not:

t1: wbt lazy init		t2: remove device
queue_attr_store
				del_gendisk
				blk_unregister_queue
				 mutex_lock(&q->sysfs_lock)
			         ...
				 mutex_unlock(&q->sysfs_lock);
				rq_qos_exit
  mutex_lock(&q->sysfs_lock);
   queue_wb_lat_store
   wbt_init
    rq_qos_add
  mutex_unlock(&q->sysfs_lock);

I tried to comfirm that by adding following delay:

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 93d9e9c9a6ea..101c33cb0a2b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -11,6 +11,7 @@
  #include <linux/blktrace_api.h>
  #include <linux/blk-mq.h>
  #include <linux/debugfs.h>
+#include <linux/delay.h>

  #include "blk.h"
  #include "blk-mq.h"
@@ -734,6 +735,8 @@ queue_attr_store(struct kobject *kobj, struct 
attribute *attr,
         if (!entry->store)
                 return -EIO;

+       msleep(10000);
+
         mutex_lock(&q->sysfs_lock);
         res = entry->store(q, page, length);
         mutex_unlock(&q->sysfs_lock);

And then do the following test:

1) echo 10000 > /sys/block/sdb/queue/wbt_lat_usec &
2) echo 1 > /sys/block/sda/device/delete

Then, following bug is triggered:

[   51.923642] BUG: unable to handle page fault for address: 
ffffffffffffffed
[   51.924294] #PF: supervisor read access in kernel mode
[   51.924773] #PF: error_code(0x0000) - not-present page
[   51.925252] PGD 1820b067 P4D 1820b067 PUD 1820d067 PMD 0
[   51.925754] Oops: 0000 [#1] PREEMPT SMP
[   51.926123] CPU: 1 PID: 539 Comm: bash Tainted: G        W 
6.2.0-rc1-next-202212267
[   51.927124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-b4
[   51.928334] RIP: 0010:__rq_qos_issue+0x30/0x60
[   51.928761] Code: 48 89 f5 53 48 89 fb 48 83 05 eb eb c9 0b 01 eb 19 
48 89 df 48 83 05 e6 e9
[   51.930426] RSP: 0018:ffffc90000c3b9b0 EFLAGS: 00010206
[   51.930905] RAX: 0000000000000000 RBX: ffffffffffffffed RCX: 
0000000000000017
[   51.931554] RDX: 000007c329800000 RSI: ffff8881022c0380 RDI: 
ffffffffffffffed
[   51.932197] RBP: ffff8881022c0380 R08: 0000000c385056e3 R09: 
ffff8881022c05c8
[   51.932841] R10: 0000000000000000 R11: ffff888100a94000 R12: 
ffff888102145000
[   51.933488] R13: 0000000000000000 R14: ffff888100a94000 R15: 
ffff8881022c04a0
[   51.934140] FS:  00007fd23def9700(0000) GS:ffff88813bd00000(0000) 
knlGS:0000000000000000
[   51.934856] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   51.935379] CR2: ffffffffffffffed CR3: 0000000106fff000 CR4: 
00000000000006e0
[   51.936036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   51.936675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[   51.937315] Call Trace:
[   51.937545]  <TASK>
[   51.937749]  blk_mq_start_request+0x1d1/0x240
[   51.938151]  scsi_queue_rq+0x347/0x1190
[   51.938513]  blk_mq_dispatch_rq_list+0x366/0xef0
[   51.938938]  ? tick_nohz_tick_stopped+0x1a/0x40
[   51.939356]  ? __irq_work_queue_local+0x59/0xd0
[   51.939769]  ? __sbitmap_get_word+0x3b/0xb0
[   51.940153]  __blk_mq_sched_dispatch_requests+0xdd/0x210
[   51.940633]  blk_mq_sched_dispatch_requests+0x38/0xa0
[   51.941089]  __blk_mq_run_hw_queue+0xca/0x110
[   51.941483]  __blk_mq_delay_run_hw_queue+0x1fc/0x210
[   51.941931]  blk_mq_run_hw_queue+0x15c/0x1d0
[   51.942327]  blk_mq_sched_insert_request+0x9c/0x210
[   51.942769]  blk_execute_rq+0xec/0x290
[   51.943121]  __scsi_execute+0x131/0x310
[   51.943492]  sd_sync_cache+0xc6/0x280
[   51.943831]  sd_shutdown+0x7f/0x180
[   51.944155]  sd_remove+0x53/0x80
[   51.944457]  device_remove+0x80/0xa0
[   51.944785]  device_release_driver_internal+0x131/0x270
[   51.945257]  device_release_driver+0x16/0x20
[   51.945643]  bus_remove_device+0x135/0x200

Thanks,
Kuai
> 
> Thanks.
>
Tejun Heo Jan. 6, 2023, 6:23 p.m. UTC | #6
Hello,

On Fri, Jan 06, 2023 at 09:33:26AM +0800, Yu Kuai wrote:
> > wbt's lazy init is tied to one of the block device sysfs files, right? So,
> > it *should* already be protected against device removal.
> 
> That seems not true, I don't think q->sysfs_lock can protect that,
> consider that queue_wb_lat_store() doesn't check if del_gendisk() is
> called or not:
> 
> t1: wbt lazy init		t2: remove device
> queue_attr_store
> 				del_gendisk
> 				blk_unregister_queue
> 				 mutex_lock(&q->sysfs_lock)
> 			         ...
> 				 mutex_unlock(&q->sysfs_lock);
> 				rq_qos_exit
>  mutex_lock(&q->sysfs_lock);
>   queue_wb_lat_store
>   wbt_init
>    rq_qos_add
>  mutex_unlock(&q->sysfs_lock);

So, it's not sysfs_lock but sysfs file deletion. When a kernfs, which backs
sysfs, file is removed, it disables future operations and drains all
inflight ones before returning, so you remove the interface files before
cleaning up the object that it interacts with, you don't have to worry about
racing against file operations as none can be in flight at that point.

> I tried to comfirm that by adding following delay:
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 93d9e9c9a6ea..101c33cb0a2b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -11,6 +11,7 @@
>  #include <linux/blktrace_api.h>
>  #include <linux/blk-mq.h>
>  #include <linux/debugfs.h>
> +#include <linux/delay.h>
> 
>  #include "blk.h"
>  #include "blk-mq.h"
> @@ -734,6 +735,8 @@ queue_attr_store(struct kobject *kobj, struct attribute
> *attr,
>         if (!entry->store)
>                 return -EIO;
> 
> +       msleep(10000);
> +
>         mutex_lock(&q->sysfs_lock);
>         res = entry->store(q, page, length);
>         mutex_unlock(&q->sysfs_lock);
> 
> And then do the following test:
> 
> 1) echo 10000 > /sys/block/sdb/queue/wbt_lat_usec &
> 2) echo 1 > /sys/block/sda/device/delete
> 
> Then, following bug is triggered:
> 
> [   51.923642] BUG: unable to handle page fault for address:
> ffffffffffffffed
> [   51.924294] #PF: supervisor read access in kernel mode
> [   51.924773] #PF: error_code(0x0000) - not-present page
> [   51.925252] PGD 1820b067 P4D 1820b067 PUD 1820d067 PMD 0
> [   51.925754] Oops: 0000 [#1] PREEMPT SMP
> [   51.926123] CPU: 1 PID: 539 Comm: bash Tainted: G        W
> 6.2.0-rc1-next-202212267
> [   51.927124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> ?-20190727_073836-b4
> [   51.928334] RIP: 0010:__rq_qos_issue+0x30/0x60

This indicates that we aren't getting the destruction order right. It could
be that there are other reasons why the ordering is like this and we might
have to synchronize separately.

Sorry that I've been asking you to go round and round but block device
add/remove paths have always been really tricky and we wanna avoid adding
more complications if at all possible. Can you see why the device is being
destroyed before the queue attr is removed?

Thanks.
Yu Kuai Jan. 9, 2023, 1:38 a.m. UTC | #7
Hi,

在 2023/01/07 2:23, Tejun Heo 写道:
> Hello,
> 
> On Fri, Jan 06, 2023 at 09:33:26AM +0800, Yu Kuai wrote:
>>> wbt's lazy init is tied to one of the block device sysfs files, right? So,
>>> it *should* already be protected against device removal.
>>
>> That seems not true, I don't think q->sysfs_lock can protect that,
>> consider that queue_wb_lat_store() doesn't check if del_gendisk() is
>> called or not:
>>
>> t1: wbt lazy init		t2: remove device
>> queue_attr_store
>> 				del_gendisk
>> 				blk_unregister_queue
>> 				 mutex_lock(&q->sysfs_lock)
>> 			         ...
>> 				 mutex_unlock(&q->sysfs_lock);
>> 				rq_qos_exit
>>   mutex_lock(&q->sysfs_lock);
>>    queue_wb_lat_store
>>    wbt_init
>>     rq_qos_add
>>   mutex_unlock(&q->sysfs_lock);
> 
> So, it's not sysfs_lock but sysfs file deletion. When a kernfs, which backs
> sysfs, file is removed, it disables future operations and drains all
> inflight ones before returning, so you remove the interface files before
> cleaning up the object that it interacts with, you don't have to worry about
> racing against file operations as none can be in flight at that point.

Ok, thanks for explanation, I'll look into this and try to find out how
this works.

> 
>> I tried to comfirm that by adding following delay:
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 93d9e9c9a6ea..101c33cb0a2b 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -11,6 +11,7 @@
>>   #include <linux/blktrace_api.h>
>>   #include <linux/blk-mq.h>
>>   #include <linux/debugfs.h>
>> +#include <linux/delay.h>
>>
>>   #include "blk.h"
>>   #include "blk-mq.h"
>> @@ -734,6 +735,8 @@ queue_attr_store(struct kobject *kobj, struct attribute
>> *attr,
>>          if (!entry->store)
>>                  return -EIO;
>>
>> +       msleep(10000);
>> +
>>          mutex_lock(&q->sysfs_lock);
>>          res = entry->store(q, page, length);
>>          mutex_unlock(&q->sysfs_lock);
>>
>> And then do the following test:
>>
>> 1) echo 10000 > /sys/block/sdb/queue/wbt_lat_usec &
>> 2) echo 1 > /sys/block/sda/device/delete
>>
>> Then, following bug is triggered:
>>
>> [   51.923642] BUG: unable to handle page fault for address:
>> ffffffffffffffed
>> [   51.924294] #PF: supervisor read access in kernel mode
>> [   51.924773] #PF: error_code(0x0000) - not-present page
>> [   51.925252] PGD 1820b067 P4D 1820b067 PUD 1820d067 PMD 0
>> [   51.925754] Oops: 0000 [#1] PREEMPT SMP
>> [   51.926123] CPU: 1 PID: 539 Comm: bash Tainted: G        W
>> 6.2.0-rc1-next-202212267
>> [   51.927124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> ?-20190727_073836-b4
>> [   51.928334] RIP: 0010:__rq_qos_issue+0x30/0x60
> 
> This indicates that we aren't getting the destruction order right. It could
> be that there are other reasons why the ordering is like this and we might
> have to synchronize separately.
> 
> Sorry that I've been asking you to go round and round but block device
> add/remove paths have always been really tricky and we wanna avoid adding
> more complications if at all possible. Can you see why the device is being
> destroyed before the queue attr is removed?

Of course, I'll glad to help, I'll let you know if I have any progress.

Thanks,
Kuai
Yu Kuai Jan. 9, 2023, 6:37 a.m. UTC | #8
Hi,

在 2023/01/09 9:38, Yu Kuai 写道:
>> So, it's not sysfs_lock but sysfs file deletion. When a kernfs, which 
>> backs
>> sysfs, file is removed, it disables future operations and drains all
>> inflight ones before returning, so you remove the interface files before
>> cleaning up the object that it interacts with, you don't have to worry 
>> about
>> racing against file operations as none can be in flight at that point.
> 

I understand this know, kernfs_fop_write_iter() will grab
kernfs_node->active, and kobject_del() will wait for active to be
dropped in kernfs_drain().

>> Sorry that I've been asking you to go round and round but block device
>> add/remove paths have always been really tricky and we wanna avoid adding
>> more complications if at all possible. Can you see why the device is 
>> being
>> destroyed before the queue attr is removed?
> 

Sorry that I actually tested with patch 4 applied, and this is a bug
introduced by patch 4, my apologies. It set rqos to ERR_PTR() in
rq_qos_exit, and follow up rq_qos_issue() just check if rqos is NULL.

I'll wait for your patchset to be apllied, and then send a new version.
Just one thing to confirm, do you think it's better to use a global
mutex rather than a disk level mutex? I'm not sure because this will
cause different cgroup configurations from different disk can't
concurrent.

Thanks,
Kuai
diff mbox series

Patch

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 50544bfb12f1..5f7ccc249c11 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -2,6 +2,8 @@ 
 
 #include "blk-rq-qos.h"
 
+static DEFINE_MUTEX(rq_qos_lock);
+
 /*
  * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
  * false if 'v' + 1 would be bigger than 'below'.
@@ -286,23 +288,18 @@  void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 	finish_wait(&rqw->wait, &data.wq);
 }
 
-int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
+static int __rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
 {
 	/*
 	 * No IO can be in-flight when adding rqos, so freeze queue, which
 	 * is fine since we only support rq_qos for blk-mq queue.
-	 *
-	 * Reuse ->queue_lock for protecting against other concurrent
-	 * rq_qos adding/deleting
 	 */
 	blk_mq_freeze_queue(q);
 
-	spin_lock_irq(&q->queue_lock);
 	if (rq_qos_id(q, rqos->id))
 		goto ebusy;
 	rqos->next = q->rq_qos;
 	q->rq_qos = rqos;
-	spin_unlock_irq(&q->queue_lock);
 
 	blk_mq_unfreeze_queue(q);
 
@@ -314,29 +311,23 @@  int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
 
 	return 0;
 ebusy:
-	spin_unlock_irq(&q->queue_lock);
 	blk_mq_unfreeze_queue(q);
 	return -EBUSY;
 }
 
-void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
+static void __rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
 {
 	struct rq_qos **cur;
 
-	/*
-	 * See comment in rq_qos_add() about freezing queue & using
-	 * ->queue_lock.
-	 */
+	/* See comment in __rq_qos_add() about freezing queue */
 	blk_mq_freeze_queue(q);
 
-	spin_lock_irq(&q->queue_lock);
 	for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
 		if (*cur == rqos) {
 			*cur = rqos->next;
 			break;
 		}
 	}
-	spin_unlock_irq(&q->queue_lock);
 
 	blk_mq_unfreeze_queue(q);
 
@@ -345,13 +336,33 @@  void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
 	mutex_unlock(&q->debugfs_mutex);
 }
 
+int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
+{
+	int ret;
+
+	mutex_lock(&rq_qos_lock);
+	ret = __rq_qos_add(q, rqos);
+	mutex_unlock(&rq_qos_lock);
+
+	return ret;
+}
+
+void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
+{
+	mutex_lock(&rq_qos_lock);
+	__rq_qos_del(q, rqos);
+	mutex_unlock(&rq_qos_lock);
+}
+
 void rq_qos_exit(struct request_queue *q)
 {
+	mutex_lock(&rq_qos_lock);
 	while (q->rq_qos) {
 		struct rq_qos *rqos = q->rq_qos;
 		q->rq_qos = rqos->next;
 		rqos->ops->exit(rqos);
 	}
+	mutex_unlock(&rq_qos_lock);
 }
 
 #ifdef CONFIG_BLK_CGROUP
@@ -364,15 +375,20 @@  int rq_qos_add_and_activate_policy(struct request_queue *q, struct rq_qos *rqos,
 	 * may get called before policy activation completion, can't assume that
 	 * the target bio has an pd associated and need to test for NULL.
 	 */
-	int ret = rq_qos_add(q, rqos);
+	int ret;
 
-	if (ret)
+	mutex_lock(&rq_qos_lock);
+	ret = __rq_qos_add(q, rqos);
+	if (ret) {
+		mutex_unlock(&rq_qos_lock);
 		return ret;
+	}
 
 	ret = blkcg_activate_policy(q, pol);
 	if (ret)
-		rq_qos_del(q, rqos);
+		__rq_qos_del(q, rqos);
 
+	mutex_unlock(&rq_qos_lock);
 	return ret;
 }
 #endif