[01/13] blk: make blk-rq-qos support pluggable and modular policy

Message ID	20220110091046.17010-2-jianchao.wan9@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Wang Jianchao <jianchao.wan9@gmail.com> To: axboe@kernel.dk Cc: jbacik@fb.com, tj@kernel.org, bvanassche@acm.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/13] blk: make blk-rq-qos support pluggable and modular policy Date: Mon, 10 Jan 2022 17:10:34 +0800 Message-Id: <20220110091046.17010-2-jianchao.wan9@gmail.com> In-Reply-To: <20220110091046.17010-1-jianchao.wan9@gmail.com> References: <20220110091046.17010-1-jianchao.wan9@gmail.com> Precedence: bulk
Series	blk: make blk-rq-qos policies pluggable and modular \| expand [0/13] blk: make blk-rq-qos policies pluggable and modular [01/13] blk: make blk-rq-qos support pluggable and modular policy [02/13] blk-wbt: make wbt pluggable [03/13] blk: export following interfaces [04/13] cgroup: export following two interfaces [05/13] blk-iolatency: make iolatency pluggable and modular [06/13] blk: remove unused BLK_RQ_IO_DATA_LEN [07/13] blk: use standalone macro to control bio.bi_iocost_cost [08/13] blk-iocost: make iocost pluggable and modular [09/13] blk: rename ioprio.c to ioprio-common.c [10/13] blk-ioprio: make ioprio pluggable and modular [11/13] blk: remove unused interfaces of blk-rq-qos [12/13] blk: make request able to carry blkcg_gq [13/13] blk: introduce iostat per cgroup module

Hi Wang, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tj-cgroup/for-next] [also build test WARNING on v5.16] [cannot apply to axboe-block/for-next next-20220112] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Wang-Jianchao/blk-make-blk-rq-qos-policies-pluggable-and-modular/20220110-171347 base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next config: arm-randconfig-r006-20220112 (https://download.01.org/0day-ci/archive/20220113/202201130903.7ZvBIOs4-lkp@intel.com/config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 244dd2913a43a200f5a6544d424cdc37b771028b) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm cross compiling tool for clang build # apt-get install binutils-arm-linux-gnueabi # https://github.com/0day-ci/linux/commit/8bef9fba59d8d47ecaebbeff3e62ee550d89b017 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Wang-Jianchao/blk-make-blk-rq-qos-policies-pluggable-and-modular/20220110-171347 git checkout 8bef9fba59d8d47ecaebbeff3e62ee550d89b017 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): block/blk-iocost.c:1244:6: warning: variable 'last_period' set but not used [-Wunused-but-set-variable] u64 last_period, cur_period; ^ >> block/blk-iocost.c:3348:7: warning: variable 'ioc' is uninitialized when used here [-Wuninitialized] if (!ioc) { ^~~ block/blk-iocost.c:3337:17: note: initialize the variable 'ioc' to silence this warning struct ioc *ioc; ^ = NULL 2 warnings generated. vim +/ioc +3348 block/blk-iocost.c 7caa47151ab2e64 Tejun Heo 2019-08-28 3331 7caa47151ab2e64 Tejun Heo 2019-08-28 3332 static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input, 7caa47151ab2e64 Tejun Heo 2019-08-28 3333 size_t nbytes, loff_t off) 7caa47151ab2e64 Tejun Heo 2019-08-28 3334 { 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3335 struct block_device *bdev; 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3336 struct rq_qos *rqos; 7caa47151ab2e64 Tejun Heo 2019-08-28 3337 struct ioc *ioc; 7caa47151ab2e64 Tejun Heo 2019-08-28 3338 u64 u[NR_I_LCOEFS]; 7caa47151ab2e64 Tejun Heo 2019-08-28 3339 bool user; 7caa47151ab2e64 Tejun Heo 2019-08-28 3340 char *p; 7caa47151ab2e64 Tejun Heo 2019-08-28 3341 int ret; 7caa47151ab2e64 Tejun Heo 2019-08-28 3342 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3343 bdev = blkcg_conf_open_bdev(&input); 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3344 if (IS_ERR(bdev)) 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3345 return PTR_ERR(bdev); 7caa47151ab2e64 Tejun Heo 2019-08-28 3346 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3347 rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); 7caa47151ab2e64 Tejun Heo 2019-08-28 @3348 if (!ioc) { ed6cddefdfd361a Pavel Begunkov 2021-10-14 3349 ret = blk_iocost_init(bdev_get_queue(bdev)); 7caa47151ab2e64 Tejun Heo 2019-08-28 3350 if (ret) 7caa47151ab2e64 Tejun Heo 2019-08-28 3351 goto err; 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3352 rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); 7caa47151ab2e64 Tejun Heo 2019-08-28 3353 } 7caa47151ab2e64 Tejun Heo 2019-08-28 3354 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3355 ioc = rqos_to_ioc(rqos); 7caa47151ab2e64 Tejun Heo 2019-08-28 3356 spin_lock_irq(&ioc->lock); 7caa47151ab2e64 Tejun Heo 2019-08-28 3357 memcpy(u, ioc->params.i_lcoefs, sizeof(u)); 7caa47151ab2e64 Tejun Heo 2019-08-28 3358 user = ioc->user_cost_model; 7caa47151ab2e64 Tejun Heo 2019-08-28 3359 spin_unlock_irq(&ioc->lock); 7caa47151ab2e64 Tejun Heo 2019-08-28 3360 7caa47151ab2e64 Tejun Heo 2019-08-28 3361 while ((p = strsep(&input, " \t\n"))) { 7caa47151ab2e64 Tejun Heo 2019-08-28 3362 substring_t args[MAX_OPT_ARGS]; 7caa47151ab2e64 Tejun Heo 2019-08-28 3363 char buf[32]; 7caa47151ab2e64 Tejun Heo 2019-08-28 3364 int tok; 7caa47151ab2e64 Tejun Heo 2019-08-28 3365 u64 v; 7caa47151ab2e64 Tejun Heo 2019-08-28 3366 7caa47151ab2e64 Tejun Heo 2019-08-28 3367 if (!*p) 7caa47151ab2e64 Tejun Heo 2019-08-28 3368 continue; 7caa47151ab2e64 Tejun Heo 2019-08-28 3369 7caa47151ab2e64 Tejun Heo 2019-08-28 3370 switch (match_token(p, cost_ctrl_tokens, args)) { 7caa47151ab2e64 Tejun Heo 2019-08-28 3371 case COST_CTRL: 7caa47151ab2e64 Tejun Heo 2019-08-28 3372 match_strlcpy(buf, &args[0], sizeof(buf)); 7caa47151ab2e64 Tejun Heo 2019-08-28 3373 if (!strcmp(buf, "auto")) 7caa47151ab2e64 Tejun Heo 2019-08-28 3374 user = false; 7caa47151ab2e64 Tejun Heo 2019-08-28 3375 else if (!strcmp(buf, "user")) 7caa47151ab2e64 Tejun Heo 2019-08-28 3376 user = true; 7caa47151ab2e64 Tejun Heo 2019-08-28 3377 else 7caa47151ab2e64 Tejun Heo 2019-08-28 3378 goto einval; 7caa47151ab2e64 Tejun Heo 2019-08-28 3379 continue; 7caa47151ab2e64 Tejun Heo 2019-08-28 3380 case COST_MODEL: 7caa47151ab2e64 Tejun Heo 2019-08-28 3381 match_strlcpy(buf, &args[0], sizeof(buf)); 7caa47151ab2e64 Tejun Heo 2019-08-28 3382 if (strcmp(buf, "linear")) 7caa47151ab2e64 Tejun Heo 2019-08-28 3383 goto einval; 7caa47151ab2e64 Tejun Heo 2019-08-28 3384 continue; 7caa47151ab2e64 Tejun Heo 2019-08-28 3385 } 7caa47151ab2e64 Tejun Heo 2019-08-28 3386 7caa47151ab2e64 Tejun Heo 2019-08-28 3387 tok = match_token(p, i_lcoef_tokens, args); 7caa47151ab2e64 Tejun Heo 2019-08-28 3388 if (tok == NR_I_LCOEFS) 7caa47151ab2e64 Tejun Heo 2019-08-28 3389 goto einval; 7caa47151ab2e64 Tejun Heo 2019-08-28 3390 if (match_u64(&args[0], &v)) 7caa47151ab2e64 Tejun Heo 2019-08-28 3391 goto einval; 7caa47151ab2e64 Tejun Heo 2019-08-28 3392 u[tok] = v; 7caa47151ab2e64 Tejun Heo 2019-08-28 3393 user = true; 7caa47151ab2e64 Tejun Heo 2019-08-28 3394 } 7caa47151ab2e64 Tejun Heo 2019-08-28 3395 7caa47151ab2e64 Tejun Heo 2019-08-28 3396 spin_lock_irq(&ioc->lock); 7caa47151ab2e64 Tejun Heo 2019-08-28 3397 if (user) { 7caa47151ab2e64 Tejun Heo 2019-08-28 3398 memcpy(ioc->params.i_lcoefs, u, sizeof(u)); 7caa47151ab2e64 Tejun Heo 2019-08-28 3399 ioc->user_cost_model = true; 7caa47151ab2e64 Tejun Heo 2019-08-28 3400 } else { 7caa47151ab2e64 Tejun Heo 2019-08-28 3401 ioc->user_cost_model = false; 7caa47151ab2e64 Tejun Heo 2019-08-28 3402 } 7caa47151ab2e64 Tejun Heo 2019-08-28 3403 ioc_refresh_params(ioc, true); 7caa47151ab2e64 Tejun Heo 2019-08-28 3404 spin_unlock_irq(&ioc->lock); 7caa47151ab2e64 Tejun Heo 2019-08-28 3405 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3406 rq_qos_put(rqos); 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3407 blkdev_put_no_open(bdev); 7caa47151ab2e64 Tejun Heo 2019-08-28 3408 return nbytes; 7caa47151ab2e64 Tejun Heo 2019-08-28 3409 7caa47151ab2e64 Tejun Heo 2019-08-28 3410 einval: 7caa47151ab2e64 Tejun Heo 2019-08-28 3411 ret = -EINVAL; 8bef9fba59d8d47 Wang Jianchao 2022-01-10 3412 rq_qos_put(rqos); 7caa47151ab2e64 Tejun Heo 2019-08-28 3413 err: 22ae8ce8b89241c Christoph Hellwig 2020-11-26 3414 blkdev_put_no_open(bdev); 7caa47151ab2e64 Tejun Heo 2019-08-28 3415 return ret; 7caa47151ab2e64 Tejun Heo 2019-08-28 3416 } 7caa47151ab2e64 Tejun Heo 2019-08-28 3417 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

diff --git a/block/blk-core.c b/block/blk-core.c index 1378d084c770..2847ab514c1f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -51,6 +51,7 @@ #include "blk-mq-sched.h" #include "blk-pm.h" #include "blk-throttle.h" +#include "blk-rq-qos.h" struct dentry *blk_debugfs_root; @@ -377,6 +378,7 @@ void blk_cleanup_queue(struct request_queue *q) * it is safe to free requests now. */ mutex_lock(&q->sysfs_lock); + rq_qos_exit(q); if (q->elevator) blk_mq_sched_free_rqs(q); mutex_unlock(&q->sysfs_lock); diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 769b64394298..cfc0e305c32e 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -662,7 +662,7 @@ static struct ioc *rqos_to_ioc(struct rq_qos *rqos) static struct ioc *q_to_ioc(struct request_queue *q) { - return rqos_to_ioc(rq_qos_id(q, RQ_QOS_COST)); + return rqos_to_ioc(rq_qos_by_id(q, RQ_QOS_COST)); } static const char *q_name(struct request_queue *q) @@ -3162,6 +3162,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, size_t nbytes, loff_t off) { struct block_device *bdev; + struct rq_qos *rqos; struct ioc *ioc; u32 qos[NR_QOS_PARAMS]; bool enable, user; @@ -3172,14 +3173,15 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, if (IS_ERR(bdev)) return PTR_ERR(bdev); - ioc = q_to_ioc(bdev_get_queue(bdev)); - if (!ioc) { + rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); + if (!rqos) { ret = blk_iocost_init(bdev_get_queue(bdev)); if (ret) goto err; - ioc = q_to_ioc(bdev_get_queue(bdev)); + rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); } + ioc = rqos_to_ioc(rqos); spin_lock_irq(&ioc->lock); memcpy(qos, ioc->params.qos, sizeof(qos)); enable = ioc->enabled; @@ -3272,10 +3274,12 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input, ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock); + rq_qos_put(rqos); blkdev_put_no_open(bdev); return nbytes; einval: ret = -EINVAL; + rq_qos_put(rqos); err: blkdev_put_no_open(bdev); return ret; @@ -3329,6 +3333,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input, size_t nbytes, loff_t off) { struct block_device *bdev; + struct rq_qos *rqos; struct ioc *ioc; u64 u[NR_I_LCOEFS]; bool user; @@ -3339,14 +3344,15 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input, if (IS_ERR(bdev)) return PTR_ERR(bdev); - ioc = q_to_ioc(bdev_get_queue(bdev)); + rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); if (!ioc) { ret = blk_iocost_init(bdev_get_queue(bdev)); if (ret) goto err; - ioc = q_to_ioc(bdev_get_queue(bdev)); + rqos = rq_qos_get(bdev_get_queue(bdev), RQ_QOS_COST); } + ioc = rqos_to_ioc(rqos); spin_lock_irq(&ioc->lock); memcpy(u, ioc->params.i_lcoefs, sizeof(u)); user = ioc->user_cost_model; @@ -3397,11 +3403,13 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input, ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock); + rq_qos_put(rqos); blkdev_put_no_open(bdev); return nbytes; einval: ret = -EINVAL; + rq_qos_put(rqos); err: blkdev_put_no_open(bdev); return ret; diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 4f2cf8399f3d..e3e8d54c836f 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -841,7 +841,9 @@ void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos) void blk_mq_debugfs_register_rqos(struct rq_qos *rqos) { struct request_queue *q = rqos->q; - const char *dir_name = rq_qos_id_to_name(rqos->id); + const char *dir_name; + + dir_name = rqos->ops->name ? rqos->ops->name : rq_qos_id_to_name(rqos->id); if (rqos->debugfs_dir || !rqos->ops->debugfs_attrs) return; diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index e83af7bc7591..a94ff872722b 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -2,6 +2,11 @@ #include "blk-rq-qos.h" +static DEFINE_IDA(rq_qos_ida); +static int nr_rqos_blkcg_pols; +static DEFINE_MUTEX(rq_qos_mutex); +static LIST_HEAD(rq_qos_list); + /* * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded, * false if 'v' + 1 would be bigger than 'below'. @@ -294,11 +299,316 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data, void rq_qos_exit(struct request_queue *q) { - blk_mq_debugfs_unregister_queue_rqos(q); + WARN_ON(!mutex_is_locked(&q->sysfs_lock)); while (q->rq_qos) { struct rq_qos *rqos = q->rq_qos; q->rq_qos = rqos->next; + if (rqos->ops->owner) + module_put(rqos->ops->owner); rqos->ops->exit(rqos); } + blk_mq_debugfs_unregister_queue_rqos(q); +} + +/* + * After the pluggable blk-qos, rqos's life cycle become complicated, + * qos switching path can add/delete rqos to/from request_queue + * under sysfs_lock and queue_lock. There are following places + * may access rqos through rq_qos_by_id() concurrently: + * (1) normal IO path, under q_usage_counter, + * (2) queue sysfs interfaces, under sysfs_lock, + * (3) blkg_create, the .pd_init_fn() may access rqos, under queue_lock, + * (4) cgroup file, such as ioc_cost_model_write, + * + * (1)(2)(3) are definitely safe. case (4) is tricky. rq_qos_get() is + * for the case. + */ +struct rq_qos *rq_qos_get(struct request_queue *q, int id) +{ + struct rq_qos *rqos; + + spin_lock_irq(&q->queue_lock); + rqos = rq_qos_by_id(q, id); + if (rqos && rqos->dying) + rqos = NULL; + if (rqos) + refcount_inc(&rqos->ref); + spin_unlock_irq(&q->queue_lock); + return rqos; +} +EXPORT_SYMBOL_GPL(rq_qos_get); + +void rq_qos_put(struct rq_qos *rqos) +{ + struct request_queue *q = rqos->q; + + spin_lock_irq(&q->queue_lock); + refcount_dec(&rqos->ref); + if (rqos->dying) + wake_up(&rqos->waitq); + spin_unlock_irq(&q->queue_lock); +} +EXPORT_SYMBOL_GPL(rq_qos_put); + +void rq_qos_activate(struct request_queue *q, + struct rq_qos *rqos, const struct rq_qos_ops *ops) +{ + struct rq_qos *pos; + bool rq_alloc_time = false; + + WARN_ON(!mutex_is_locked(&q->sysfs_lock)); + + rqos->dying = false; + refcount_set(&rqos->ref, 1); + init_waitqueue_head(&rqos->waitq); + rqos->id = ops->id; + rqos->ops = ops; + rqos->q = q; + rqos->next = NULL; + + spin_lock_irq(&q->queue_lock); + pos = q->rq_qos; + if (pos) { + while (pos->next) { + if (pos->ops->flags & RQOS_FLAG_RQ_ALLOC_TIME) + rq_alloc_time = true; + pos = pos->next; + } + pos->next = rqos; + } else { + q->rq_qos = rqos; + } + if (ops->flags & RQOS_FLAG_RQ_ALLOC_TIME && + !rq_alloc_time) + blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, q); + + spin_unlock_irq(&q->queue_lock); + + if (rqos->ops->debugfs_attrs) + blk_mq_debugfs_register_rqos(rqos); +} +EXPORT_SYMBOL_GPL(rq_qos_activate); + +void rq_qos_deactivate(struct rq_qos *rqos) +{ + struct request_queue *q = rqos->q; + struct rq_qos **cur, *pos; + bool rq_alloc_time = false; + + WARN_ON(!mutex_is_locked(&q->sysfs_lock)); + + spin_lock_irq(&q->queue_lock); + rqos->dying = true; + /* + * Drain all of the usage of get/put_rqos() + */ + wait_event_lock_irq(rqos->waitq, + refcount_read(&rqos->ref) == 1, q->queue_lock); + for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) { + if (*cur == rqos) { + *cur = rqos->next; + break; + } + } + + pos = q->rq_qos; + while (pos && pos->next) { + if (pos->ops->flags & RQOS_FLAG_RQ_ALLOC_TIME) + rq_alloc_time = true; + pos = pos->next; + } + + if (rqos->ops->flags & RQOS_FLAG_RQ_ALLOC_TIME && + !rq_alloc_time) + blk_queue_flag_clear(QUEUE_FLAG_RQ_ALLOC_TIME, q); + + spin_unlock_irq(&q->queue_lock); + blk_mq_debugfs_unregister_rqos(rqos); +} +EXPORT_SYMBOL_GPL(rq_qos_deactivate); + +static struct rq_qos_ops *rq_qos_find_by_name(const char *name) +{ + struct rq_qos_ops *pos; + + list_for_each_entry(pos, &rq_qos_list, node) { + if (!strncmp(pos->name, name, strlen(pos->name))) + return pos; + } + + return NULL; +} + +int rq_qos_register(struct rq_qos_ops *ops) +{ + int ret, start; + + mutex_lock(&rq_qos_mutex); + + if (rq_qos_find_by_name(ops->name)) { + ret = -EEXIST; + goto out; + } + + if (ops->flags & RQOS_FLAG_CGRP_POL && + nr_rqos_blkcg_pols >= (BLKCG_MAX_POLS - BLKCG_NON_RQOS_POLS)) { + ret = -ENOSPC; + goto out; + } + + start = RQ_QOS_IOPRIO + 1; + ret = ida_simple_get(&rq_qos_ida, start, INT_MAX, GFP_KERNEL); + if (ret < 0) + goto out; + + if (ops->flags & RQOS_FLAG_CGRP_POL) + nr_rqos_blkcg_pols++; + + ops->id = ret; + ret = 0; + INIT_LIST_HEAD(&ops->node); + list_add_tail(&ops->node, &rq_qos_list); +out: + mutex_unlock(&rq_qos_mutex); + return ret; +} +EXPORT_SYMBOL_GPL(rq_qos_register); + +void rq_qos_unregister(struct rq_qos_ops *ops) +{ + mutex_lock(&rq_qos_mutex); + + if (ops->flags & RQOS_FLAG_CGRP_POL) + nr_rqos_blkcg_pols--; + list_del_init(&ops->node); + ida_simple_remove(&rq_qos_ida, ops->id); + mutex_unlock(&rq_qos_mutex); +} +EXPORT_SYMBOL_GPL(rq_qos_unregister); + +ssize_t queue_qos_show(struct request_queue *q, char *buf) +{ + struct rq_qos_ops *ops; + struct rq_qos *rqos; + int ret = 0; + + mutex_lock(&rq_qos_mutex); + /* + * Show the policies in the order of being invoked + */ + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (!rqos->ops->name) + continue; + ret += sprintf(buf + ret, "[%s] ", rqos->ops->name); + } + list_for_each_entry(ops, &rq_qos_list, node) { + if (!rq_qos_by_name(q, ops->name)) + ret += sprintf(buf + ret, "%s ", ops->name); + } + + ret--; /* overwrite the last space */ + ret += sprintf(buf + ret, "\n"); + mutex_unlock(&rq_qos_mutex); + + return ret; +} + +int rq_qos_switch(struct request_queue *q, + const struct rq_qos_ops *ops, + struct rq_qos *rqos) +{ + int ret; + + WARN_ON(!mutex_is_locked(&q->sysfs_lock)); + + blk_mq_freeze_queue(q); + if (!rqos) { + ret = ops->init(q); + } else { + ops->exit(rqos); + ret = 0; + } + blk_mq_unfreeze_queue(q); + + return ret; +} + +ssize_t queue_qos_store(struct request_queue *q, const char *page, + size_t count) +{ + const struct rq_qos_ops *ops; + struct rq_qos *rqos; + const char *qosname; + char *buf; + bool add; + int ret; + + buf = kstrdup(page, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + buf = strim(buf); + if (buf[0] != '+' && buf[0] != '-') { + ret = -EINVAL; + goto out; + } + + add = buf[0] == '+'; + qosname = buf + 1; + + rqos = rq_qos_by_name(q, qosname); + if ((buf[0] == '+' && rqos)) { + ret = -EEXIST; + goto out; + } + + if ((buf[0] == '-' && !rqos)) { + ret = -ENODEV; + goto out; + } + + mutex_lock(&rq_qos_mutex); + if (add) { + ops = rq_qos_find_by_name(qosname); + if (!ops) { + /* + * module_init callback may request this mutex + */ + mutex_unlock(&rq_qos_mutex); + request_module("%s", qosname); + mutex_lock(&rq_qos_mutex); + ops = rq_qos_find_by_name(qosname); + } + } else { + ops = rqos->ops; + } + + if (!ops) { + ret = -EINVAL; + } else if (ops->owner && !try_module_get(ops->owner)) { + ops = NULL; + ret = -EAGAIN; + } + mutex_unlock(&rq_qos_mutex); + + if (!ops) + goto out; + + if (add) { + ret = rq_qos_switch(q, ops, NULL); + if (!ret && ops->owner) + __module_get(ops->owner); + } else { + rq_qos_switch(q, ops, rqos); + ret = 0; + if (ops->owner) + module_put(ops->owner); + } + + if (ops->owner) + module_put(ops->owner); +out: + kfree(buf); + return ret ? ret : count; } diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h index 3cfbc8668cba..c2b9b41f8fd4 100644 --- a/block/blk-rq-qos.h +++ b/block/blk-rq-qos.h @@ -26,7 +26,10 @@ struct rq_wait { }; struct rq_qos { - struct rq_qos_ops *ops; + refcount_t ref; + wait_queue_head_t waitq; + bool dying; + const struct rq_qos_ops *ops; struct request_queue *q; enum rq_qos_id id; struct rq_qos *next; @@ -35,7 +38,17 @@ struct rq_qos { #endif }; +enum { + RQOS_FLAG_CGRP_POL = 1 << 0, + RQOS_FLAG_RQ_ALLOC_TIME = 1 << 1 +}; + struct rq_qos_ops { + struct list_head node; + struct module *owner; + const char *name; + int flags; + int id; void (*throttle)(struct rq_qos *, struct bio *); void (*track)(struct rq_qos *, struct request *, struct bio *); void (*merge)(struct rq_qos *, struct request *, struct bio *); @@ -46,6 +59,7 @@ struct rq_qos_ops { void (*cleanup)(struct rq_qos *, struct bio *); void (*queue_depth_changed)(struct rq_qos *); void (*exit)(struct rq_qos *); + int (*init)(struct request_queue *); const struct blk_mq_debugfs_attr *debugfs_attrs; }; @@ -59,10 +73,12 @@ struct rq_depth { unsigned int default_depth; }; -static inline struct rq_qos *rq_qos_id(struct request_queue *q, - enum rq_qos_id id) +static inline struct rq_qos *rq_qos_by_id(struct request_queue *q, int id) { struct rq_qos *rqos; + + WARN_ON(!mutex_is_locked(&q->sysfs_lock) && !spin_is_locked(&q->queue_lock)); + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { if (rqos->id == id) break; @@ -72,12 +88,12 @@ static inline struct rq_qos *rq_qos_id(struct request_queue *q, static inline struct rq_qos *wbt_rq_qos(struct request_queue *q) { - return rq_qos_id(q, RQ_QOS_WBT); + return rq_qos_by_id(q, RQ_QOS_WBT); } static inline struct rq_qos *blkcg_rq_qos(struct request_queue *q) { - return rq_qos_id(q, RQ_QOS_LATENCY); + return rq_qos_by_id(q, RQ_QOS_LATENCY); } static inline void rq_wait_init(struct rq_wait *rq_wait) @@ -132,6 +148,35 @@ static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos) blk_mq_debugfs_unregister_rqos(rqos); } +int rq_qos_register(struct rq_qos_ops *ops); +void rq_qos_unregister(struct rq_qos_ops *ops); +void rq_qos_activate(struct request_queue *q, + struct rq_qos *rqos, const struct rq_qos_ops *ops); +void rq_qos_deactivate(struct rq_qos *rqos); +ssize_t queue_qos_show(struct request_queue *q, char *buf); +ssize_t queue_qos_store(struct request_queue *q, const char *page, + size_t count); +struct rq_qos *rq_qos_get(struct request_queue *q, int id); +void rq_qos_put(struct rq_qos *rqos); + +static inline struct rq_qos *rq_qos_by_name(struct request_queue *q, + const char *name) +{ + struct rq_qos *rqos; + + WARN_ON(!mutex_is_locked(&q->sysfs_lock)); + + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (!rqos->ops->name) + continue; + + if (!strncmp(rqos->ops->name, name, + strlen(rqos->ops->name))) + return rqos; + } + return NULL; +} + typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data); typedef void (cleanup_cb_t)(struct rq_wait *rqw, void *private_data); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index cd75b0f73dc6..91f980985b1b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -573,6 +573,7 @@ QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); QUEUE_RW_ENTRY(elv_iosched, "scheduler"); +QUEUE_RW_ENTRY(queue_qos, "qos"); QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size"); @@ -632,6 +633,7 @@ static struct attribute *queue_attrs[] = { &queue_max_integrity_segments_entry.attr, &queue_max_segment_size_entry.attr, &elv_iosched_entry.attr, + &queue_qos_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, &queue_physical_block_size_entry.attr, diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 0c119be0e813..88265ae4fa41 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -628,9 +628,13 @@ static void wbt_requeue(struct rq_qos *rqos, struct request *rq) void wbt_set_write_cache(struct request_queue *q, bool write_cache_on) { - struct rq_qos *rqos = wbt_rq_qos(q); + struct rq_qos *rqos; + + spin_lock_irq(&q->queue_lock); + rqos = wbt_rq_qos(q); if (rqos) RQWB(rqos)->wc = write_cache_on; + spin_unlock_irq(&q->queue_lock); } /* diff --git a/block/elevator.c b/block/elevator.c index 19a78d5516ba..fe664674c14d 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -701,12 +701,15 @@ void elevator_init_mq(struct request_queue *q) * requests, then no need to quiesce queue which may add long boot * latency, especially when lots of disks are involved. */ + + mutex_lock(&q->sysfs_lock); blk_mq_freeze_queue(q); blk_mq_cancel_work_sync(q); err = blk_mq_init_sched(q, e); blk_mq_unfreeze_queue(q); + mutex_unlock(&q->sysfs_lock); if (err) { pr_warn("\"%s\" elevator initialization failed, " diff --git a/block/genhd.c b/block/genhd.c index 30362aeacac4..af2e8ebce46e 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -27,7 +27,6 @@ #include <linux/badblocks.h> #include "blk.h" -#include "blk-rq-qos.h" static struct kobject *block_depr; @@ -621,8 +620,6 @@ void del_gendisk(struct gendisk *disk) device_del(disk_to_dev(disk)); blk_mq_freeze_queue_wait(q); - - rq_qos_exit(q); blk_sync_queue(q); blk_flush_integrity(); /* diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bd4370baccca..e7dce2232814 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -43,6 +43,10 @@ struct blk_crypto_profile; * Defined here to simplify include dependency. */ #define BLKCG_MAX_POLS 6 +/* + * Non blk-rq-qos blkcg policies include blk-throttle and bfq + */ +#define BLKCG_NON_RQOS_POLS 2 static inline int blk_validate_block_size(unsigned int bsize) {

[01/13] blk: make blk-rq-qos support pluggable and modular policy

Commit Message

Comments

Patch