Message ID | 20221020035348.10163-3-lengchao@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | improve nvme quiesce time for large amount of namespaces | expand |
On 10/20/22 06:53, Chao Leng wrote: > All controller namespaces share the same tagset, so we can use this > interface which does the optimal operation for parallel quiesce based on > the tagset type(e.g. blocking tagsets and non-blocking tagsets). > > nvme connect_q should not be quiesced when quiesce tagset, so set the > QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q. > > Currntely we use NVME_NS_STOPPED to ensure pairing quiescing and > unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be > invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED. > In addition, we never really quiesce a single namespace. It is a better > choice to move the flag from ns to ctrl. > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > Signed-off-by: Chao Leng <lengchao@huawei.com> > --- > drivers/nvme/host/core.c | 57 +++++++++++++++++++----------------------------- > drivers/nvme/host/nvme.h | 3 ++- > 2 files changed, 25 insertions(+), 35 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 059737c1a2c1..c7727d1f228e 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4890,6 +4890,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set, > ret = PTR_ERR(ctrl->connect_q); > goto out_free_tag_set; > } > + blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q); > } > > ctrl->tagset = set; > @@ -5013,6 +5014,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags); > spin_lock_init(&ctrl->lock); > mutex_init(&ctrl->scan_lock); > + mutex_init(&ctrl->queue_state_lock); Why is this lock needed?
On 2022/10/20 14:11, Sagi Grimberg wrote: > > > On 10/20/22 06:53, Chao Leng wrote: >> All controller namespaces share the same tagset, so we can use this >> interface which does the optimal operation for parallel quiesce based on >> the tagset type(e.g. blocking tagsets and non-blocking tagsets). >> >> nvme connect_q should not be quiesced when quiesce tagset, so set the >> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q. >> >> Currntely we use NVME_NS_STOPPED to ensure pairing quiescing and >> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be >> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED. >> In addition, we never really quiesce a single namespace. It is a better >> choice to move the flag from ns to ctrl. >> >> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >> Signed-off-by: Chao Leng <lengchao@huawei.com> >> --- >> drivers/nvme/host/core.c | 57 +++++++++++++++++++----------------------------- >> drivers/nvme/host/nvme.h | 3 ++- >> 2 files changed, 25 insertions(+), 35 deletions(-) >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 059737c1a2c1..c7727d1f228e 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -4890,6 +4890,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set, >> ret = PTR_ERR(ctrl->connect_q); >> goto out_free_tag_set; >> } >> + blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q); >> } >> ctrl->tagset = set; >> @@ -5013,6 +5014,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, >> clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags); >> spin_lock_init(&ctrl->lock); >> mutex_init(&ctrl->scan_lock); >> + mutex_init(&ctrl->queue_state_lock); > > Why is this lock needed? It is used to secure the process which need that the queue must be quiesced. The scenario:(without the lock) Thread A: call nvme_stop_queues and set the NVME_CTRL_STOPPED and then quiesce the tagset and wait the grace period. Thread B: call nvme_stop_queues, because the NVME_CTRL_STOPPED is already setted, continue to do something which need that the queue must be quiesced, because the grace period of the queue is not ended, may cause abnormal. Thread A: the grace period end, and continue. So add a lock to ensure that all queues are quiesced after set the NVME_CTRL_STOPPED. The old code was implemented by forcing a wait for the grace period. Show the code: if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags)) blk_mq_quiesce_queue(ns->queue); else blk_mq_wait_quiesce_done(ns->queue); The old code was not absolutely safe, such as this scenario: Thread A: test_and_set_bit, and interrupt by hardware irq, lost chance to run. Thread B: test_and_set_bit, and wait the grace period, and then continue to do something which need that the queue must be quiesced, because the queue is not quiesced, may cause abnormal. Thread A: get the chance to run, blk_mq_quiesce_queue, and then continue. > > .
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 059737c1a2c1..c7727d1f228e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4890,6 +4890,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set, ret = PTR_ERR(ctrl->connect_q); goto out_free_tag_set; } + blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q); } ctrl->tagset = set; @@ -5013,6 +5014,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags); spin_lock_init(&ctrl->lock); mutex_init(&ctrl->scan_lock); + mutex_init(&ctrl->queue_state_lock); INIT_LIST_HEAD(&ctrl->namespaces); xa_init(&ctrl->cels); init_rwsem(&ctrl->namespaces_rwsem); @@ -5089,36 +5091,21 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, } EXPORT_SYMBOL_GPL(nvme_init_ctrl); -static void nvme_start_ns_queue(struct nvme_ns *ns) -{ - if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags)) - blk_mq_unquiesce_queue(ns->queue); -} - -static void nvme_stop_ns_queue(struct nvme_ns *ns) -{ - if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags)) - blk_mq_quiesce_queue(ns->queue); - else - blk_mq_wait_quiesce_done(ns->queue); -} - /* * Prepare a queue for teardown. * - * This must forcibly unquiesce queues to avoid blocking dispatch, and only set - * the capacity to 0 after that to avoid blocking dispatchers that may be - * holding bd_butex. This will end buffered writers dirtying pages that can't - * be synced. + * The caller should unquiesce the queue to avoid blocking dispatch. */ static void nvme_set_queue_dying(struct nvme_ns *ns) { if (test_and_set_bit(NVME_NS_DEAD, &ns->flags)) return; + /* + * Mark the disk dead to prevent new opens, and set the capacity to 0 + * to end buffered writers dirtying pages that can't be synced. + */ blk_mark_disk_dead(ns->disk); - nvme_start_ns_queue(ns); - set_capacity_and_notify(ns->disk, 0); } @@ -5134,15 +5121,17 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl) struct nvme_ns *ns; down_read(&ctrl->namespaces_rwsem); + list_for_each_entry(ns, &ctrl->namespaces, list) + nvme_set_queue_dying(ns); + + up_read(&ctrl->namespaces_rwsem); /* Forcibly unquiesce queues to avoid blocking dispatch */ if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q)) nvme_start_admin_queue(ctrl); - list_for_each_entry(ns, &ctrl->namespaces, list) - nvme_set_queue_dying(ns); - - up_read(&ctrl->namespaces_rwsem); + if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags)) + nvme_start_queues(ctrl); } EXPORT_SYMBOL_GPL(nvme_kill_queues); @@ -5196,23 +5185,23 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze); void nvme_stop_queues(struct nvme_ctrl *ctrl) { - struct nvme_ns *ns; + mutex_lock(&ctrl->queue_state_lock); - down_read(&ctrl->namespaces_rwsem); - list_for_each_entry(ns, &ctrl->namespaces, list) - nvme_stop_ns_queue(ns); - up_read(&ctrl->namespaces_rwsem); + if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags)) + blk_mq_quiesce_tagset(ctrl->tagset); + + mutex_unlock(&ctrl->queue_state_lock); } EXPORT_SYMBOL_GPL(nvme_stop_queues); void nvme_start_queues(struct nvme_ctrl *ctrl) { - struct nvme_ns *ns; + mutex_lock(&ctrl->queue_state_lock); - down_read(&ctrl->namespaces_rwsem); - list_for_each_entry(ns, &ctrl->namespaces, list) - nvme_start_ns_queue(ns); - up_read(&ctrl->namespaces_rwsem); + if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags)) + blk_mq_unquiesce_tagset(ctrl->tagset); + + mutex_unlock(&ctrl->queue_state_lock); } EXPORT_SYMBOL_GPL(nvme_start_queues); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a29877217ee6..23be055fb425 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -237,6 +237,7 @@ enum nvme_ctrl_flags { NVME_CTRL_FAILFAST_EXPIRED = 0, NVME_CTRL_ADMIN_Q_STOPPED = 1, NVME_CTRL_STARTED_ONCE = 2, + NVME_CTRL_STOPPED = 3, }; struct nvme_ctrl { @@ -245,6 +246,7 @@ struct nvme_ctrl { bool identified; spinlock_t lock; struct mutex scan_lock; + struct mutex queue_state_lock; const struct nvme_ctrl_ops *ops; struct request_queue *admin_q; struct request_queue *connect_q; @@ -487,7 +489,6 @@ struct nvme_ns { #define NVME_NS_ANA_PENDING 2 #define NVME_NS_FORCE_RO 3 #define NVME_NS_READY 4 -#define NVME_NS_STOPPED 5 struct cdev cdev; struct device cdev_device;