Message ID | a11bb09a-6207-f4a8-cfd4-4fe5627c1700@sandisk.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 10/01/16 15:56, Ming Lei wrote: >> >> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), >> the >> above code needn't to be duplicated any more. > > > Hello Ming, > > Can you have a look at the attached patch? That patch uses an srcu read lock > for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set. That is much cleaner now. > Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like > previous versions, this patch has been tested. I think the flag of QUEUE_FLAG_QUIESCING is still needed because we have to set this flag to prevent new coming .queue_rq() from being run, and synchronize_srcu() won't wait for completion of that at all (see section of 'Update-Side Primitives' in [1]). [1] https://lwn.net/Articles/202847/
On 10/04/16 21:32, Ming Lei wrote: > On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche > <bart.vanassche@sandisk.com> wrote: >> On 10/01/16 15:56, Ming Lei wrote: >>> >>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), >>> the above code needn't to be duplicated any more. >> >> Can you have a look at the attached patch? That patch uses an srcu read lock >> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set. > > That is much cleaner now. > >> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like >> previous versions, this patch has been tested. > > I think the flag of QUEUE_FLAG_QUIESCING is still needed because we > have to set this flag to prevent new coming .queue_rq() from being run, > and synchronize_srcu() won't wait for completion of that at all (see > section of 'Update-Side Primitives' in [1]). > > [1] https://lwn.net/Articles/202847/ Hello Ming, How about using the existing flag BLK_MQ_S_STOPPED instead of introducing a new QUEUE_FLAG_QUIESCING flag? From the comment above blk_mq_quiesce_queue() in the patch that was attached to my previous e-mail: "Additionally, it is not prevented that new queue_rq() calls occur unless the queue has been stopped first." Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 5, 2016 at 10:46 PM, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 10/04/16 21:32, Ming Lei wrote: >> >> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche >> <bart.vanassche@sandisk.com> wrote: >>> >>> On 10/01/16 15:56, Ming Lei wrote: >>>> >>>> >>>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), >>>> the above code needn't to be duplicated any more. >>> >>> >>> Can you have a look at the attached patch? That patch uses an srcu read >>> lock >>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been >>> set. >> >> >> That is much cleaner now. >> >>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like >>> previous versions, this patch has been tested. >> >> >> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we >> have to set this flag to prevent new coming .queue_rq() from being run, >> and synchronize_srcu() won't wait for completion of that at all (see >> section of 'Update-Side Primitives' in [1]). >> >> [1] https://lwn.net/Articles/202847/ > > > Hello Ming, > > How about using the existing flag BLK_MQ_S_STOPPED instead of introducing a > new QUEUE_FLAG_QUIESCING flag? From the comment above blk_mq_quiesce_queue() That looks fine, and we need to stop direct issue first after hw queue becomes BLK_MQ_S_STOPPED. > in the patch that was attached to my previous e-mail: "Additionally, it is > not prevented that new queue_rq() calls occur unless the queue has been > stopped first." > > Thanks, > > Bart.
> Hello Ming, > > Can you have a look at the attached patch? That patch uses an srcu read > lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has > been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. > Just like previous versions, this patch has been tested. Hey Bart, Do we care about the synchronization of queue_rq and/or blk_mq_run_hw_queue of the hctx is not stopped? I'm wandering if we can avoid introducing new barriers in the submission path of its not absolutely needed. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/05/2016 11:14 AM, Sagi Grimberg wrote: >> Hello Ming, >> >> Can you have a look at the attached patch? That patch uses an srcu read >> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has >> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. >> Just like previous versions, this patch has been tested. > > Hey Bart, > > Do we care about the synchronization of queue_rq and/or > blk_mq_run_hw_queue of the hctx is not stopped? > > I'm wandering if we can avoid introducing new barriers in the > submission path of its not absolutely needed. Hello Sagi, I'm not sure whether the new blk_quiesce_queue() function is useful without stopping all hardware contexts first. In other words, in my view setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is necessary. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Hello Ming, >>> >>> Can you have a look at the attached patch? That patch uses an srcu read >>> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has >>> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. >>> Just like previous versions, this patch has been tested. >> >> Hey Bart, >> >> Do we care about the synchronization of queue_rq and/or >> blk_mq_run_hw_queue of the hctx is not stopped? >> >> I'm wandering if we can avoid introducing new barriers in the >> submission path of its not absolutely needed. > > Hello Sagi, Hey Bart, > > I'm not sure whether the new blk_quiesce_queue() function is useful > without stopping all hardware contexts first. In other words, in my view > setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is > sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is > necessary. I was referring to weather we can take srcu in the submission path conditional of the hctx being STOPPED? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/05/2016 12:11 PM, Sagi Grimberg wrote: > I was referring to weather we can take srcu in the submission path > conditional of the hctx being STOPPED? Hello Sagi, Regarding run-time overhead: * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is translated into preempt_disable() with preemption enabled. The latter function modifies a per-cpu variable. * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the compiler is told to reread the hctx flags (READ_ONCE()) and if the compiler and CPU are told not to reorder test_bit() with the memory accesses in (s)rcu_read_lock(). To avoid races BLK_MQ_S_STOPPED will have to be tested a second time after the lock has been obtained, similar to the double-checked-locking pattern. * srcu_read_lock() reads a word from the srcu structure, disables preemption, calls __srcu_read_lock() and re-enables preemption. The latter function increments two CPU-local variables and triggers a memory barrier (smp_mp()). Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the code more complicated. Going back to the implementation that calls rcu_read_lock() if .queue_rq() won't sleep will result in an implementation that is easier to read and to verify. If I overlooked something, please let me know. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 10/05/2016 12:11 PM, Sagi Grimberg wrote: >> >> I was referring to weather we can take srcu in the submission path >> conditional of the hctx being STOPPED? > > > Hello Sagi, > > Regarding run-time overhead: > * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is > translated into preempt_disable() with preemption enabled. The latter > function modifies a per-cpu variable. > * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only > safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the > compiler is told to reread the hctx flags (READ_ONCE()) and if the > compiler and CPU are told not to reorder test_bit() with the > memory accesses in (s)rcu_read_lock(). To avoid races > BLK_MQ_S_STOPPED will have to be tested a second time after the lock > has been obtained, similar to the double-checked-locking pattern. > * srcu_read_lock() reads a word from the srcu structure, disables > preemption, calls __srcu_read_lock() and re-enables preemption. The > latter function increments two CPU-local variables and triggers a > memory barrier (smp_mp()). We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING, by putting *_read_lock() and *_read_unlock() into two wrappers, which should minimize the cost of srcu read lock & unlock and the code is still easy to read & verify. > > Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the > code more complicated. Going back to the implementation that calls > rcu_read_lock() if .queue_rq() won't sleep will result in an implementation > that is easier to read and to verify. Yeah, I agree. Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/05/2016 03:49 PM, Ming Lei wrote: > We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING, > by putting *_read_lock() and *_read_unlock() into two wrappers, which > should minimize the cost of srcu read lock & unlock and the code is still easy > to read & verify. Hello Ming, The lock checking algorithms in the sparse and smatch static checkers are unable to deal with code of the type "if (condition) (un)lock()". So unless someone has a better proposal my preference is to use the approach from the patch at the start of this e-mail thread. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 25f02ed7ab7b2308fd18b89d180c0c613e55d416 Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bart.vanassche@sandisk.com> Date: Tue, 27 Sep 2016 10:52:36 -0700 Subject: [PATCH] blk-mq: Introduce blk_mq_quiesce_queue() blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations have finished. This function does *not* wait until all outstanding requests have finished (this means invocation of request.end_io()). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> --- block/blk-mq.c | 40 ++++++++++++++++++++++++++++++++++------ include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 1 + 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index d8c45de..38ae685 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -115,6 +115,23 @@ void blk_mq_unfreeze_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); +/** + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished + * + * Note: this function does not prevent that the struct request end_io() + * callback function is invoked. Additionally, it is not prevented that + * new queue_rq() calls occur unless the queue has been stopped first. + */ +void blk_mq_quiesce_queue(struct request_queue *q) +{ + struct blk_mq_hw_ctx *hctx; + unsigned int i; + + queue_for_each_hw_ctx(q, hctx, i) + synchronize_srcu(&hctx->queue_rq_srcu); +} +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); + void blk_mq_wake_waiters(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; @@ -789,11 +806,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) LIST_HEAD(rq_list); LIST_HEAD(driver_list); struct list_head *dptr; - int queued; + int queued, srcu_idx; if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; + srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu); + WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && cpu_online(hctx->next_cpu)); @@ -885,6 +904,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) **/ blk_mq_run_hw_queue(hctx, true); } + + srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx); } /* @@ -1298,7 +1319,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA); struct blk_map_ctx data; struct request *rq; - unsigned int request_count = 0; + unsigned int request_count = 0, srcu_idx; struct blk_plug *plug; struct request *same_queue_rq = NULL; blk_qc_t cookie; @@ -1341,7 +1362,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) blk_mq_bio_to_request(rq, bio); /* - * We do limited pluging. If the bio can be merged, do that. + * We do limited plugging. If the bio can be merged, do that. * Otherwise the existing request in the plug list will be * issued. So the plug list will have one request at most */ @@ -1361,9 +1382,12 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) blk_mq_put_ctx(data.ctx); if (!old_rq) goto done; - if (!blk_mq_direct_issue_request(old_rq, &cookie)) - goto done; - blk_mq_insert_request(old_rq, false, true, true); + + srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu); + if (blk_mq_direct_issue_request(old_rq, &cookie) != 0) + blk_mq_insert_request(old_rq, false, true, true); + srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx); + goto done; } @@ -1659,6 +1683,8 @@ static void blk_mq_exit_hctx(struct request_queue *q, if (set->ops->exit_hctx) set->ops->exit_hctx(hctx, hctx_idx); + cleanup_srcu_struct(&hctx->queue_rq_srcu); + blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier); blk_free_flush_queue(hctx->fq); sbitmap_free(&hctx->ctx_map); @@ -1741,6 +1767,8 @@ static int blk_mq_init_hctx(struct request_queue *q, flush_start_tag + hctx_idx, node)) goto free_fq; + init_srcu_struct(&hctx->queue_rq_srcu); + return 0; free_fq: diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 368c460d..b2ccd3c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -3,6 +3,7 @@ #include <linux/blkdev.h> #include <linux/sbitmap.h> +#include <linux/srcu.h> struct blk_mq_tags; struct blk_flush_queue; @@ -41,6 +42,8 @@ struct blk_mq_hw_ctx { struct blk_mq_tags *tags; + struct srcu_struct queue_rq_srcu; + unsigned long queued; unsigned long run; #define BLK_MQ_MAX_DISPATCH_ORDER 7 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c47c358..8259d87 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q); extern void __blk_run_queue_uncond(struct request_queue *q); extern void blk_run_queue(struct request_queue *); extern void blk_run_queue_async(struct request_queue *q); +extern void blk_mq_quiesce_queue(struct request_queue *q); extern int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); -- 2.9.3