Message ID | 3a5750bd-c4e6-8f41-1dc0-803ba8da944b@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 8/16/2016 11:23 AM, Khan, Imran wrote: > On 8/5/2016 12:49 PM, Khan, Imran wrote: >> On 8/1/2016 2:58 PM, Khan, Imran wrote: >>> On 7/30/2016 7:54 AM, Akinobu Mita wrote: >>>> 2016-07-28 22:18 GMT+09:00 Khan, Imran <kimran@codeaurora.org>: >>>>> >>>>> Hi, >>>>> >>>>> Recently we have observed some increased latency in CPU hotplug >>>>> event in CPU online path. For online latency we see that block >>>>> layer is executing notification handler for CPU_UP_PREPARE event >>>>> and this in turn waits for RCU grace period resulting (sometimes) >>>>> in an execution time of 15-20 ms for this notification handler. >>>>> This change was not there in 3.18 kernel but is present in 4.4 >>>>> kernel and was introduced by following commit: >>>>> >>>>> >>>>> commit 5778322e67ed34dc9f391a4a5cbcbb856071ceba >>>>> Author: Akinobu Mita <akinobu.mita@gmail.com> >>>>> Date: Sun Sep 27 02:09:23 2015 +0900 >>>>> >>>>> blk-mq: avoid inserting requests before establishing new mapping >>>> >>>> ... >>>> >>>>> Upon reverting this commit I could see an improvement of 15-20 ms in >>>>> online latency. So I am looking for some help in analyzing the effects >>>>> of reverting this or should some other approach to reduce the online >>>>> latency must be taken. >>>> >>>> Can you observe the difference in online latency by removing >>>> get_online_cpus() and put_online_cpus() pair in blk_mq_init_allocated_queue() >>>> instead of full reverting the commit? >>>> >>> Hi Akinobu, >>> I tried your suggestion but could not achieve any improvement. Actually the snippet that is causing the change in latency is the following one : >>> >>> list_for_each_entry(q, &all_q_list, all_q_node) { >>> blk_mq_freeze_queue_wait(q); >>> >>> /* >>> * timeout handler can't touch hw queue during the >>> * reinitialization >>> */ >>> del_timer_sync(&q->timeout); >>> } >>> >>> I understand that this is getting executed now for CPU_UP_PREPARE as well resulting in >>> increased latency in the cpu online path. I am trying to reduce this latency while keeping the >>> purpose of this commit intact. I would welcome further suggestions/feedback in this regard. >>> >> Hi Akinobu, >> >> I am not able to reduce the cpu online latency with this patch, could you please let me know what >> functionality will be broken, if we avoid this patch in our kernel. Also if you have some other >> suggestions towards improving this patch please let me know. >> > After moving the remapping of queues to block layer's kworker I see that online latency has improved > while offline latency remains the same. As the freezing of queues happens in the context of block layer's > worker, I think it would be better to do the remapping in the same context and then go ahead with freezing. > In this regard I have made following change: > > commit b2131b86eeef4c5b1f8adaf7a53606301aa6b624 > Author: Imran Khan <kimran@codeaurora.org> > Date: Fri Aug 12 19:59:47 2016 +0530 > > blk-mq: Move block queue remapping from cpu hotplug path > > During a cpu hotplug, the hardware and software contexts mappings > need to be updated in order to take into account requests > submitted for the hotadded CPU. But if this mapping is done > in hotplug notifier, it deteriorates the hotplug latency. > So move the block queue remapping to block layer worker which > results in significant improvements in hotplug latency. > > Change-Id: I01ac83178ce95c3a4e3b7b1b286eda65ff34e8c4 > Signed-off-by: Imran Khan <kimran@codeaurora.org> > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 6d6f8fe..06fcf89 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -22,7 +22,11 @@ > #include <linux/sched/sysctl.h> > #include <linux/delay.h> > #include <linux/crash_dump.h> > - > +#include <linux/kthread.h> > +#include <linux/mutex.h> > +#include <linux/sched.h> > +#include <linux/kthread.h> > +#include <linux/mutex.h> > #include <trace/events/block.h> > > #include <linux/blk-mq.h> > @@ -32,10 +36,18 @@ > > static DEFINE_MUTEX(all_q_mutex); > static LIST_HEAD(all_q_list); > - > static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx); > > /* > + * New online cpumask which is going to be set in this hotplug event. > + * Declare this cpumasks as global as cpu-hotplug operation is invoked > + * one-by-one and dynamically allocating this could result in a failure. > + */ > +static struct cpumask online_new; > + > +static struct work_struct blk_mq_remap_work; > + > +/* > * Check if any of the ctx's have pending work in this hardware queue > */ > static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) > @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q, > static int blk_mq_queue_reinit_notify(struct notifier_block *nb, > unsigned long action, void *hcpu) > { > - struct request_queue *q; > int cpu = (unsigned long)hcpu; > - /* > - * New online cpumask which is going to be set in this hotplug event. > - * Declare this cpumasks as global as cpu-hotplug operation is invoked > - * one-by-one and dynamically allocating this could result in a failure. > - */ > - static struct cpumask online_new; > > /* > * Before hotadded cpu starts handling requests, new mappings must > @@ -2155,43 +2160,17 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb, > case CPU_DEAD: > case CPU_UP_CANCELED: > cpumask_copy(&online_new, cpu_online_mask); > + kblockd_schedule_work(&blk_mq_remap_work); > break; > case CPU_UP_PREPARE: > cpumask_copy(&online_new, cpu_online_mask); > cpumask_set_cpu(cpu, &online_new); > + kblockd_schedule_work(&blk_mq_remap_work); > break; > default: > return NOTIFY_OK; > } > > - mutex_lock(&all_q_mutex); > - > - /* > - * We need to freeze and reinit all existing queues. Freezing > - * involves synchronous wait for an RCU grace period and doing it > - * one by one may take a long time. Start freezing all queues in > - * one swoop and then wait for the completions so that freezing can > - * take place in parallel. > - */ > - list_for_each_entry(q, &all_q_list, all_q_node) > - blk_mq_freeze_queue_start(q); > - list_for_each_entry(q, &all_q_list, all_q_node) { > - blk_mq_freeze_queue_wait(q); > - > - /* > - * timeout handler can't touch hw queue during the > - * reinitialization > - */ > - del_timer_sync(&q->timeout); > - } > - > - list_for_each_entry(q, &all_q_list, all_q_node) > - blk_mq_queue_reinit(q, &online_new); > - > - list_for_each_entry(q, &all_q_list, all_q_node) > - blk_mq_unfreeze_queue(q); > - > - mutex_unlock(&all_q_mutex); > return NOTIFY_OK; > } > > @@ -2357,11 +2336,48 @@ void blk_mq_enable_hotplug(void) > mutex_unlock(&all_q_mutex); > } > > +static void blk_remap_work(struct work_struct *work) > +{ > + struct request_queue *q; > + > + /* Start the task of queue remapping */ > + mutex_lock(&all_q_mutex); > + > + /* > + * We need to freeze and reinit all existing queues. Freezing > + * involves synchronous wait for an RCU grace period and doing it > + * one by one may take a long time. Start freezing all queues in > + * one swoop and then wait for the completions so that freezing can > + * take place in parallel. > + */ > + list_for_each_entry(q, &all_q_list, all_q_node) > + blk_mq_freeze_queue_start(q); > + list_for_each_entry(q, &all_q_list, all_q_node) { > + blk_mq_freeze_queue_wait(q); > + > + /* > + * timeout handler can't touch hw queue during the > + * reinitialization > + */ > + del_timer_sync(&q->timeout); > + } > + > + list_for_each_entry(q, &all_q_list, all_q_node) > + blk_mq_queue_reinit(q, &online_new); > + > + list_for_each_entry(q, &all_q_list, all_q_node) > + blk_mq_unfreeze_queue(q); > + > + mutex_unlock(&all_q_mutex); > +} > + > + > static int __init blk_mq_init(void) > { > blk_mq_cpu_init(); > > hotcpu_notifier(blk_mq_queue_reinit_notify, 0); > + INIT_WORK(&blk_mq_remap_work, blk_remap_work); > > return 0; > } > > I have run overnight tests of fsstress and multiple dd commands along with cpu hotplugging. > Could you please review this change and let me know if you see any issues with this change or if I need > to test some specific corner cases to validate this change. > Could someone kindly review this change and provide feedback?
diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..06fcf89 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -22,7 +22,11 @@ #include <linux/sched/sysctl.h> #include <linux/delay.h> #include <linux/crash_dump.h> - +#include <linux/kthread.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/kthread.h> +#include <linux/mutex.h> #include <trace/events/block.h> #include <linux/blk-mq.h> @@ -32,10 +36,18 @@ static DEFINE_MUTEX(all_q_mutex); static LIST_HEAD(all_q_list); - static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx); /* + * New online cpumask which is going to be set in this hotplug event. + * Declare this cpumasks as global as cpu-hotplug operation is invoked + * one-by-one and dynamically allocating this could result in a failure. + */ +static struct cpumask online_new; + +static struct work_struct blk_mq_remap_work; + +/* * Check if any of the ctx's have pending work in this hardware queue */ static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q, static int blk_mq_queue_reinit_notify(struct notifier_block *nb, unsigned long action, void *hcpu) { - struct request_queue *q; int cpu = (unsigned long)hcpu; - /* - * New online cpumask which is going to be set in this hotplug event. - * Declare this cpumasks as global as cpu-hotplug operation is invoked - * one-by-one and dynamically allocating this could result in a failure. - */ - static struct cpumask online_new; /* * Before hotadded cpu starts handling requests, new mappings must @@ -2155,43 +2160,17 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb, case CPU_DEAD: case CPU_UP_CANCELED: cpumask_copy(&online_new, cpu_online_mask); + kblockd_schedule_work(&blk_mq_remap_work); break; case CPU_UP_PREPARE: cpumask_copy(&online_new, cpu_online_mask); cpumask_set_cpu(cpu, &online_new); + kblockd_schedule_work(&blk_mq_remap_work); break; default: return NOTIFY_OK; } - mutex_lock(&all_q_mutex); - - /* - * We need to freeze and reinit all existing queues. Freezing - * involves synchronous wait for an RCU grace period and doing it - * one by one may take a long time. Start freezing all queues in - * one swoop and then wait for the completions so that freezing can - * take place in parallel. - */ - list_for_each_entry(q, &all_q_list, all_q_node) - blk_mq_freeze_queue_start(q); - list_for_each_entry(q, &all_q_list, all_q_node) { - blk_mq_freeze_queue_wait(q); - - /* - * timeout handler can't touch hw queue during the - * reinitialization - */ - del_timer_sync(&q->timeout); - } - - list_for_each_entry(q, &all_q_list, all_q_node) - blk_mq_queue_reinit(q, &online_new); - - list_for_each_entry(q, &all_q_list, all_q_node) - blk_mq_unfreeze_queue(q); - - mutex_unlock(&all_q_mutex); return NOTIFY_OK; } @@ -2357,11 +2336,48 @@ void blk_mq_enable_hotplug(void) mutex_unlock(&all_q_mutex); } +static void blk_remap_work(struct work_struct *work) +{ + struct request_queue *q; + + /* Start the task of queue remapping */ + mutex_lock(&all_q_mutex); + + /* + * We need to freeze and reinit all existing queues. Freezing + * involves synchronous wait for an RCU grace period and doing it + * one by one may take a long time. Start freezing all queues in + * one swoop and then wait for the completions so that freezing can + * take place in parallel. + */ + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_freeze_queue_start(q); + list_for_each_entry(q, &all_q_list, all_q_node) { + blk_mq_freeze_queue_wait(q); + + /* + * timeout handler can't touch hw queue during the + * reinitialization + */ + del_timer_sync(&q->timeout); + } + + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_queue_reinit(q, &online_new); + + list_for_each_entry(q, &all_q_list, all_q_node) + blk_mq_unfreeze_queue(q); + + mutex_unlock(&all_q_mutex); +} + + static int __init blk_mq_init(void) { blk_mq_cpu_init(); hotcpu_notifier(blk_mq_queue_reinit_notify, 0); + INIT_WORK(&blk_mq_remap_work, blk_remap_work); return 0; }