diff mbox series

[1/1] blk-mq: do not splice ctx->rq_lists[type] to hctx->dispatch if ctx is not mapped to hctx

Message ID 1554721973-32456-1-git-send-email-dongli.zhang@oracle.com (mailing list archive)
State New, archived
Headers show
Series [1/1] blk-mq: do not splice ctx->rq_lists[type] to hctx->dispatch if ctx is not mapped to hctx | expand

Commit Message

Dongli Zhang April 8, 2019, 11:12 a.m. UTC
When a cpu is offline, blk_mq_hctx_notify_dead() is called once for each
hctx for the offline cpu.

While blk_mq_hctx_notify_dead() is used to splice all ctx->rq_lists[type]
to hctx->dispatch, it never checks whether the ctx is already mapped to the
hctx.

For example, on a VM (with nvme) of 4 cpu, to offline cpu 2 out of the
4 cpu (0-3), blk_mq_hctx_notify_dead() is called once for each io queue
hctx:

1st: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 3
2nd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 2
3rd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 1
4th: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 0

Although blk_mq_ctx->cpu = 2 is only mapped to blk_mq_hw_ctx->queue_num = 2
in this case, its ctx->rq_lists[type] will however be moved to
blk_mq_hw_ctx->queue_num = 3 during the 1st call of
blk_mq_hctx_notify_dead().

This patch would return and go ahead to next call of
blk_mq_hctx_notify_dead() if ctx is not mapped to hctx.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Dongli Zhang April 10, 2019, 5:38 p.m. UTC | #1
Hi Jens,

On 04/08/2019 07:12 PM, Dongli Zhang wrote:
> When a cpu is offline, blk_mq_hctx_notify_dead() is called once for each
> hctx for the offline cpu.
> 
> While blk_mq_hctx_notify_dead() is used to splice all ctx->rq_lists[type]
> to hctx->dispatch, it never checks whether the ctx is already mapped to the
> hctx.
> 
> For example, on a VM (with nvme) of 4 cpu, to offline cpu 2 out of the
> 4 cpu (0-3), blk_mq_hctx_notify_dead() is called once for each io queue
> hctx:
> 
> 1st: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 3
> 2nd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 2
> 3rd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 1
> 4th: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 0
> 
> Although blk_mq_ctx->cpu = 2 is only mapped to blk_mq_hw_ctx->queue_num = 2
> in this case, its ctx->rq_lists[type] will however be moved to
> blk_mq_hw_ctx->queue_num = 3 during the 1st call of
> blk_mq_hctx_notify_dead().
> 
> This patch would return and go ahead to next call of
> blk_mq_hctx_notify_dead() if ctx is not mapped to hctx.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

Would you consider this one?

In addition to Ming's Reviewed-by, there is another Reviewed-by from Keith as in
below link.

https://lore.kernel.org/linux-block/20190408153627.GF32498@localhost.localdomain/

Thank you very much!

Dongli Zhang

> ---
>  block/blk-mq.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a935483..9612746 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2219,6 +2219,10 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	enum hctx_type type;
>  
>  	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +
> +	if (!cpumask_test_cpu(cpu, hctx->cpumask))
> +		return 0;
> +
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
>  	type = hctx->type;
>  
>
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a935483..9612746 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2219,6 +2219,10 @@  static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	enum hctx_type type;
 
 	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+
+	if (!cpumask_test_cpu(cpu, hctx->cpumask))
+		return 0;
+
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 	type = hctx->type;