mbox series

[0/2] blk-mq: fix handling cpu hotplug

Message ID 20200605114410.2416726-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series blk-mq: fix handling cpu hotplug | expand

Message

Ming Lei June 5, 2020, 11:44 a.m. UTC
Hi Jens,

The 1st patch avoids to fail driver tag allocation because of inactive
hctx, so hang risk can be killed during cpu hotplug.

The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
requests before one hctx becomes inactive.

Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
offline").

John has verified that the two can fix his request timeout issue during
cpu hotplug.

Christoph Hellwig (1):
  blk-mq: split out a __blk_mq_get_driver_tag helper

Ming Lei (1):
  blk-mq: fix blk_mq_all_tag_iter

 block/blk-mq-tag.c | 39 ++++++++++++++++++++++++++++++++++++---
 block/blk-mq-tag.h |  8 ++++++++
 block/blk-mq.c     | 29 -----------------------------
 block/blk-mq.h     |  1 -
 4 files changed, 44 insertions(+), 33 deletions(-)

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Garry <john.garry@huawei.com>

Comments

John Garry June 5, 2020, 11:49 a.m. UTC | #1
On 05/06/2020 12:44, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch avoids to fail driver tag allocation because of inactive
> hctx, so hang risk can be killed during cpu hotplug.
> 
> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
> requests before one hctx becomes inactive.
> 
> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> offline").
> 
> John has verified that the two can fix his request timeout issue during
> cpu hotplug.

But let me test it again my afternoon. My test branch earlier had some
debug stuff.

Cheers

> 
> Christoph Hellwig (1):
>    blk-mq: split out a __blk_mq_get_driver_tag helper
> 
> Ming Lei (1):
>    blk-mq: fix blk_mq_all_tag_iter
> 
>   block/blk-mq-tag.c | 39 ++++++++++++++++++++++++++++++++++++---
>   block/blk-mq-tag.h |  8 ++++++++
>   block/blk-mq.c     | 29 -----------------------------
>   block/blk-mq.h     |  1 -
>   4 files changed, 44 insertions(+), 33 deletions(-)
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: John Garry <john.garry@huawei.com>
> 
>
John Garry June 5, 2020, 4:08 p.m. UTC | #2
On 05/06/2020 12:49, John Garry wrote:
> On 05/06/2020 12:44, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch avoids to fail driver tag allocation because of inactive
>> hctx, so hang risk can be killed during cpu hotplug.
>>
>> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
>> requests before one hctx becomes inactive.
>>
>> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
>> offline").
>>
>> John has verified that the two can fix his request timeout issue during
>> cpu hotplug.
> 
> But let me test it again my afternoon. My test branch earlier had some
> debug stuff.
> 

it looks ok, so the tested-by stands.

BTW, do you agree that the following comment from 
__blk_mq_run_hw_queue() is now stale:

"... we depend on blk-mq timeout handler to handle dispatched requests 
to this hctx"

Thanks,
John
Jens Axboe June 7, 2020, 2:57 p.m. UTC | #3
On 6/5/20 5:44 AM, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch avoids to fail driver tag allocation because of inactive
> hctx, so hang risk can be killed during cpu hotplug.
> 
> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
> requests before one hctx becomes inactive.
> 
> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> offline").
> 
> John has verified that the two can fix his request timeout issue during
> cpu hotplug.

Thanks, applied.