[0/2] blk-mq: fix handling cpu hotplug

Message ID	20200605114410.2416726-1-ming.lei@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=tuS3=7S=vger.kernel.org=linux-block-owner@kernel.org> From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Dongli Zhang <dongli.zhang@oracle.com>, Hannes Reinecke <hare@suse.de>, Daniel Wagner <dwagner@suse.de>, Christoph Hellwig <hch@lst.de>, John Garry <john.garry@huawei.com> Subject: [PATCH 0/2] blk-mq: fix handling cpu hotplug Date: Fri, 5 Jun 2020 19:44:08 +0800 Message-Id: <20200605114410.2416726-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	blk-mq: fix handling cpu hotplug \| expand [0/2] blk-mq: fix handling cpu hotplug [1/2] blk-mq: split out a __blk_mq_get_driver_tag helper [2/2] blk-mq: fix blk_mq_all_tag_iter

Message ID

20200605114410.2416726-1-ming.lei@redhat.com (mailing list archive)

Headers

From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
        Dongli Zhang <dongli.zhang@oracle.com>,
        Hannes Reinecke <hare@suse.de>,
        Daniel Wagner <dwagner@suse.de>,
        Christoph Hellwig <hch@lst.de>,
        John Garry <john.garry@huawei.com>
Subject: [PATCH 0/2] blk-mq: fix handling cpu hotplug
Date: Fri,  5 Jun 2020 19:44:08 +0800
Message-Id: <20200605114410.2416726-1-ming.lei@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

blk-mq: fix handling cpu hotplug | expand

Message

Ming Lei June 5, 2020, 11:44 a.m. UTC

Hi Jens,

The 1st patch avoids to fail driver tag allocation because of inactive
hctx, so hang risk can be killed during cpu hotplug.

The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
requests before one hctx becomes inactive.

Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
offline").

John has verified that the two can fix his request timeout issue during
cpu hotplug.

Christoph Hellwig (1):
  blk-mq: split out a __blk_mq_get_driver_tag helper

Ming Lei (1):
  blk-mq: fix blk_mq_all_tag_iter

 block/blk-mq-tag.c | 39 ++++++++++++++++++++++++++++++++++++---
 block/blk-mq-tag.h |  8 ++++++++
 block/blk-mq.c     | 29 -----------------------------
 block/blk-mq.h     |  1 -
 4 files changed, 44 insertions(+), 33 deletions(-)

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Garry <john.garry@huawei.com>

Comments

John Garry June 5, 2020, 11:49 a.m. UTC | #1

On 05/06/2020 12:44, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch avoids to fail driver tag allocation because of inactive
> hctx, so hang risk can be killed during cpu hotplug.
> 
> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
> requests before one hctx becomes inactive.
> 
> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> offline").
> 
> John has verified that the two can fix his request timeout issue during
> cpu hotplug.

But let me test it again my afternoon. My test branch earlier had some
debug stuff.

Cheers

> 
> Christoph Hellwig (1):
>    blk-mq: split out a __blk_mq_get_driver_tag helper
> 
> Ming Lei (1):
>    blk-mq: fix blk_mq_all_tag_iter
> 
>   block/blk-mq-tag.c | 39 ++++++++++++++++++++++++++++++++++++---
>   block/blk-mq-tag.h |  8 ++++++++
>   block/blk-mq.c     | 29 -----------------------------
>   block/blk-mq.h     |  1 -
>   4 files changed, 44 insertions(+), 33 deletions(-)
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: John Garry <john.garry@huawei.com>
> 
>

John Garry June 5, 2020, 4:08 p.m. UTC | #2

On 05/06/2020 12:49, John Garry wrote:
> On 05/06/2020 12:44, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch avoids to fail driver tag allocation because of inactive
>> hctx, so hang risk can be killed during cpu hotplug.
>>
>> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
>> requests before one hctx becomes inactive.
>>
>> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
>> offline").
>>
>> John has verified that the two can fix his request timeout issue during
>> cpu hotplug.
> 
> But let me test it again my afternoon. My test branch earlier had some
> debug stuff.
> 

it looks ok, so the tested-by stands.

BTW, do you agree that the following comment from 
__blk_mq_run_hw_queue() is now stale:

"... we depend on blk-mq timeout handler to handle dispatched requests 
to this hctx"

Thanks,
John

Jens Axboe June 7, 2020, 2:57 p.m. UTC | #3

On 6/5/20 5:44 AM, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch avoids to fail driver tag allocation because of inactive
> hctx, so hang risk can be killed during cpu hotplug.
> 
> The 2nd patch fixes blk_mq_all_tag_iter so that we can drain all
> requests before one hctx becomes inactive.
> 
> Both fixes bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> offline").
> 
> John has verified that the two can fix his request timeout issue during
> cpu hotplug.

Thanks, applied.