queue stall with blk-mq-sched

Message ID	aa1c23e1-1beb-a26d-3e77-fe78aa281771@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> To: Jens Axboe <axboe@fb.com> Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, Omar Sandoval <osandov@osandov.com> From: Hannes Reinecke <hare@suse.de> Subject: [PATCH] queue stall with blk-mq-sched Message-ID: <aa1c23e1-1beb-a26d-3e77-fe78aa281771@suse.de> Date: Tue, 24 Jan 2017 16:54:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------8DD5F5B5E441F8411716ED43" Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Message ID

aa1c23e1-1beb-a26d-3e77-fe78aa281771@suse.de (mailing list archive)

State

New, archived

Headers

To: Jens Axboe <axboe@fb.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Omar Sandoval <osandov@osandov.com>
From: Hannes Reinecke <hare@suse.de>
Subject: [PATCH] queue stall with blk-mq-sched
Message-ID: <aa1c23e1-1beb-a26d-3e77-fe78aa281771@suse.de>
Date: Tue, 24 Jan 2017 16:54:46 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.6.0
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="------------8DD5F5B5E441F8411716ED43"
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Commit Message

Hannes Reinecke Jan. 24, 2017, 3:54 p.m. UTC

Hi Jens,

I'm trying to debug a queue stall with your blk-mq-sched branch; with my
latest mpt3sas patches fio stops basically directly after starting a
sequential read :-(

I've debugged things and came up with the attached patch; we need to
restart waiters with blk_mq_tag_idle() after completing a tag.
We're already calling blk_mq_tag_busy() when fetching a tag, so I think
calling blk_mq_tag_idle() is required when retiring a tag.

However, even with the attached patch I'm seeing some queue stalls;
looks like they're related to the 'stonewall' statement in fio.

Debugging continues.

Cheers,

Hannes

Comments

Jens Axboe Jan. 24, 2017, 4:03 p.m. UTC | #1

On 01/24/2017 08:54 AM, Hannes Reinecke wrote:
> Hi Jens,
> 
> I'm trying to debug a queue stall with your blk-mq-sched branch; with my
> latest mpt3sas patches fio stops basically directly after starting a
> sequential read :-(
> 
> I've debugged things and came up with the attached patch; we need to
> restart waiters with blk_mq_tag_idle() after completing a tag.
> We're already calling blk_mq_tag_busy() when fetching a tag, so I think
> calling blk_mq_tag_idle() is required when retiring a tag.

I'll take a look at this. It sounds like all your grief is related to
shared tag maps, which I don't have anything that uses here. I'll see
if we are leaking it, you should be able to check that by reading the
'active' file in the sysfs directory.

Jens Axboe Jan. 24, 2017, 4:09 p.m. UTC | #2

On 01/24/2017 08:54 AM, Hannes Reinecke wrote:
> Hi Jens,
> 
> I'm trying to debug a queue stall with your blk-mq-sched branch; with my
> latest mpt3sas patches fio stops basically directly after starting a
> sequential read :-(
> 
> I've debugged things and came up with the attached patch; we need to
> restart waiters with blk_mq_tag_idle() after completing a tag.
> We're already calling blk_mq_tag_busy() when fetching a tag, so I think
> calling blk_mq_tag_idle() is required when retiring a tag.

The patch isn't correct, the whole point of the un-idling is that it
ISN'T happening for every request completion. Otherwise you throw
away scalability. So a queue will go into active mode on the first
request, and idle when it's been idle for a bit. The active count
is used to divide up the tags.

So I'm assuming we're missing a queue run somewhere when we fail
getting a driver tag. The latter should only happen if the target
has IO in flight already, and the restart marking should take care
of it. Obviously there's a case where that is not true, since you
are seeing stalls.

> However, even with the attached patch I'm seeing some queue stalls;
> looks like they're related to the 'stonewall' statement in fio.

I think you are heading down the wrong path. Your patch will cause
the symptoms to be a bit different, but you'll still run into cases
where we fail giving out the tag and then stall.

Hannes Reinecke Jan. 24, 2017, 6:45 p.m. UTC | #3

On 01/24/2017 05:03 PM, Jens Axboe wrote:
> On 01/24/2017 08:54 AM, Hannes Reinecke wrote:
>> Hi Jens,
>>
>> I'm trying to debug a queue stall with your blk-mq-sched branch; with my
>> latest mpt3sas patches fio stops basically directly after starting a
>> sequential read :-(
>>
>> I've debugged things and came up with the attached patch; we need to
>> restart waiters with blk_mq_tag_idle() after completing a tag.
>> We're already calling blk_mq_tag_busy() when fetching a tag, so I think
>> calling blk_mq_tag_idle() is required when retiring a tag.
>
> I'll take a look at this. It sounds like all your grief is related to
> shared tag maps, which I don't have anything that uses here. I'll see
> if we are leaking it, you should be able to check that by reading the
> 'active' file in the sysfs directory.
>
Ah. That'll explain it.

Basically _all_ my HBAs I'm testing with have shared tag maps :-(

Cheers,

Hannes

Hannes Reinecke Jan. 24, 2017, 6:49 p.m. UTC | #4

On 01/24/2017 05:09 PM, Jens Axboe wrote:
> On 01/24/2017 08:54 AM, Hannes Reinecke wrote:
>> Hi Jens,
>>
>> I'm trying to debug a queue stall with your blk-mq-sched branch; with my
>> latest mpt3sas patches fio stops basically directly after starting a
>> sequential read :-(
>>
>> I've debugged things and came up with the attached patch; we need to
>> restart waiters with blk_mq_tag_idle() after completing a tag.
>> We're already calling blk_mq_tag_busy() when fetching a tag, so I think
>> calling blk_mq_tag_idle() is required when retiring a tag.
>
> The patch isn't correct, the whole point of the un-idling is that it
> ISN'T happening for every request completion. Otherwise you throw
> away scalability. So a queue will go into active mode on the first
> request, and idle when it's been idle for a bit. The active count
> is used to divide up the tags.
>
> So I'm assuming we're missing a queue run somewhere when we fail
> getting a driver tag. The latter should only happen if the target
> has IO in flight already, and the restart marking should take care
> of it. Obviously there's a case where that is not true, since you
> are seeing stalls.
>
But what is the point in the 'blk_mq_tag_busy()' thingie then?
When will it be reset?
The only instances I've seen is that it'll be getting reset during 
resize and teardown ... hence my patch.

>> However, even with the attached patch I'm seeing some queue stalls;
>> looks like they're related to the 'stonewall' statement in fio.
>
> I think you are heading down the wrong path. Your patch will cause
> the symptoms to be a bit different, but you'll still run into cases
> where we fail giving out the tag and then stall.
>
Hehe.
How did you know that?

That's indeed what I'm seeing.

Oh well, back to the drawing board...

Cheers,

Hannes

From 82b15ff40d71aed318f9946881825f9f03ef8f48 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Tue, 24 Jan 2017 14:43:09 +0100
Subject: [PATCH] block-mq: fixup queue stall

__blk_mq_alloc_request() calls blk_mq_tag_busy(), which might result
in the queue to become blocked. So we need to call blk_mq_tag_idle()
once the tag is finished to wakeup all waiters on the queue.

Patch is relative to the blk-mq-sched branch
 
Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-mq.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 739a292..d52bcb1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -333,10 +333,12 @@  void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 {
 	const int sched_tag = rq->internal_tag;
 	struct request_queue *q = rq->q;
+	bool unbusy = false;
 
-	if (rq->rq_flags & RQF_MQ_INFLIGHT)
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
 		atomic_dec(&hctx->nr_active);
-
+		unbusy = true;
+	}
 	wbt_done(q->rq_wb, &rq->issue_stat);
 	rq->rq_flags = 0;
 
@@ -346,6 +348,9 @@  void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
 		blk_mq_sched_completed_request(hctx, rq);
+	if (unbusy)
+		blk_mq_tag_idle(hctx);
+
 	blk_queue_exit(q);
 }
 
-- 
1.8.5.6