diff mbox

blk-mq request allocation stalls [was: Re: [PATCH v3 0/8] dm: add request-based blk-mq support]

Message ID 54B043FC.8000902@kernel.dk (mailing list archive)
State Not Applicable, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Jens Axboe Jan. 9, 2015, 9:11 p.m. UTC
On 01/09/2015 02:07 PM, Jens Axboe wrote:
> On 01/09/2015 12:49 PM, Mike Snitzer wrote:
>> On Wed, Jan 07 2015 at  3:40pm -0500,
>> Keith Busch <keith.busch@intel.com> wrote:
>>
>>> On Wed, 7 Jan 2015, Bart Van Assche wrote:
>>>> On 01/06/15 17:15, Jens Axboe wrote:
>>>>> blk-mq request allocation is pretty much as optimized/fast as it
>>>>> can be.
>>>>> The slowdown must be due to one of two reasons:
>>>>>
>>>>> - A bug related to running out of requests, perhaps a missing queue
>>>>> run
>>>>> or something like that.
>>>>> - A smaller number of available requests, due to the requested
>>>>> queue depth.
>>>>>
>>>>> Looking at Barts results, it looks like it's usually fast, but
>>>>> sometimes
>>>>> very slow. That would seem to indicate it's option #1 above that is
>>>>> the
>>>>> issue. Bart, since this seems to wait for quite a bit, would it be
>>>>> possible to cat the 'tags' file for that queue when it is stuck
>>>>> like that?
>>>>
>>>> Hello Jens,
>>>>
>>>> Thanks for the assistance. Is this the output you were looking for
>>>
>>> I'm a little confused by the later comments given the below data. It
>>> says
>>> multipath_clone_and_map() is stuck at bt_get, but that doesn't block
>>> unless there are no tags available. The tags should be coming from one
>>> of dm-1's path queues, and I'm assuming these queues are provided by sdc
>>> and sdd. All their tags are free, so that looks like a missing wake_up
>>> when the queue idles.
>>
>> Like I said in an earlier email, I cannot reproduce Bart's hangs running
>> mkfs.xfs against a multipath device that is built ontop of a virtio
>> device in a KVM guest.
>>
>> But I can hit __bt_get() failures on the virtio-blk device that I'm
>> using for the root device on this guest.  Bart I'd be interested to see
>> what you get when running the attached debug patch (likely will just
>> echo the same type of info you've already provided).
>>
>> There does appear to be something weird going on with bt_get().  With
>> the debug patch I'm seeing the following when I simply run "make install"
>> of the kernel (it'll run dracut to build the initramfs, etc):
>>
>> You'll note that in all instances where __bt_get() returns -1 nr_free
>> isn't 0.
>
> Yeah, that doesn't look good. Can you try with this patch? The second
> hunk is the interesting bit, the first is more of a cleanup.

Actually, try this one instead, it should be a bit more precise than the 
first.

Comments

Mike Snitzer Jan. 9, 2015, 9:40 p.m. UTC | #1
On Fri, Jan 09 2015 at  4:11pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> 
> Actually, try this one instead, it should be a bit more precise than
> the first.
> 

Thanks for the test patch.

I'm still seeing failures that look wrong (last_tag=127 could be edge
condition not handled properly?):

[   14.254632] __bt_get: values before for loop: last_tag=127, index=3
[   14.255841] __bt_get: values after  for loop: last_tag=64, index=2
[   14.257036]
[   14.257036] bt_get: __bt_get() returned -1
[   14.258051] queue_num=0, nr_tags=128, reserved_tags=0, bits_per_word=5
[   14.259246] nr_free=128, nr_reserved=0
[   14.259963] active_queues=0

[  213.115997] __bt_get: values before for loop: last_tag=127, index=3
[  213.117115] __bt_get: values after  for loop: last_tag=96, index=3
[  213.118200]
[  213.118200] bt_get: __bt_get() returned -1
[  213.121593] queue_num=0, nr_tags=128, reserved_tags=0, bits_per_word=5
[  213.123960] nr_free=128, nr_reserved=0
[  213.125880] active_queues=0

[  239.158079] __bt_get: values before for loop: last_tag=8, index=0
[  239.160363] __bt_get: values after  for loop: last_tag=0, index=0
[  239.162896]
[  239.162896] bt_get: __bt_get() returned -1
[  239.166284] queue_num=0, nr_tags=128, reserved_tags=0, bits_per_word=5
[  239.168623] nr_free=127, nr_reserved=0
[  239.170508] active_queues=0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 60c9d4a93fe4..2e38cd118c1d 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -143,7 +143,6 @@  static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 static int __bt_get_word(struct blk_align_bitmap *bm, unsigned int last_tag)
 {
 	int tag, org_last_tag, end;
-	bool wrap = last_tag != 0;
 
 	org_last_tag = last_tag;
 	end = bm->depth;
@@ -155,15 +154,16 @@  restart:
 			 * We started with an offset, start from 0 to
 			 * exhaust the map.
 			 */
-			if (wrap) {
-				wrap = false;
+			if (org_last_tag) {
 				end = org_last_tag;
-				last_tag = 0;
+				last_tag = org_last_tag = 0;
 				goto restart;
 			}
 			return -1;
 		}
 		last_tag = tag + 1;
+		if (last_tag >= bm->depth - 1)
+			last_tag = 0;
 	} while (test_and_set_bit(tag, &bm->word));
 
 	return tag;
@@ -199,9 +199,13 @@  static int __bt_get(struct blk_mq_hw_ctx *hctx, struct blk_mq_bitmap_tags *bt,
 			goto done;
 		}
 
-		last_tag = 0;
-		if (++index >= bt->map_nr)
+		index++;
+		last_tag = (index << bt->bits_per_word);
+
+		if (index >= bt->map_nr) {
 			index = 0;
+			last_tag = 0;
+		}
 	}
 
 	*tag_cache = 0;