diff mbox series

blk-mq: fix sbitmap ws_active for shared tags

Message ID c0d3f0c8-a5d7-19dd-7b7f-1d4f860c425b@kernel.dk (mailing list archive)
State New, archived
Headers show
Series blk-mq: fix sbitmap ws_active for shared tags | expand

Commit Message

Jens Axboe March 25, 2019, 4:22 p.m. UTC
We now wrap sbitmap waitqueues in an active counter, so we can avoid
iterating wakeups unless we have waiters there. This works as long as
everyone that's manipulating the waitqueues use the proper helpers. For
the tag wait case for shared tags, however, we add ourselves to the
waitqueue without incrementing/decrementing the ->ws_active count. This
means that wakeups can take a long time to happen.

Fix this by manually doing the inc/dec as needed for the wait queue
handling.

Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

Got a bug report on raid5 on 4 USB-3 attached drives which is slow with
5.0, but works fine with the ->ws_active check bypassed. Waiting for
confirmation that this patch fixes it, but I think it will.

Comments

Bart Van Assche March 25, 2019, 4:39 p.m. UTC | #1
On Mon, 2019-03-25 at 10:22 -0600, Jens Axboe wrote:
> We now wrap sbitmap waitqueues in an active counter, so we can avoid
> iterating wakeups unless we have waiters there. This works as long as
> everyone that's manipulating the waitqueues use the proper helpers. For
> the tag wait case for shared tags, however, we add ourselves to the
> waitqueue without incrementing/decrementing the ->ws_active count. This
> means that wakeups can take a long time to happen.
> 
> Fix this by manually doing the inc/dec as needed for the wait queue
> handling.
> 
> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Hi Jens,

Since commit 5d2ee7122c73 went upstream in kernel v5.0, does this patch need
a "Cc: stable" tag?

Thanks,

Bart.
Jens Axboe March 25, 2019, 4:42 p.m. UTC | #2
On 3/25/19 10:39 AM, Bart Van Assche wrote:
> On Mon, 2019-03-25 at 10:22 -0600, Jens Axboe wrote:
>> We now wrap sbitmap waitqueues in an active counter, so we can avoid
>> iterating wakeups unless we have waiters there. This works as long as
>> everyone that's manipulating the waitqueues use the proper helpers. For
>> the tag wait case for shared tags, however, we add ourselves to the
>> waitqueue without incrementing/decrementing the ->ws_active count. This
>> means that wakeups can take a long time to happen.
>>
>> Fix this by manually doing the inc/dec as needed for the wait queue
>> handling.
>>
>> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Hi Jens,
> 
> Since commit 5d2ee7122c73 went upstream in kernel v5.0, does this patch need
> a "Cc: stable" tag?

I guess it does, I'll add it.
Omar Sandoval March 25, 2019, 6:56 p.m. UTC | #3
On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote:
> We now wrap sbitmap waitqueues in an active counter, so we can avoid
> iterating wakeups unless we have waiters there. This works as long as
> everyone that's manipulating the waitqueues use the proper helpers. For
> the tag wait case for shared tags, however, we add ourselves to the
> waitqueue without incrementing/decrementing the ->ws_active count. This
> means that wakeups can take a long time to happen.
> 
> Fix this by manually doing the inc/dec as needed for the wait queue
> handling.
> 
> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers
for add/del wait queue handling")?
Jens Axboe March 25, 2019, 6:58 p.m. UTC | #4
On 3/25/19 12:56 PM, Omar Sandoval wrote:
> On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote:
>> We now wrap sbitmap waitqueues in an active counter, so we can avoid
>> iterating wakeups unless we have waiters there. This works as long as
>> everyone that's manipulating the waitqueues use the proper helpers. For
>> the tag wait case for shared tags, however, we add ourselves to the
>> waitqueue without incrementing/decrementing the ->ws_active count. This
>> means that wakeups can take a long time to happen.
>>
>> Fix this by manually doing the inc/dec as needed for the wait queue
>> handling.
>>
>> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers
> for add/del wait queue handling")?

I don't think so without adding more, which seems kind of silly for this
very specialized use case of openly manipulating the wait queues. The
blk-mq setup there is very special cased.
Omar Sandoval March 25, 2019, 7:04 p.m. UTC | #5
On Mon, Mar 25, 2019 at 12:58:47PM -0600, Jens Axboe wrote:
> On 3/25/19 12:56 PM, Omar Sandoval wrote:
> > On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote:
> >> We now wrap sbitmap waitqueues in an active counter, so we can avoid
> >> iterating wakeups unless we have waiters there. This works as long as
> >> everyone that's manipulating the waitqueues use the proper helpers. For
> >> the tag wait case for shared tags, however, we add ourselves to the
> >> waitqueue without incrementing/decrementing the ->ws_active count. This
> >> means that wakeups can take a long time to happen.
> >>
> >> Fix this by manually doing the inc/dec as needed for the wait queue
> >> handling.
> >>
> >> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers
> > for add/del wait queue handling")?
> 
> I don't think so without adding more, which seems kind of silly for this
> very specialized use case of openly manipulating the wait queues. The
> blk-mq setup there is very special cased.

Yup, I see. Assuming it fixes the issue,

Reviewed-by: Omar Sandoval <osandov@fb.com>
Jens Axboe March 25, 2019, 7:05 p.m. UTC | #6
On 3/25/19 1:04 PM, Omar Sandoval wrote:
> On Mon, Mar 25, 2019 at 12:58:47PM -0600, Jens Axboe wrote:
>> On 3/25/19 12:56 PM, Omar Sandoval wrote:
>>> On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote:
>>>> We now wrap sbitmap waitqueues in an active counter, so we can avoid
>>>> iterating wakeups unless we have waiters there. This works as long as
>>>> everyone that's manipulating the waitqueues use the proper helpers. For
>>>> the tag wait case for shared tags, however, we add ourselves to the
>>>> waitqueue without incrementing/decrementing the ->ws_active count. This
>>>> means that wakeups can take a long time to happen.
>>>>
>>>> Fix this by manually doing the inc/dec as needed for the wait queue
>>>> handling.
>>>>
>>>> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>
>>> Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers
>>> for add/del wait queue handling")?
>>
>> I don't think so without adding more, which seems kind of silly for this
>> very specialized use case of openly manipulating the wait queues. The
>> blk-mq setup there is very special cased.
> 
> Yup, I see. Assuming it fixes the issue,
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Thanks - it does fix the issue, the original reporter has since tested and
confirmed that his abysmally slow raid5 now works at full speed again.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 28080b0235f0..3ff3d7b49969 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1072,7 +1072,13 @@  static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 	hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait);
 
 	spin_lock(&hctx->dispatch_wait_lock);
-	list_del_init(&wait->entry);
+	if (!list_empty(&wait->entry)) {
+		struct sbitmap_queue *sbq;
+
+		list_del_init(&wait->entry);
+		sbq = &hctx->tags->bitmap_tags;
+		atomic_dec(&sbq->ws_active);
+	}
 	spin_unlock(&hctx->dispatch_wait_lock);
 
 	blk_mq_run_hw_queue(hctx, true);
@@ -1088,6 +1094,7 @@  static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
+	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -1110,7 +1117,7 @@  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	if (!list_empty_careful(&wait->entry))
 		return false;
 
-	wq = &bt_wait_ptr(&hctx->tags->bitmap_tags, hctx)->wait;
+	wq = &bt_wait_ptr(sbq, hctx)->wait;
 
 	spin_lock_irq(&wq->lock);
 	spin_lock(&hctx->dispatch_wait_lock);
@@ -1120,6 +1127,7 @@  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 		return false;
 	}
 
+	atomic_inc(&sbq->ws_active);
 	wait->flags &= ~WQ_FLAG_EXCLUSIVE;
 	__add_wait_queue(wq, wait);
 
@@ -1140,6 +1148,7 @@  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	 * someone else gets the wakeup.
 	 */
 	list_del_init(&wait->entry);
+	atomic_dec(&sbq->ws_active);
 	spin_unlock(&hctx->dispatch_wait_lock);
 	spin_unlock_irq(&wq->lock);