Message ID | c0d3f0c8-a5d7-19dd-7b7f-1d4f860c425b@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: fix sbitmap ws_active for shared tags | expand |
On Mon, 2019-03-25 at 10:22 -0600, Jens Axboe wrote: > We now wrap sbitmap waitqueues in an active counter, so we can avoid > iterating wakeups unless we have waiters there. This works as long as > everyone that's manipulating the waitqueues use the proper helpers. For > the tag wait case for shared tags, however, we add ourselves to the > waitqueue without incrementing/decrementing the ->ws_active count. This > means that wakeups can take a long time to happen. > > Fix this by manually doing the inc/dec as needed for the wait queue > handling. > > Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") > Signed-off-by: Jens Axboe <axboe@kernel.dk> Hi Jens, Since commit 5d2ee7122c73 went upstream in kernel v5.0, does this patch need a "Cc: stable" tag? Thanks, Bart.
On 3/25/19 10:39 AM, Bart Van Assche wrote: > On Mon, 2019-03-25 at 10:22 -0600, Jens Axboe wrote: >> We now wrap sbitmap waitqueues in an active counter, so we can avoid >> iterating wakeups unless we have waiters there. This works as long as >> everyone that's manipulating the waitqueues use the proper helpers. For >> the tag wait case for shared tags, however, we add ourselves to the >> waitqueue without incrementing/decrementing the ->ws_active count. This >> means that wakeups can take a long time to happen. >> >> Fix this by manually doing the inc/dec as needed for the wait queue >> handling. >> >> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Hi Jens, > > Since commit 5d2ee7122c73 went upstream in kernel v5.0, does this patch need > a "Cc: stable" tag? I guess it does, I'll add it.
On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote: > We now wrap sbitmap waitqueues in an active counter, so we can avoid > iterating wakeups unless we have waiters there. This works as long as > everyone that's manipulating the waitqueues use the proper helpers. For > the tag wait case for shared tags, however, we add ourselves to the > waitqueue without incrementing/decrementing the ->ws_active count. This > means that wakeups can take a long time to happen. > > Fix this by manually doing the inc/dec as needed for the wait queue > handling. > > Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") > Signed-off-by: Jens Axboe <axboe@kernel.dk> Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers for add/del wait queue handling")?
On 3/25/19 12:56 PM, Omar Sandoval wrote: > On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote: >> We now wrap sbitmap waitqueues in an active counter, so we can avoid >> iterating wakeups unless we have waiters there. This works as long as >> everyone that's manipulating the waitqueues use the proper helpers. For >> the tag wait case for shared tags, however, we add ourselves to the >> waitqueue without incrementing/decrementing the ->ws_active count. This >> means that wakeups can take a long time to happen. >> >> Fix this by manually doing the inc/dec as needed for the wait queue >> handling. >> >> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers > for add/del wait queue handling")? I don't think so without adding more, which seems kind of silly for this very specialized use case of openly manipulating the wait queues. The blk-mq setup there is very special cased.
On Mon, Mar 25, 2019 at 12:58:47PM -0600, Jens Axboe wrote: > On 3/25/19 12:56 PM, Omar Sandoval wrote: > > On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote: > >> We now wrap sbitmap waitqueues in an active counter, so we can avoid > >> iterating wakeups unless we have waiters there. This works as long as > >> everyone that's manipulating the waitqueues use the proper helpers. For > >> the tag wait case for shared tags, however, we add ourselves to the > >> waitqueue without incrementing/decrementing the ->ws_active count. This > >> means that wakeups can take a long time to happen. > >> > >> Fix this by manually doing the inc/dec as needed for the wait queue > >> handling. > >> > >> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers > > for add/del wait queue handling")? > > I don't think so without adding more, which seems kind of silly for this > very specialized use case of openly manipulating the wait queues. The > blk-mq setup there is very special cased. Yup, I see. Assuming it fixes the issue, Reviewed-by: Omar Sandoval <osandov@fb.com>
On 3/25/19 1:04 PM, Omar Sandoval wrote: > On Mon, Mar 25, 2019 at 12:58:47PM -0600, Jens Axboe wrote: >> On 3/25/19 12:56 PM, Omar Sandoval wrote: >>> On Mon, Mar 25, 2019 at 10:22:50AM -0600, Jens Axboe wrote: >>>> We now wrap sbitmap waitqueues in an active counter, so we can avoid >>>> iterating wakeups unless we have waiters there. This works as long as >>>> everyone that's manipulating the waitqueues use the proper helpers. For >>>> the tag wait case for shared tags, however, we add ourselves to the >>>> waitqueue without incrementing/decrementing the ->ws_active count. This >>>> means that wakeups can take a long time to happen. >>>> >>>> Fix this by manually doing the inc/dec as needed for the wait queue >>>> handling. >>>> >>>> Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> >>> Can this use the helpers we added in 9f6b7ef6c3eb ("sbitmap: add helpers >>> for add/del wait queue handling")? >> >> I don't think so without adding more, which seems kind of silly for this >> very specialized use case of openly manipulating the wait queues. The >> blk-mq setup there is very special cased. > > Yup, I see. Assuming it fixes the issue, > > Reviewed-by: Omar Sandoval <osandov@fb.com> Thanks - it does fix the issue, the original reporter has since tested and confirmed that his abysmally slow raid5 now works at full speed again.
diff --git a/block/blk-mq.c b/block/blk-mq.c index 28080b0235f0..3ff3d7b49969 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1072,7 +1072,13 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait); spin_lock(&hctx->dispatch_wait_lock); - list_del_init(&wait->entry); + if (!list_empty(&wait->entry)) { + struct sbitmap_queue *sbq; + + list_del_init(&wait->entry); + sbq = &hctx->tags->bitmap_tags; + atomic_dec(&sbq->ws_active); + } spin_unlock(&hctx->dispatch_wait_lock); blk_mq_run_hw_queue(hctx, true); @@ -1088,6 +1094,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, struct request *rq) { + struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags; struct wait_queue_head *wq; wait_queue_entry_t *wait; bool ret; @@ -1110,7 +1117,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, if (!list_empty_careful(&wait->entry)) return false; - wq = &bt_wait_ptr(&hctx->tags->bitmap_tags, hctx)->wait; + wq = &bt_wait_ptr(sbq, hctx)->wait; spin_lock_irq(&wq->lock); spin_lock(&hctx->dispatch_wait_lock); @@ -1120,6 +1127,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, return false; } + atomic_inc(&sbq->ws_active); wait->flags &= ~WQ_FLAG_EXCLUSIVE; __add_wait_queue(wq, wait); @@ -1140,6 +1148,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, * someone else gets the wakeup. */ list_del_init(&wait->entry); + atomic_dec(&sbq->ws_active); spin_unlock(&hctx->dispatch_wait_lock); spin_unlock_irq(&wq->lock);
We now wrap sbitmap waitqueues in an active counter, so we can avoid iterating wakeups unless we have waiters there. This works as long as everyone that's manipulating the waitqueues use the proper helpers. For the tag wait case for shared tags, however, we add ourselves to the waitqueue without incrementing/decrementing the ->ws_active count. This means that wakeups can take a long time to happen. Fix this by manually doing the inc/dec as needed for the wait queue handling. Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check") Signed-off-by: Jens Axboe <axboe@kernel.dk> --- Got a bug report on raid5 on 4 USB-3 attached drives which is slow with 5.0, but works fine with the ->ws_active check bypassed. Waiting for confirmation that this patch fixes it, but I think it will.