From patchwork Thu May 24 05:56:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10422733 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3455E6019D for ; Thu, 24 May 2018 05:56:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 203E1292E5 for ; Thu, 24 May 2018 05:56:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1246E2931B; Thu, 24 May 2018 05:56:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FD33292E5 for ; Thu, 24 May 2018 05:56:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755229AbeEXF4T (ORCPT ); Thu, 24 May 2018 01:56:19 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:42676 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755254AbeEXF4S (ORCPT ); Thu, 24 May 2018 01:56:18 -0400 Received: by mail-pl0-f66.google.com with SMTP id u6-v6so360089pls.9 for ; Wed, 23 May 2018 22:56:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=FW16coPDI/7pcJ6ONBIggoUOOK24PoUuwn44Y75iD6k=; b=qchYfca5P9qxNmM8D41TUSMPSKcNY0GKfrXZNEZN2ktNs6295OHtLcL/blLQr6mlpa OrGu08G6fOPsR0g/1bZq99lTYRdlbg6ZrAzDceYIDfbixfcKZ7Bd1wB3egYzNxBFprnW m/fl5ZlKwE92ZyCRSpDg30tdTdODVOPx3Lp6flEELh3cQ6DcBxsofx5ITCNw2wJa3ZQX cj/SNAH3DTovzQqeiTwhK2P/z8KOGOddowub7apvGjd5fEsEQmUETZhnu34T7EVBwwCm l7vBC4iQ5nhUMN8VTWGbiI2jpQQkIhkPS8ex7ESkq5vbF64wvFLwCF131k+K0+3gM9K5 +zHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=FW16coPDI/7pcJ6ONBIggoUOOK24PoUuwn44Y75iD6k=; b=owvEzG0dBQJgdkg4vL8Cp8ktFHeENrCJUpBEYes0rXa5X8NKOqv26O0EZZWx35VJ04 K+Y9BhTbKC/l5MJuDTWh/jLhaeURvPh9MFQHdJEgtrrNglkWPiODH5JQ0TjsuMU2Tni3 aI9en9sVWMWFxm6zo2bCwKAyqTmn4UkdxNJFWG79euZ1UigWmWBM4Hca0q6hD8bDFIMP AQ8wZRrH5VYvV0y0Nty8pnRzYMIiJ/tqpO8aUvrIZUEbeFNeehJanW2JW8wQYkJi0kUu XMVVZmb7qP/yoUB1dKaVHXbJoYnxxEh3w44qbkBDjewKxyljmW6vAw6WwBrm6M+pO2R/ Ayow== X-Gm-Message-State: ALKqPwczDcbU21+OCwfJNa/k+PlXcKgq489OAr0JBuetMgCKo5+aW+qI ZLdtfwvyWU0TiuMrxY1fZQtcyA== X-Google-Smtp-Source: AB8JxZrtExuR4qGFdsBiUGiT2361oWa8s9TpBMkzml3n7AxlcyHZpBwPmY18r/Uzv+SlwaKtTtQRqw== X-Received: by 2002:a17:902:3a5:: with SMTP id d34-v6mr6018535pld.103.1527141377198; Wed, 23 May 2018 22:56:17 -0700 (PDT) Received: from vader ([2601:602:8800:a9a9:e6a7:a0ff:fe0b:c9a8]) by smtp.gmail.com with ESMTPSA id k64-v6sm31600041pgk.17.2018.05.23.22.56.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 23 May 2018 22:56:16 -0700 (PDT) Date: Wed, 23 May 2018 22:56:15 -0700 From: Omar Sandoval To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, stable@vger.kernel.org, Omar Sandoval Subject: Re: [PATCH V3] blk-mq: avoid to starve tag allocation after allocation process migrates Message-ID: <20180524055615.GF12533@vader> References: <20180524030440.15744-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180524030440.15744-1-ming.lei@redhat.com> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, May 24, 2018 at 11:04:40AM +0800, Ming Lei wrote: > When the allocation process is scheduled back and the mapped hw queue is > changed, fake one extra wake up on previous queue for compensating wake up > miss, so other allocations on the previous queue won't be starved. > > This patch fixes one request allocation hang issue, which can be > triggered easily in case of very low nr_request. Thanks, Ming, this looks better. One comment below. > Cc: > Cc: Omar Sandoval > Signed-off-by: Ming Lei > --- > V3: > - fix comments as suggested by Jens > - remove the wrapper as suggested by Omar > V2: > - fix build failure > > > block/blk-mq-tag.c | 12 ++++++++++++ > include/linux/sbitmap.h | 7 +++++++ > lib/sbitmap.c | 22 ++++++++++++---------- > 3 files changed, 31 insertions(+), 10 deletions(-) > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 336dde07b230..a4e58fc28a06 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) > ws = bt_wait_ptr(bt, data->hctx); > drop_ctx = data->ctx == NULL; > do { > + struct sbitmap_queue *bt_prev; > + > /* > * We're out of tags on this hardware queue, kick any > * pending IO submits before going to sleep waiting for > @@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) > if (data->ctx) > blk_mq_put_ctx(data->ctx); > > + bt_prev = bt; > io_schedule(); > > data->ctx = blk_mq_get_ctx(data->q); > @@ -170,6 +173,15 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) > bt = &tags->bitmap_tags; > > finish_wait(&ws->wait, &wait); > + > + /* > + * If destination hw queue is changed, fake wake up on > + * previous queue for compensating the wake up miss, so > + * other allocations on previous queue won't be starved. > + */ > + if (bt != bt_prev) > + sbitmap_queue_wake_up(bt_prev); > + > ws = bt_wait_ptr(bt, data->hctx); > } while (1); > > diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h > index 841585f6e5f2..bba9d80191b7 100644 > --- a/include/linux/sbitmap.h > +++ b/include/linux/sbitmap.h > @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, > void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); > > /** > + * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue > + * on a &struct sbitmap_queue. > + * @sbq: Bitmap queue to wake up. > + */ > +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); > + > +/** > * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct > * seq_file. > * @sbq: Bitmap queue to show. > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index e6a9c06ec70c..14e027a33ffa 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -335,8 +335,9 @@ void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth) > if (sbq->wake_batch != wake_batch) { > WRITE_ONCE(sbq->wake_batch, wake_batch); > /* > - * Pairs with the memory barrier in sbq_wake_up() to ensure that > - * the batch size is updated before the wait counts. > + * Pairs with the memory barrier in sbitmap_queue_wake_up() > + * to ensure that the batch size is updated before the wait > + * counts. > */ > smp_mb__before_atomic(); > for (i = 0; i < SBQ_WAIT_QUEUES; i++) > @@ -425,7 +426,7 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) > return NULL; > } > > -static void sbq_wake_up(struct sbitmap_queue *sbq) > +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) > { > struct sbq_wait_state *ws; > unsigned int wake_batch; Just after this is: /* * Pairs with the memory barrier in set_current_state() to ensure the * proper ordering of clear_bit()/waitqueue_active() in the waker and * test_and_set_bit_lock()/prepare_to_wait()/finish_wait() in the * waiter. See the comment on waitqueue_active(). This is __after_atomic * because we just did clear_bit_unlock() in the caller. */ smp_mb__after_atomic(); But there's no atomic operation before this in blk_mq_get_tag(). I don't think this memory barrier is necessary for the blk_mq_get_tag() case, so let's move it to sbitmap_queue_clear(): That comment also mentioned clear_bit() but we do clear_bit_unlock() now, so we can update that at the same time. > @@ -454,23 +455,24 @@ static void sbq_wake_up(struct sbitmap_queue *sbq) > */ > smp_mb__before_atomic(); > /* > - * If there are concurrent callers to sbq_wake_up(), the last > - * one to decrement the wait count below zero will bump it back > - * up. If there is a concurrent resize, the count reset will > - * either cause the cmpxchg to fail or overwrite after the > - * cmpxchg. > + * If there are concurrent callers to sbitmap_queue_wake_up(), > + * the last one to decrement the wait count below zero will > + * bump it back up. If there is a concurrent resize, the count > + * reset will either cause the cmpxchg to fail or overwrite > + * after the cmpxchg. > */ > atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wait_cnt + wake_batch); > sbq_index_atomic_inc(&sbq->wake_index); > wake_up_nr(&ws->wait, wake_batch); > } > } > +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up); > > void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, > unsigned int cpu) > { > sbitmap_clear_bit_unlock(&sbq->sb, nr); > - sbq_wake_up(sbq); > + sbitmap_queue_wake_up(sbq); > if (likely(!sbq->round_robin && nr < sbq->sb.depth)) > *per_cpu_ptr(sbq->alloc_hint, cpu) = nr; > } > @@ -482,7 +484,7 @@ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq) > > /* > * Pairs with the memory barrier in set_current_state() like in > - * sbq_wake_up(). > + * sbitmap_queue_wake_up(). > */ > smp_mb(); > wake_index = atomic_read(&sbq->wake_index); > -- > 2.9.5 > diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 14e027a33ffa..537328a98c06 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -432,15 +432,6 @@ void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) unsigned int wake_batch; int wait_cnt; - /* - * Pairs with the memory barrier in set_current_state() to ensure the - * proper ordering of clear_bit()/waitqueue_active() in the waker and - * test_and_set_bit_lock()/prepare_to_wait()/finish_wait() in the - * waiter. See the comment on waitqueue_active(). This is __after_atomic - * because we just did clear_bit_unlock() in the caller. - */ - smp_mb__after_atomic(); - ws = sbq_wake_ptr(sbq); if (!ws) return; @@ -472,6 +463,13 @@ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, unsigned int cpu) { sbitmap_clear_bit_unlock(&sbq->sb, nr); + /* + * Pairs with the memory barrier in set_current_state() to ensure the + * proper ordering of clear_bit_unlock()/waitqueue_active() in the waker + * and test_and_set_bit_lock()/prepare_to_wait()/finish_wait() in the + * waiter. See the comment on waitqueue_active(). + */ + smp_mb__after_atomic(); sbitmap_queue_wake_up(sbq); if (likely(!sbq->round_robin && nr < sbq->sb.depth)) *per_cpu_ptr(sbq->alloc_hint, cpu) = nr;