[RFC,v7,02/12] blk-mq: rename blk_mq_update_tag_set_depth()

Message ID	1591810159-240929-3-git-send-email-john.garry@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qLys=7X=vger.kernel.org=linux-block-owner@kernel.org> From: John Garry <john.garry@huawei.com> To: <axboe@kernel.dk>, <jejb@linux.ibm.com>, <martin.petersen@oracle.com>, <don.brace@microsemi.com>, <kashyap.desai@broadcom.com>, <sumit.saxena@broadcom.com>, <ming.lei@redhat.com>, <bvanassche@acm.org>, <hare@suse.com>, <hch@lst.de>, <shivasharan.srikanteshwara@broadcom.com> CC: <linux-block@vger.kernel.org>, <linux-scsi@vger.kernel.org>, <esc.storagedev@microsemi.com>, <chenxiang66@hisilicon.com>, <megaraidlinux.pdl@broadcom.com>, Hannes Reinecke <hare@suse.de>, John Garry <john.garry@huawei.com> Subject: [PATCH RFC v7 02/12] blk-mq: rename blk_mq_update_tag_set_depth() Date: Thu, 11 Jun 2020 01:29:09 +0800 Message-ID: <1591810159-240929-3-git-send-email-john.garry@huawei.com> In-Reply-To: <1591810159-240929-1-git-send-email-john.garry@huawei.com> References: <1591810159-240929-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs \| expand [RFC,v7,00/12] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs [RFC,v7,01/12] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED [RFC,v7,02/12] blk-mq: rename blk_mq_update_tag_set_depth() [RFC,v7,03/12] blk-mq: Use pointers for blk_mq_tags bitmap tags [RFC,v7,04/12] blk-mq: Facilitate a shared sbitmap per tagset [RFC,v7,05/12] blk-mq: Record nr_active_requests per queue for when using shared sbitmap [RFC,v7,06/12] blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap [RFC,v7,07/12] blk-mq: Add support in hctx_tags_bitmap_show() for a shared sbitmap [RFC,v7,08/12] scsi: Add template flag 'host_tagset' [RFC,v7,09/12] scsi: hisi_sas: Switch v3 hw to MQ [RFC,v7,10/12] megaraid_sas: switch fusion adapters to MQ [RFC,v7,11/12] smartpqi: enable host tagset [RFC,v7,12/12] hpsa: enable host_tagset and switch to MQ

John Garry June 10, 2020, 5:29 p.m. UTC

From: Hannes Reinecke <hare@suse.de>

The function does not set the depth, but rather transitions from
shared to non-shared queues and vice versa.
So rename it to blk_mq_update_tag_set_shared() to better reflect
its purpose.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c | 18 ++++++++++--------
 block/blk-mq.c     |  8 ++++----
 2 files changed, 14 insertions(+), 12 deletions(-)

Ming Lei June 11, 2020, 2:57 a.m. UTC | #1

On Thu, Jun 11, 2020 at 01:29:09AM +0800, John Garry wrote:
> From: Hannes Reinecke <hare@suse.de>
> 
> The function does not set the depth, but rather transitions from
> shared to non-shared queues and vice versa.
> So rename it to blk_mq_update_tag_set_shared() to better reflect
> its purpose.

It is fine to rename it for me, however:

This patch claims to rename blk_mq_update_tag_set_shared(), but also
change blk_mq_init_bitmap_tags's signature.

So suggest to split this patch into two or add comment log on changing
blk_mq_init_bitmap_tags().

Thanks, 
Ming

John Garry June 11, 2020, 8:26 a.m. UTC | #2

On 11/06/2020 03:57, Ming Lei wrote:
> On Thu, Jun 11, 2020 at 01:29:09AM +0800, John Garry wrote:
>> From: Hannes Reinecke <hare@suse.de>
>>
>> The function does not set the depth, but rather transitions from
>> shared to non-shared queues and vice versa.
>> So rename it to blk_mq_update_tag_set_shared() to better reflect
>> its purpose.
> 
> It is fine to rename it for me, however:
> 
> This patch claims to rename blk_mq_update_tag_set_shared(), but also
> change blk_mq_init_bitmap_tags's signature.

I was going to update the commit message here, but forgot again...

> 
> So suggest to split this patch into two or add comment log on changing
> blk_mq_init_bitmap_tags().

I think I'll just split into 2x commits.

Thanks,
John

John Garry June 23, 2020, 11:25 a.m. UTC | #3

On 11/06/2020 09:26, John Garry wrote:
> On 11/06/2020 03:57, Ming Lei wrote:
>> On Thu, Jun 11, 2020 at 01:29:09AM +0800, John Garry wrote:
>>> From: Hannes Reinecke <hare@suse.de>
>>>
>>> The function does not set the depth, but rather transitions from
>>> shared to non-shared queues and vice versa.
>>> So rename it to blk_mq_update_tag_set_shared() to better reflect
>>> its purpose.
>>
>> It is fine to rename it for me, however:
>>
>> This patch claims to rename blk_mq_update_tag_set_shared(), but also
>> change blk_mq_init_bitmap_tags's signature.
> 
> I was going to update the commit message here, but forgot again...
> 
>>
>> So suggest to split this patch into two or add comment log on changing
>> blk_mq_init_bitmap_tags().
> 
> I think I'll just split into 2x commits.

Hi Hannes,

Do you have any issue with splitting the undocumented changes into 
another patch as so:

-------------------->8---------------------

 From db3f8ec1295efbf53273ffc137d348857cbd411e Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Tue, 23 Jun 2020 12:07:33 +0100
Subject: [PATCH] blk-mq: Free tags in blk_mq_init_tags() upon error

Since the tags are allocated in blk_mq_init_tags() it's better practice
to free in that same function upon error, rather than a callee which is 
to init the bitmap tags - blk_mq_init_tags().

Signed-off-by: Hannes Reinecke <hare@suse.de>
[jpg: split from an earlier patch with a new commit message, minor mod 
to return NULL directly for error]
Signed-off-by: John Garry <john.garry@huawei.com>

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 1085dc9848f3..b8972014cd90 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -487,24 +487,22 @@ static int bt_alloc(struct sbitmap_queue *bt, 
unsigned int depth,
  				       node);
  }

-static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags 
*tags,
-						   int node, int alloc_policy)
+static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
+				   int node, int alloc_policy)
  {
  	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
  	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;

  	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
-		goto free_tags;
+		return -ENOMEM;
  	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
  		     node))
  		goto free_bitmap_tags;

-	return tags;
+	return 0;
  free_bitmap_tags:
  	sbitmap_queue_free(&tags->bitmap_tags);
-free_tags:
-	kfree(tags);
-	return NULL;
+	return -ENOMEM;
  }

  struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
@@ -525,7 +523,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int 
total_tags,
  	tags->nr_tags = total_tags;
  	tags->nr_reserved_tags = reserved_tags;

-	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
+	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+		kfree(tags);
+		return NULL;
+	}
+	return tags;
  }

  void blk_mq_free_tags(struct blk_mq_tags *tags)

--------------------8<---------------------

Thanks

Hannes Reinecke June 23, 2020, 2:23 p.m. UTC | #4

On 6/23/20 1:25 PM, John Garry wrote:
> On 11/06/2020 09:26, John Garry wrote:
>> On 11/06/2020 03:57, Ming Lei wrote:
>>> On Thu, Jun 11, 2020 at 01:29:09AM +0800, John Garry wrote:
>>>> From: Hannes Reinecke <hare@suse.de>
>>>>
>>>> The function does not set the depth, but rather transitions from
>>>> shared to non-shared queues and vice versa.
>>>> So rename it to blk_mq_update_tag_set_shared() to better reflect
>>>> its purpose.
>>>
>>> It is fine to rename it for me, however:
>>>
>>> This patch claims to rename blk_mq_update_tag_set_shared(), but also
>>> change blk_mq_init_bitmap_tags's signature.
>>
>> I was going to update the commit message here, but forgot again...
>>
>>>
>>> So suggest to split this patch into two or add comment log on changing
>>> blk_mq_init_bitmap_tags().
>>
>> I think I'll just split into 2x commits.
> 
> Hi Hannes,
> 
> Do you have any issue with splitting the undocumented changes into 
> another patch as so:
> 
No, that's perfectly fine.

Kashyap, I've also attached an updated patch for the elevator_count 
patch; if you agree John can include it in the next version.

Cheers,

Hannes

Kashyap Desai June 24, 2020, 8:13 a.m. UTC | #5

>
> On 6/23/20 1:25 PM, John Garry wrote:
> > On 11/06/2020 09:26, John Garry wrote:
> >> On 11/06/2020 03:57, Ming Lei wrote:
> >>> On Thu, Jun 11, 2020 at 01:29:09AM +0800, John Garry wrote:
> >>>> From: Hannes Reinecke <hare@suse.de>
> >>>>
> >>>> The function does not set the depth, but rather transitions from
> >>>> shared to non-shared queues and vice versa.
> >>>> So rename it to blk_mq_update_tag_set_shared() to better reflect
> >>>> its purpose.
> >>>
> >>> It is fine to rename it for me, however:
> >>>
> >>> This patch claims to rename blk_mq_update_tag_set_shared(), but also
> >>> change blk_mq_init_bitmap_tags's signature.
> >>
> >> I was going to update the commit message here, but forgot again...
> >>
> >>>
> >>> So suggest to split this patch into two or add comment log on
> >>> changing blk_mq_init_bitmap_tags().
> >>
> >> I think I'll just split into 2x commits.
> >
> > Hi Hannes,
> >
> > Do you have any issue with splitting the undocumented changes into
> > another patch as so:
> >
> No, that's perfectly fine.
>
> Kashyap, I've also attached an updated patch for the elevator_count patch;
> if
> you agree John can include it in the next version.

Hannes - Patch looks good.   Header does not include problem statement. How
about adding below in header ?

High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
contention is possible in mq-deadline and bfq io scheduler when nr_hw_queues
is more than one.
It is because kblockd work queue can submit IO from all online CPUs (through
blk_mq_run_hw_queues) even though only one hctx has pending commands.
Elevator callback "has_work" for mq-deadline and bfq scheduler consider
pending work if there are any IOs on request queue and it does not account
hctx context.

I have not seen performance drop after this patch, but I will continue
further testing.

John - One more thing, I am working on megaraid_sas driver to provide both
host_tagset = 1 and 0 option through module parameter.

Kashyap

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke            Teamlead Storage & Networking
> hare@suse.de                               +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809
> (AG Nürnberg), Geschäftsführer: Felix Imendörffer

John Garry June 29, 2020, 4:18 p.m. UTC | #6

On 24/06/2020 09:13, Kashyap Desai wrote:
>>> Hi Hannes,
>>>
>>> Do you have any issue with splitting the undocumented changes into
>>> another patch as so:
>>>
>> No, that's perfectly fine.
>>
>> Kashyap, I've also attached an updated patch for the elevator_count patch;
>> if
>> you agree John can include it in the next version.

ok, but I need to check it myself.

> Hannes - Patch looks good.   Header does not include problem statement. How
> about adding below in header ?
> 
> High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
> contention is possible in mq-deadline and bfq io scheduler when nr_hw_queues
> is more than one.
> It is because kblockd work queue can submit IO from all online CPUs (through
> blk_mq_run_hw_queues) even though only one hctx has pending commands.
> Elevator callback "has_work" for mq-deadline and bfq scheduler consider
> pending work if there are any IOs on request queue and it does not account
> hctx context.
> 
> I have not seen performance drop after this patch, but I will continue
> further testing.
> 
> John - One more thing, I am working on megaraid_sas driver to provide both
> host_tagset = 1 and 0 option through module parameter.

I was hoping that we wouldn't have this, and have host_tagset = 1 
always. Or maybe host_tagset = 1 by default, and allow module param to 
set = 0. Your choice, though.

Thanks,
John

Kashyap Desai Aug. 10, 2020, 4:51 p.m. UTC | #7

> > Kashyap, I've also attached an updated patch for the elevator_count
> > patch; if you agree John can include it in the next version.
>
> Hannes - Patch looks good.   Header does not include problem statement.
> How about adding below in header ?
>
> High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
> contention is possible in mq-deadline and bfq io scheduler when
> nr_hw_queues is more than one.
> It is because kblockd work queue can submit IO from all online CPUs
> (through
> blk_mq_run_hw_queues) even though only one hctx has pending commands.
> Elevator callback "has_work" for mq-deadline and bfq scheduler consider
> pending work if there are any IOs on request queue and it does not account
> hctx context.

Hannes/John - We need one more correction for below patch -

https://github.com/hisilicon/kernel-dev/commit/ff631eb80aa0449eaeb78a282fd7eff2a9e42f77

I noticed - that elevator_queued count goes negative mainly because there
are some case where IO was submitted from dispatch queue(not scheduler
queue) and request still has "RQF_ELVPRIV" flag set.
In such cases " dd_finish_request" is called without " dd_insert_request". I
think it is better to decrement counter once it is taken out from dispatched
queue. (Ming proposed to use dispatch path for decrementing counter, but I
somehow did not accounted assuming RQF_ELVPRIV will be set only if IO is
submitted from scheduler queue.)

Below is additional change. Can you merge this ?

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 9d75374..bc413dd 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -385,6 +385,8 @@ static struct request *dd_dispatch_request(struct
blk_mq_hw_ctx *hctx)

        spin_lock(&dd->lock);
        rq = __dd_dispatch_request(dd);
+       if (rq)
+               atomic_dec(&rq->mq_hctx->elevator_queued);
        spin_unlock(&dd->lock);

        return rq;
@@ -574,7 +576,6 @@ static void dd_finish_request(struct request *rq)
                        blk_mq_sched_mark_restart_hctx(rq->mq_hctx);
                spin_unlock_irqrestore(&dd->zone_lock, flags);
        }
-       atomic_dec(&rq->mq_hctx->elevator_queued);
 }

 static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
--
2.9.5

Kashyap

John Garry Aug. 11, 2020, 8:01 a.m. UTC | #8

On 10/08/2020 17:51, Kashyap Desai wrote:
>> tx context.
> Hannes/John - We need one more correction for below patch -
> 
> https://github.com/hisilicon/kernel-dev/commit/ff631eb80aa0449eaeb78a282fd7eff2a9e42f77
> 
> I noticed - that elevator_queued count goes negative mainly because there
> are some case where IO was submitted from dispatch queue(not scheduler
> queue) and request still has "RQF_ELVPRIV" flag set.
> In such cases " dd_finish_request" is called without " dd_insert_request". I
> think it is better to decrement counter once it is taken out from dispatched
> queue. (Ming proposed to use dispatch path for decrementing counter, but I
> somehow did not accounted assuming RQF_ELVPRIV will be set only if IO is
> submitted from scheduler queue.)
> 
> Below is additional change. Can you merge this ?
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 9d75374..bc413dd 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -385,6 +385,8 @@ static struct request *dd_dispatch_request(struct
> blk_mq_hw_ctx *hctx)
> 
>          spin_lock(&dd->lock);
>          rq = __dd_dispatch_request(dd);
> +       if (rq)
> +               atomic_dec(&rq->mq_hctx->elevator_queued);

Is there any reason why this operation could not be taken outside the 
spinlock? I assume raciness is not a problem with this patch...

>          spin_unlock(&dd->lock);
> 
>          return rq;
> @@ -574,7 +576,6 @@ static void dd_finish_request(struct request *rq)
>                          blk_mq_sched_mark_restart_hctx(rq->mq_hctx);
>                  spin_unlock_irqrestore(&dd->zone_lock, flags);
>          }
> -       atomic_dec(&rq->mq_hctx->elevator_queued);
>   }
> 
>   static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
> --
> 2.9.5
> 
> Kashyap
> .#


btw, can you provide signed-off-by if you want credit upgraded to 
Co-developed-by?

Thanks,
john

Kashyap Desai Aug. 11, 2020, 4:34 p.m. UTC | #9

> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c index
> > 9d75374..bc413dd 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -385,6 +385,8 @@ static struct request *dd_dispatch_request(struct
> > blk_mq_hw_ctx *hctx)
> >
> >          spin_lock(&dd->lock);
> >          rq = __dd_dispatch_request(dd);
> > +       if (rq)
> > +               atomic_dec(&rq->mq_hctx->elevator_queued);
>
> Is there any reason why this operation could not be taken outside the
> spinlock? I assume raciness is not a problem with this patch...

No issue if we want to move this outside spinlock.

>
> >          spin_unlock(&dd->lock);
> >
> >          return rq;
> > @@ -574,7 +576,6 @@ static void dd_finish_request(struct request *rq)
> >                          blk_mq_sched_mark_restart_hctx(rq->mq_hctx);
> >                  spin_unlock_irqrestore(&dd->zone_lock, flags);
> >          }
> > -       atomic_dec(&rq->mq_hctx->elevator_queued);
> >   }
> >
> >   static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
> > --
> > 2.9.5
> >
> > Kashyap
> > .#
>
>
> btw, can you provide signed-off-by if you want credit upgraded to Co-
> developed-by?

I will send you merged patch which you can push to your git repo.

Kashyap

>
> Thanks,
> john

[RFC,v7,02/12] blk-mq: rename blk_mq_update_tag_set_depth()

Commit Message

Comments

Patch