MD: make bio mergeable
diff mbox

Message ID 384a0e0c7d6f2700aadbcbdef003cece88fa7dd7.1461626533.git.shli@fb.com
State New
Headers show

Commit Message

Shaohua Li April 25, 2016, 11:52 p.m. UTC
blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.

In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051

Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
Cc: stable@vger.kernel.org (v4.3+)
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/md.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jens Axboe April 26, 2016, 12:59 a.m. UTC | #1
On 04/25/2016 05:52 PM, Shaohua Li wrote:
> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
>
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051

Good catch! Will apply for this series, thanks Shaohua.
Jens Axboe April 26, 2016, 1:15 a.m. UTC | #2
On 04/25/2016 06:59 PM, Jens Axboe wrote:
> On 04/25/2016 05:52 PM, Shaohua Li wrote:
>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>> But if dispatching the bio to underlayer disk, the blk_queue_split
>> checks are invalid, hence it's possible the bio becomes mergeable.
>>
>> In the reported bug, this bug causes trim against raid0 performance slash
>> https://bugzilla.kernel.org/show_bug.cgi?id=117051
>
> Good catch! Will apply for this series, thanks Shaohua.

Actually, let's let that go through the md tree instead. But you can add 
my Reviewed-by, and it'd be nice to get this into 4.6.
Ming Lei April 26, 2016, 9:56 a.m. UTC | #3
On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <shli@fb.com> wrote:
> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.

If the bio from md is splitted and marked as NOMERGE, it means some
queue limits are reached. So looks the raid's queue limit is set as not
big enough, could your find which limit causes the splitting and nomerge?

>
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
>
> Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: stable@vger.kernel.org (v4.3+)
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  drivers/md/md.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
>          * go away inside make_request
>          */
>         sectors = bio_sectors(bio);
> +       /* bio could be mergeable after passing to underlayer */
> +       bio->bi_rw &= ~REQ_NOMERGE;

IMO it isn't a good fix, eigher we need to set a correct queue limit, or
we simply don't set nomerge for all stackable block device. But I prefer
to the former a bit.

Thanks,

>         mddev->pers->make_request(mddev, bio);
>
>         cpu = part_stat_lock();
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe April 26, 2016, 2:21 p.m. UTC | #4
On 04/26/2016 03:56 AM, Ming Lei wrote:
> On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <shli@fb.com> wrote:
>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>> But if dispatching the bio to underlayer disk, the blk_queue_split
>> checks are invalid, hence it's possible the bio becomes mergeable.
>
> If the bio from md is splitted and marked as NOMERGE, it means some
> queue limits are reached. So looks the raid's queue limit is set as not
> big enough, could your find which limit causes the splitting and nomerge?

raid0 sets a limit of the stripe size for IO. Once the IO has passed md, 
there's no reason why we can't merge for the lower driver. This is 
(potentially) a huge performance issue on trim, since a lot of devices 
are trim ops / sec limited rather than throughput limited.
Ming Lei April 26, 2016, 3:17 p.m. UTC | #5
On Tue, Apr 26, 2016 at 10:21 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 04/26/2016 03:56 AM, Ming Lei wrote:
>>
>> On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <shli@fb.com> wrote:
>>>
>>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>>> But if dispatching the bio to underlayer disk, the blk_queue_split
>>> checks are invalid, hence it's possible the bio becomes mergeable.
>>
>>
>> If the bio from md is splitted and marked as NOMERGE, it means some
>> queue limits are reached. So looks the raid's queue limit is set as not
>> big enough, could your find which limit causes the splitting and nomerge?
>
>
> raid0 sets a limit of the stripe size for IO. Once the IO has passed md,
> there's no reason why we can't merge for the lower driver. This is
> (potentially) a huge performance issue on trim, since a lot of devices are
> trim ops / sec limited rather than throughput limited.

Just found raid0 maps the chunk sectors into max hw sectors of queue,
and dm uses blk_stack_limits() to set up the limits.

So looks a raid specific issue, then the fix is correct, sorry for the noise.

thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Holger Kiehl April 28, 2016, 8 p.m. UTC | #6
Hello,

On Mon, 25 Apr 2016, Shaohua Li wrote:

> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
> 
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
> 
This patch makes a huge difference. On a system with two Samsung 850 Pro
in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!

However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
also setup as one big MD Raid0, the patch does not make any difference
at all. fstrim takes more then 4 hours!

Any idea what could be wrong?

Regards,
Holger


> Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: stable@vger.kernel.org (v4.3+)
> Cc: Ming Lei <ming.lei@canonical.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  drivers/md/md.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
>  	 * go away inside make_request
>  	 */
>  	sectors = bio_sectors(bio);
> +	/* bio could be mergeable after passing to underlayer */
> +	bio->bi_rw &= ~REQ_NOMERGE;
>  	mddev->pers->make_request(mddev, bio);
>  
>  	cpu = part_stat_lock();
> -- 
> 2.8.0.rc2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li April 28, 2016, 9:19 p.m. UTC | #7
On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> Hello,
> 
> On Mon, 25 Apr 2016, Shaohua Li wrote:
> 
> > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > But if dispatching the bio to underlayer disk, the blk_queue_split
> > checks are invalid, hence it's possible the bio becomes mergeable.
> > 
> > In the reported bug, this bug causes trim against raid0 performance slash
> > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> > 
> This patch makes a huge difference. On a system with two Samsung 850 Pro
> in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
> 
> However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> also setup as one big MD Raid0, the patch does not make any difference
> at all. fstrim takes more then 4 hours!

Does the raid0 cross two partitions or two SSD?

can you post blktrace data in the bugzilloa, I'll track the bug there.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Holger Kiehl April 29, 2016, 9:23 a.m. UTC | #8
On Thu, 28 Apr 2016, Shaohua Li wrote:

> On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> > Hello,
> > 
> > On Mon, 25 Apr 2016, Shaohua Li wrote:
> > 
> > > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > > But if dispatching the bio to underlayer disk, the blk_queue_split
> > > checks are invalid, hence it's possible the bio becomes mergeable.
> > > 
> > > In the reported bug, this bug causes trim against raid0 performance slash
> > > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> > > 
> > This patch makes a huge difference. On a system with two Samsung 850 Pro
> > in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
> > 
> > However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> > also setup as one big MD Raid0, the patch does not make any difference
> > at all. fstrim takes more then 4 hours!
> 
> Does the raid0 cross two partitions or two SSD?
> 
Two SSD's. Where it works, for the two Samsung 850 Pro SATA SSD it was
via partitions.

> can you post blktrace data in the bugzilloa, I'll track the bug there.
> 
I did the blktrace on the two md raid0 devices /dev/nvme[01]n1 for 2 minutes
and attached them to the bug 117051 as a tar.bz2 file:

   https://bugzilla.kernel.org/show_bug.cgi?id=117051

Please just ask if I have forgotten anything. And many thanks for looking
at this and all the good work!

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 194580f..14d3b37 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -284,6 +284,8 @@  static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
 	 * go away inside make_request
 	 */
 	sectors = bio_sectors(bio);
+	/* bio could be mergeable after passing to underlayer */
+	bio->bi_rw &= ~REQ_NOMERGE;
 	mddev->pers->make_request(mddev, bio);
 
 	cpu = part_stat_lock();