From patchwork Wed Jan  6 02:17:51 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ming Lei <tom.leiming@gmail.com>
X-Patchwork-Id: 7963951
Return-Path: <linux-block-owner@kernel.org>
X-Original-To: patchwork-linux-block@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 536029F32E
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Jan 2016 02:18:01 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 90AE120222
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Jan 2016 02:18:00 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id D035620221
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  6 Jan 2016 02:17:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752194AbcAFCRx (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Tue, 5 Jan 2016 21:17:53 -0500
Received: from mail-yk0-f169.google.com ([209.85.160.169]:34548 "EHLO
	mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751761AbcAFCRw (ORCPT
	<rfc822; linux-block@vger.kernel.org>); Tue, 5 Jan 2016 21:17:52 -0500
Received: by mail-yk0-f169.google.com with SMTP id a85so224811793ykb.1
	for <linux-block@vger.kernel.org>;
	Tue, 05 Jan 2016 18:17:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=Q9tC5SOAsjlWG8mMZy+a4NyBr6yb6bupT2BU83W91pM=;
	b=QXvNrc8/sp1im/XRqhaGrWhG083hqvopK/EGV9hmCw7rAkQu+ykgyNyCMt4gQ5FuCZ
	EA5Y6wK7+vUe8wj8rPCliCmGktduzMR7oGRLQJV2noppkPdP7xgAChIWpMPETFqzBHne
	XjqR7xhnfG1IYqty+ITaZWATnX5WP90EfCppd/PpX1r+58vf0YgN6pD6dAk24H2TdKGa
	FZ8EmGjzmxAcvn18SXiXidGtLDnd4jIowgKH0fTJLPAnc5G3NDgEuKpuORCDzsohuLat
	G+DuyEBrAPJBtklzgmUDyUWtaJnEcjmTdwlSSpcvDydtbkUBlzw2xZlwjUbG+Qw8eiKQ
	n/ew==
MIME-Version: 1.0
X-Received: by 10.13.196.196 with SMTP id g187mr30835695ywd.47.1452046671322;
	Tue, 05 Jan 2016 18:17:51 -0800 (PST)
Received: by 10.37.57.130 with HTTP; Tue, 5 Jan 2016 18:17:51 -0800 (PST)
In-Reply-To: <20160105150938.GA20832@localhost.localdomain>
References: <1451931895-17474-1-git-send-email-keith.busch@intel.com>
	<CACVXFVPOYra24SNmGBthcheYv_kY=5Fv-6s=-bSwwZijudXapw@mail.gmail.com>
	<20160105150938.GA20832@localhost.localdomain>
Date: Wed, 6 Jan 2016 10:17:51 +0800
Message-ID: 
 <CACVXFVNQQNd8_3ti4AAhZB4KN0uJEN8sdmB+xsmpykJaCLfebA@mail.gmail.com>
Subject: Re: [PATCH for-4.4] block: split bios to max possible length
From: Ming Lei <tom.leiming@gmail.com>
To: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme <linux-nvme@lists.infradead.org>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@fb.com>,
	Greg White <gwhite@kupulau.com>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD,
	T_DKIM_INVALID,
	T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi Keith,

On Tue, Jan 5, 2016 at 11:09 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, Jan 05, 2016 at 12:54:53PM +0800, Ming Lei wrote:
>> On Tue, Jan 5, 2016 at 2:24 AM, Keith Busch <keith.busch@intel.com> wrote:
>> > This allows bio splits in the middle of a vector to form the largest
>>
>> Wrt. the current block stack, one segment always hold one or more bvec,
>> instead of part of bvec, so better to be careful about this handling.
>
> Hi Ming,
>
> Could you help me understand your concern here? If we split a vector
> somewhere in the middle, it becomes two different bvecs. The first is
> the last segment in the first bio, the second is the first segment in
> the split bio, right?

Firstly we didn't split one single bio vector before bio splitting.

Secondly, current bio split still doesn't support to split one single
bvec into two, and it just makes the two bios shared the original
bvec table, please see bio_split(), which calls bio_clone_fast()
to do that, and the bvec table has been immutable at that time.

>
> It's not necessarily a new segment if it is physically contiguous with
> the previous (if it exists at all), but duplicating the logic to coalesce
> addresses doesn't seem to be worth that optimization.

I understand your motivation in the two patches, actually before bio splitting,
we don't do sg merge for nvme because of the flag of NO_SG_MERGE,
which is ignored after bio splitting is introduced. So could you check if
the nvme performance can be good by putting NO_SG_MERGE back
in blk_bio_segment_split()? And the change should be simple, like the
attached patch.

>
>> > possible bio at the h/w's desired alignment, and guarantees the bio being
>> > split will have some data. Previously, if the first vector's length was
>> > greater than the allowable amount, the bio would split at a zero length
>> > and hit a kernel BUG.
>>
>> That is introduced by d3805611130a, and zero length can't be splitted
>> previously because queue_max_sectors() is at least one PAGE_SIZE.
>
> Can a bvec's length exceed a PAGE_SIZE? They point to pages, so I
> suppose not.

No, it doesn't, but blk_max_size_offset() may be less than PAGE_SIZE,
then zero splitting is triggered.

>
> But it should be more efficient to split to the largest allowed by the
> hardware. We can contrive a scenario where a bio would be split many

Previously Jens took the opposite approach to make each bvec
as one segment, and he mentioned performance is increased.

> times more than necessary without this patch. Let's say queue_max_sectors
> is a PAGE_SIZE, and we want to submit '2 * PAGE_SIZE' worth of data
> addressed in 3 bvecs. Previously that would split three times; now it
> will split only twice.

IMO, splitting is quite cheap, or we still can increase the limit of
queue_max_sectors() to the hardware allowed value.

diff --git a/block/blk-merge.c b/block/blk-merge.c
index e73846a..64fbbba 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,9 +79,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	unsigned front_seg_size = bio->bi_seg_front_size;
 	bool do_split = true;
 	struct bio *new = NULL;
+	bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
 
 	bio_for_each_segment(bv, bio, iter) {
-		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
+		if (no_sg_merge)
+			goto new_segment;
+
+		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
 			goto split;
 
 		/*