From patchwork Wed Jan 6 02:17:51 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 7963951 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 536029F32E for ; Wed, 6 Jan 2016 02:18:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 90AE120222 for ; Wed, 6 Jan 2016 02:18:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D035620221 for ; Wed, 6 Jan 2016 02:17:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752194AbcAFCRx (ORCPT ); Tue, 5 Jan 2016 21:17:53 -0500 Received: from mail-yk0-f169.google.com ([209.85.160.169]:34548 "EHLO mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751761AbcAFCRw (ORCPT ); Tue, 5 Jan 2016 21:17:52 -0500 Received: by mail-yk0-f169.google.com with SMTP id a85so224811793ykb.1 for ; Tue, 05 Jan 2016 18:17:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Q9tC5SOAsjlWG8mMZy+a4NyBr6yb6bupT2BU83W91pM=; b=QXvNrc8/sp1im/XRqhaGrWhG083hqvopK/EGV9hmCw7rAkQu+ykgyNyCMt4gQ5FuCZ EA5Y6wK7+vUe8wj8rPCliCmGktduzMR7oGRLQJV2noppkPdP7xgAChIWpMPETFqzBHne XjqR7xhnfG1IYqty+ITaZWATnX5WP90EfCppd/PpX1r+58vf0YgN6pD6dAk24H2TdKGa FZ8EmGjzmxAcvn18SXiXidGtLDnd4jIowgKH0fTJLPAnc5G3NDgEuKpuORCDzsohuLat G+DuyEBrAPJBtklzgmUDyUWtaJnEcjmTdwlSSpcvDydtbkUBlzw2xZlwjUbG+Qw8eiKQ n/ew== MIME-Version: 1.0 X-Received: by 10.13.196.196 with SMTP id g187mr30835695ywd.47.1452046671322; Tue, 05 Jan 2016 18:17:51 -0800 (PST) Received: by 10.37.57.130 with HTTP; Tue, 5 Jan 2016 18:17:51 -0800 (PST) In-Reply-To: <20160105150938.GA20832@localhost.localdomain> References: <1451931895-17474-1-git-send-email-keith.busch@intel.com> <20160105150938.GA20832@localhost.localdomain> Date: Wed, 6 Jan 2016 10:17:51 +0800 Message-ID: Subject: Re: [PATCH for-4.4] block: split bios to max possible length From: Ming Lei To: Keith Busch Cc: linux-nvme , linux-block@vger.kernel.org, Jens Axboe , Greg White Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Keith, On Tue, Jan 5, 2016 at 11:09 PM, Keith Busch wrote: > On Tue, Jan 05, 2016 at 12:54:53PM +0800, Ming Lei wrote: >> On Tue, Jan 5, 2016 at 2:24 AM, Keith Busch wrote: >> > This allows bio splits in the middle of a vector to form the largest >> >> Wrt. the current block stack, one segment always hold one or more bvec, >> instead of part of bvec, so better to be careful about this handling. > > Hi Ming, > > Could you help me understand your concern here? If we split a vector > somewhere in the middle, it becomes two different bvecs. The first is > the last segment in the first bio, the second is the first segment in > the split bio, right? Firstly we didn't split one single bio vector before bio splitting. Secondly, current bio split still doesn't support to split one single bvec into two, and it just makes the two bios shared the original bvec table, please see bio_split(), which calls bio_clone_fast() to do that, and the bvec table has been immutable at that time. > > It's not necessarily a new segment if it is physically contiguous with > the previous (if it exists at all), but duplicating the logic to coalesce > addresses doesn't seem to be worth that optimization. I understand your motivation in the two patches, actually before bio splitting, we don't do sg merge for nvme because of the flag of NO_SG_MERGE, which is ignored after bio splitting is introduced. So could you check if the nvme performance can be good by putting NO_SG_MERGE back in blk_bio_segment_split()? And the change should be simple, like the attached patch. > >> > possible bio at the h/w's desired alignment, and guarantees the bio being >> > split will have some data. Previously, if the first vector's length was >> > greater than the allowable amount, the bio would split at a zero length >> > and hit a kernel BUG. >> >> That is introduced by d3805611130a, and zero length can't be splitted >> previously because queue_max_sectors() is at least one PAGE_SIZE. > > Can a bvec's length exceed a PAGE_SIZE? They point to pages, so I > suppose not. No, it doesn't, but blk_max_size_offset() may be less than PAGE_SIZE, then zero splitting is triggered. > > But it should be more efficient to split to the largest allowed by the > hardware. We can contrive a scenario where a bio would be split many Previously Jens took the opposite approach to make each bvec as one segment, and he mentioned performance is increased. > times more than necessary without this patch. Let's say queue_max_sectors > is a PAGE_SIZE, and we want to submit '2 * PAGE_SIZE' worth of data > addressed in 3 bvecs. Previously that would split three times; now it > will split only twice. IMO, splitting is quite cheap, or we still can increase the limit of queue_max_sectors() to the hardware allowed value. diff --git a/block/blk-merge.c b/block/blk-merge.c index e73846a..64fbbba 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -79,9 +79,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, unsigned front_seg_size = bio->bi_seg_front_size; bool do_split = true; struct bio *new = NULL; + bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags); bio_for_each_segment(bv, bio, iter) { - if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector)) + if (no_sg_merge) + goto new_segment; + + if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q)) goto split; /*