From patchwork Sun Feb 14 16:20:58 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 8302311 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 5FF42C02AA for ; Sun, 14 Feb 2016 16:21:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8F9DF2026D for ; Sun, 14 Feb 2016 16:21:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 98DC520398 for ; Sun, 14 Feb 2016 16:21:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751509AbcBNQVD (ORCPT ); Sun, 14 Feb 2016 11:21:03 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:43326 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751497AbcBNQVB (ORCPT ); Sun, 14 Feb 2016 11:21:01 -0500 Received: from mail-yw0-f180.google.com ([209.85.161.180]) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1aUzPv-0001xm-RG; Sun, 14 Feb 2016 16:21:00 +0000 Received: by mail-yw0-f180.google.com with SMTP id e63so4103491ywc.3; Sun, 14 Feb 2016 08:20:59 -0800 (PST) X-Gm-Message-State: AG10YORqstBIhTlh1rjDi02a5l7PGtuPAiVk5i0U06LXHLYIXpSKtZWtpJK3k9RB+FLpE8ytoRBfnHIolbNKWg== MIME-Version: 1.0 X-Received: by 10.13.254.195 with SMTP id o186mr7321858ywf.169.1455466858978; Sun, 14 Feb 2016 08:20:58 -0800 (PST) Received: by 10.37.25.134 with HTTP; Sun, 14 Feb 2016 08:20:58 -0800 (PST) In-Reply-To: <20160214152223.GA5323@infradead.org> References: <20160214074119.GA24558@infradead.org> <56C04294.3090701@dev.mellanox.co.il> <56C05000.1040001@dev.mellanox.co.il> <56C066AF.6050901@dev.mellanox.co.il> <56C088EA.1050901@dev.mellanox.co.il> <20160214152223.GA5323@infradead.org> Date: Mon, 15 Feb 2016 00:20:58 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: 4.5-rc iser issues From: Ming Lei To: Christoph Hellwig Cc: Sagi Grimberg , linux-rdma , Ming Lin-SSI , "linux-nvme@lists.infradead.org" , "linux-block@vger.kernel.org" , Jens Axboe Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Sagi, On Sun, Feb 14, 2016 at 11:22 PM, Christoph Hellwig wrote: > Adding Ming to Cc. > > But I don't think simply not cloning the biovecs is the right thing > to do in the end. This must be something with the bvec iterators. I agree with Christoph, and there might be issues somewhere. > From the log: > iser: sg[0] dma_addr:0x85FC06000 off:0x0 sz:0x200 dma_len:0x200 > iser: sg[1] dma_addr:0x860334000 off:0x0 sz:0x200 dma_len:0x200 <-- gap > iser: sg[2] dma_addr:0x860335000 off:0x0 sz:0x200 dma_len:0x200 <-- gap The above gap shouldn't have come since blk_bio_segment_split() splits out one new bio if gap is detected. Sort of the following code can be added in driver or prep_fn to check if bvec of the rq is correct: rq_for_each_segment(bvec, sc->request, iter) { //check if there is gap between bvec } I don't know how to use iser, and looks everything works fine after I setup virt boundary as 4095 for null_blk by the attachment patch. > Full quote for Ming: > > On Sun, Feb 14, 2016 at 04:02:18PM +0200, Sagi Grimberg wrote: >> >> >>I'm bisecting now, there are a couple of patches from Ming in >> >>the area of the bio splitting code... >> >> >> >>CC'ing Ming, Linux-block and Linux-nvme as iser is identical to nvme >> >>wrt the virtual boundary so I think nvme will break as well. The bisected commit is merged to v4.3, and looks no such kind of report from nvme. >> > >> >Bisection reveals that this one is the culprit: >> > >> >commit 52cc6eead9095e2faf2ec7afc013aa3af1f01ac5 >> >Author: Ming Lei >> >Date: Thu Sep 17 09:58:38 2015 -0600 >> > >> > block: blk-merge: fast-clone bio when splitting rw bios >> > >> > biovecs has become immutable since v3.13, so it isn't necessary >> > to allocate biovecs for the new cloned bios, then we can save >> > one extra biovecs allocation/copy, and the allocation is often >> > not fixed-length and a bit more expensive. >> > >> > For example, if the 'max_sectors_kb' of null blk's queue is set >> > as 16(32 sectors) via sysfs just for making more splits, this patch >> > can increase throught about ~70% in the sequential read test over >> > null_blk(direct io, bs: 1M). >> > >> > Cc: Christoph Hellwig >> > Cc: Kent Overstreet >> > Cc: Ming Lin >> > Cc: Dongsu Park >> > Signed-off-by: Ming Lei >> > >> > This fixes a performance regression introduced by commit 54efd50bfd, >> > and allows us to take full advantage of the fact that we have >> >immutable >> > bio_vecs. Hand applied, as it rejected violently with commit >> > 5014c311baa2. >> > >> > Signed-off-by: Jens Axboe >> >-- >> >> Looks like there is a problem with bio_clone_fast() >> >> This change makes the problem go away: >> -- >> diff --git a/block/bio.c b/block/bio.c >> index dbabd48..5e93733 100644 >> --- a/block/bio.c >> +++ b/block/bio.c >> @@ -1791,7 +1791,7 @@ struct bio *bio_split(struct bio *bio, int sectors, >> * Discards need a mutable bio_vec to accommodate the payload >> * required by the DSM TRIM and UNMAP commands. >> */ >> - if (bio->bi_rw & REQ_DISCARD) >> + if (1 || bio->bi_rw & REQ_DISCARD) >> split = bio_clone_bioset(bio, gfp, bs); >> else >> split = bio_clone_fast(bio, gfp, bs); >> -- >> >> Any thoughts? I don't think bio_clone_fast() is wrong, which use bvec table from the original bio, and drivers are not allowed to change the table, and should just call the standard iterator helpers to access bvec. Also bio_for_each_segment_all() can't be used to iterate over one cloned bio. Thanks, diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 8ba1e97..5328f3c 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -10,6 +10,8 @@ #include #include +#define MAX_SG 128 + struct nullb_cmd { struct list_head list; struct llist_node ll_list; @@ -19,6 +21,7 @@ struct nullb_cmd { unsigned int tag; struct nullb_queue *nq; struct hrtimer timer; + struct scatterlist sg[MAX_SG]; }; struct nullb_queue { @@ -351,6 +354,23 @@ static void null_request_fn(struct request_queue *q) } } +static int handle_sg(struct nullb_cmd *cmd) +{ + int num, i; + struct request *rq = cmd->rq; + + num = blk_rq_map_sg(rq->q, rq, cmd->sg); + trace_printk("%s dump sg for %p(%d)\n", __func__, rq, rq->tag); + for (i = 0; i < num; i++) { + struct scatterlist *sg = &cmd->sg[i]; + trace_printk("\t %4d: %u %u %llu\n", + i, sg->offset, sg->length, + (unsigned long long)sg->dma_address); + } + + return 0; +} + static int null_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { @@ -365,6 +385,8 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(bd->rq); + handle_sg(cmd); + null_handle_cmd(cmd); return BLK_MQ_RQ_QUEUE_OK; } @@ -707,6 +729,8 @@ static int null_add_dev(void) queue_flag_set_unlocked(QUEUE_FLAG_NONROT, nullb->q); queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, nullb->q); + blk_queue_virt_boundary(nullb->q, (1UL << 12) - 1); + blk_queue_max_segments(nullb->q, MAX_SG); mutex_lock(&lock); list_add_tail(&nullb->list, &nullb_list);