From patchwork Wed Jul 25 23:26:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 1239991 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by patchwork2.kernel.org (Postfix) with ESMTP id 5D8A0DFFCD for ; Wed, 25 Jul 2012 23:29:15 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q6PNQMYW003677; Wed, 25 Jul 2012 19:26:23 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q6PNQMk2019082 for ; Wed, 25 Jul 2012 19:26:22 -0400 Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com [10.5.110.17]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q6PNQGgX003410 for ; Wed, 25 Jul 2012 19:26:17 -0400 Received: from mail-yx0-f174.google.com (mail-yx0-f174.google.com [209.85.213.174]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q6PNQFta016922 for ; Wed, 25 Jul 2012 19:26:15 -0400 Received: by yenl2 with SMTP id l2so1479312yen.33 for ; Wed, 25 Jul 2012 16:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=gYXWzs6be4s46ZupG34N3B7cFUOVxCpM9YhCkryyNzU=; b=Sk8QgB1bJOoNRB5sWgoWcp6imQdJagH3niMyOtQLa27ueYD/3TBimvkk10ycVkkRTb 9Jpeo4vcd0mnasR/Z2oEaS4Rsk/KD6s91pbS13pn6W+u77/hz3NRZjtzGH8nCLTI4MWy uldh8zCi4EPitqJKJxa8jGhs4KhCFsFe2wPjaL45ZJsNK8AwpYqPOH9Yo6qT24VIaFnT 7HXIGEFBH0zzxcJ856L+a+yoGq1oQdf6mEApO4Rlr55BwmcRuHSF6AbvEOZolqTC7Qq+ Ji+Chjx7zrRYcymo2FqsboJU+vntvJLjuw0CgCeZ5H4vFk1YEJzZc8wwroIjSO+QsL8X 3R4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent :x-gm-message-state; bh=gYXWzs6be4s46ZupG34N3B7cFUOVxCpM9YhCkryyNzU=; b=iX6HqV9QATr7ON8hwO/GQIosaOcebr7M/uK4HCqfWa5dg1nWklnZHDXMfS35pQS42B WdP8BzHUFg0qiZABqTeQBbqESrPPTDBYh/XSqR6oRbiv5Qho5zFritNt6nWtTcVc1STQ aDMI77yGdVh2wt3vGRGvVZ5bzwP54sL1iPI/fIf5G1c5Xy0pRWGHK//S3IpqdKaAcItO 54NKx129mKy2e53oFqBZ7YVuXz5YIoSE2wwhJNFiYn20z5Uxyv7L/Tq8kaBML/2V2jlk ly7Kq0NxF5HfHIkasFzE9T/a4BlIxk3JFGlzO84EFfuZJlPd6u35dV2JOMwL/GeOpug8 t/7w== Received: by 10.66.79.195 with SMTP id l3mr16089477pax.33.1343258774915; Wed, 25 Jul 2012 16:26:14 -0700 (PDT) Received: by 10.66.79.195 with SMTP id l3mr16089450pax.33.1343258774810; Wed, 25 Jul 2012 16:26:14 -0700 (PDT) Received: from moria.home.lan (c-67-188-232-164.hsd1.ca.comcast.net. [67.188.232.164]) by mx.google.com with ESMTPS id wa14sm15248288pbc.10.2012.07.25.16.26.13 (version=SSLv3 cipher=OTHER); Wed, 25 Jul 2012 16:26:14 -0700 (PDT) Date: Wed, 25 Jul 2012 16:26:11 -0700 From: Kent Overstreet To: Boaz Harrosh Message-ID: <20120725232611.GD8673@moria.home.lan> References: <1343160689-12378-1-git-send-email-koverstreet@google.com> <1343160689-12378-9-git-send-email-koverstreet@google.com> <500FDEBC.9050607@panasas.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <500FDEBC.9050607@panasas.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQnSQXK7jZjprMtIjdAEXCuy9muIrlpFRBA9OLuItY5/4/NBYuE1MjGUmnr/QNozKhLZTzn4cWIAVYJI7rNQ0K+BcYHW1Jurt50KQMrV1rAxkoaEzMpHinjoQEDh1MEfJX3QWabOSWab5mq5SKMJGUHUq+U+VoZPUJS1qumd9byPq1e8mHDi716QNdW+z5lZ6m3Vr/pi X-RedHat-Spam-Score: -2.711 (BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW, SPF_PASS, T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.17 X-loop: dm-devel@redhat.com Cc: axboe@kernel.dk, dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org, mpatocka@redhat.com, vgoyal@redhat.com, yehuda@hq.newdream.net, tj@kernel.org, sage@newdream.net, agk@redhat.com, drbd-dev@lists.linbit.com Subject: Re: [dm-devel] [PATCH v4 08/12] block: Introduce new bio_split() X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com On Wed, Jul 25, 2012 at 02:55:40PM +0300, Boaz Harrosh wrote: > On 07/24/2012 11:11 PM, Kent Overstreet wrote: > > > The new bio_split() can split arbitrary bios - it's not restricted to > > single page bios, like the old bio_split() (previously renamed to > > bio_pair_split()). It also has different semantics - it doesn't allocate > > a struct bio_pair, leaving it up to the caller to handle completions. > > > > Signed-off-by: Kent Overstreet > > --- > > fs/bio.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 99 insertions(+), 0 deletions(-) > > > > diff --git a/fs/bio.c b/fs/bio.c > > index 5d02aa5..a15e121 100644 > > --- a/fs/bio.c > > +++ b/fs/bio.c > > @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors) > > EXPORT_SYMBOL(bio_pair_split); > > > > /** > > + * bio_split - split a bio > > + * @bio: bio to split > > + * @sectors: number of sectors to split from the front of @bio > > + * @gfp: gfp mask > > + * @bs: bio set to allocate from > > + * > > + * Allocates and returns a new bio which represents @sectors from the start of > > + * @bio, and updates @bio to represent the remaining sectors. > > + * > > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio > > + * unchanged. > > + * > > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a > > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not > > + * freed before the split. > > + * > > + * If bio_split() is running under generic_make_request(), it's not safe to > > + * allocate more than one bio from the same bio set. Therefore, if it is running > > + * under generic_make_request() it masks out __GFP_WAIT when doing the > > + * allocation. The caller must check for failure if there's any possibility of > > + * it being called from under generic_make_request(); it is then the caller's > > + * responsibility to retry from a safe context (by e.g. punting to workqueue). > > + */ > > +struct bio *bio_split(struct bio *bio, int sectors, > > + gfp_t gfp, struct bio_set *bs) > > +{ > > + unsigned idx, vcnt = 0, nbytes = sectors << 9; > > + struct bio_vec *bv; > > + struct bio *ret = NULL; > > + > > + BUG_ON(sectors <= 0); > > + > > + /* > > + * If we're being called from underneath generic_make_request() and we > > + * already allocated any bios from this bio set, we risk deadlock if we > > + * use the mempool. So instead, we possibly fail and let the caller punt > > + * to workqueue or somesuch and retry in a safe context. > > + */ > > + if (current->bio_list) > > + gfp &= ~__GFP_WAIT; > > > NACK! > > If as you said above in the comment: > if there's any possibility of it being called from under generic_make_request(); > it is then the caller's responsibility to ... > > So all the comment needs to say is: > ... caller's responsibility to not set __GFP_WAIT at gfp. > > And drop this here. It is up to the caller to decide. If the caller wants he can do > "if (current->bio_list)" by his own. > > This is a general purpose utility you might not know it's context. > for example with osdblk above will break. Well I'm highly highly skeptical that using __GFP_WAIT under generic_make_request() is ever a sane thing to do - it could certainly be safe in specific circumstances, but it's just such a fragile thing to rely on, you have to _never_ use the same bio pool more than once. And even then I bet there's other subtle ways it could break. But you're not the first to complain about it, and your point about existing code is compelling. commit ea124f899af29887e24d07497442066572012e5b Author: Kent Overstreet Date: Wed Jul 25 16:25:10 2012 -0700 block: Introduce new bio_split() The new bio_split() can split arbitrary bios - it's not restricted to single page bios, like the old bio_split() (previously renamed to bio_pair_split()). It also has different semantics - it doesn't allocate a struct bio_pair, leaving it up to the caller to handle completions. --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/fs/bio.c b/fs/bio.c index 0470376..312e5de 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors) EXPORT_SYMBOL(bio_pair_split); /** + * bio_split - split a bio + * @bio: bio to split + * @sectors: number of sectors to split from the front of @bio + * @gfp: gfp mask + * @bs: bio set to allocate from + * + * Allocates and returns a new bio which represents @sectors from the start of + * @bio, and updates @bio to represent the remaining sectors. + * + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio + * unchanged. + * + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a + * bvec boundry; it is the caller's responsibility to ensure that @bio is not + * freed before the split. + * + * BIG FAT WARNING: + * + * If you're calling this from under generic_make_request() (i.e. + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to + * workqueue if the allocation fails. Otherwise, your code will probably + * deadlock. + * + * You can't allocate more than once from the same bio pool without submitting + * the previous allocations (so they'll eventually complete and deallocate + * themselves), but if you're under generic_make_request() those previous + * allocations won't submit until you return . And if you have to split bios, + * you should expect that some bios will require multiple splits. + */ +struct bio *bio_split(struct bio *bio, int sectors, + gfp_t gfp, struct bio_set *bs) +{ + unsigned idx, vcnt = 0, nbytes = sectors << 9; + struct bio_vec *bv; + struct bio *ret = NULL; + + BUG_ON(sectors <= 0); + + if (sectors >= bio_sectors(bio)) + return bio; + + trace_block_split(bdev_get_queue(bio->bi_bdev), bio, + bio->bi_sector + sectors); + + bio_for_each_segment(bv, bio, idx) { + vcnt = idx - bio->bi_idx; + + if (!nbytes) { + ret = bio_alloc_bioset(gfp, 0, bs); + if (!ret) + return NULL; + + ret->bi_io_vec = bio_iovec(bio); + ret->bi_flags |= 1 << BIO_CLONED; + break; + } else if (nbytes < bv->bv_len) { + ret = bio_alloc_bioset(gfp, ++vcnt, bs); + if (!ret) + return NULL; + + memcpy(ret->bi_io_vec, bio_iovec(bio), + sizeof(struct bio_vec) * vcnt); + + ret->bi_io_vec[vcnt - 1].bv_len = nbytes; + bv->bv_offset += nbytes; + bv->bv_len -= nbytes; + break; + } + + nbytes -= bv->bv_len; + } + + ret->bi_bdev = bio->bi_bdev; + ret->bi_sector = bio->bi_sector; + ret->bi_size = sectors << 9; + ret->bi_rw = bio->bi_rw; + ret->bi_vcnt = vcnt; + ret->bi_max_vecs = vcnt; + ret->bi_end_io = bio->bi_end_io; + ret->bi_private = bio->bi_private; + + bio->bi_sector += sectors; + bio->bi_size -= sectors << 9; + bio->bi_idx = idx; + + if (bio_integrity(bio)) { + bio_integrity_clone(ret, bio, gfp, bs); + bio_integrity_trim(ret, 0, bio_sectors(ret)); + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio)); + } + + return ret; +} +EXPORT_SYMBOL_GPL(bio_split); + +/** * bio_sector_offset - Find hardware sector offset in bio * @bio: bio to inspect * @index: bio_vec index