From patchwork Wed Jul 25 23:26:11 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kent Overstreet <koverstreet@google.com>
X-Patchwork-Id: 1239991
Return-Path: <dm-devel-bounces@redhat.com>
X-Original-To: patchwork-dm-devel@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25])
	by patchwork2.kernel.org (Postfix) with ESMTP id 5D8A0DFFCD
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Wed, 25 Jul 2012 23:29:15 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com
	(lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q6PNQMYW003677;
	Wed, 25 Jul 2012 19:26:23 -0400
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
	(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id q6PNQMk2019082 for <dm-devel@listman.util.phx.redhat.com>;
	Wed, 25 Jul 2012 19:26:22 -0400
Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com
	[10.5.110.17])
	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id q6PNQGgX003410
	for <dm-devel@redhat.com>; Wed, 25 Jul 2012 19:26:17 -0400
Received: from mail-yx0-f174.google.com (mail-yx0-f174.google.com
	[209.85.213.174])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q6PNQFta016922
	for <dm-devel@redhat.com>; Wed, 25 Jul 2012 19:26:15 -0400
Received: by yenl2 with SMTP id l2so1479312yen.33
	for <dm-devel@redhat.com>; Wed, 25 Jul 2012 16:26:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=20120113;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	bh=gYXWzs6be4s46ZupG34N3B7cFUOVxCpM9YhCkryyNzU=;
	b=Sk8QgB1bJOoNRB5sWgoWcp6imQdJagH3niMyOtQLa27ueYD/3TBimvkk10ycVkkRTb
	9Jpeo4vcd0mnasR/Z2oEaS4Rsk/KD6s91pbS13pn6W+u77/hz3NRZjtzGH8nCLTI4MWy
	uldh8zCi4EPitqJKJxa8jGhs4KhCFsFe2wPjaL45ZJsNK8AwpYqPOH9Yo6qT24VIaFnT
	7HXIGEFBH0zzxcJ856L+a+yoGq1oQdf6mEApO4Rlr55BwmcRuHSF6AbvEOZolqTC7Qq+
	Ji+Chjx7zrRYcymo2FqsboJU+vntvJLjuw0CgCeZ5H4vFk1YEJzZc8wwroIjSO+QsL8X
	3R4w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent
	:x-gm-message-state;
	bh=gYXWzs6be4s46ZupG34N3B7cFUOVxCpM9YhCkryyNzU=;
	b=iX6HqV9QATr7ON8hwO/GQIosaOcebr7M/uK4HCqfWa5dg1nWklnZHDXMfS35pQS42B
	WdP8BzHUFg0qiZABqTeQBbqESrPPTDBYh/XSqR6oRbiv5Qho5zFritNt6nWtTcVc1STQ
	aDMI77yGdVh2wt3vGRGvVZ5bzwP54sL1iPI/fIf5G1c5Xy0pRWGHK//S3IpqdKaAcItO
	54NKx129mKy2e53oFqBZ7YVuXz5YIoSE2wwhJNFiYn20z5Uxyv7L/Tq8kaBML/2V2jlk
	ly7Kq0NxF5HfHIkasFzE9T/a4BlIxk3JFGlzO84EFfuZJlPd6u35dV2JOMwL/GeOpug8
	t/7w==
Received: by 10.66.79.195 with SMTP id l3mr16089477pax.33.1343258774915;
	Wed, 25 Jul 2012 16:26:14 -0700 (PDT)
Received: by 10.66.79.195 with SMTP id l3mr16089450pax.33.1343258774810;
	Wed, 25 Jul 2012 16:26:14 -0700 (PDT)
Received: from moria.home.lan (c-67-188-232-164.hsd1.ca.comcast.net.
	[67.188.232.164]) by mx.google.com with ESMTPS id
	wa14sm15248288pbc.10.2012.07.25.16.26.13
	(version=SSLv3 cipher=OTHER); Wed, 25 Jul 2012 16:26:14 -0700 (PDT)
Date: Wed, 25 Jul 2012 16:26:11 -0700
From: Kent Overstreet <koverstreet@google.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Message-ID: <20120725232611.GD8673@moria.home.lan>
References: <1343160689-12378-1-git-send-email-koverstreet@google.com>
	<1343160689-12378-9-git-send-email-koverstreet@google.com>
	<500FDEBC.9050607@panasas.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <500FDEBC.9050607@panasas.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Gm-Message-State: 
 ALoCoQnSQXK7jZjprMtIjdAEXCuy9muIrlpFRBA9OLuItY5/4/NBYuE1MjGUmnr/QNozKhLZTzn4cWIAVYJI7rNQ0K+BcYHW1Jurt50KQMrV1rAxkoaEzMpHinjoQEDh1MEfJX3QWabOSWab5mq5SKMJGUHUq+U+VoZPUJS1qumd9byPq1e8mHDi716QNdW+z5lZ6m3Vr/pi
X-RedHat-Spam-Score: -2.711  (BAYES_00, DKIM_SIGNED, DKIM_VALID,
	DKIM_VALID_AU,
	RCVD_IN_DNSWL_LOW, SPF_PASS, T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Scanned-By: MIMEDefang 2.68 on 10.5.110.17
X-loop: dm-devel@redhat.com
Cc: axboe@kernel.dk, dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	linux-bcache@vger.kernel.org, mpatocka@redhat.com, vgoyal@redhat.com,
	yehuda@hq.newdream.net, tj@kernel.org, sage@newdream.net,
	agk@redhat.com, drbd-dev@lists.linbit.com
Subject: Re: [dm-devel] [PATCH v4 08/12] block: Introduce new bio_split()
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com

On Wed, Jul 25, 2012 at 02:55:40PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > The new bio_split() can split arbitrary bios - it's not restricted to
> > single page bios, like the old bio_split() (previously renamed to
> > bio_pair_split()). It also has different semantics - it doesn't allocate
> > a struct bio_pair, leaving it up to the caller to handle completions.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > ---
> >  fs/bio.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 99 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 5d02aa5..a15e121 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
> >  EXPORT_SYMBOL(bio_pair_split);
> >  
> >  /**
> > + * bio_split - split a bio
> > + * @bio:	bio to split
> > + * @sectors:	number of sectors to split from the front of @bio
> > + * @gfp:	gfp mask
> > + * @bs:		bio set to allocate from
> > + *
> > + * Allocates and returns a new bio which represents @sectors from the start of
> > + * @bio, and updates @bio to represent the remaining sectors.
> > + *
> > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > + * unchanged.
> > + *
> > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > + * freed before the split.
> > + *
> > + * If bio_split() is running under generic_make_request(), it's not safe to
> > + * allocate more than one bio from the same bio set. Therefore, if it is running
> > + * under generic_make_request() it masks out __GFP_WAIT when doing the
> > + * allocation. The caller must check for failure if there's any possibility of
> > + * it being called from under generic_make_request(); it is then the caller's
> > + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> > + */
> > +struct bio *bio_split(struct bio *bio, int sectors,
> > +		      gfp_t gfp, struct bio_set *bs)
> > +{
> > +	unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > +	struct bio_vec *bv;
> > +	struct bio *ret = NULL;
> > +
> > +	BUG_ON(sectors <= 0);
> > +
> > +	/*
> > +	 * If we're being called from underneath generic_make_request() and we
> > +	 * already allocated any bios from this bio set, we risk deadlock if we
> > +	 * use the mempool. So instead, we possibly fail and let the caller punt
> > +	 * to workqueue or somesuch and retry in a safe context.
> > +	 */
> > +	if (current->bio_list)
> > +		gfp &= ~__GFP_WAIT;
> 
> 
> NACK!
> 
> If as you said above in the comment:
> 	if there's any possibility of it being called from under generic_make_request();
>         it is then the caller's responsibility to ...
> 
> So all the comment needs to say is: 
> 	... caller's responsibility to not set __GFP_WAIT at gfp.
> 
> And drop this here. It is up to the caller to decide. If the caller wants he can do
> "if (current->bio_list)" by his own.
> 
> This is a general purpose utility you might not know it's context.
> for example with osdblk above will break.

Well I'm highly highly skeptical that using __GFP_WAIT under
generic_make_request() is ever a sane thing to do - it could certainly
be safe in specific circumstances, but it's just such a fragile thing to
rely on, you have to _never_ use the same bio pool more than once. And
even then I bet there's other subtle ways it could break.

But you're not the first to complain about it, and your point about
existing code is compelling.

commit ea124f899af29887e24d07497442066572012e5b
Author: Kent Overstreet <koverstreet@google.com>
Date:   Wed Jul 25 16:25:10 2012 -0700

    block: Introduce new bio_split()
    
    The new bio_split() can split arbitrary bios - it's not restricted to
    single page bios, like the old bio_split() (previously renamed to
    bio_pair_split()). It also has different semantics - it doesn't allocate
    a struct bio_pair, leaving it up to the caller to handle completions.
---
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

diff --git a/fs/bio.c b/fs/bio.c
index 0470376..312e5de 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
 EXPORT_SYMBOL(bio_pair_split);
 
 /**
+ * bio_split - split a bio
+ * @bio:	bio to split
+ * @sectors:	number of sectors to split from the front of @bio
+ * @gfp:	gfp mask
+ * @bs:		bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * BIG FAT WARNING:
+ *
+ * If you're calling this from under generic_make_request() (i.e.
+ * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
+ * workqueue if the allocation fails. Otherwise, your code will probably
+ * deadlock.
+ *
+ * You can't allocate more than once from the same bio pool without submitting
+ * the previous allocations (so they'll eventually complete and deallocate
+ * themselves), but if you're under generic_make_request() those previous
+ * allocations won't submit until you return . And if you have to split bios,
+ * you should expect that some bios will require multiple splits.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+		      gfp_t gfp, struct bio_set *bs)
+{
+	unsigned idx, vcnt = 0, nbytes = sectors << 9;
+	struct bio_vec *bv;
+	struct bio *ret = NULL;
+
+	BUG_ON(sectors <= 0);
+
+	if (sectors >= bio_sectors(bio))
+		return bio;
+
+	trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+			  bio->bi_sector + sectors);
+
+	bio_for_each_segment(bv, bio, idx) {
+		vcnt = idx - bio->bi_idx;
+
+		if (!nbytes) {
+			ret = bio_alloc_bioset(gfp, 0, bs);
+			if (!ret)
+				return NULL;
+
+			ret->bi_io_vec = bio_iovec(bio);
+			ret->bi_flags |= 1 << BIO_CLONED;
+			break;
+		} else if (nbytes < bv->bv_len) {
+			ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+			if (!ret)
+				return NULL;
+
+			memcpy(ret->bi_io_vec, bio_iovec(bio),
+			       sizeof(struct bio_vec) * vcnt);
+
+			ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+			bv->bv_offset	+= nbytes;
+			bv->bv_len	-= nbytes;
+			break;
+		}
+
+		nbytes -= bv->bv_len;
+	}
+
+	ret->bi_bdev	= bio->bi_bdev;
+	ret->bi_sector	= bio->bi_sector;
+	ret->bi_size	= sectors << 9;
+	ret->bi_rw	= bio->bi_rw;
+	ret->bi_vcnt	= vcnt;
+	ret->bi_max_vecs = vcnt;
+	ret->bi_end_io	= bio->bi_end_io;
+	ret->bi_private	= bio->bi_private;
+
+	bio->bi_sector	+= sectors;
+	bio->bi_size	-= sectors << 9;
+	bio->bi_idx	 = idx;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_clone(ret, bio, gfp, bs);
+		bio_integrity_trim(ret, 0, bio_sectors(ret));
+		bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
  *      bio_sector_offset - Find hardware sector offset in bio
  *      @bio:           bio to inspect
  *      @index:         bio_vec index