From patchwork Wed Mar 13 01:10:20 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Darrick J. Wong" <darrick.wong@oracle.com>
X-Patchwork-Id: 2260651
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	by patchwork2.kernel.org (Postfix) with ESMTP id 37F1ADF23A
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Wed, 13 Mar 2013 01:14:03 +0000 (UTC)
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
	id 1UFaDP-0005CB-JJ; Wed, 13 Mar 2013 01:10:47 +0000
Received: from userp1040.oracle.com ([156.151.31.81])
	by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux))
	id 1UFaDL-0005Bi-Vz for linux-arm-kernel@lists.infradead.org;
	Wed, 13 Mar 2013 01:10:45 +0000
Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94])
	by userp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with
	ESMTP id r2D1APSj029191
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Wed, 13 Mar 2013 01:10:26 GMT
Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158])
	by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id
	r2D1ANmV006275
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 13 Mar 2013 01:10:23 GMT
Received: from abhmt107.oracle.com (abhmt107.oracle.com [141.146.116.59])
	by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id
	r2D1AMub017524; Tue, 12 Mar 2013 20:10:22 -0500
Received: from localhost (/67.171.138.228)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Tue, 12 Mar 2013 18:10:22 -0700
Date: Tue, 12 Mar 2013 18:10:20 -0700
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] bounce:fix bug, avoid to flush dcache on slab page from
	jbd2.
Message-ID: <20130313011020.GA5313@blackbox.djwong.org>
References: <5139DB90.5090302@gmail.com>
	<20130312153221.0d26fe5599d4885e51bb0c7c@linux-foundation.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20130312153221.0d26fe5599d4885e51bb0c7c@linux-foundation.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Source-IP: ucsinet22.oracle.com [156.151.31.94]
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20130312_211044_185551_D2B6F7D0 
X-CRM114-Status: GOOD (  32.46  )
X-Spam-Score: -6.8 (------)
X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary:
	Content analysis details:   (-6.8 points)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/,
	medium trust [156.151.31.81 listed in list.dnswl.org]
	-0.0 SPF_PASS               SPF: sender matches SPF record
	-2.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay
	domain
	-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
	[score: 0.0000]
	0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay
	lines
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
	Catalin Marinas <catalin.marinas@arm.com>, Shuge <shugelinux@gmail.com>,
	Will Deacon <will.deacon@arm.com>, linux-kernel@vger.kernel.org,
	Kevin <kevin@allwinneretch.com>, linux-mm@kvack.org,
	Theodore Ts'o <tytso@mit.edu>, linux-ext4@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

On Tue, Mar 12, 2013 at 03:32:21PM -0700, Andrew Morton wrote:
> On Fri, 08 Mar 2013 20:37:36 +0800 Shuge <shugelinux@gmail.com> wrote:
> 
> > The bounce accept slab pages from jbd2, and flush dcache on them.
> > When enabling VM_DEBUG, it will tigger VM_BUG_ON in page_mapping().
> > So, check PageSlab to avoid it in __blk_queue_bounce().
> > 
> > Bug URL: http://lkml.org/lkml/2013/3/7/56
> > 
> > ...
> >
> > --- a/mm/bounce.c
> > +++ b/mm/bounce.c
> > @@ -214,7 +214,8 @@ static void __blk_queue_bounce(struct request_queue 
> > *q, struct bio **bio_orig,
> >   		if (rw == WRITE) {
> >   			char *vto, *vfrom;
> >   -			flush_dcache_page(from->bv_page);
> > +			if (unlikely(!PageSlab(from->bv_page)))
> > +				flush_dcache_page(from->bv_page);
> >   			vto = page_address(to->bv_page) + to->bv_offset;
> >   			vfrom = kmap(from->bv_page) + from->bv_offset;
> >   			memcpy(vto, vfrom, to->bv_len);
> 
> I guess this is triggered by Catalin's f1a0c4aa0937975b ("arm64: Cache
> maintenance routines"), which added a page_mapping() call to arm64's
> arch/arm64/mm/flush.c:flush_dcache_page().
> 
> What's happening is that jbd2 is using kmalloc() to allocate buffer_head
> data.  That gets submitted down the BIO layer and __blk_queue_bounce()
> calls flush_dcache_page() which in the arm64 case calls page_mapping()
> and page_mapping() does VM_BUG_ON(PageSlab) and splat.
> 
> The unusual thing about all of this is that the payload for some disk
> IO is coming from kmalloc, rather than being a user page.  It's oddball
> but we've done this for ages and should continue to support it.
> 
> 
> Now, the page from kmalloc() cannot be in highmem, so why did the
> bounce code decide to bounce it?
> 
> __blk_queue_bounce() does
> 
> 		/*
> 		 * is destination page below bounce pfn?
> 		 */
> 		if (page_to_pfn(page) <= queue_bounce_pfn(q) && !force)
> 			continue;
> 
> and `force' comes from must_snapshot_stable_pages().  But
> must_snapshot_stable_pages() must have returned false, because if it
> had returned true then it would have been must_snapshot_stable_pages()
> which went BUG, because must_snapshot_stable_pages() calls page_mapping().
> 
> So my tentative diagosis is that arm64 is fishy.  A page which was
> allocated via jbd2_alloc(GFP_NOFS)->kmem_cache_alloc() ended up being
> above arm64's queue_bounce_pfn().  Can you please do a bit of
> investigation to work out if this is what is happening?  Find out why
> __blk_queue_bounce() decided to bounce a page which shouldn't have been
> bounced?

That sure is strange.  I didn't see any obvious reasons why we'd end up with a
kmalloc above queue_bounce_pfn().  But then I don't have any arm64s either.

> This is all terribly fragile :( afaict if someone sets
> bdi_cap_stable_pages_required() against that jbd2 queue, we're going to
> hit that BUG_ON() again, via must_snapshot_stable_pages()'s
> page_mapping() call.  (Darrick, this means you ;))

Wheeee.  You're right, we shouldn't be calling page_mapping on slab pages.
We can keep walking the bio segments to find a non-slab page that can tell us
MS_SNAP_STABLE is set, since we probably won't need the bounce buffer anyway.

How does something like this look?  (+ the patch above)

--D

Subject: [PATCH] mm: Don't blow up on slab pages being written to disk

Don't assume that all pages attached to a bio are non-slab pages.  This happens
if (for example) jbd2 allocates a buffer out of the slab to hold frozen data.
If we encounter a slab page, just ignore the page and keep searching.
Hopefully filesystems are smart enough to guarantee that slab pages won't
be dirtied while they're also being written to disk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/bounce.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/bounce.c b/mm/bounce.c
index 5f89017..af34855 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -199,6 +199,8 @@ static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio)
 	 */
 	bio_for_each_segment(from, bio, i) {
 		page = from->bv_page;
+		if (PageSlab(page))
+			continue;
 		mapping = page_mapping(page);
 		if (!mapping)
 			continue;