From patchwork Wed Mar 13 01:10:20 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 2260651 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 37F1ADF23A for ; Wed, 13 Mar 2013 01:14:03 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UFaDP-0005CB-JJ; Wed, 13 Mar 2013 01:10:47 +0000 Received: from userp1040.oracle.com ([156.151.31.81]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1UFaDL-0005Bi-Vz for linux-arm-kernel@lists.infradead.org; Wed, 13 Mar 2013 01:10:45 +0000 Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by userp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r2D1APSj029191 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 13 Mar 2013 01:10:26 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r2D1ANmV006275 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 13 Mar 2013 01:10:23 GMT Received: from abhmt107.oracle.com (abhmt107.oracle.com [141.146.116.59]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id r2D1AMub017524; Tue, 12 Mar 2013 20:10:22 -0500 Received: from localhost (/67.171.138.228) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 12 Mar 2013 18:10:22 -0700 Date: Tue, 12 Mar 2013 18:10:20 -0700 From: "Darrick J. Wong" To: Andrew Morton Subject: Re: [PATCH] bounce:fix bug, avoid to flush dcache on slab page from jbd2. Message-ID: <20130313011020.GA5313@blackbox.djwong.org> References: <5139DB90.5090302@gmail.com> <20130312153221.0d26fe5599d4885e51bb0c7c@linux-foundation.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20130312153221.0d26fe5599d4885e51bb0c7c@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130312_211044_185551_D2B6F7D0 X-CRM114-Status: GOOD ( 32.46 ) X-Spam-Score: -6.8 (------) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-6.8 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [156.151.31.81 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -2.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines Cc: Jens Axboe , Jan Kara , Catalin Marinas , Shuge , Will Deacon , linux-kernel@vger.kernel.org, Kevin , linux-mm@kvack.org, Theodore Ts'o , linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On Tue, Mar 12, 2013 at 03:32:21PM -0700, Andrew Morton wrote: > On Fri, 08 Mar 2013 20:37:36 +0800 Shuge wrote: > > > The bounce accept slab pages from jbd2, and flush dcache on them. > > When enabling VM_DEBUG, it will tigger VM_BUG_ON in page_mapping(). > > So, check PageSlab to avoid it in __blk_queue_bounce(). > > > > Bug URL: http://lkml.org/lkml/2013/3/7/56 > > > > ... > > > > --- a/mm/bounce.c > > +++ b/mm/bounce.c > > @@ -214,7 +214,8 @@ static void __blk_queue_bounce(struct request_queue > > *q, struct bio **bio_orig, > > if (rw == WRITE) { > > char *vto, *vfrom; > > - flush_dcache_page(from->bv_page); > > + if (unlikely(!PageSlab(from->bv_page))) > > + flush_dcache_page(from->bv_page); > > vto = page_address(to->bv_page) + to->bv_offset; > > vfrom = kmap(from->bv_page) + from->bv_offset; > > memcpy(vto, vfrom, to->bv_len); > > I guess this is triggered by Catalin's f1a0c4aa0937975b ("arm64: Cache > maintenance routines"), which added a page_mapping() call to arm64's > arch/arm64/mm/flush.c:flush_dcache_page(). > > What's happening is that jbd2 is using kmalloc() to allocate buffer_head > data. That gets submitted down the BIO layer and __blk_queue_bounce() > calls flush_dcache_page() which in the arm64 case calls page_mapping() > and page_mapping() does VM_BUG_ON(PageSlab) and splat. > > The unusual thing about all of this is that the payload for some disk > IO is coming from kmalloc, rather than being a user page. It's oddball > but we've done this for ages and should continue to support it. > > > Now, the page from kmalloc() cannot be in highmem, so why did the > bounce code decide to bounce it? > > __blk_queue_bounce() does > > /* > * is destination page below bounce pfn? > */ > if (page_to_pfn(page) <= queue_bounce_pfn(q) && !force) > continue; > > and `force' comes from must_snapshot_stable_pages(). But > must_snapshot_stable_pages() must have returned false, because if it > had returned true then it would have been must_snapshot_stable_pages() > which went BUG, because must_snapshot_stable_pages() calls page_mapping(). > > So my tentative diagosis is that arm64 is fishy. A page which was > allocated via jbd2_alloc(GFP_NOFS)->kmem_cache_alloc() ended up being > above arm64's queue_bounce_pfn(). Can you please do a bit of > investigation to work out if this is what is happening? Find out why > __blk_queue_bounce() decided to bounce a page which shouldn't have been > bounced? That sure is strange. I didn't see any obvious reasons why we'd end up with a kmalloc above queue_bounce_pfn(). But then I don't have any arm64s either. > This is all terribly fragile :( afaict if someone sets > bdi_cap_stable_pages_required() against that jbd2 queue, we're going to > hit that BUG_ON() again, via must_snapshot_stable_pages()'s > page_mapping() call. (Darrick, this means you ;)) Wheeee. You're right, we shouldn't be calling page_mapping on slab pages. We can keep walking the bio segments to find a non-slab page that can tell us MS_SNAP_STABLE is set, since we probably won't need the bounce buffer anyway. How does something like this look? (+ the patch above) --D Subject: [PATCH] mm: Don't blow up on slab pages being written to disk Don't assume that all pages attached to a bio are non-slab pages. This happens if (for example) jbd2 allocates a buffer out of the slab to hold frozen data. If we encounter a slab page, just ignore the page and keep searching. Hopefully filesystems are smart enough to guarantee that slab pages won't be dirtied while they're also being written to disk. Signed-off-by: Darrick J. Wong --- mm/bounce.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/bounce.c b/mm/bounce.c index 5f89017..af34855 100644 --- a/mm/bounce.c +++ b/mm/bounce.c @@ -199,6 +199,8 @@ static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio) */ bio_for_each_segment(from, bio, i) { page = from->bv_page; + if (PageSlab(page)) + continue; mapping = page_mapping(page); if (!mapping) continue;