From patchwork Wed Mar 1 15:05:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 9598581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1507B604DC for ; Wed, 1 Mar 2017 15:06:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0905428518 for ; Wed, 1 Mar 2017 15:06:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F1E6B2856F; Wed, 1 Mar 2017 15:06:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5C7A28518 for ; Wed, 1 Mar 2017 15:06:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752132AbdCAPG3 (ORCPT ); Wed, 1 Mar 2017 10:06:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:48084 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751706AbdCAPFv (ORCPT ); Wed, 1 Mar 2017 10:05:51 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id B35B7AAB4; Wed, 1 Mar 2017 15:05:46 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 99DDD1E070A; Wed, 1 Mar 2017 16:05:44 +0100 (CET) Date: Wed, 1 Mar 2017 16:05:44 +0100 From: Jan Kara To: Al Viro Cc: Dmitry Vyukov , "linux-fsdevel@vger.kernel.org" , LKML , Jens Axboe , Andrew Morton , Tejun Heo , Jan Kara , Johannes Weiner , "linux-mm@kvack.org" , Andrey Ryabinin , syzkaller Subject: Re: mm: GPF in bdi_put Message-ID: <20170301150544.GH20512@quack2.suse.cz> References: <20170227182755.GR29622@ZenIV.linux.org.uk> <20170301142909.GG20512@quack2.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170301142909.GG20512@quack2.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed 01-03-17 15:29:09, Jan Kara wrote: > On Mon 27-02-17 18:27:55, Al Viro wrote: > > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: > > > Hello, > > > > > > The following program triggers GPF in bdi_put: > > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > > > What happens is > > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() > > and then promptly destroys the new instance it has created. > > * the only inode created on that sucker (root directory, that > > is) gets evicted. > > * most of ->evict_inode() is harmless, until it gets to > > if (bdev->bd_bdi != &noop_backing_dev_info) > > bdi_put(bdev->bd_bdi); > > Thanks for the analysis! > > > added there by "block: Make blk_get_backing_dev_info() safe without open bdev". > > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has > > placed initialization into bdget()), we step into shit of varying nastiness, > > depending on phase of moon, etc. > > Yup, I've missed that the root inode of bdev superblock does not go through > bdget() (in fact I didn't think what happens with root inode for bdev > superblock at all) and thus bd_bdi is left uninitialized in that case. I'll > send a fix for that in a while. > > > Could somebody explain WTF do we have those two lines in bdev_evict_inode(), > > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only > > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is > > the matching bdi_put() not in __blkdev_put()? Jan? > > The problem is writeback code (from flusher work or through sync(2) - > generally inode_to_bdi() users) can be looking at bdev inode independently > from it being open. So if they start looking while the bdev is open but the > dereference happens after it is closed and device removed, we oops. We have > seen oopses due to this for quite a while. And all the stuff that is done > in __blkdev_put() is not enough to prevent writeback code from having a > look whether there is not something to write. > > So what we do now is that once we establish valid bd_bdi reference, we > leave it alone until bdev inode gets evicted. And to handle the case when > underlying device actually changes, we unhash bdev inode when the device > gets removed from the system so that it cannot be found by bdget() anymore. Attached patch fixes the problem for me. I'll post it officially tomorrow once Al has a chance to reply... Honza From a533c8dd1fb4dbf840cd3adaf68afb6ad6851ddc Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 1 Mar 2017 15:31:11 +0100 Subject: [PATCH] block: Initialize bd_bdi on inode initialization So far we initialized bd_bdi only in bdget(). That is fine for normal bdev inodes however for the special case of the root inode of blockdev_superblock that function is never called and thus bd_bdi is left uninitialized. As a result bdev_evict_inode() may oops doing bdi_put(root->bd_bdi) on that inode as can be seen when doing: mount -t bdev none /mnt Fix the problem by initializing bd_bdi when first allocating the inode and then reinitializing bd_bdi in bdev_evict_inode(). Thanks to syzkaller team for finding the problem. Reported-by: Dmitry Vyukov Fixes: b1d2dc5659b41741f5a29b2ade76ffb4e5bb13d8 Signed-off-by: Jan Kara --- fs/block_dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 77c30f15a02c..2eca00ec4370 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -870,6 +870,7 @@ static void init_once(void *foo) #ifdef CONFIG_SYSFS INIT_LIST_HEAD(&bdev->bd_holder_disks); #endif + bdev->bd_bdi = &noop_backing_dev_info; inode_init_once(&ei->vfs_inode); /* Initialize mutex for freeze. */ mutex_init(&bdev->bd_fsfreeze_mutex); @@ -884,8 +885,10 @@ static void bdev_evict_inode(struct inode *inode) spin_lock(&bdev_lock); list_del_init(&bdev->bd_list); spin_unlock(&bdev_lock); - if (bdev->bd_bdi != &noop_backing_dev_info) + if (bdev->bd_bdi != &noop_backing_dev_info) { bdi_put(bdev->bd_bdi); + bdev->bd_bdi = &noop_backing_dev_info; + } } static const struct super_operations bdev_sops = { @@ -988,7 +991,6 @@ struct block_device *bdget(dev_t dev) bdev->bd_contains = NULL; bdev->bd_super = NULL; bdev->bd_inode = inode; - bdev->bd_bdi = &noop_backing_dev_info; bdev->bd_block_size = i_blocksize(inode); bdev->bd_part_count = 0; bdev->bd_invalidated = 0; -- 2.10.2