From patchwork Fri Jan 8 00:43:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7981331 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B98E9BEEE5 for ; Fri, 8 Jan 2016 00:43:46 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CEF9A200E1 for ; Fri, 8 Jan 2016 00:43:45 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B215D20142 for ; Fri, 8 Jan 2016 00:43:44 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A9E921A2197; Thu, 7 Jan 2016 16:43:44 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by ml01.01.org (Postfix) with ESMTP id 22DFD1A2197 for ; Thu, 7 Jan 2016 16:43:44 -0800 (PST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 07 Jan 2016 16:43:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,535,1444719600"; d="scan'208";a="856002858" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.136]) by orsmga001.jf.intel.com with ESMTP; 07 Jan 2016 16:43:41 -0800 Subject: [PATCH v3 3/5] block: introduce force_failure_partition() and unmap_dax_inodes() From: Dan Williams To: xfs@oss.sgi.com Date: Thu, 07 Jan 2016 16:43:15 -0800 Message-ID: <20160108004310.36061.4099.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <20160108004254.36061.32864.stgit@dwillia2-desk3.amr.corp.intel.com> References: <20160108004254.36061.32864.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Cc: Jens Axboe , linux-nvdimm@lists.01.org, Dave Chinner , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Jan Kara , linux-fsdevel@vger.kernel.org X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Historically we have waited for filesystem specific heuristics to attempt to guess when a block device is gone. Sometimes this works, but in other cases the system can hang waiting for the fs to trigger its shutdown protocol. The initial motivation for this investigation was to prevent DAX mappings (direct mmap access to persistent memory) from leaking past the lifetime of the hosting block device. However, Dave points out that these shutdown operations are needed in other scenarios. Quoting Dave: For example, if we detect a free space corruption during allocation, it is not safe to trust *any active mapping* because we can't trust that we having handed out the same block to multiple owners. Hence on such a filesystem shutdown, we have to prevent any new DAX mapping from occurring and invalidate all existing mappings as we cannot allow userspace to modify any data or metadata until we've resolved the corruption situation. Cc: Jan Kara Cc: Jens Axboe Cc: Matthew Wilcox Cc: Ross Zwisler Suggested-by: Dave Chinner Signed-off-by: Dan Williams --- block/genhd.c | 7 +++++-- fs/block_dev.c | 19 +++++++++++++++++++ fs/inode.c | 28 ++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index a5bb768111cc..ac0d12c4f895 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -717,12 +717,15 @@ void del_gendisk_queue(struct gendisk *disk) blk_cleanup_queue(disk->queue); - /* pass2 the queue is dead */ + /* pass2 the queue is dead, halt dax */ disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE); - for_each_part(part, &piter) + for_each_part(part, &piter) { + force_failure_partition(disk, part->partno); delete_partition(disk, part->partno); + } disk_part_iter_exit(&piter); + force_failure_partition(disk, 0); del_gendisk_end(disk); } diff --git a/fs/block_dev.c b/fs/block_dev.c index 44d4a1e9244e..9cff33b6baab 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1799,6 +1799,25 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty) } EXPORT_SYMBOL(__invalidate_device); +void force_failure_partition(struct gendisk *disk, int partno) +{ + struct block_device *bdev; + struct super_block *sb; + + bdev = bdget_disk(disk, partno); + if (!bdev) + return; + + sb = get_super(bdev); + if (!sb) + goto out; + + unmap_dax_inodes(sb); + drop_super(sb); + out: + bdput(bdev); +} + void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg) { struct inode *inode, *old_inode = NULL; diff --git a/fs/inode.c b/fs/inode.c index 1be5f9003eb3..ed62e5f78f35 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -673,6 +673,34 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) return busy; } +void unmap_dax_inodes(struct super_block *sb) +{ + struct inode *inode, *_inode = NULL; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) + || !IS_DAX(inode)) { + spin_unlock(&inode->i_lock); + continue; + } + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + + unmap_mapping_range(inode->i_mapping, 0, 0, 1); + iput(_inode); + _inode = inode; + cond_resched(); + + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + iput(_inode); +} +EXPORT_SYMBOL(unmap_dax_inodes); + /* * Isolate the inode from the LRU in preparation for freeing it. * diff --git a/include/linux/fs.h b/include/linux/fs.h index 3aa514254161..a0d55199e628 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2390,6 +2390,7 @@ extern int revalidate_disk(struct gendisk *); extern int check_disk_change(struct block_device *); extern int __invalidate_device(struct block_device *, bool); extern int invalidate_partition(struct gendisk *, int); +extern void force_failure_partition(struct gendisk *, int); #endif unsigned long invalidate_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t end); @@ -2544,6 +2545,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence); extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence); +extern void unmap_dax_inodes(struct super_block *sb); extern int inode_init_always(struct super_block *, struct inode *); extern void inode_init_once(struct inode *); extern void address_space_init_once(struct address_space *mapping);