From patchwork Fri Apr 21 03:44:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 9691705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 19C1E6038D for ; Fri, 21 Apr 2017 03:45:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0ECCB2839B for ; Fri, 21 Apr 2017 03:45:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0326C28481; Fri, 21 Apr 2017 03:45:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A770D2839B for ; Fri, 21 Apr 2017 03:45:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1034971AbdDUDou (ORCPT ); Thu, 20 Apr 2017 23:44:50 -0400 Received: from mga09.intel.com ([134.134.136.24]:64144 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1034939AbdDUDos (ORCPT ); Thu, 20 Apr 2017 23:44:48 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Apr 2017 20:44:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,228,1488873600"; d="scan'208";a="90477479" Received: from theros.lm.intel.com ([10.232.112.77]) by orsmga005.jf.intel.com with ESMTP; 20 Apr 2017 20:44:43 -0700 From: Ross Zwisler To: Andrew Morton , linux-kernel@vger.kernel.org Cc: Ross Zwisler , Alexander Viro , Alexey Kuznetsov , Andrey Ryabinin , Anna Schumaker , Christoph Hellwig , Dan Williams , "Darrick J. Wong" , Eric Van Hensbergen , Jan Kara , Jens Axboe , Johannes Weiner , Konrad Rzeszutek Wilk , Latchesar Ionkov , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-nvdimm@lists.01.org, Matthew Wilcox , Ron Minnich , samba-technical@lists.samba.org, Steve French , Trond Myklebust , v9fs-developer@lists.sourceforge.net Subject: [PATCH 2/2] dax: fix data corruption due to stale mmap reads Date: Thu, 20 Apr 2017 21:44:37 -0600 Message-Id: <20170421034437.4359-2-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170421034437.4359-1-ross.zwisler@linux.intel.com> References: <20170420191446.GA21694@linux.intel.com> <20170421034437.4359-1-ross.zwisler@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Users of DAX can suffer data corruption from stale mmap reads via the following sequence: - open an mmap over a 2MiB hole - read from a 2MiB hole, faulting in a 2MiB zero page - write to the hole with write(3p). The write succeeds but we incorrectly leave the 2MiB zero page mapping intact. - via the mmap, read the data that was just written. Since the zero page mapping is still intact we read back zeroes instead of the new data. We fix this by unconditionally calling invalidate_inode_pages2_range() in dax_iomap_actor() for new block allocations, and by enhancing __dax_invalidate_mapping_entry() so that it properly unmaps the DAX entry being removed from the radix tree. This is based on an initial patch from Jan Kara. Signed-off-by: Ross Zwisler Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Reported-by: Jan Kara Cc: [4.10+] --- fs/dax.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 166504c..3f445d5 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -468,23 +468,35 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping, pgoff_t index, bool trunc) { int ret = 0; - void *entry; + void *entry, **slot; struct radix_tree_root *page_tree = &mapping->page_tree; spin_lock_irq(&mapping->tree_lock); - entry = get_unlocked_mapping_entry(mapping, index, NULL); + entry = get_unlocked_mapping_entry(mapping, index, &slot); if (!entry || !radix_tree_exceptional_entry(entry)) goto out; if (!trunc && (radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_DIRTY) || radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))) goto out; + + /* + * Make sure 'entry' remains valid while we drop mapping->tree_lock to + * do the unmap_mapping_range() call. + */ + entry = lock_slot(mapping, slot); + spin_unlock_irq(&mapping->tree_lock); + + unmap_mapping_range(mapping, (loff_t)index << PAGE_SHIFT, + (loff_t)PAGE_SIZE << dax_radix_order(entry), 0); + + spin_lock_irq(&mapping->tree_lock); radix_tree_delete(page_tree, index); mapping->nrexceptional--; ret = 1; out: - put_unlocked_mapping_entry(mapping, index, entry); spin_unlock_irq(&mapping->tree_lock); + dax_wake_mapping_entry_waiter(mapping, index, entry, true); return ret; } /* @@ -999,11 +1011,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return -EIO; /* - * Write can allocate block for an area which has a hole page mapped - * into page tables. We have to tear down these mappings so that data - * written by write(2) is visible in mmap. + * Write can allocate block for an area which has a hole page or zero + * PMD entry in the radix tree. We have to tear down these mappings so + * that data written by write(2) is visible in mmap. */ - if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { + if (iomap->flags & IOMAP_F_NEW) { invalidate_inode_pages2_range(inode->i_mapping, pos >> PAGE_SHIFT, (end - 1) >> PAGE_SHIFT);