From patchwork Thu May 4 19:59:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 9712609 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4689060235 for ; Thu, 4 May 2017 19:59:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 326D5286A8 for ; Thu, 4 May 2017 19:59:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 266DF285F5; Thu, 4 May 2017 19:59:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A3EEB285F5 for ; Thu, 4 May 2017 19:59:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756976AbdEDT7b (ORCPT ); Thu, 4 May 2017 15:59:31 -0400 Received: from mga03.intel.com ([134.134.136.65]:57410 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751894AbdEDT7a (ORCPT ); Thu, 4 May 2017 15:59:30 -0400 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 May 2017 12:59:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,289,1491289200"; d="scan'208";a="82651606" Received: from theros.lm.intel.com ([10.232.112.77]) by orsmga002.jf.intel.com with ESMTP; 04 May 2017 12:59:19 -0700 From: Ross Zwisler To: Andrew Morton , linux-kernel@vger.kernel.org Cc: Ross Zwisler , Alexander Viro , Alexey Kuznetsov , Andrey Ryabinin , Anna Schumaker , Christoph Hellwig , Dan Williams , "Darrick J. Wong" , Eric Van Hensbergen , Jan Kara , Jens Axboe , Johannes Weiner , Konrad Rzeszutek Wilk , Latchesar Ionkov , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-nvdimm@lists.01.org, Matthew Wilcox , Ron Minnich , samba-technical@lists.samba.org, Steve French , Trond Myklebust , v9fs-developer@lists.sourceforge.net Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads Date: Thu, 4 May 2017 13:59:10 -0600 Message-Id: <20170504195910.11579-2-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170504195910.11579-1-ross.zwisler@linux.intel.com> References: <20170504195910.11579-1-ross.zwisler@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Users of DAX can suffer data corruption from stale mmap reads via the following sequence: - open an mmap over a 2MiB hole - read from a 2MiB hole, faulting in a 2MiB zero page - write to the hole with write(3p). The write succeeds but we incorrectly leave the 2MiB zero page mapping intact. - via the mmap, read the data that was just written. Since the zero page mapping is still intact we read back zeroes instead of the new data. We fix this by unconditionally calling invalidate_inode_pages2_range() in dax_iomap_actor() for new block allocations, and by enhancing invalidate_inode_pages2_range() so that it properly unmaps the DAX entries being removed from the radix tree. This is based on an initial patch from Jan Kara. I've written an fstest that triggers this error: http://www.spinics.net/lists/linux-mm/msg126276.html Signed-off-by: Ross Zwisler Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Reported-by: Jan Kara Cc: [4.10+] --- Changes since v1: - Instead of unmapping each DAX entry individually in __dax_invalidate_mapping_entry(), instead unmap the whole range at once inside of invalidate_inode_pages2_range(). Each unmap requires an rmap walk so this should be less expensive, plus now we don't have to drop and re-acquire the mapping->tree_lock for each entry. (Jan) These patches apply cleanly to v4.11 and have passed an xfstest run. They also apply to v4.10.13 with a little help from git am's 3-way merger. --- fs/dax.c | 8 ++++---- mm/truncate.c | 10 ++++++++++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 166504c..1f2c880 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return -EIO; /* - * Write can allocate block for an area which has a hole page mapped - * into page tables. We have to tear down these mappings so that data - * written by write(2) is visible in mmap. + * Write can allocate block for an area which has a hole page or zero + * PMD entry in the radix tree. We have to tear down these mappings so + * that data written by write(2) is visible in mmap. */ - if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { + if (iomap->flags & IOMAP_F_NEW) { invalidate_inode_pages2_range(inode->i_mapping, pos >> PAGE_SHIFT, (end - 1) >> PAGE_SHIFT); diff --git a/mm/truncate.c b/mm/truncate.c index c537184..ad40316 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping, cond_resched(); index++; } + + /* + * Ensure that any DAX exceptional entries that have been invalidated + * are also unmapped. + */ + if (dax_mapping(mapping)) { + unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT, + (loff_t)(1 + end - start) << PAGE_SHIFT, 0); + } + cleancache_invalidate_inode(mapping); return ret; }