From patchwork Fri Jan 22 21:36:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 8093241 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 9D491BEEE5 for ; Fri, 22 Jan 2016 21:36:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B9E4D205BC for ; Fri, 22 Jan 2016 21:36:25 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C541205B5 for ; Fri, 22 Jan 2016 21:36:24 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 44F1B1A248D; Fri, 22 Jan 2016 13:36:24 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by ml01.01.org (Postfix) with ESMTP id 0C62E1A248A for ; Fri, 22 Jan 2016 13:36:23 -0800 (PST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP; 22 Jan 2016 13:36:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,333,1449561600"; d="scan'208";a="732559482" Received: from rzwisler-desk1.amr.corp.intel.com (HELO tarkir.lm.intel.com) ([10.232.112.142]) by orsmga003.jf.intel.com with ESMTP; 22 Jan 2016 13:36:22 -0800 From: Ross Zwisler To: linux-kernel@vger.kernel.org Subject: [PATCH v3 3/5] dax: improve documentation for fsync/msync Date: Fri, 22 Jan 2016 14:36:11 -0700 Message-Id: <1453498573-6328-4-git-send-email-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.5.0 In-Reply-To: <1453498573-6328-1-git-send-email-ross.zwisler@linux.intel.com> References: <1453498573-6328-1-git-send-email-ross.zwisler@linux.intel.com> Cc: Andrew Morton , linux-nvdimm@lists.01.org, Dave Chinner , Alexander Viro , Jan Kara , linux-fsdevel@vger.kernel.org X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Several of the subtleties and assumptions of the DAX fsync/msync implementation are not immediately obvious, so document them with comments. Signed-off-by: Ross Zwisler Reported-by: Jan Kara Reviewed-by: Jan Kara --- fs/dax.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index d589113..ab2faa9 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -335,6 +335,7 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, int type, error = 0; void *entry; + WARN_ON_ONCE(pmd_entry && !dirty); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); spin_lock_irq(&mapping->tree_lock); @@ -350,6 +351,13 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, if (!pmd_entry || type == RADIX_DAX_PMD) goto dirty; + + /* + * We only insert dirty PMD entries into the radix tree. This + * means we don't need to worry about removing a dirty PTE + * entry and inserting a clean PMD entry, thus reducing the + * range we would flush with a follow-up fsync/msync call. + */ radix_tree_delete(&mapping->page_tree, index); mapping->nrexceptional--; } @@ -912,6 +920,21 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, } dax_unmap_atomic(bdev, &dax); + /* + * For PTE faults we insert a radix tree entry for reads, and + * leave it clean. Then on the first write we dirty the radix + * tree entry via the dax_pfn_mkwrite() path. This sequence + * allows the dax_pfn_mkwrite() call to be simpler and avoid a + * call into get_block() to translate the pgoff to a sector in + * order to be able to create a new radix tree entry. + * + * The PMD path doesn't have an equivalent to + * dax_pfn_mkwrite(), though, so for a read followed by a + * write we traverse all the way through __dax_pmd_fault() + * twice. This means we can just skip inserting a radix tree + * entry completely on the initial read and just wait until + * the write to insert a dirty entry. + */ if (write) { error = dax_radix_entry(mapping, pgoff, dax.sector, true, true); @@ -985,6 +1008,14 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { struct file *file = vma->vm_file; + /* + * We pass NO_SECTOR to dax_radix_entry() because we expect that a + * RADIX_DAX_PTE entry already exists in the radix tree from a + * previous call to __dax_fault(). We just want to look up that PTE + * entry using vmf->pgoff and make sure the dirty tag is set. This + * saves us from having to make a call to get_block() here to look + * up the sector. + */ dax_radix_entry(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true); return VM_FAULT_NOPAGE; }