From patchwork Wed Mar 25 13:47:40 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boaz Harrosh X-Patchwork-Id: 6091001 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B29149F318 for ; Wed, 25 Mar 2015 13:47:46 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B350720148 for ; Wed, 25 Mar 2015 13:47:45 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B0E12202C8 for ; Wed, 25 Mar 2015 13:47:44 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 9E26F81443; Wed, 25 Mar 2015 06:47:44 -0700 (PDT) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 20FA781427 for ; Wed, 25 Mar 2015 06:47:44 -0700 (PDT) Received: by wixw10 with SMTP id w10so74960936wix.0 for ; Wed, 25 Mar 2015 06:47:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=veAFUTSvmEd+q88/SYOXhYvnUdOYAcPXmCkatvECfaE=; b=Cy7hSpxX1wBm6PBRUqoCJ8k+7ha1TSClYoyBefQMxnwCvmsx2icF58QZ34qsXpViNS uX1f6fMjgXmep8ensZEylTYJAT9AEdvlZbcHNPwx76AWGj0bQwL8Jm4tKSMJTkTf53PX P52LdaNqnVdKnThDCzi/1ULx9WTB4wZH3V3hN8T5W+SadNNdR+jJ8/nu3IknylSSnQkf KYVW5BFWFXvx+DL7VdjLH2WKVVE89IvZf8vMKB8CIEVZEYyBP8oG4CbZksM4qbo9mzpP OGJFwPNad9HJob73uaaxWiAx8KgLf7g8bVV5Eg5VEkSfOINwH5O/MknKzBL9vJYAXoN7 HGAA== X-Gm-Message-State: ALoCoQlaxCx2iA0NJBglFuSBZ91QLYISmJgCEnmmpNUxqQeWybTwhhs6ys0C+WiC+7zPWBxkmJGw X-Received: by 10.180.214.99 with SMTP id nz3mr11810864wic.82.1427291262693; Wed, 25 Mar 2015 06:47:42 -0700 (PDT) Received: from [10.0.0.5] ([207.232.55.62]) by mx.google.com with ESMTPSA id ev7sm3791604wjb.47.2015.03.25.06.47.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Mar 2015 06:47:42 -0700 (PDT) Message-ID: <5512BC7C.7060709@plexistor.com> Date: Wed, 25 Mar 2015 15:47:40 +0200 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Dave Chinner , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , Jan Kara , Hugh Dickins , Mel Gorman , linux-mm@kvack.org, linux-nvdimm , linux-fsdevel , Eryu Guan References: <5512B961.8070409@plexistor.com> In-Reply-To: <5512B961.8070409@plexistor.com> Subject: [Linux-nvdimm] [FIXME] NOT-GOOD: dax: dax_prepare_freeze X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is just for reference! When freezing an FS, we must write protect all IS_DAX() inodes that have an mmap mapping on an inode. Otherwise application will be able to modify previously faulted-in file pages. I'm actually doing a full unmap_mapping_range because there is no readily available "mapping_write_protect" like functionality. I do not think it is worth it to define one just for here and just for some extra read-faults after an fs_freeze. How hot-path is fs_freeze at all? FIXME: As pointed out by Dave this is completely the wrong fix because we need to first fsync all cache dirty inodes, and only for those write protect. So maybe plug this in the regular sb_sync path, checking the FREEZE flag. CC: Dave Chinner CC: Jan Kara CC: Matthew Wilcox NOT-Signed-off-by: Boaz Harrosh --- fs/dax.c | 29 +++++++++++++++++++++++++++++ fs/super.c | 3 +++ include/linux/fs.h | 5 +++++ 3 files changed, 37 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index d0bd1f4..ec99d1c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -26,6 +26,7 @@ #include #include #include +#include "internal.h" int dax_clear_blocks(struct inode *inode, sector_t block, long size) { @@ -549,3 +550,31 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block) return dax_zero_page_range(inode, from, length, get_block); } EXPORT_SYMBOL_GPL(dax_truncate_page); + +/* This is meant to be called as part of freeze_super. otherwise we might + * Need some extra locking before calling here. + */ +void dax_prepare_freeze(struct super_block *sb) +{ + struct inode *inode; + + if (!(sb->s_bdev && sb->s_bdev->bd_disk->fops->direct_access)) + return; + + spin_lock(&inode_sb_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + /* TODO: For freezing we can actually do with write-protecting + * the page. But I cannot find a ready made function that does + * that for a giving mapping (with all the proper locking). + * How performance sensitive is the all sb_freeze API? + * For now we can just unmap the all mapping, and pay extra + * on read faults. + */ + /* NOTE: Do not unmap private COW mapped pages it will not + * modify the FS. + */ + if (IS_DAX(inode)) + unmap_mapping_range(inode->i_mapping, 0, 0, 0); + } + spin_unlock(&inode_sb_list_lock); +} diff --git a/fs/super.c b/fs/super.c index 2b7dc90..9ef490c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1329,6 +1329,9 @@ int freeze_super(struct super_block *sb) /* All writers are done so after syncing there won't be dirty data */ sync_filesystem(sb); + /* Need to take care of DAX mmaped inodes */ + dax_prepare_freeze(sb); + /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; smp_wmb(); diff --git a/include/linux/fs.h b/include/linux/fs.h index 24af817..ac48ba6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2599,6 +2599,11 @@ int dax_truncate_page(struct inode *, loff_t from, get_block_t); int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t); int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *); #define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb) +#ifdef CONFIG_FS_DAX +void dax_prepare_freeze(struct super_block *sb); +#else /* !CONFIG_FS_DAX */ +static inline void dax_prepare_freeze(struct super_block *sb){} +#endif /* !CONFIG_FS_DAX */ #ifdef CONFIG_BLOCK typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,