From patchwork Thu Feb 27 05:24:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EED55924 for ; Thu, 27 Feb 2020 05:33:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CE1E62467B for ; Thu, 27 Feb 2020 05:33:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726785AbgB0Fck (ORCPT ); Thu, 27 Feb 2020 00:32:40 -0500 Received: from mga12.intel.com ([192.55.52.136]:15192 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725769AbgB0Fcj (ORCPT ); Thu, 27 Feb 2020 00:32:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:38 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="226996288" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:37 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 01/12] fs/xfs: Remove unnecessary initialization of i_rwsem Date: Wed, 26 Feb 2020 21:24:31 -0800 Message-Id: <20200227052442.22524-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_reinit_inode() -> inode_init_always() already handles calling init_rwsem(i_rwsem). Doing so again is unneeded. Signed-off-by: Ira Weiny --- New for V4: NOTE: This was found while ensuring the new i_aops_sem was properly handled. It seems like this is a layering violation so I think it is worth cleaning up so as to not confuse others. --- fs/xfs/xfs_icache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 8dc2e5414276..836a1f09be03 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -419,6 +419,7 @@ xfs_iget_cache_hit( spin_unlock(&ip->i_flags_lock); rcu_read_unlock(); + ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); if (error) { bool wake; @@ -452,9 +453,6 @@ xfs_iget_cache_hit( ip->i_sick = 0; ip->i_checked = 0; - ASSERT(!rwsem_is_locked(&inode->i_rwsem)); - init_rwsem(&inode->i_rwsem); - spin_unlock(&ip->i_flags_lock); spin_unlock(&pag->pag_ici_lock); } else { From patchwork Thu Feb 27 05:24:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28BA414B4 for ; Thu, 27 Feb 2020 05:33:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 117602467B for ; Thu, 27 Feb 2020 05:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728525AbgB0Fdj (ORCPT ); Thu, 27 Feb 2020 00:33:39 -0500 Received: from mga09.intel.com ([134.134.136.24]:35462 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726460AbgB0Fck (ORCPT ); Thu, 27 Feb 2020 00:32:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:39 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="241932087" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:38 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Dave Chinner , Jan Kara , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 02/12] fs: Remove unneeded IS_DAX() check Date: Wed, 26 Feb 2020 21:24:32 -0800 Message-Id: <20200227052442.22524-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Remove the check because DAX now has it's own read/write methods and file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on their own. Therefore, it does not matter if the file state is DAX when the iocb flags are created. Reviewed-by: Dave Chinner Reviewed-by: Jan Kara Signed-off-by: Ira Weiny --- Changes from v3: Reword commit message. Reordered to be a 'pre-cleanup' patch --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3cd4fe6b845e..63d1e533a07d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3388,7 +3388,7 @@ extern int file_update_time(struct file *file); static inline bool io_is_direct(struct file *filp) { - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); + return (filp->f_flags & O_DIRECT); } static inline bool vma_is_dax(struct vm_area_struct *vma) From patchwork Thu Feb 27 05:24:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BE9F14B4 for ; Thu, 27 Feb 2020 05:33:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 554202467B for ; Thu, 27 Feb 2020 05:33:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727349AbgB0Fcl (ORCPT ); Thu, 27 Feb 2020 00:32:41 -0500 Received: from mga02.intel.com ([134.134.136.20]:16215 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725769AbgB0Fck (ORCPT ); Thu, 27 Feb 2020 00:32:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:40 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="410887742" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:39 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Jan Kara , "Darrick J . Wong" , Alexander Viro , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 03/12] fs/stat: Define DAX statx attribute Date: Wed, 26 Feb 2020 21:24:33 -0800 Message-Id: <20200227052442.22524-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny In order for users to determine if a file is currently operating in DAX state (effective DAX). Define a statx attribute value and set that attribute if the effective DAX flag is set. To go along with this we propose the following addition to the statx man page: STATX_ATTR_DAX The file is in the DAX (cpu direct access) state. DAX state attempts to minimize software cache effects for both I/O and memory mappings of this file. It requires a file system which has been configured to support DAX. DAX generally assumes all accesses are via cpu load / store instructions which can minimize overhead for small accesses, but may adversely affect cpu utilization for large transfers. File I/O is done directly to/from user-space buffers and memory mapped I/O may be performed with direct memory mappings that bypass kernel page cache. While the DAX property tends to result in data being transferred synchronously, it does not give the same guarantees of O_SYNC where data and the necessary metadata are transferred together. A DAX file may support being mapped with the MAP_SYNC flag, which enables a program to use CPU cache flush instructions to persist CPU store operations without an explicit fsync(2). See mmap(2) for more information. Reviewed-by: Jan Kara Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from V2: Update man page text with comments from Darrick, Jan, Dan, and Dave. --- fs/stat.c | 3 +++ include/uapi/linux/stat.h | 1 + 2 files changed, 4 insertions(+) diff --git a/fs/stat.c b/fs/stat.c index 030008796479..894699c74dde 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + if (IS_DAX(inode)) + stat->attributes |= STATX_ATTR_DAX; + if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, query_flags); diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index ad80a5c885d5..e5f9d5517f6b 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -169,6 +169,7 @@ struct statx { #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ #endif /* _UAPI_LINUX_STAT_H */ From patchwork Thu Feb 27 05:24:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4AB85924 for ; Thu, 27 Feb 2020 05:33:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3387B2467B for ; Thu, 27 Feb 2020 05:33:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727492AbgB0Fcm (ORCPT ); Thu, 27 Feb 2020 00:32:42 -0500 Received: from mga18.intel.com ([134.134.136.126]:43439 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727399AbgB0Fcm (ORCPT ); Thu, 27 Feb 2020 00:32:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:41 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="385047236" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:40 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 04/12] fs/xfs: Isolate the physical DAX flag from enabled Date: Wed, 26 Feb 2020 21:24:34 -0800 Message-Id: <20200227052442.22524-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_ioctl_setattr_dax_invalidate() currently checks if the DAX flag is changing as a quick check. But the implementation mixes the physical (XFS_DIFLAG2_DAX) and the enabled (S_DAX) DAX flags. Remove the use of the enabled flag when determining if a change of the physical flag is required. Furthermore, we want the physical flag, XFS_DIFLAG2_DAX, to be changed regardless of if the underlying storage can support DAX or not. The enabled flag, IS_DAX(), will be set later IFF the inode supports dax in a follow on patch. Signed-off-by: Ira Weiny --- Changes from V3: Remove the underlying storage support check Rework commit message Reorder patch --- fs/xfs/xfs_ioctl.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d42de92cb283..25e12ce85075 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1190,28 +1190,16 @@ xfs_ioctl_setattr_dax_invalidate( int *join_flags) { struct inode *inode = VFS_I(ip); - struct super_block *sb = inode->i_sb; int error; *join_flags = 0; - /* - * It is only valid to set the DAX flag on regular files and - * directories on filesystems where the block size is equal to the page - * size. On directories it serves as an inherited hint so we don't - * have to check the device for dax support or flush pagecache. - */ - if (fa->fsx_xflags & FS_XFLAG_DAX) { - struct xfs_buftarg *target = xfs_inode_buftarg(ip); - - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize)) - return -EINVAL; - } - /* If the DAX state is not changing, we have nothing to do here. */ - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode)) + if ((fa->fsx_xflags & FS_XFLAG_DAX) && + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) return 0; - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode)) + if (!(fa->fsx_xflags & FS_XFLAG_DAX) && + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) return 0; if (S_ISDIR(inode->i_mode)) From patchwork Thu Feb 27 05:24:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 019C814B4 for ; Thu, 27 Feb 2020 05:33:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF2062467B for ; Thu, 27 Feb 2020 05:33:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728396AbgB0Fco (ORCPT ); Thu, 27 Feb 2020 00:32:44 -0500 Received: from mga11.intel.com ([192.55.52.93]:58414 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727399AbgB0Fcn (ORCPT ); Thu, 27 Feb 2020 00:32:43 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:42 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="238289232" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:42 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 05/12] fs/xfs: Create function xfs_inode_enable_dax() Date: Wed, 26 Feb 2020 21:24:35 -0800 Message-Id: <20200227052442.22524-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_inode_supports_dax() should reflect if the inode can support DAX not that it is enabled for DAX. Change the use of xfs_inode_supports_dax() to reflect only if the inode and underlying storage support dax. Add a new function xfs_inode_enable_dax() which reflects if the inode should be enabled for DAX. Signed-off-by: Ira Weiny --- Changes from v3: Update functions and names to be more clear Update commit message Merge with 'fs/xfs: Clean up DAX support check' don't allow IS_DAX() on a directory use STATIC macro for static make xfs_inode_supports_dax() static --- fs/xfs/xfs_iops.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 81f2f93caec0..ff711efc5247 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1237,19 +1237,18 @@ static const struct inode_operations xfs_inline_symlink_inode_operations = { }; /* Figure out if this file actually supports DAX. */ -static bool +STATIC bool xfs_inode_supports_dax( struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; /* Only supported on non-reflinked files. */ - if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip)) + if (xfs_is_reflink_inode(ip)) return false; - /* DAX mount option or DAX iflag must be set. */ - if (!(mp->m_flags & XFS_MOUNT_DAX) && - !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) + /* Only supported on regular files. */ + if (!S_ISREG(VFS_I(ip)->i_mode)) return false; /* Block size must match page size */ @@ -1260,6 +1259,20 @@ xfs_inode_supports_dax( return xfs_inode_buftarg(ip)->bt_daxdev != NULL; } +STATIC bool +xfs_inode_enable_dax( + struct xfs_inode *ip) +{ + if (!xfs_inode_supports_dax(ip)) + return false; + + if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) + return true; + if (ip->i_mount->m_flags & XFS_MOUNT_DAX) + return true; + return false; +} + STATIC void xfs_diflags_to_iflags( struct inode *inode, @@ -1278,7 +1291,7 @@ xfs_diflags_to_iflags( inode->i_flags |= S_SYNC; if (flags & XFS_DIFLAG_NOATIME) inode->i_flags |= S_NOATIME; - if (xfs_inode_supports_dax(ip)) + if (xfs_inode_enable_dax(ip)) inode->i_flags |= S_DAX; } From patchwork Thu Feb 27 05:24:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C794014B4 for ; Thu, 27 Feb 2020 05:33:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9B9302467B for ; Thu, 27 Feb 2020 05:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728425AbgB0Fcp (ORCPT ); Thu, 27 Feb 2020 00:32:45 -0500 Received: from mga14.intel.com ([192.55.52.115]:20119 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728387AbgB0Fco (ORCPT ); Thu, 27 Feb 2020 00:32:44 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:43 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="256595149" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:42 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 06/12] fs: Add locking for a dynamic address space operations state Date: Wed, 26 Feb 2020 21:24:36 -0800 Message-Id: <20200227052442.22524-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny DAX requires special address space operations (aops). Changing DAX state therefore requires changing those aops. However, many functions require aops to remain consistent through a deep call stack. Define a vfs level inode rwsem to protect aops throughout call stacks which require them. Finally, define calls to be used in subsequent patches when aops usage needs to be quiesced by the file system. Signed-off-by: Ira Weiny --- Changes from V4: Fix 0-day build issue. Changes from V3: Convert from global to a per-inode rwsem Remove pre-optimization Remove static branch stuff Change function names from inode_dax_state_* to inode_aops_* I kept 'inode' as the synchronization is at the inode level now (probably where it belongs)... and I still prefer *_[down|up]_[read|write] as those names better reflect the use and interaction between users (readers) and writers better. 'XX_start_aop()' would have to be matched with something like 'XX_wait_for_aop_user()' and 'XX_release_aop_users()' or something which does not make sense on the 'writer' side. Changes from V2 Rebase on linux-next-08-02-2020 Fix locking order Change all references from mode to state where appropriate add CONFIG_FS_DAX requirement for state change Use a static branch to enable locking only when a dax capable device has been seen. Move the lock to a global vfs lock this does a few things 1) preps us better for ext4 support 2) removes funky callbacks from inode ops 3) remove complexity from XFS and probably from ext4 later We can do this because 1) the locking order is required to be at the highest level anyway, so why complicate xfs 2) We had to move the sem to the super_block because it is too heavy for the inode. 3) After internal discussions with Dan we decided that this would be easier, just as performant, and with slightly less overhead than in the VFS SB. We also change the functions names to up/down; read/write as appropriate. Previous names were over simplified. Update comments and documentation squash: add locking squash:... --- Documentation/filesystems/vfs.rst | 16 ++++++++ fs/attr.c | 1 + fs/inode.c | 15 ++++++-- fs/iomap/buffered-io.c | 1 + fs/open.c | 4 ++ fs/stat.c | 2 + fs/xfs/xfs_icache.c | 1 + include/linux/fs.h | 64 ++++++++++++++++++++++++++++++- mm/fadvise.c | 7 +++- mm/filemap.c | 4 ++ mm/huge_memory.c | 1 + mm/khugepaged.c | 2 + mm/util.c | 9 ++++- 13 files changed, 119 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7d4d09dd5e6d..4a10a232f8e2 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -934,6 +934,22 @@ cache in your filesystem. The following members are defined: Called during swapoff on files where swap_activate was successful. +Changing DAX 'state' dynamically +---------------------------------- + +Some file systems which support DAX want to be able to change the DAX state +dyanically. To switch the state safely we lock the inode state in all "normal" +file system operations and restrict state changes to those operations. The +specific rules are. + + 1) the direct_IO address_space_operation must be supported in all + potential a_ops vectors for any state suported by the inode. + + 3) DAX state changes shall not be allowed while the file is mmap'ed + 4) For non-mmaped operations the VFS layer must take the read lock for any + use of IS_DAX() + 5) Filesystems take the write lock when changing DAX states. + The File Object =============== diff --git a/fs/attr.c b/fs/attr.c index b4bbdbd4c8ca..9b15f73d1079 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -332,6 +332,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de if (error) return error; + /* DAX read state should already be held here */ if (inode->i_op->setattr) error = inode->i_op->setattr(dentry, attr); else diff --git a/fs/inode.c b/fs/inode.c index 7d57068b6b7a..6e4f1cc872f2 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -200,6 +200,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) #endif inode->i_flctx = NULL; this_cpu_inc(nr_inodes); + init_rwsem(&inode->i_aops_sem); return 0; out: @@ -1616,11 +1617,19 @@ EXPORT_SYMBOL(iput); */ int bmap(struct inode *inode, sector_t *block) { - if (!inode->i_mapping->a_ops->bmap) - return -EINVAL; + int ret = 0; + + inode_aops_down_read(inode); + if (!inode->i_mapping->a_ops->bmap) { + ret = -EINVAL; + goto err; + } *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); - return 0; + +err: + inode_aops_up_read(inode); + return ret; } EXPORT_SYMBOL(bmap); #endif diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7c84c4c027c4..e313a34d5fa6 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -999,6 +999,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = offset_in_page(pos); bytes = min_t(loff_t, PAGE_SIZE - offset, count); + /* DAX state read should already be held here */ if (IS_DAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else diff --git a/fs/open.c b/fs/open.c index 0788b3715731..3abf0bfac462 100644 --- a/fs/open.c +++ b/fs/open.c @@ -59,10 +59,12 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs, if (ret) newattrs.ia_valid |= ret | ATTR_FORCE; + inode_aops_down_read(dentry->d_inode); inode_lock(dentry->d_inode); /* Note any delegations or leases have already been broken: */ ret = notify_change(dentry, &newattrs, NULL); inode_unlock(dentry->d_inode); + inode_aops_up_read(dentry->d_inode); return ret; } @@ -306,7 +308,9 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EOPNOTSUPP; file_start_write(file); + inode_aops_down_read(inode); ret = file->f_op->fallocate(file, mode, offset, len); + inode_aops_up_read(inode); /* * Create inotify and fanotify events. diff --git a/fs/stat.c b/fs/stat.c index 894699c74dde..274b3ccc82b1 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,8 +79,10 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + inode_aops_down_read(inode); if (IS_DAX(inode)) stat->attributes |= STATX_ATTR_DAX; + inode_aops_up_read(inode); if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 836a1f09be03..3e83a97dc047 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -420,6 +420,7 @@ xfs_iget_cache_hit( rcu_read_unlock(); ASSERT(!rwsem_is_locked(&inode->i_rwsem)); + ASSERT(!rwsem_is_locked(&inode->i_aops_sem)); error = xfs_reinit_inode(mp, inode); if (error) { bool wake; diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d1e533a07d..22cc1aa980f5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -359,6 +360,11 @@ typedef struct { typedef int (*read_actor_t)(read_descriptor_t *, struct page *, unsigned long, unsigned long); +/** + * NOTE: DO NOT define new functions in address_space_operations without first + * considering how dynamic DAX states are to be supported. See the + * inode_aops_*_read() functions + */ struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); @@ -735,6 +741,7 @@ struct inode { #endif void *i_private; /* fs or device private pointer */ + struct rw_semaphore i_aops_sem; } __randomize_layout; struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode); @@ -1817,6 +1824,11 @@ struct block_device_operations; struct iov_iter; +/** + * NOTE: DO NOT define new functions in file_operations without first + * considering how dynamic address_space operations are to be supported. See + * the inode_aops_*_read() functions in this file. + */ struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); @@ -1889,16 +1901,64 @@ struct inode_operations { int (*set_acl)(struct inode *, struct posix_acl *, int); } ____cacheline_aligned; +#if defined(CONFIG_FS_DAX) +/* + * Filesystems wishing to support dynamic DAX states must do the following. + * + * 1) the direct_IO address_space_operation must be supported in all potential + * a_ops vectors for any state suported by the inode. This is because the + * direct_IO function is used as a flag long before the function is called. + + * 3) DAX state changes shall not be allowed while the file is mmap'ed + * 4) For non-mmaped operations the VFS layer must take the read lock for any + * use of IS_DAX() + * 5) Filesystems take the write lock when changing DAX states. + */ +static inline void inode_aops_down_read(struct inode *inode) +{ + down_read(&inode->i_aops_sem); +} +static inline void inode_aops_up_read(struct inode *inode) +{ + up_read(&inode->i_aops_sem); +} +static inline void inode_aops_down_write(struct inode *inode) +{ + down_write(&inode->i_aops_sem); +} +static inline void inode_aops_up_write(struct inode *inode) +{ + up_write(&inode->i_aops_sem); +} +#else /* !CONFIG_FS_DAX */ +#define inode_aops_down_read(inode) do { (void)(inode); } while (0) +#define inode_aops_up_read(inode) do { (void)(inode); } while (0) +#define inode_aops_down_write(inode) do { (void)(inode); } while (0) +#define inode_aops_up_write(inode) do { (void)(inode); } while (0) +#endif /* CONFIG_FS_DAX */ + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { - return file->f_op->read_iter(kio, iter); + struct inode *inode = file_inode(kio->ki_filp); + ssize_t ret; + + inode_aops_down_read(inode); + ret = file->f_op->read_iter(kio, iter); + inode_aops_up_read(inode); + return ret; } static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { - return file->f_op->write_iter(kio, iter); + struct inode *inode = file_inode(kio->ki_filp); + ssize_t ret; + + inode_aops_down_read(inode); + ret = file->f_op->write_iter(kio, iter); + inode_aops_up_read(inode); + return ret; } static inline int call_mmap(struct file *file, struct vm_area_struct *vma) diff --git a/mm/fadvise.c b/mm/fadvise.c index 4f17c83db575..6a30febb11e0 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -48,6 +48,8 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) bdi = inode_to_bdi(mapping->host); if (IS_DAX(inode) || (bdi == &noop_backing_dev_info)) { + int ret = 0; + switch (advice) { case POSIX_FADV_NORMAL: case POSIX_FADV_RANDOM: @@ -58,9 +60,10 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) /* no bad return value, but ignore advice */ break; default: - return -EINVAL; + ret = -EINVAL; } - return 0; + + return ret; } /* diff --git a/mm/filemap.c b/mm/filemap.c index 1784478270e1..3a7863ba51b9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2293,6 +2293,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * and return. Otherwise fallthrough to buffered io for * the rest of the read. Buffered reads will not work for * DAX files, so don't bother trying. + * + * IS_DAX is protected under ->read_iter lock */ if (retval < 0 || !count || iocb->ki_pos >= size || IS_DAX(inode)) @@ -3377,6 +3379,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * holes, for example. For DAX files, a buffered write will * not succeed (even if it did, DAX does not handle dirty * page-cache pages correctly). + * + * IS_DAX is protected under ->write_iter lock */ if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) goto out; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b08b199f9a11..3d05bd10d83e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -572,6 +572,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long ret; loff_t off = (loff_t)pgoff << PAGE_SHIFT; + /* Should not need locking here because mmap is not allowed */ if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) goto out; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b679908743cb..f048178e2b93 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1592,9 +1592,11 @@ static void collapse_file(struct mm_struct *mm, } else { /* !is_shmem */ if (!page || xa_is_value(page)) { xas_unlock_irq(&xas); + inode_aops_down_read(file->f_inode); page_cache_sync_readahead(mapping, &file->f_ra, file, index, PAGE_SIZE); + inode_aops_up_read(file->f_inode); /* drain pagevecs to help isolate_lru_page() */ lru_add_drain(); page = find_lock_page(mapping, index); diff --git a/mm/util.c b/mm/util.c index 988d11e6c17c..a4fb0670137d 100644 --- a/mm/util.c +++ b/mm/util.c @@ -501,11 +501,18 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, ret = security_mmap_file(file, prot, flag); if (!ret) { - if (down_write_killable(&mm->mmap_sem)) + if (file) + inode_aops_down_read(file_inode(file)); + if (down_write_killable(&mm->mmap_sem)) { + if (file) + inode_aops_up_read(file_inode(file)); return -EINTR; + } ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, &populate, &uf); up_write(&mm->mmap_sem); + if (file) + inode_aops_up_read(file_inode(file)); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); From patchwork Thu Feb 27 05:24:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407773 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41D80924 for ; Thu, 27 Feb 2020 05:33:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2BCDE2467D for ; Thu, 27 Feb 2020 05:33:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728438AbgB0Fcq (ORCPT ); Thu, 27 Feb 2020 00:32:46 -0500 Received: from mga18.intel.com ([134.134.136.126]:43439 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728410AbgB0Fcp (ORCPT ); Thu, 27 Feb 2020 00:32:45 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:44 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="317670179" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:44 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 07/12] fs: Prevent DAX state change if file is mmap'ed Date: Wed, 26 Feb 2020 21:24:37 -0800 Message-Id: <20200227052442.22524-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Page faults need to ensure the inode DAX configuration is correct and consistent with the vmf information at the time of the fault. There is no easy way to ensure the vmf information is correct if a DAX change is in progress. Furthermore, there is no good use case to require changing DAX configs while the file is mmap'ed. Track mmap's of the file and fail the DAX change if the file is mmap'ed. Signed-off-by: Ira Weiny --- Changes from V2: move 'i_mapped' to struct address_space and rename mmap_count Add inode_has_mappings() helper for FS's Change reference to "mode" to "state" --- Changes from V3: Fix htmldoc error from the kbuild test robot. Reported-by: kbuild test robot Rebase cleanups --- fs/inode.c | 1 + fs/xfs/xfs_ioctl.c | 9 +++++++++ include/linux/fs.h | 7 +++++++ mm/mmap.c | 19 +++++++++++++++++-- 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 6e4f1cc872f2..613a045075bd 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -372,6 +372,7 @@ static void __address_space_init_once(struct address_space *mapping) INIT_LIST_HEAD(&mapping->private_list); spin_lock_init(&mapping->private_lock); mapping->i_mmap = RB_ROOT_CACHED; + atomic64_set(&mapping->mmap_count, 0); } void address_space_init_once(struct address_space *mapping) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 25e12ce85075..498fae2ef9f6 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1207,6 +1207,15 @@ xfs_ioctl_setattr_dax_invalidate( /* lock, flush and invalidate mapping in preparation for flag change */ xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + + /* + * If there is a mapping in place we must remain in our current state. + */ + if (inode_has_mappings(inode)) { + error = -EBUSY; + goto out_unlock; + } + error = filemap_write_and_wait(inode->i_mapping); if (error) goto out_unlock; diff --git a/include/linux/fs.h b/include/linux/fs.h index 22cc1aa980f5..cf18a3c38562 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -438,6 +438,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * @nr_thps: Number of THPs in the pagecache (non-shmem only). * @i_mmap: Tree of private and shared mappings. * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable. + * @mmap_count: The number of times this AS has been mmap'ed * @nrpages: Number of page entries, protected by the i_pages lock. * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. @@ -459,6 +460,7 @@ struct address_space { #endif struct rb_root_cached i_mmap; struct rw_semaphore i_mmap_rwsem; + atomic64_t mmap_count; unsigned long nrpages; unsigned long nrexceptional; pgoff_t writeback_index; @@ -1937,6 +1939,11 @@ static inline void inode_aops_up_write(struct inode *inode) #define inode_aops_up_write(inode) do { (void)(inode); } while (0) #endif /* CONFIG_FS_DAX */ +static inline bool inode_has_mappings(struct inode *inode) +{ + return (atomic64_read(&inode->i_mapping->mmap_count) != 0); +} + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { diff --git a/mm/mmap.c b/mm/mmap.c index 7cc2562b99fd..6bb16a0996b5 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -171,12 +171,17 @@ void unlink_file_vma(struct vm_area_struct *vma) static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) { struct vm_area_struct *next = vma->vm_next; + struct file *f = vma->vm_file; might_sleep(); if (vma->vm_ops && vma->vm_ops->close) vma->vm_ops->close(vma); - if (vma->vm_file) - fput(vma->vm_file); + if (f) { + struct inode *inode = file_inode(f); + if (inode) + atomic64_dec(&inode->i_mapping->mmap_count); + fput(f); + } mpol_put(vma_policy(vma)); vm_area_free(vma); return next; @@ -1830,6 +1835,16 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_page_prot(vma); + /* + * Track if there is mapping in place such that a state change + * does not occur on a file which is mapped + */ + if (file) { + struct inode *inode = file_inode(file); + + atomic64_inc(&inode->i_mapping->mmap_count); + } + return addr; unmap_and_free_vma: From patchwork Thu Feb 27 05:24:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3EEE924 for ; Thu, 27 Feb 2020 05:33:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8DA602467D for ; Thu, 27 Feb 2020 05:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728592AbgB0FdY (ORCPT ); Thu, 27 Feb 2020 00:33:24 -0500 Received: from mga05.intel.com ([192.55.52.43]:37515 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728416AbgB0Fcp (ORCPT ); Thu, 27 Feb 2020 00:32:45 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="232052990" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:44 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 08/12] fs/xfs: Hold off aops users while changing DAX state Date: Wed, 26 Feb 2020 21:24:38 -0800 Message-Id: <20200227052442.22524-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny XFS requires the use of the aops of an inode to be quiesced prior to changing the aops vector to/from the DAX vector. Take the aops write lock while changing DAX state. Signed-off-by: Ira Weiny --- Changes from v4: Open code the aops write lock Obsolete: Clean up lock name in comments Obsolete: Change #define: XFS_DAX_EXCL => XFS_AOPSLOCK_EXCL Changes from v3: Change locking function names to reflect changes in previous patches. Changes from V2: Change name of patch (WAS: fs/xfs: Add lock/unlock state to xfs) Remove the xfs specific lock and move to the vfs layer. We still use XFS_LOCK_DAX_EXCL to be able to pass this flag through to the transaction code. But we no longer have a lock specific to xfs. This removes a lot of code from the XFS layer, preps us for using this in ext4, and is actually more straight forward now that all the locking requirements are better known. Fix locking order comment Rework for new 'state' names (Other comments on the previous patch are not applicable with new patch as much of the code was removed in favor of the vfs level lock) --- fs/xfs/xfs_ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 498fae2ef9f6..6c4d4ea3b6b6 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1444,7 +1444,11 @@ xfs_ioctl_setattr( * or cancel time, so need to be passed through to * xfs_ioctl_setattr_get_trans() so it can apply them to the join call * appropriately. + * + * Further we hold off aops users until we have completed any potential + * changing of aops due to attribute changes. */ + inode_aops_down_write(VFS_I(ip)); code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags); if (code) goto error_free_dquots; @@ -1527,6 +1531,7 @@ xfs_ioctl_setattr( xfs_qm_dqrele(udqp); xfs_qm_dqrele(pdqp); + inode_aops_up_write(VFS_I(ip)); return code; error_trans_cancel: @@ -1534,6 +1539,7 @@ xfs_ioctl_setattr( error_free_dquots: xfs_qm_dqrele(udqp); xfs_qm_dqrele(pdqp); + inode_aops_up_write(VFS_I(ip)); return code; } @@ -1603,7 +1609,11 @@ xfs_ioc_setxflags( * or cancel time, so need to be passed through to * xfs_ioctl_setattr_get_trans() so it can apply them to the join call * appropriately. + * + * Further we hold off aops users until we have completed any potential + * changing of aops due to attribute changes. */ + inode_aops_down_write(VFS_I(ip)); error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags); if (error) goto out_drop_write; @@ -1630,6 +1640,7 @@ xfs_ioc_setxflags( error = xfs_trans_commit(tp); out_drop_write: mnt_drop_write_file(filp); + inode_aops_up_write(VFS_I(ip)); return error; } From patchwork Thu Feb 27 05:24:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09774924 for ; Thu, 27 Feb 2020 05:32:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D669B24679 for ; Thu, 27 Feb 2020 05:32:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728463AbgB0Fcr (ORCPT ); Thu, 27 Feb 2020 00:32:47 -0500 Received: from mga01.intel.com ([192.55.52.88]:17726 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728427AbgB0Fcq (ORCPT ); Thu, 27 Feb 2020 00:32:46 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="230676227" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 09/12] fs/xfs: Clean up locking in dax invalidate Date: Wed, 26 Feb 2020 21:24:39 -0800 Message-Id: <20200227052442.22524-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Define a variable to hold the lock flags to ensure that the correct locks are returned or released on error. Signed-off-by: Ira Weiny --- fs/xfs/xfs_ioctl.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 6c4d4ea3b6b6..40ae791cfb41 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1190,7 +1190,7 @@ xfs_ioctl_setattr_dax_invalidate( int *join_flags) { struct inode *inode = VFS_I(ip); - int error; + int error, flags; *join_flags = 0; @@ -1205,8 +1205,10 @@ xfs_ioctl_setattr_dax_invalidate( if (S_ISDIR(inode->i_mode)) return 0; + flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; + /* lock, flush and invalidate mapping in preparation for flag change */ - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_ilock(ip, flags); /* * If there is a mapping in place we must remain in our current state. @@ -1223,11 +1225,11 @@ xfs_ioctl_setattr_dax_invalidate( if (error) goto out_unlock; - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; + *join_flags = flags; return 0; out_unlock: - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_iunlock(ip, flags); return error; } From patchwork Thu Feb 27 05:24:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9A5814B4 for ; Thu, 27 Feb 2020 05:33:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 935E124683 for ; Thu, 27 Feb 2020 05:33:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728550AbgB0FdJ (ORCPT ); Thu, 27 Feb 2020 00:33:09 -0500 Received: from mga01.intel.com ([192.55.52.88]:17726 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728451AbgB0Fcr (ORCPT ); Thu, 27 Feb 2020 00:32:47 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="438703029" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 10/12] fs/xfs: Allow toggle of effective DAX flag Date: Wed, 26 Feb 2020 21:24:40 -0800 Message-Id: <20200227052442.22524-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Now that locking of the inode aops is in place we can allow a DAX state change. Use the new xfs_inode_enable_dax() to set IS_DAX() as needed and update the aops vector after the enabled state change. Signed-off-by: Ira Weiny --- Changes from v4: Adjust for open coding of the aops lock Changes from V3: Remove static branch stuff. Fix bugs found by Jeff by using xfs_inode_enable_dax() Condition xfs_ioctl_setattr_dax_invalidate() on CONFIG_FS_DAX Changes from V2: Add in lock_dax_state_static_key static branch enabling. Rebase updates --- fs/xfs/xfs_inode.h | 2 ++ fs/xfs/xfs_ioctl.c | 20 +++++++++++++++++--- fs/xfs/xfs_iops.c | 17 ++++++++++++----- 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 492e53992fa9..7d4968915dec 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -466,6 +466,8 @@ int xfs_break_layouts(struct inode *inode, uint *iolock, /* from xfs_iops.c */ extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); +extern void xfs_setup_a_ops(struct xfs_inode *ip); +extern bool xfs_inode_enable_dax(struct xfs_inode *ip); /* * When setting up a newly allocated inode, we need to call diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 40ae791cfb41..38e91de44a6f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1123,12 +1123,11 @@ xfs_diflags_to_linux( inode->i_flags |= S_NOATIME; else inode->i_flags &= ~S_NOATIME; -#if 0 /* disabled until the flag switching races are sorted out */ - if (xflags & FS_XFLAG_DAX) + + if (xfs_inode_enable_dax(ip)) inode->i_flags |= S_DAX; else inode->i_flags &= ~S_DAX; -#endif } static int @@ -1183,6 +1182,7 @@ xfs_ioctl_setattr_xflags( * so that the cache invalidation is atomic with respect to the DAX flag * manipulation. */ +#if defined(CONFIG_FS_DAX) static int xfs_ioctl_setattr_dax_invalidate( struct xfs_inode *ip, @@ -1233,6 +1233,16 @@ xfs_ioctl_setattr_dax_invalidate( return error; } +#else /* !CONFIG_FS_DAX */ +static int +xfs_ioctl_setattr_dax_invalidate( + struct xfs_inode *ip, + struct fsxattr *fa, + int *join_flags) +{ + return 0; +} +#endif /* * Set up the transaction structure for the setattr operation, checking that we @@ -1524,6 +1534,8 @@ xfs_ioctl_setattr( else ip->i_d.di_cowextsize = 0; + xfs_setup_a_ops(ip); + code = xfs_trans_commit(tp); /* @@ -1639,6 +1651,8 @@ xfs_ioc_setxflags( goto out_drop_write; } + xfs_setup_a_ops(ip); + error = xfs_trans_commit(tp); out_drop_write: mnt_drop_write_file(filp); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ff711efc5247..8f023be39705 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax( return xfs_inode_buftarg(ip)->bt_daxdev != NULL; } -STATIC bool +bool xfs_inode_enable_dax( struct xfs_inode *ip) { @@ -1355,6 +1355,16 @@ xfs_setup_inode( } } +void xfs_setup_a_ops(struct xfs_inode *ip) +{ + struct inode *inode = &ip->i_vnode; + + if (IS_DAX(inode)) + inode->i_mapping->a_ops = &xfs_dax_aops; + else + inode->i_mapping->a_ops = &xfs_address_space_operations; +} + void xfs_setup_iops( struct xfs_inode *ip) @@ -1365,10 +1375,7 @@ xfs_setup_iops( case S_IFREG: inode->i_op = &xfs_inode_operations; inode->i_fop = &xfs_file_operations; - if (IS_DAX(inode)) - inode->i_mapping->a_ops = &xfs_dax_aops; - else - inode->i_mapping->a_ops = &xfs_address_space_operations; + xfs_setup_a_ops(ip); break; case S_IFDIR: if (xfs_sb_version_hasasciici(&XFS_M(inode->i_sb)->m_sb)) From patchwork Thu Feb 27 05:24:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99C8314B4 for ; Thu, 27 Feb 2020 05:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8187F24683 for ; Thu, 27 Feb 2020 05:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728574AbgB0FdP (ORCPT ); Thu, 27 Feb 2020 00:33:15 -0500 Received: from mga09.intel.com ([134.134.136.24]:35478 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728441AbgB0Fcq (ORCPT ); Thu, 27 Feb 2020 00:32:46 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:46 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="436917067" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:45 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , kbuild test robot , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 11/12] fs/xfs: Remove xfs_diflags_to_linux() Date: Wed, 26 Feb 2020 21:24:41 -0800 Message-Id: <20200227052442.22524-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny The functionality in xfs_diflags_to_linux() is now identical to xfs_diflags_to_iflags(). Remove xfs_diflags_to_linux() and call xfs_diflags_to_iflags() directly. While we are here simplify xfs_diflags_to_iflags() to take just struct xfs_inode. And use xfs_ip2xflags() to ensure future diflags are included correctly. Signed-off-by: kbuild test robot Signed-off-by: Ira Weiny --- Changes from V4: Pick up lkp build suggestion (make xfs_inode_enable_dax() static) --- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_ioctl.c | 32 +------------------------------- fs/xfs/xfs_iops.c | 31 +++++++++++++++++++------------ 3 files changed, 21 insertions(+), 44 deletions(-) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 7d4968915dec..2c8c2804f88b 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -467,7 +467,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock, extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); extern void xfs_setup_a_ops(struct xfs_inode *ip); -extern bool xfs_inode_enable_dax(struct xfs_inode *ip); +extern void xfs_diflags_to_iflags(struct xfs_inode *ip); /* * When setting up a newly allocated inode, we need to call diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 38e91de44a6f..bd67359a3251 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1100,36 +1100,6 @@ xfs_flags2diflags2( return di_flags2; } -STATIC void -xfs_diflags_to_linux( - struct xfs_inode *ip) -{ - struct inode *inode = VFS_I(ip); - unsigned int xflags = xfs_ip2xflags(ip); - - if (xflags & FS_XFLAG_IMMUTABLE) - inode->i_flags |= S_IMMUTABLE; - else - inode->i_flags &= ~S_IMMUTABLE; - if (xflags & FS_XFLAG_APPEND) - inode->i_flags |= S_APPEND; - else - inode->i_flags &= ~S_APPEND; - if (xflags & FS_XFLAG_SYNC) - inode->i_flags |= S_SYNC; - else - inode->i_flags &= ~S_SYNC; - if (xflags & FS_XFLAG_NOATIME) - inode->i_flags |= S_NOATIME; - else - inode->i_flags &= ~S_NOATIME; - - if (xfs_inode_enable_dax(ip)) - inode->i_flags |= S_DAX; - else - inode->i_flags &= ~S_DAX; -} - static int xfs_ioctl_setattr_xflags( struct xfs_trans *tp, @@ -1167,7 +1137,7 @@ xfs_ioctl_setattr_xflags( ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags); ip->i_d.di_flags2 = di_flags2; - xfs_diflags_to_linux(ip); + xfs_diflags_to_iflags(ip); xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 8f023be39705..c63cf0b73b75 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax( return xfs_inode_buftarg(ip)->bt_daxdev != NULL; } -bool +static bool xfs_inode_enable_dax( struct xfs_inode *ip) { @@ -1273,26 +1273,33 @@ xfs_inode_enable_dax( return false; } -STATIC void +void xfs_diflags_to_iflags( - struct inode *inode, struct xfs_inode *ip) { - uint16_t flags = ip->i_d.di_flags; - - inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | - S_NOATIME | S_DAX); + struct inode *inode = VFS_I(ip); + uint16_t diflags = xfs_ip2xflags(ip); - if (flags & XFS_DIFLAG_IMMUTABLE) + if (diflags & FS_XFLAG_IMMUTABLE) inode->i_flags |= S_IMMUTABLE; - if (flags & XFS_DIFLAG_APPEND) + else + inode->i_flags &= ~S_IMMUTABLE; + if (diflags & FS_XFLAG_APPEND) inode->i_flags |= S_APPEND; - if (flags & XFS_DIFLAG_SYNC) + else + inode->i_flags &= ~S_APPEND; + if (diflags & FS_XFLAG_SYNC) inode->i_flags |= S_SYNC; - if (flags & XFS_DIFLAG_NOATIME) + else + inode->i_flags &= ~S_SYNC; + if (diflags & FS_XFLAG_NOATIME) inode->i_flags |= S_NOATIME; + else + inode->i_flags &= ~S_NOATIME; if (xfs_inode_enable_dax(ip)) inode->i_flags |= S_DAX; + else + inode->i_flags &= ~S_DAX; } /* @@ -1321,7 +1328,7 @@ xfs_setup_inode( inode->i_gid = xfs_gid_to_kgid(ip->i_d.di_gid); i_size_write(inode, ip->i_d.di_size); - xfs_diflags_to_iflags(inode, ip); + xfs_diflags_to_iflags(ip); if (S_ISDIR(inode->i_mode)) { /* From patchwork Thu Feb 27 05:24:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11407759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E6DC159A for ; Thu, 27 Feb 2020 05:33:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 77AD92467F for ; Thu, 27 Feb 2020 05:33:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728503AbgB0Fc7 (ORCPT ); Thu, 27 Feb 2020 00:32:59 -0500 Received: from mga01.intel.com ([192.55.52.88]:17728 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728453AbgB0Fcr (ORCPT ); Thu, 27 Feb 2020 00:32:47 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:46 -0800 X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="261320404" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2020 21:32:46 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH V5 12/12] Documentation/dax: Update Usage section Date: Wed, 26 Feb 2020 21:24:42 -0800 Message-Id: <20200227052442.22524-13-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200227052442.22524-1-ira.weiny@intel.com> References: <20200227052442.22524-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Update the Usage section to reflect the new individual dax selection functionality. Signed-off-by: Ira Weiny --- Documentation/filesystems/dax.txt | 84 ++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 679729442fd2..32e37c550f76 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -20,8 +20,88 @@ Usage If you have a block device which supports DAX, you can make a filesystem on it as usual. The DAX code currently only supports files with a block size equal to your kernel's PAGE_SIZE, so you may need to specify a block -size when creating the filesystem. When mounting it, use the "-o dax" -option on the command line or add 'dax' to the options in /etc/fstab. +size when creating the filesystem. + +Enabling DAX on an individual file basis (XFS) +---------------------------------------------- + +There are 2 per file dax flags. One is a physical configuration setting and +the other a currently enabled state. + +The physical configuration setting is maintained on individual file and +directory inodes. It is preserved within the file system. This 'physical' +config setting can be set using an ioctl and/or an application such as "xfs_io +-c 'chattr [-+]x'". Files and directories automatically inherit their physical +dax setting from their parent directory when created. Therefore, setting the +physical dax setting at directory creation time can be used to set a default +behavior for that sub-tree. Doing so on the root directory acts to set a +default for the entire file system. + +To clarify inheritance here are 3 examples: + +Example A: + +mkdir -p a/b/c +xfs_io 'chattr +x' a +mkdir a/b/c/d +mkdir a/e + + dax: a,e + no dax: b,c,d + +Example B: + +mkdir a +xfs_io 'chattr +x' a +mkdir -p a/b/c/d + + dax: a,b,c,d + no dax: + +Example C: + +mkdir -p a/b/c +xfs_io 'chattr +x' c +mkdir a/b/c/d + + dax: c,d + no dax: a,b + + +The current inode enabled state is set when a file inode is loaded and it is +determined that the underlying media supports dax. + +statx can be used to query the file's current enabled state. NOTE that a +directory will never be operating in a dax state. Therefore, the dax config +state must be queried to see what config state a file or sub-directory will +inherit from a directory. + +NOTE: Setting a file or directory's config state with xfs_io is possible even +if the underlying media does not support dax. + + +Enabling dax on a file system wide basis ('-o dax' mount option) +---------------------------------------------------------------- + +The physical dax configuration of all files can be overridden using a mount +option. In summary: + + (physical flag || mount option) && capable device == dax in effect + ( || <'-o dax'> ) && capable device == + +To enable the mount override, use "-o dax" on the command line or add +'dax' to the options in /etc/fstab + +Using the mount option does not change the physical configured state of +individual files. Therefore, remounting _without_ the mount option will allow +the file system to set file's enabled state directly based on their config +setting. + +NOTE: Setting a file or directory's physical config state is possible while the +file system is mounted with the dax override. However, the file's enabled +state will continue to be overridden and "dax enabled" until the mount option +is removed and a remount performed. At that point the file's physical config +state dictates the enabled state. Implementation Tips for Block Driver Writers