From patchwork Sat Feb 8 19:34:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C59E014E3 for ; Sat, 8 Feb 2020 19:35:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AE87C2464B for ; Sat, 8 Feb 2020 19:35:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727532AbgBHTet (ORCPT ); Sat, 8 Feb 2020 14:34:49 -0500 Received: from mga14.intel.com ([192.55.52.115]:38679 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727478AbgBHTet (ORCPT ); Sat, 8 Feb 2020 14:34:49 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:47 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="250782575" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:48 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Jan Kara , "Darrick J . Wong" , Alexander Viro , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 01/12] fs/stat: Define DAX statx attribute Date: Sat, 8 Feb 2020 11:34:34 -0800 Message-Id: <20200208193445.27421-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny In order for users to determine if a file is currently operating in DAX state (effective DAX). Define a statx attribute value and set that attribute if the effective DAX flag is set. To go along with this we propose the following addition to the statx man page: STATX_ATTR_DAX The file is in the DAX (cpu direct access) state. DAX state attempts to minimize software cache effects for both I/O and memory mappings of this file. It requires a file system which has been configured to support DAX. DAX generally assumes all accesses are via cpu load / store instructions which can minimize overhead for small accesses, but may adversely affect cpu utilization for large transfers. File I/O is done directly to/from user-space buffers and memory mapped I/O may be performed with direct memory mappings that bypass kernel page cache. While the DAX property tends to result in data being transferred synchronously, it does not give the same guarantees of O_SYNC where data and the necessary metadata are transferred together. A DAX file may support being mapped with the MAP_SYNC flag, which enables a program to use CPU cache flush instructions to persist CPU store operations without an explicit fsync(2). See mmap(2) for more information. Reviewed-by: Jan Kara Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from V2: Update man page text with comments from Darrick, Jan, Dan, and Dave. fs/stat.c | 3 +++ include/uapi/linux/stat.h | 1 + 2 files changed, 4 insertions(+) diff --git a/fs/stat.c b/fs/stat.c index 030008796479..894699c74dde 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + if (IS_DAX(inode)) + stat->attributes |= STATX_ATTR_DAX; + if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, query_flags); diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index ad80a5c885d5..e5f9d5517f6b 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -169,6 +169,7 @@ struct statx { #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ #endif /* _UAPI_LINUX_STAT_H */ From patchwork Sat Feb 8 19:34:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AF388139A for ; Sat, 8 Feb 2020 19:35:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 984F624125 for ; Sat, 8 Feb 2020 19:35:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727840AbgBHTfl (ORCPT ); Sat, 8 Feb 2020 14:35:41 -0500 Received: from mga01.intel.com ([192.55.52.88]:59763 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727560AbgBHTeu (ORCPT ); Sat, 8 Feb 2020 14:34:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:49 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="312366777" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:49 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 02/12] fs/xfs: Isolate the physical DAX flag from effective Date: Sat, 8 Feb 2020 11:34:35 -0800 Message-Id: <20200208193445.27421-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_ioctl_setattr_dax_invalidate() currently checks if the DAX flag is changing as a quick check. But the implementation mixes the physical (XFS_DIFLAG2_DAX) and effective (S_DAX) DAX flags. Remove the use of the effective flag when determining if a change of the physical flag is required. Signed-off-by: Ira Weiny --- fs/xfs/xfs_ioctl.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d42de92cb283..1a57be696810 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1209,9 +1209,11 @@ xfs_ioctl_setattr_dax_invalidate( } /* If the DAX state is not changing, we have nothing to do here. */ - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode)) + if ((fa->fsx_xflags & FS_XFLAG_DAX) && + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) return 0; - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode)) + if (!(fa->fsx_xflags & FS_XFLAG_DAX) && + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) return 0; if (S_ISDIR(inode->i_mode)) From patchwork Sat Feb 8 19:34:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6788D14E3 for ; Sat, 8 Feb 2020 19:35:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5128624125 for ; Sat, 8 Feb 2020 19:35:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727573AbgBHTeu (ORCPT ); Sat, 8 Feb 2020 14:34:50 -0500 Received: from mga02.intel.com ([134.134.136.20]:29572 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727555AbgBHTeu (ORCPT ); Sat, 8 Feb 2020 14:34:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="226810223" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:49 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 03/12] fs/xfs: Separate functionality of xfs_inode_supports_dax() Date: Sat, 8 Feb 2020 11:34:36 -0800 Message-Id: <20200208193445.27421-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_inode_supports_dax() should reflect if the inode can support DAX not that it is enabled for DAX. Leave that to other helper functions. Change the caller of xfs_inode_supports_dax() to call xfs_inode_use_dax() which reflects new logic to override the effective DAX flag with either the mount option or the physical DAX flag. To make the logic clear create 2 helper functions for the mount and physical flag. Signed-off-by: Ira Weiny --- fs/xfs/xfs_iops.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 81f2f93caec0..a7db50d923d4 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1236,6 +1236,15 @@ static const struct inode_operations xfs_inline_symlink_inode_operations = { .update_time = xfs_vn_update_time, }; +static bool +xfs_inode_mount_is_dax( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + + return (mp->m_flags & XFS_MOUNT_DAX) == XFS_MOUNT_DAX; +} + /* Figure out if this file actually supports DAX. */ static bool xfs_inode_supports_dax( @@ -1247,11 +1256,6 @@ xfs_inode_supports_dax( if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip)) return false; - /* DAX mount option or DAX iflag must be set. */ - if (!(mp->m_flags & XFS_MOUNT_DAX) && - !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) - return false; - /* Block size must match page size */ if (mp->m_sb.sb_blocksize != PAGE_SIZE) return false; @@ -1260,6 +1264,22 @@ xfs_inode_supports_dax( return xfs_inode_buftarg(ip)->bt_daxdev != NULL; } +static bool +xfs_inode_is_dax( + struct xfs_inode *ip) +{ + return (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) == XFS_DIFLAG2_DAX; +} + +static bool +xfs_inode_use_dax( + struct xfs_inode *ip) +{ + return xfs_inode_supports_dax(ip) && + (xfs_inode_mount_is_dax(ip) || + xfs_inode_is_dax(ip)); +} + STATIC void xfs_diflags_to_iflags( struct inode *inode, @@ -1278,7 +1298,7 @@ xfs_diflags_to_iflags( inode->i_flags |= S_SYNC; if (flags & XFS_DIFLAG_NOATIME) inode->i_flags |= S_NOATIME; - if (xfs_inode_supports_dax(ip)) + if (xfs_inode_use_dax(ip)) inode->i_flags |= S_DAX; } From patchwork Sat Feb 8 19:34:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E7F214E3 for ; Sat, 8 Feb 2020 19:35:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 570CD2464B for ; Sat, 8 Feb 2020 19:35:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727598AbgBHTex (ORCPT ); Sat, 8 Feb 2020 14:34:53 -0500 Received: from mga05.intel.com ([192.55.52.43]:32901 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727563AbgBHTeu (ORCPT ); Sat, 8 Feb 2020 14:34:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="225859201" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 04/12] fs/xfs: Clean up DAX support check Date: Sat, 8 Feb 2020 11:34:37 -0800 Message-Id: <20200208193445.27421-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Rather than open coding xfs_inode_supports_dax() in xfs_ioctl_setattr_dax_invalidate() export xfs_inode_supports_dax() and call it in preparation for swapping dax flags. This also means updating xfs_inode_supports_dax() to return true for a directory. Signed-off-by: Ira Weiny --- fs/xfs/xfs_ioctl.c | 16 +++------------- fs/xfs/xfs_iops.c | 8 ++++++-- fs/xfs/xfs_iops.h | 2 ++ 3 files changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 1a57be696810..da1eb2bdb386 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1190,23 +1190,13 @@ xfs_ioctl_setattr_dax_invalidate( int *join_flags) { struct inode *inode = VFS_I(ip); - struct super_block *sb = inode->i_sb; int error; *join_flags = 0; - /* - * It is only valid to set the DAX flag on regular files and - * directories on filesystems where the block size is equal to the page - * size. On directories it serves as an inherited hint so we don't - * have to check the device for dax support or flush pagecache. - */ - if (fa->fsx_xflags & FS_XFLAG_DAX) { - struct xfs_buftarg *target = xfs_inode_buftarg(ip); - - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize)) - return -EINVAL; - } + if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX && + !xfs_inode_supports_dax(ip)) + return -EINVAL; /* If the DAX state is not changing, we have nothing to do here. */ if ((fa->fsx_xflags & FS_XFLAG_DAX) && diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a7db50d923d4..eebec159d873 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1246,14 +1246,18 @@ xfs_inode_mount_is_dax( } /* Figure out if this file actually supports DAX. */ -static bool +bool xfs_inode_supports_dax( struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; /* Only supported on non-reflinked files. */ - if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip)) + if (xfs_is_reflink_inode(ip)) + return false; + + /* Only supported on regular files and directories. */ + if (!(S_ISREG(VFS_I(ip)->i_mode) || S_ISDIR(VFS_I(ip)->i_mode))) return false; /* Block size must match page size */ diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h index 4d24ff309f59..f24fec8de1d6 100644 --- a/fs/xfs/xfs_iops.h +++ b/fs/xfs/xfs_iops.h @@ -24,4 +24,6 @@ extern int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap, extern int xfs_vn_setattr_nonsize(struct dentry *dentry, struct iattr *vap); extern int xfs_vn_setattr_size(struct dentry *dentry, struct iattr *vap); +extern bool xfs_inode_supports_dax(struct xfs_inode *ip); + #endif /* __XFS_IOPS_H__ */ From patchwork Sat Feb 8 19:34:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43D2F139A for ; Sat, 8 Feb 2020 19:35:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2207621741 for ; Sat, 8 Feb 2020 19:35:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727756AbgBHTfP (ORCPT ); Sat, 8 Feb 2020 14:35:15 -0500 Received: from mga07.intel.com ([134.134.136.100]:41243 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727555AbgBHTev (ORCPT ); Sat, 8 Feb 2020 14:34:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="432925594" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Jan Kara , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 05/12] fs: remove unneeded IS_DAX() check Date: Sat, 8 Feb 2020 11:34:38 -0800 Message-Id: <20200208193445.27421-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny The IS_DAX() check in io_is_direct() causes a race between changing the DAX state and creating the iocb flags. Remove the check because DAX now emulates the page cache API and therefore it does not matter if the file state is DAX or not when the iocb flags are created. Reviewed-by: Jan Kara Signed-off-by: Ira Weiny --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3cd4fe6b845e..63d1e533a07d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3388,7 +3388,7 @@ extern int file_update_time(struct file *file); static inline bool io_is_direct(struct file *filp) { - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); + return (filp->f_flags & O_DIRECT); } static inline bool vma_is_dax(struct vm_area_struct *vma) From patchwork Sat Feb 8 19:34:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 654E714E3 for ; Sat, 8 Feb 2020 19:35:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4D7492253D for ; Sat, 8 Feb 2020 19:35:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727767AbgBHTfQ (ORCPT ); Sat, 8 Feb 2020 14:35:16 -0500 Received: from mga11.intel.com ([192.55.52.93]:1353 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727586AbgBHTev (ORCPT ); Sat, 8 Feb 2020 14:34:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="431203209" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:50 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 06/12] fs/xfs: Check if the inode supports DAX under lock Date: Sat, 8 Feb 2020 11:34:39 -0800 Message-Id: <20200208193445.27421-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny One of the checks for an inode supporting DAX is if the inode is reflinked. During a non-DAX to DAX state change we could race with the file being reflinked and end up with a reflinked file being in DAX state. Prevent this race by checking for DAX support under the MMAP_LOCK. Signed-off-by: Ira Weiny --- fs/xfs/xfs_ioctl.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index da1eb2bdb386..4ff402fd6636 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1194,10 +1194,6 @@ xfs_ioctl_setattr_dax_invalidate( *join_flags = 0; - if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX && - !xfs_inode_supports_dax(ip)) - return -EINVAL; - /* If the DAX state is not changing, we have nothing to do here. */ if ((fa->fsx_xflags & FS_XFLAG_DAX) && (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) @@ -1211,6 +1207,13 @@ xfs_ioctl_setattr_dax_invalidate( /* lock, flush and invalidate mapping in preparation for flag change */ xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + + if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX && + !xfs_inode_supports_dax(ip)) { + error = -EINVAL; + goto out_unlock; + } + error = filemap_write_and_wait(inode->i_mapping); if (error) goto out_unlock; From patchwork Sat Feb 8 19:34:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2072186E for ; Sat, 8 Feb 2020 19:35:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6D99C21741 for ; Sat, 8 Feb 2020 19:35:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727720AbgBHTfP (ORCPT ); Sat, 8 Feb 2020 14:35:15 -0500 Received: from mga02.intel.com ([134.134.136.20]:29572 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727588AbgBHTew (ORCPT ); Sat, 8 Feb 2020 14:34:52 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="255763687" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 07/12] fs: Add locking for a dynamic DAX state Date: Sat, 8 Feb 2020 11:34:40 -0800 Message-Id: <20200208193445.27421-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny DAX requires special address space operations but many other functions check the IS_DAX() state. While DAX is a property of the inode we perfer a lock at the super block level because of the overhead of a rwsem within the inode. Define a vfs per superblock percpu rs semaphore to lock the DAX state while performing various VFS layer operations. Write lock calls are provided here but are used in subsequent patches by the file systems themselves. Signed-off-by: Ira Weiny --- Changes from V2 Rebase on linux-next-08-02-2020 Fix locking order Change all references from mode to state where appropriate add CONFIG_FS_DAX requirement for state change Use a static branch to enable locking only when a dax capable device has been seen. Move the lock to a global vfs lock this does a few things 1) preps us better for ext4 support 2) removes funky callbacks from inode ops 3) remove complexity from XFS and probably from ext4 later We can do this because 1) the locking order is required to be at the highest level anyway, so why complicate xfs 2) We had to move the sem to the super_block because it is too heavy for the inode. 3) After internal discussions with Dan we decided that this would be easier, just as performant, and with slightly less overhead than in the VFS SB. We also change the functions names to up/down; read/write as appropriate. Previous names were over simplified. Update comments and documentation Documentation/filesystems/vfs.rst | 17 +++++++ fs/attr.c | 1 + fs/dax.c | 3 ++ fs/inode.c | 14 ++++-- fs/iomap/buffered-io.c | 1 + fs/open.c | 4 ++ fs/stat.c | 2 + fs/super.c | 3 ++ include/linux/fs.h | 78 ++++++++++++++++++++++++++++++- mm/fadvise.c | 10 +++- mm/filemap.c | 4 ++ mm/huge_memory.c | 1 + mm/khugepaged.c | 2 + mm/madvise.c | 3 ++ mm/util.c | 9 +++- 15 files changed, 144 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7d4d09dd5e6d..cd011ceb4b72 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -934,6 +934,23 @@ cache in your filesystem. The following members are defined: Called during swapoff on files where swap_activate was successful. +Changing DAX 'state' dynamically +---------------------------------- + +Some file systems which support DAX want to be able to change the DAX state +dyanically. To switch the state safely we lock the inode state in all "normal" +file system operations and restrict state changes to those operations. The +specific rules are. + + 1) the direct_IO address_space_operation must be supported in all + potential a_ops vectors for any state suported by the inode. + 2) FS's should enable the static branch lock_dax_state_static_key when a DAX + capable device is detected. + 3) DAX state changes shall not be allowed while the file is mmap'ed + 4) For non-mmaped operations the VFS layer must take the read lock for any + use of IS_DAX() + 5) Filesystems take the write lock when changing DAX states. + The File Object =============== diff --git a/fs/attr.c b/fs/attr.c index b4bbdbd4c8ca..9b15f73d1079 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -332,6 +332,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de if (error) return error; + /* DAX read state should already be held here */ if (inode->i_op->setattr) error = inode->i_op->setattr(dentry, attr); else diff --git a/fs/dax.c b/fs/dax.c index 1f1f0201cad1..96136866f151 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -30,6 +30,9 @@ #define CREATE_TRACE_POINTS #include +DEFINE_STATIC_KEY_FALSE(lock_dax_state_static_key); +EXPORT_SYMBOL(lock_dax_state_static_key); + static inline unsigned int pe_order(enum page_entry_size pe_size) { if (pe_size == PE_SIZE_PTE) diff --git a/fs/inode.c b/fs/inode.c index 7d57068b6b7a..7d0227f9e3e8 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1616,11 +1616,19 @@ EXPORT_SYMBOL(iput); */ int bmap(struct inode *inode, sector_t *block) { - if (!inode->i_mapping->a_ops->bmap) - return -EINVAL; + int ret = 0; + + inode_dax_state_down_read(inode); + if (!inode->i_mapping->a_ops->bmap) { + ret = -EINVAL; + goto err; + } *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); - return 0; + +err: + inode_dax_state_up_read(inode); + return ret; } EXPORT_SYMBOL(bmap); #endif diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7c84c4c027c4..e313a34d5fa6 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -999,6 +999,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = offset_in_page(pos); bytes = min_t(loff_t, PAGE_SIZE - offset, count); + /* DAX state read should already be held here */ if (IS_DAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else diff --git a/fs/open.c b/fs/open.c index 0788b3715731..148980e30611 100644 --- a/fs/open.c +++ b/fs/open.c @@ -59,10 +59,12 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs, if (ret) newattrs.ia_valid |= ret | ATTR_FORCE; + inode_dax_state_down_read(dentry->d_inode); inode_lock(dentry->d_inode); /* Note any delegations or leases have already been broken: */ ret = notify_change(dentry, &newattrs, NULL); inode_unlock(dentry->d_inode); + inode_dax_state_up_read(dentry->d_inode); return ret; } @@ -306,7 +308,9 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EOPNOTSUPP; file_start_write(file); + inode_dax_state_down_read(inode); ret = file->f_op->fallocate(file, mode, offset, len); + inode_dax_state_up_read(inode); /* * Create inotify and fanotify events. diff --git a/fs/stat.c b/fs/stat.c index 894699c74dde..bf8841314c08 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,8 +79,10 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + inode_dax_state_down_read(inode); if (IS_DAX(inode)) stat->attributes |= STATX_ATTR_DAX; + inode_dax_state_up_read(inode); if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, diff --git a/fs/super.c b/fs/super.c index cd352530eca9..3e26e3a1d860 100644 --- a/fs/super.c +++ b/fs/super.c @@ -51,6 +51,9 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = { "sb_internal", }; +DEFINE_PERCPU_RWSEM(sb_dax_rwsem); +EXPORT_SYMBOL(sb_dax_rwsem); + /* * One thing we have to be careful of with a per-sb shrinker is that we don't * drop the last active reference to the superblock from within the shrinker. diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d1e533a07d..1a22cd94c4ab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -359,6 +360,11 @@ typedef struct { typedef int (*read_actor_t)(read_descriptor_t *, struct page *, unsigned long, unsigned long); +/** + * NOTE: DO NOT define new functions in address_space_operations without first + * considering how dynamic DAX states are to be supported. See the + * inode_dax_state_*_read() functions + */ struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); @@ -1817,6 +1823,11 @@ struct block_device_operations; struct iov_iter; +/** + * NOTE: DO NOT define new functions in file_operations without first + * considering how dynamic DAX states are to be supported. See the + * inode_dax_state_*_read() functions + */ struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); @@ -1889,16 +1900,79 @@ struct inode_operations { int (*set_acl)(struct inode *, struct posix_acl *, int); } ____cacheline_aligned; +#if defined(CONFIG_FS_DAX) +/* + * Filesystems wishing to support dynamic DAX states must do the following. + * + * 1) the direct_IO address_space_operation must be supported in all + * potential a_ops vectors for any state suported by the inode. + * 2) FS's should enable the static branch lock_dax_state_static_key when a DAX + * capable device is detected. + * 3) DAX state changes shall not be allowed while the file is mmap'ed + * 4) For non-mmaped operations the VFS layer must take the read lock for any + * use of IS_DAX() + * 5) Filesystems take the write lock when changing DAX states. + */ +DECLARE_STATIC_KEY_FALSE(lock_dax_state_static_key); +extern struct percpu_rw_semaphore sb_dax_rwsem; +static inline void inode_dax_state_down_read(struct inode *inode) +{ + if (!static_branch_unlikely(&lock_dax_state_static_key)) + return; + percpu_down_read(&sb_dax_rwsem); +} +static inline void inode_dax_state_up_read(struct inode *inode) +{ + if (!static_branch_unlikely(&lock_dax_state_static_key)) + return; + percpu_up_read(&sb_dax_rwsem); +} +static inline void inode_dax_state_down_write(struct inode *inode) +{ + if (!static_branch_unlikely(&lock_dax_state_static_key)) + return; + percpu_down_write(&sb_dax_rwsem); +} +static inline void inode_dax_state_up_write(struct inode *inode) +{ + if (!static_branch_unlikely(&lock_dax_state_static_key)) + return; + percpu_up_write(&sb_dax_rwsem); +} +static inline void enable_dax_state_static_branch(void) +{ + static_branch_enable(&lock_dax_state_static_key); +} +#else /* !CONFIG_FS_DAX */ +#define inode_dax_state_down_read(inode) do { (void)(inode); } while (0) +#define inode_dax_state_up_read(inode) do { (void)(inode); } while (0) +#define inode_dax_state_down_write(inode) do { (void)(inode); } while (0) +#define inode_dax_state_up_write(inode) do { (void)(inode); } while (0) +#define enable_dax_state_static_branch() +#endif /* CONFIG_FS_DAX */ + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { - return file->f_op->read_iter(kio, iter); + struct inode *inode = file_inode(kio->ki_filp); + ssize_t ret; + + inode_dax_state_down_read(inode); + ret = file->f_op->read_iter(kio, iter); + inode_dax_state_up_read(inode); + return ret; } static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { - return file->f_op->write_iter(kio, iter); + struct inode *inode = file_inode(kio->ki_filp); + ssize_t ret; + + inode_dax_state_down_read(inode); + ret = file->f_op->write_iter(kio, iter); + inode_dax_state_up_read(inode); + return ret; } static inline int call_mmap(struct file *file, struct vm_area_struct *vma) diff --git a/mm/fadvise.c b/mm/fadvise.c index 4f17c83db575..ac85eb778c74 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -47,7 +47,10 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) bdi = inode_to_bdi(mapping->host); + inode_dax_state_down_read(inode); if (IS_DAX(inode) || (bdi == &noop_backing_dev_info)) { + int ret = 0; + switch (advice) { case POSIX_FADV_NORMAL: case POSIX_FADV_RANDOM: @@ -58,10 +61,13 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) /* no bad return value, but ignore advice */ break; default: - return -EINVAL; + ret = -EINVAL; } - return 0; + + inode_dax_state_up_read(inode); + return ret; } + inode_dax_state_up_read(inode); /* * Careful about overflows. Len == 0 means "as much as possible". Use diff --git a/mm/filemap.c b/mm/filemap.c index 1784478270e1..3a7863ba51b9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2293,6 +2293,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * and return. Otherwise fallthrough to buffered io for * the rest of the read. Buffered reads will not work for * DAX files, so don't bother trying. + * + * IS_DAX is protected under ->read_iter lock */ if (retval < 0 || !count || iocb->ki_pos >= size || IS_DAX(inode)) @@ -3377,6 +3379,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * holes, for example. For DAX files, a buffered write will * not succeed (even if it did, DAX does not handle dirty * page-cache pages correctly). + * + * IS_DAX is protected under ->write_iter lock */ if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) goto out; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b08b199f9a11..3d05bd10d83e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -572,6 +572,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long ret; loff_t off = (loff_t)pgoff << PAGE_SHIFT; + /* Should not need locking here because mmap is not allowed */ if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) goto out; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b679908743cb..3bec46277886 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1592,9 +1592,11 @@ static void collapse_file(struct mm_struct *mm, } else { /* !is_shmem */ if (!page || xa_is_value(page)) { xas_unlock_irq(&xas); + inode_dax_state_down_read(file->f_inode); page_cache_sync_readahead(mapping, &file->f_ra, file, index, PAGE_SIZE); + inode_dax_state_up_read(file->f_inode); /* drain pagevecs to help isolate_lru_page() */ lru_add_drain(); page = find_lock_page(mapping, index); diff --git a/mm/madvise.c b/mm/madvise.c index 43b47d3fae02..419b7c26216b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -275,10 +275,13 @@ static long madvise_willneed(struct vm_area_struct *vma, return -EBADF; #endif + inode_dax_state_down_read(file_inode(file)); if (IS_DAX(file_inode(file))) { + inode_dax_state_up_read(file_inode(file)); /* no bad return value, but ignore advice */ return 0; } + inode_dax_state_up_read(file_inode(file)); /* * Filesystem's fadvise may need to take various locks. We need to diff --git a/mm/util.c b/mm/util.c index 988d11e6c17c..8dfb9958f2a6 100644 --- a/mm/util.c +++ b/mm/util.c @@ -501,11 +501,18 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, ret = security_mmap_file(file, prot, flag); if (!ret) { - if (down_write_killable(&mm->mmap_sem)) + if (file) + inode_dax_state_down_read(file_inode(file)); + if (down_write_killable(&mm->mmap_sem)) { + if (file) + inode_dax_state_up_read(file_inode(file)); return -EINTR; + } ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, &populate, &uf); up_write(&mm->mmap_sem); + if (file) + inode_dax_state_up_read(file_inode(file)); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); From patchwork Sat Feb 8 19:34:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94486139A for ; Sat, 8 Feb 2020 19:35:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7DB702253D for ; Sat, 8 Feb 2020 19:35:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727815AbgBHTff (ORCPT ); Sat, 8 Feb 2020 14:35:35 -0500 Received: from mga01.intel.com ([192.55.52.88]:59763 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727587AbgBHTev (ORCPT ); Sat, 8 Feb 2020 14:34:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="280279360" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 08/12] fs/xfs: Clarify lockdep dependency for xfs_isilocked() Date: Sat, 8 Feb 2020 11:34:41 -0800 Message-Id: <20200208193445.27421-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny xfs_isilocked() can't work fully without CONFIG_LOCKDEP. However, making xfs_isilocked() dependant on CONFIG_LOCKDEP is not feasible because it is used for more than the i_rwsem. Therefore a short-circuit was provided via debug_locks. However, this caused confusion while working through the xfs locking. Rather than use debug_locks as a flag specify this clearly using IS_ENABLED(CONFIG_LOCKDEP). Signed-off-by: Ira Weiny --- Changes from V2: This patch is new for V3 fs/xfs/xfs_inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index c5077e6326c7..35df324875db 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -364,7 +364,7 @@ xfs_isilocked( if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) { if (!(lock_flags & XFS_IOLOCK_SHARED)) - return !debug_locks || + return !IS_ENABLED(CONFIG_LOCKDEP) || lockdep_is_held_type(&VFS_I(ip)->i_rwsem, 0); return rwsem_is_locked(&VFS_I(ip)->i_rwsem); } From patchwork Sat Feb 8 19:34:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96EE2139A for ; Sat, 8 Feb 2020 19:35:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7FABE24125 for ; Sat, 8 Feb 2020 19:35:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727742AbgBHTfP (ORCPT ); Sat, 8 Feb 2020 14:35:15 -0500 Received: from mga12.intel.com ([192.55.52.136]:58204 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727589AbgBHTew (ORCPT ); Sat, 8 Feb 2020 14:34:52 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="265404086" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 09/12] fs/xfs: Add write DAX lock to xfs layer Date: Sat, 8 Feb 2020 11:34:42 -0800 Message-Id: <20200208193445.27421-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny XFS requires regular files to be locked for write while changing to/from DAX state. Take the DAX write lock while changing DAX state. We define a new XFS_DAX_EXCL lock type to carry the lock through to transaction completion. Signed-off-by: Ira Weiny --- Changes from V2: Change name of patch (WAS: fs/xfs: Add lock/unlock state to xfs) Remove the xfs specific lock and move to the vfs layer. We still use XFS_LOCK_DAX_EXCL to be able to pass this flag through to the transaction code. But we no longer have a lock specific to xfs. This removes a lot of code from the XFS layer, preps us for using this in ext4, and is actually more straight forward now that all the locking requirements are better known. Fix locking order comment Rework for new 'state' names (Other comments on the previous patch are not applicable with new patch as much of the code was removed in favor of the vfs level lock) fs/xfs/xfs_inode.c | 22 ++++++++++++++++++++-- fs/xfs/xfs_inode.h | 7 +++++-- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 35df324875db..0c7b1855e0c8 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -142,12 +142,12 @@ xfs_ilock_attr_map_shared( * * Basic locking order: * - * i_rwsem -> i_mmap_lock -> page_lock -> i_ilock + * s_dax_sem -> i_rwsem -> i_mmap_lock -> page_lock -> i_ilock * * mmap_sem locking order: * * i_rwsem -> page lock -> mmap_sem - * mmap_sem -> i_mmap_lock -> page_lock + * s_dax_sem -> mmap_sem -> i_mmap_lock -> page_lock * * The difference in mmap_sem locking order mean that we cannot hold the * i_mmap_lock over syscall based read(2)/write(2) based IO. These IO paths can @@ -182,6 +182,9 @@ xfs_ilock( (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0); + if (lock_flags & XFS_DAX_EXCL) + inode_dax_state_down_write(VFS_I(ip)); + if (lock_flags & XFS_IOLOCK_EXCL) { down_write_nested(&VFS_I(ip)->i_rwsem, XFS_IOLOCK_DEP(lock_flags)); @@ -224,6 +227,8 @@ xfs_ilock_nowait( * You can't set both SHARED and EXCL for the same lock, * and only XFS_IOLOCK_SHARED, XFS_IOLOCK_EXCL, XFS_ILOCK_SHARED, * and XFS_ILOCK_EXCL are valid values to set in lock_flags. + * + * XFS_DAX_* is not allowed */ ASSERT((lock_flags & (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)) != (XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)); @@ -232,6 +237,7 @@ xfs_ilock_nowait( ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) != (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0); + ASSERT((lock_flags & XFS_DAX_EXCL) == 0); if (lock_flags & XFS_IOLOCK_EXCL) { if (!down_write_trylock(&VFS_I(ip)->i_rwsem)) @@ -318,6 +324,9 @@ xfs_iunlock( else if (lock_flags & XFS_ILOCK_SHARED) mrunlock_shared(&ip->i_lock); + if (lock_flags & XFS_DAX_EXCL) + inode_dax_state_up_write(VFS_I(ip)); + trace_xfs_iunlock(ip, lock_flags, _RET_IP_); } @@ -333,6 +342,8 @@ xfs_ilock_demote( ASSERT(lock_flags & (XFS_IOLOCK_EXCL|XFS_MMAPLOCK_EXCL|XFS_ILOCK_EXCL)); ASSERT((lock_flags & ~(XFS_IOLOCK_EXCL|XFS_MMAPLOCK_EXCL|XFS_ILOCK_EXCL)) == 0); + /* XFS_DAX_* is not allowed */ + ASSERT((lock_flags & XFS_DAX_EXCL) == 0); if (lock_flags & XFS_ILOCK_EXCL) mrdemote(&ip->i_lock); @@ -465,6 +476,9 @@ xfs_lock_inodes( ASSERT(!(lock_mode & XFS_ILOCK_EXCL) || inodes <= XFS_ILOCK_MAX_SUBCLASS + 1); + /* XFS_DAX_* is not allowed */ + ASSERT((lock_mode & XFS_DAX_EXCL) == 0); + if (lock_mode & XFS_IOLOCK_EXCL) { ASSERT(!(lock_mode & (XFS_MMAPLOCK_EXCL | XFS_ILOCK_EXCL))); } else if (lock_mode & XFS_MMAPLOCK_EXCL) @@ -566,6 +580,10 @@ xfs_lock_two_inodes( ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); + /* XFS_DAX_* is not allowed */ + ASSERT((ip0_mode & XFS_DAX_EXCL) == 0); + ASSERT((ip1_mode & XFS_DAX_EXCL) == 0); + ASSERT(ip0->i_ino != ip1->i_ino); if (ip0->i_ino > ip1->i_ino) { diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 492e53992fa9..25fe20740bf7 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -278,10 +278,12 @@ static inline void xfs_ifunlock(struct xfs_inode *ip) #define XFS_ILOCK_SHARED (1<<3) #define XFS_MMAPLOCK_EXCL (1<<4) #define XFS_MMAPLOCK_SHARED (1<<5) +#define XFS_DAX_EXCL (1<<6) #define XFS_LOCK_MASK (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \ | XFS_ILOCK_EXCL | XFS_ILOCK_SHARED \ - | XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED) + | XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED \ + | XFS_DAX_EXCL) #define XFS_LOCK_FLAGS \ { XFS_IOLOCK_EXCL, "IOLOCK_EXCL" }, \ @@ -289,7 +291,8 @@ static inline void xfs_ifunlock(struct xfs_inode *ip) { XFS_ILOCK_EXCL, "ILOCK_EXCL" }, \ { XFS_ILOCK_SHARED, "ILOCK_SHARED" }, \ { XFS_MMAPLOCK_EXCL, "MMAPLOCK_EXCL" }, \ - { XFS_MMAPLOCK_SHARED, "MMAPLOCK_SHARED" } + { XFS_MMAPLOCK_SHARED, "MMAPLOCK_SHARED" }, \ + { XFS_DAX_EXCL, "DAX_EXCL" } /* From patchwork Sat Feb 8 19:34:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B16FB2D34 for ; Sat, 8 Feb 2020 19:35:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 909AA2464B for ; Sat, 8 Feb 2020 19:35:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727634AbgBHTex (ORCPT ); Sat, 8 Feb 2020 14:34:53 -0500 Received: from mga09.intel.com ([134.134.136.24]:36194 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727597AbgBHTex (ORCPT ); Sat, 8 Feb 2020 14:34:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:52 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="346755558" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:51 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 10/12] fs: Prevent DAX state change if file is mmap'ed Date: Sat, 8 Feb 2020 11:34:43 -0800 Message-Id: <20200208193445.27421-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Page faults need to ensure the inode DAX configuration is correct and consistent with the vmf information at the time of the fault. There is no easy way to ensure the vmf information is correct if a DAX change is in progress. Furthermore, there is no good use case to require changing DAX configs while the file is mmap'ed. Track mmap's of the file and fail the DAX change if the file is mmap'ed. Signed-off-by: Ira Weiny --- Changes from V2: move 'i_mapped' to struct address_space and rename mmap_count Add inode_has_mappings() helper for FS's Change reference to "mode" to "state" fs/inode.c | 1 + fs/xfs/xfs_ioctl.c | 8 ++++++++ include/linux/fs.h | 6 ++++++ mm/mmap.c | 19 +++++++++++++++++-- 4 files changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 7d0227f9e3e8..bca5c9093542 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -371,6 +371,7 @@ static void __address_space_init_once(struct address_space *mapping) INIT_LIST_HEAD(&mapping->private_list); spin_lock_init(&mapping->private_lock); mapping->i_mmap = RB_ROOT_CACHED; + atomic64_set(&mapping->mmap_count, 0); } void address_space_init_once(struct address_space *mapping) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 4ff402fd6636..faba232b1f31 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1214,6 +1214,14 @@ xfs_ioctl_setattr_dax_invalidate( goto out_unlock; } + /* + * If there is a mapping in place we must remain in our current state. + */ + if (inode_has_mappings(inode)) { + error = -EBUSY; + goto out_unlock; + } + error = filemap_write_and_wait(inode->i_mapping); if (error) goto out_unlock; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1a22cd94c4ab..3e0121626d94 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -459,6 +459,7 @@ struct address_space { #endif struct rb_root_cached i_mmap; struct rw_semaphore i_mmap_rwsem; + atomic64_t mmap_count; unsigned long nrpages; unsigned long nrexceptional; pgoff_t writeback_index; @@ -1951,6 +1952,11 @@ static inline void enable_dax_state_static_branch(void) #define enable_dax_state_static_branch() #endif /* CONFIG_FS_DAX */ +static inline bool inode_has_mappings(struct inode *inode) +{ + return (atomic64_read(&inode->i_mapping->mmap_count) != 0); +} + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { diff --git a/mm/mmap.c b/mm/mmap.c index 7cc2562b99fd..6bb16a0996b5 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -171,12 +171,17 @@ void unlink_file_vma(struct vm_area_struct *vma) static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) { struct vm_area_struct *next = vma->vm_next; + struct file *f = vma->vm_file; might_sleep(); if (vma->vm_ops && vma->vm_ops->close) vma->vm_ops->close(vma); - if (vma->vm_file) - fput(vma->vm_file); + if (f) { + struct inode *inode = file_inode(f); + if (inode) + atomic64_dec(&inode->i_mapping->mmap_count); + fput(f); + } mpol_put(vma_policy(vma)); vm_area_free(vma); return next; @@ -1830,6 +1835,16 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_page_prot(vma); + /* + * Track if there is mapping in place such that a state change + * does not occur on a file which is mapped + */ + if (file) { + struct inode *inode = file_inode(file); + + atomic64_inc(&inode->i_mapping->mmap_count); + } + return addr; unmap_and_free_vma: From patchwork Sat Feb 8 19:34:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E01D139A for ; Sat, 8 Feb 2020 19:35:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EB2DC21741 for ; Sat, 8 Feb 2020 19:35:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727655AbgBHTez (ORCPT ); Sat, 8 Feb 2020 14:34:55 -0500 Received: from mga18.intel.com ([134.134.136.126]:56785 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727563AbgBHTex (ORCPT ); Sat, 8 Feb 2020 14:34:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:53 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="225732414" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:52 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 11/12] fs/xfs: Clean up locking in dax invalidate Date: Sat, 8 Feb 2020 11:34:44 -0800 Message-Id: <20200208193445.27421-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Define a variable to hold the lock flags to ensure that the correct locks are returned or released on error. Signed-off-by: Ira Weiny --- fs/xfs/xfs_ioctl.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index faba232b1f31..0134c9e65efb 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1190,7 +1190,7 @@ xfs_ioctl_setattr_dax_invalidate( int *join_flags) { struct inode *inode = VFS_I(ip); - int error; + int error, flags; *join_flags = 0; @@ -1205,8 +1205,10 @@ xfs_ioctl_setattr_dax_invalidate( if (S_ISDIR(inode->i_mode)) return 0; + flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; + /* lock, flush and invalidate mapping in preparation for flag change */ - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_ilock(ip, flags); if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX && !xfs_inode_supports_dax(ip)) { @@ -1229,11 +1231,11 @@ xfs_ioctl_setattr_dax_invalidate( if (error) goto out_unlock; - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; + *join_flags = flags; return 0; out_unlock: - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_iunlock(ip, flags); return error; } From patchwork Sat Feb 8 19:34:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11371803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A60118C6 for ; Sat, 8 Feb 2020 19:35:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4353421741 for ; Sat, 8 Feb 2020 19:35:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727692AbgBHTe7 (ORCPT ); Sat, 8 Feb 2020 14:34:59 -0500 Received: from mga03.intel.com ([134.134.136.65]:60930 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727597AbgBHTez (ORCPT ); Sat, 8 Feb 2020 14:34:55 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:54 -0800 X-IronPort-AV: E=Sophos;i="5.70,418,1574150400"; d="scan'208";a="221147459" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.157]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 11:34:53 -0800 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org Cc: Ira Weiny , Alexander Viro , "Darrick J. Wong" , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 12/12] fs/xfs: Allow toggle of effective DAX flag Date: Sat, 8 Feb 2020 11:34:45 -0800 Message-Id: <20200208193445.27421-13-ira.weiny@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200208193445.27421-1-ira.weiny@intel.com> References: <20200208193445.27421-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Ira Weiny Now that locking of the inode is in place we can allow a DAX state change while under the new lock. Signed-off-by: Ira Weiny --- Changes from V2: Add in lock_dax_state_static_key static branch enabling. fs/xfs/xfs_inode.h | 1 + fs/xfs/xfs_ioctl.c | 15 ++++++++++++--- fs/xfs/xfs_iops.c | 15 +++++++++++---- fs/xfs/xfs_super.c | 16 +++++++++------- 4 files changed, 33 insertions(+), 14 deletions(-) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 25fe20740bf7..0064f8eef41d 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -469,6 +469,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock, /* from xfs_iops.c */ extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); +extern void xfs_setup_a_ops(struct xfs_inode *ip); /* * When setting up a newly allocated inode, we need to call diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 0134c9e65efb..0736cf45223d 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1123,12 +1123,11 @@ xfs_diflags_to_linux( inode->i_flags |= S_NOATIME; else inode->i_flags &= ~S_NOATIME; -#if 0 /* disabled until the flag switching races are sorted out */ if (xflags & FS_XFLAG_DAX) inode->i_flags |= S_DAX; else inode->i_flags &= ~S_DAX; -#endif + } static int @@ -1194,6 +1193,10 @@ xfs_ioctl_setattr_dax_invalidate( *join_flags = 0; +#if !defined(CONFIG_FS_DAX) + return -EINVAL; +#endif + /* If the DAX state is not changing, we have nothing to do here. */ if ((fa->fsx_xflags & FS_XFLAG_DAX) && (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) @@ -1205,7 +1208,7 @@ xfs_ioctl_setattr_dax_invalidate( if (S_ISDIR(inode->i_mode)) return 0; - flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; + flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL | XFS_DAX_EXCL; /* lock, flush and invalidate mapping in preparation for flag change */ xfs_ilock(ip, flags); @@ -1526,6 +1529,9 @@ xfs_ioctl_setattr( else ip->i_d.di_cowextsize = 0; + if (join_flags & XFS_DAX_EXCL) + xfs_setup_a_ops(ip); + code = xfs_trans_commit(tp); /* @@ -1635,6 +1641,9 @@ xfs_ioc_setxflags( goto out_drop_write; } + if (join_flags & XFS_DAX_EXCL) + xfs_setup_a_ops(ip); + error = xfs_trans_commit(tp); out_drop_write: mnt_drop_write_file(filp); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index eebec159d873..15ef51c7f0b3 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1366,6 +1366,16 @@ xfs_setup_inode( } } +void xfs_setup_a_ops(struct xfs_inode *ip) +{ + struct inode *inode = &ip->i_vnode; + + if (IS_DAX(inode)) + inode->i_mapping->a_ops = &xfs_dax_aops; + else + inode->i_mapping->a_ops = &xfs_address_space_operations; +} + void xfs_setup_iops( struct xfs_inode *ip) @@ -1376,10 +1386,7 @@ xfs_setup_iops( case S_IFREG: inode->i_op = &xfs_inode_operations; inode->i_fop = &xfs_file_operations; - if (IS_DAX(inode)) - inode->i_mapping->a_ops = &xfs_dax_aops; - else - inode->i_mapping->a_ops = &xfs_address_space_operations; + xfs_setup_a_ops(ip); break; case S_IFDIR: if (xfs_sb_version_hasasciici(&XFS_M(inode->i_sb)->m_sb)) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 2094386af8ac..af57fe07b56d 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1332,6 +1332,7 @@ xfs_fc_fill_super( struct xfs_mount *mp = sb->s_fs_info; struct inode *root; int flags = 0, error; + bool rtdev_is_dax = false, datadev_is_dax; mp->m_super = sb; @@ -1437,17 +1438,18 @@ xfs_fc_fill_super( if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) sb->s_flags |= SB_I_VERSION; - if (mp->m_flags & XFS_MOUNT_DAX) { - bool rtdev_is_dax = false, datadev_is_dax; + datadev_is_dax = bdev_dax_supported(mp->m_ddev_targp->bt_bdev, + sb->s_blocksize); + if (mp->m_rtdev_targp) + rtdev_is_dax = bdev_dax_supported(mp->m_rtdev_targp->bt_bdev, + sb->s_blocksize); + if (rtdev_is_dax || datadev_is_dax) + enable_dax_state_static_branch(); + if (mp->m_flags & XFS_MOUNT_DAX) { xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"); - datadev_is_dax = bdev_dax_supported(mp->m_ddev_targp->bt_bdev, - sb->s_blocksize); - if (mp->m_rtdev_targp) - rtdev_is_dax = bdev_dax_supported( - mp->m_rtdev_targp->bt_bdev, sb->s_blocksize); if (!rtdev_is_dax && !datadev_is_dax) { xfs_alert(mp, "DAX unsupported by block device. Turning off DAX.");