From patchwork Tue Jan 9 14:10:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 10152131 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7DCA660223 for ; Tue, 9 Jan 2018 14:11:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E464256E6 for ; Tue, 9 Jan 2018 14:11:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62D0828784; Tue, 9 Jan 2018 14:11:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2374628888 for ; Tue, 9 Jan 2018 14:11:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756351AbeAIOLJ (ORCPT ); Tue, 9 Jan 2018 09:11:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:41920 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756286AbeAIOLH (ORCPT ); Tue, 9 Jan 2018 09:11:07 -0500 Received: from tleilax.poochiereds.net (cpe-71-70-156-158.nc.res.rr.com [71.70.156.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E0AF42173C; Tue, 9 Jan 2018 14:11:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0AF42173C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=jlayton@kernel.org From: Jeff Layton To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, linux-nfs@vger.kernel.org, bfields@fieldses.org, neilb@suse.de, jack@suse.de, linux-ext4@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, linux-xfs@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com, dsterba@suse.com, linux-integrity@vger.kernel.org, zohar@linux.vnet.ibm.com, dmitry.kasatkin@gmail.com, linux-afs@lists.infradead.org, dhowells@redhat.com, jaltman@auristor.com, krzk@kernel.org Subject: [PATCH v5 01/19] fs: new API for handling inode->i_version Date: Tue, 9 Jan 2018 09:10:41 -0500 Message-Id: <20180109141059.25929-2-jlayton@kernel.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180109141059.25929-1-jlayton@kernel.org> References: <20180109141059.25929-1-jlayton@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jeff Layton Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara --- fs/btrfs/file.c | 1 + fs/btrfs/inode.c | 1 + fs/btrfs/ioctl.c | 1 + fs/btrfs/xattr.c | 1 + fs/ext4/inode.c | 1 + fs/ext4/namei.c | 1 + fs/inode.c | 1 + include/linux/fs.h | 15 --- include/linux/iversion.h | 236 +++++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 243 insertions(+), 15 deletions(-) create mode 100644 include/linux/iversion.h diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index eb1bac7c8553..c95d7b2efefb 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e1a7f3cb5be9..27f008b33fc1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 2ef8acaac688..aa452c9e2eff 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 2c7e53f9ff1b..5258c1714830 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "ctree.h" #include "btrfs_inode.h" #include "transaction.h" diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7df2c5644e59..fa5d8bc52d2d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -39,6 +39,7 @@ #include #include #include +#include #include "ext4_jbd2.h" #include "xattr.h" diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 798b3ac680db..bcf0dff517be 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "ext4.h" #include "ext4_jbd2.h" diff --git a/fs/inode.c b/fs/inode.c index 03102d6ef044..19e72f500f71 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -18,6 +18,7 @@ #include /* for inode_has_buffers */ #include #include +#include #include #include "internal.h" diff --git a/include/linux/fs.h b/include/linux/fs.h index 511fbaabf624..76382c24e9d0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode *inode) mark_inode_dirty(inode); } -/** - * inode_inc_iversion - increments i_version - * @inode: inode that need to be updated - * - * Every time the inode is modified, the i_version field will be incremented. - * The filesystem has to be mounted with i_version flag - */ - -static inline void inode_inc_iversion(struct inode *inode) -{ - spin_lock(&inode->i_lock); - inode->i_version++; - spin_unlock(&inode->i_lock); -} - enum file_time_flags { S_ATIME = 1, S_MTIME = 2, diff --git a/include/linux/iversion.h b/include/linux/iversion.h new file mode 100644 index 000000000000..d09cc3a08740 --- /dev/null +++ b/include/linux/iversion.h @@ -0,0 +1,236 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_IVERSION_H +#define _LINUX_IVERSION_H + +#include + +/* + * The change attribute (i_version) is mandated by NFSv4 and is mostly for + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must + * appear different to observers if there was a change to the inode's data or + * metadata since it was last queried. + * + * Observers see the i_version as a 64-bit number that never changes. If it + * remains the same since it was last checked, then nothing has changed in the + * inode. If it's different then something has changed. Observers cannot infer + * anything about the nature or magnitude of the changes from the value, only + * that the inode has changed in some fashion. + * + * Not all filesystems properly implement the i_version counter. Subsystems that + * want to use i_version field on an inode should first check whether the + * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro). + * + * Those that set SB_I_VERSION will automatically have their i_version counter + * incremented on writes to normal files. If the SB_I_VERSION is not set, then + * the VFS will not touch it on writes, and the filesystem can use it how it + * wishes. Note that the filesystem is always responsible for updating the + * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.). + * We consider these sorts of filesystems to have a kernel-managed i_version. + * + * Note that some filesystems (e.g. NFS and AFS) just use the field to store + * a server-provided value (for the most part). For that reason, those + * filesystems do not set SB_I_VERSION. These filesystems are considered to + * have a self-managed i_version. + */ + +/** + * inode_set_iversion_raw - set i_version to the specified raw value + * @inode: inode to set + * @new: new i_version value to set + * + * Set @inode's i_version field to @new. This function is for use by + * filesystems that self-manage the i_version. + * + * For example, the NFS client stores its NFSv4 change attribute in this way, + * and the AFS client stores the data_version from the server here. + */ +static inline void +inode_set_iversion_raw(struct inode *inode, u64 new) +{ + inode->i_version = new; +} + +/** + * inode_set_iversion - set i_version to a particular value + * @inode: inode to set + * @new: new i_version value to set + * + * Set @inode's i_version field to @new. This function is for filesystems with + * a kernel-managed i_version. + * + * For now, this just does the same thing as the _raw variant. + */ +static inline void +inode_set_iversion(struct inode *inode, u64 new) +{ + inode_set_iversion_raw(inode, new); +} + +/** + * inode_set_iversion_queried - set i_version to a particular value and set + * flag to indicate that it has been viewed + * @inode: inode to set + * @new: new i_version value to set + * + * When loading in an i_version value from a backing store, we typically don't + * know whether it was previously viewed before being stored or not. Thus, we + * must assume that it was, to ensure that any changes will result in the + * value changing. + * + * This function will set the inode's i_version, and possibly flag the value + * as if it has already been viewed at least once. + * + * For now, this just does what inode_set_iversion does. + */ +static inline void +inode_set_iversion_queried(struct inode *inode, u64 new) +{ + inode_set_iversion(inode, new); +} + +/** + * inode_maybe_inc_iversion - increments i_version + * @inode: inode with the i_version that should be updated + * @force: increment the counter even if it's not necessary + * + * Every time the inode is modified, the i_version field must be seen to have + * changed by any observer. + * + * In this implementation, we always increment it after taking the i_lock to + * ensure that we don't race with other incrementors. + * + * Returns true if counter was bumped, and false if it wasn't. + */ +static inline bool +inode_maybe_inc_iversion(struct inode *inode, bool force) +{ + spin_lock(&inode->i_lock); + inode->i_version++; + spin_unlock(&inode->i_lock); + return true; +} + +/** + * inode_inc_iversion - forcibly increment i_version + * @inode: inode that needs to be updated + * + * Forcbily increment the i_version field. This always results in a change to + * the observable value. + */ +static inline void +inode_inc_iversion(struct inode *inode) +{ + inode_maybe_inc_iversion(inode, true); +} + +/** + * inode_iversion_need_inc - is the i_version in need of being incremented? + * @inode: inode to check + * + * Returns whether the inode->i_version counter needs incrementing on the next + * change. + * + * For now, we assume that it always does. + */ +static inline bool +inode_iversion_need_inc(struct inode *inode) +{ + return true; +} + +/** + * inode_peek_iversion_raw - grab a "raw" iversion value + * @inode: inode from which i_version should be read + * + * Grab a "raw" inode->i_version value and return it. The i_version is not + * flagged or converted in any way. This is mostly used to access a self-managed + * i_version. + * + * With those filesystems, we want to treat the i_version as an entirely + * opaque value. + */ +static inline u64 +inode_peek_iversion_raw(const struct inode *inode) +{ + return inode->i_version; +} + +/** + * inode_inc_iversion_raw - forcibly increment raw i_version + * @inode: inode that needs to be updated + * + * Forcbily increment the raw i_version field. This always results in a change + * to the raw value. + * + * NFS will use the i_version field to store the value from the server. It + * mostly treats it as opaque, but in the case where it holds a write + * delegation, it must increment the value itself. This function does that. + */ +static inline void +inode_inc_iversion_raw(struct inode *inode) +{ + inode_inc_iversion(inode); +} + +/** + * inode_peek_iversion - read i_version without flagging it to be incremented + * @inode: inode from which i_version should be read + * + * Read the inode i_version counter for an inode without registering it as a + * query. + * + * This is typically used by local filesystems that need to store an i_version + * on disk. In that situation, it's not necessary to flag it as having been + * viewed, as the result won't be used to gauge changes from that point. + */ +static inline u64 +inode_peek_iversion(const struct inode *inode) +{ + return inode_peek_iversion_raw(inode); +} + +/** + * inode_query_iversion - read i_version for later use + * @inode: inode from which i_version should be read + * + * Read the inode i_version counter. This should be used by callers that wish + * to store the returned i_version for later comparison. This will guarantee + * that a later query of the i_version will result in a different value if + * anything has changed. + * + * This implementation just does a peek. + */ +static inline u64 +inode_query_iversion(struct inode *inode) +{ + return inode_peek_iversion(inode); +} + +/** + * inode_cmp_iversion_raw - check whether the raw i_version counter has changed + * @inode: inode to check + * @old: old value to check against its i_version + * + * Compare the current raw i_version counter with a previous one. Returns 0 if + * they are the same or non-zero if they are different. + */ +static inline s64 +inode_cmp_iversion_raw(const struct inode *inode, u64 old) +{ + return (s64)inode_peek_iversion_raw(inode) - (s64)old; +} + +/** + * inode_cmp_iversion - check whether the i_version counter has changed + * @inode: inode to check + * @old: old value to check against its i_version + * + * Compare an i_version counter with a previous one. Returns 0 if they are + * the same or non-zero if they are different. + */ +static inline s64 +inode_cmp_iversion(const struct inode *inode, u64 old) +{ + return (s64)inode_peek_iversion(inode) - (s64)old; +} +#endif