diff mbox

[v2,05/19] afs: convert to new i_version API

Message ID 20171216134656.15561-6-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Dec. 16, 2017, 1:46 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

For AFS, it's generally treated as an opaque value, so we use the
*_raw variants of the API here.

Note that AFS has quite a different definition for this counter. AFS
only increments it on changes to the data, not for the metadata. We'll
need to reconcile that somehow if we ever want to present this to
userspace via statx.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/afs/fsclient.c | 2 +-
 fs/afs/inode.c    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Jeffrey E Altman Dec. 16, 2017, 4:18 p.m. UTC | #1
Hi Jeff,

A few thoughts on AFS usage below which might impact a future revision
of the API.  I hope they are useful.

On 12/16/2017 8:49 AM, Jeff Layton wrote:
> On Sat, 2017-12-16 at 08:46 -0500, Jeff Layton wrote:
>> From: Jeff Layton <jlayton@redhat.com>
>>
>> For AFS, it's generally treated as an opaque value, so we use the
>> *_raw variants of the API here.
>>
>> Note that AFS has quite a different definition for this counter. AFS
>> only increments it on changes to the data, not for the metadata. We'll
>> need to reconcile that somehow if we ever want to present this to
>> userspace via statx.
>>

From the patch series notes:

"The inode->i_version field is supposed to be a value that changes
whenever there is any data or metadata change to the inode. Some
filesystems use it internally to detect directory changes during
readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
will also use it to optimize away some remeasurement if it's available.
NFS and AFS just use it to store an opaque change attribute from the
server.

"Only btrfs, ext4, and xfs increment it for data changes. Because of
this, these filesystems must log the inode to disk whenever the
i_version counter changes. That has a non-zero performance impact,
especially on write-heavy workloads, because we end up dirtying the
inode metadata on every write, not just when the times change. [1]"


The AFS/AuriStorFS data version is an unsigned 64-bit value that is
incremented by the file server as part of a data changing operation. For
files, a StoreData and for directories entry manipulations such as
create, rename, delete.  This data version is used to tag the version of
any subset of the data stream for caching and replication purposes.

As Jeff notes, the AFS data version is not incremented for metadata
changes.  Metadata cannot be trusted by clients without acquiring a
callback promise from a fileserver.  The callback promise will either be
satisfied by the issuing fileserver sending a CallBack notification that
the metadata is no longer valid OR the callback promise will expire.

Something else that is important to note that it is assumed that local
data changes that occur under a valid callback promise is newer than the
data on the fileserver.  It might be useful if the new i_version API
supported major and minor version numbers.  AFS implementations would
store the fileserver provided data version number as the major version
and would increment the minor version when local changes have been made
which have yet to be stored back to the fileserver.  This functionality
would be especially useful if disconnected operations were implemented
for the AFS implementation.

It might also be useful to separate metadata version and data version
although some filesystems would set the same value to both.  For AFS,
the metadata major version would the timestamp at which the callback was
issued.

Jeffrey Altman
Jeff Layton Dec. 16, 2017, 4:40 p.m. UTC | #2
On Sat, 2017-12-16 at 11:18 -0500, Jeffrey Altman wrote:
> Hi Jeff,
> 
> A few thoughts on AFS usage below which might impact a future revision
> of the API.  I hope they are useful.
> 
> On 12/16/2017 8:49 AM, Jeff Layton wrote:
> > On Sat, 2017-12-16 at 08:46 -0500, Jeff Layton wrote:
> > > From: Jeff Layton <jlayton@redhat.com>
> > > 
> > > For AFS, it's generally treated as an opaque value, so we use the
> > > *_raw variants of the API here.
> > > 
> > > Note that AFS has quite a different definition for this counter. AFS
> > > only increments it on changes to the data, not for the metadata. We'll
> > > need to reconcile that somehow if we ever want to present this to
> > > userspace via statx.
> > > 
> 
> From the patch series notes:
> 
> "The inode->i_version field is supposed to be a value that changes
> whenever there is any data or metadata change to the inode. Some
> filesystems use it internally to detect directory changes during
> readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
> will also use it to optimize away some remeasurement if it's available.
> NFS and AFS just use it to store an opaque change attribute from the
> server.
> 
> "Only btrfs, ext4, and xfs increment it for data changes. Because of
> this, these filesystems must log the inode to disk whenever the
> i_version counter changes. That has a non-zero performance impact,
> especially on write-heavy workloads, because we end up dirtying the
> inode metadata on every write, not just when the times change. [1]"
> 
> 
> The AFS/AuriStorFS data version is an unsigned 64-bit value that is
> incremented by the file server as part of a data changing operation. For
> files, a StoreData and for directories entry manipulations such as
> create, rename, delete.  This data version is used to tag the version of
> any subset of the data stream for caching and replication purposes.
> 
> As Jeff notes, the AFS data version is not incremented for metadata
> changes.  Metadata cannot be trusted by clients without acquiring a
> callback promise from a fileserver.  The callback promise will either be
> satisfied by the issuing fileserver sending a CallBack notification that
> the metadata is no longer valid OR the callback promise will expire.
> 
> Something else that is important to note that it is assumed that local
> data changes that occur under a valid callback promise is newer than the
> data on the fileserver.  It might be useful if the new i_version API
> supported major and minor version numbers.  AFS implementations would
> store the fileserver provided data version number as the major version
> and would increment the minor version when local changes have been made
> which have yet to be stored back to the fileserver.  This functionality
> would be especially useful if disconnected operations were implemented
> for the AFS implementation.
> 
> It might also be useful to separate metadata version and data version
> although some filesystems would set the same value to both.  For AFS,
> the metadata major version would the timestamp at which the callback was
> issued.
> 
> Jeffrey Altman

Thanks. That seems like rather specialized use case.

If we did want to go that route, we'd probably need to give filesystems
a way to overload how i_version is handled and queried (maybe some new
inode ops?).

Given that nothing ever looks at the the i_version in kAFS now, I don't
have a lot of incentive to do anything along those lines in this set. I
think this patchset will probably make it simpler to do something like
that in the future, if you were motivated to do so though.
diff mbox

Patch

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index b90ef39ae914..67ed9bb5fe31 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -124,7 +124,7 @@  static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
 		vnode->vfs_inode.i_ctime.tv_sec	= status->mtime_client;
 		vnode->vfs_inode.i_mtime	= vnode->vfs_inode.i_ctime;
 		vnode->vfs_inode.i_atime	= vnode->vfs_inode.i_ctime;
-		vnode->vfs_inode.i_version	= data_version;
+		inode_set_iversion_raw(&vnode->vfs_inode, data_version);
 	}
 
 	expected_version = status->data_version;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3415eb7484f6..af9577210a46 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -89,7 +89,7 @@  static int afs_inode_map_status(struct afs_vnode *vnode, struct key *key)
 	inode->i_atime		= inode->i_mtime = inode->i_ctime;
 	inode->i_blocks		= 0;
 	inode->i_generation	= vnode->fid.unique;
-	inode->i_version	= vnode->status.data_version;
+	inode_set_iversion_raw(inode, vnode->status.data_version);
 	inode->i_mapping->a_ops	= &afs_fs_aops;
 
 	read_sequnlock_excl(&vnode->cb_lock);
@@ -218,7 +218,7 @@  struct inode *afs_iget_autocell(struct inode *dir, const char *dev_name,
 	inode->i_ctime.tv_nsec	= 0;
 	inode->i_atime		= inode->i_mtime = inode->i_ctime;
 	inode->i_blocks		= 0;
-	inode->i_version	= 0;
+	inode_set_iversion_raw(inode, 0);
 	inode->i_generation	= 0;
 
 	set_bit(AFS_VNODE_PSEUDODIR, &vnode->flags);