diff mbox series

[v3,1/3] ext4: don't increase iversion counter for ea_inodes

Message ID 20220812123727.46397-1-lczerner@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/3] ext4: don't increase iversion counter for ea_inodes | expand

Commit Message

Lukas Czerner Aug. 12, 2022, 12:37 p.m. UTC
ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
v2, v3: no change

 fs/ext4/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Christian Brauner Aug. 12, 2022, 1:04 p.m. UTC | #1
On Fri, Aug 12, 2022 at 02:37:25PM +0200, Lukas Czerner wrote:
> ea_inodes are using i_version for storing part of the reference count so
> we really need to leave it alone.
> 
> The problem can be reproduced by xfstest ext4/026 when iversion is
> enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> inodes in ext4_mark_iloc_dirty().
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Jeff Layton Aug. 12, 2022, 6:42 p.m. UTC | #2
On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote:
> ea_inodes are using i_version for storing part of the reference count so
> we really need to leave it alone.
> 
> The problem can be reproduced by xfstest ext4/026 when iversion is
> enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> inodes in ext4_mark_iloc_dirty().
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> v2, v3: no change
> 
>  fs/ext4/inode.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 601214453c3a..2a220be34caa 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>  	}
>  	ext4_fc_track_inode(handle, inode);
>  
> -	if (IS_I_VERSION(inode))
> +	/*
> +	 * ea_inodes are using i_version for storing reference count, don't
> +	 * mess with it
> +	 */
> +	if (IS_I_VERSION(inode) &&
> +	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
>  		inode_inc_iversion(inode);
>  
>  	/* the do_update_inode consumes one bh->b_count */


I've spent some time writing tests for the i_version counter (still
quite rough right now), and what I've found is that this particular
inode_inc_iversion results in the counter being bumped on _reads_ as
well as writes, due to the atime changing. This call to
inode_inc_iversion seems to make no sense, as we aren't bumping the
mtime here.

I'm still working on and testing this, but I think we'll probably just
want to remove this inode_inc_iversion entirely, and leave the i_version
bumping for normal files to happen when the timestamps are updated. So
far, my testing seems to indicate that that does the right thing.

Hopefully I'll have some testcases + patches for this next week
sometime.

Cheers,
Jan Kara Aug. 16, 2022, 11:52 a.m. UTC | #3
On Fri 12-08-22 14:42:36, Jeff Layton wrote:
> On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote:
> > ea_inodes are using i_version for storing part of the reference count so
> > we really need to leave it alone.
> > 
> > The problem can be reproduced by xfstest ext4/026 when iversion is
> > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> > inodes in ext4_mark_iloc_dirty().
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > v2, v3: no change
> > 
> >  fs/ext4/inode.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 601214453c3a..2a220be34caa 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> >  	}
> >  	ext4_fc_track_inode(handle, inode);
> >  
> > -	if (IS_I_VERSION(inode))
> > +	/*
> > +	 * ea_inodes are using i_version for storing reference count, don't
> > +	 * mess with it
> > +	 */
> > +	if (IS_I_VERSION(inode) &&
> > +	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> >  		inode_inc_iversion(inode);
> >  
> >  	/* the do_update_inode consumes one bh->b_count */
> 
> 
> I've spent some time writing tests for the i_version counter (still
> quite rough right now), and what I've found is that this particular
> inode_inc_iversion results in the counter being bumped on _reads_ as
> well as writes, due to the atime changing. This call to
> inode_inc_iversion seems to make no sense, as we aren't bumping the
> mtime here.
> 
> I'm still working on and testing this, but I think we'll probably just
> want to remove this inode_inc_iversion entirely, and leave the i_version
> bumping for normal files to happen when the timestamps are updated. So
> far, my testing seems to indicate that that does the right thing.

I agree that inode_inc_iversion() may be overly agressive here but where
else does get iversion updated for things like inode owner update or
permission changes?

								Honza
Jeff Layton Aug. 16, 2022, 12:18 p.m. UTC | #4
On Tue, 2022-08-16 at 13:52 +0200, Jan Kara wrote:
> On Fri 12-08-22 14:42:36, Jeff Layton wrote:
> > On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote:
> > > ea_inodes are using i_version for storing part of the reference count so
> > > we really need to leave it alone.
> > > 
> > > The problem can be reproduced by xfstest ext4/026 when iversion is
> > > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> > > inodes in ext4_mark_iloc_dirty().
> > > 
> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > v2, v3: no change
> > > 
> > >  fs/ext4/inode.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 601214453c3a..2a220be34caa 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> > >  	}
> > >  	ext4_fc_track_inode(handle, inode);
> > >  
> > > -	if (IS_I_VERSION(inode))
> > > +	/*
> > > +	 * ea_inodes are using i_version for storing reference count, don't
> > > +	 * mess with it
> > > +	 */
> > > +	if (IS_I_VERSION(inode) &&
> > > +	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > >  		inode_inc_iversion(inode)
> > >  
> > >  	/* the do_update_inode consumes one bh->b_count */
> > 
> > 
> > I've spent some time writing tests for the i_version counter (still
> > quite rough right now), and what I've found is that this particular
> > inode_inc_iversion results in the counter being bumped on _reads_ as
> > well as writes, due to the atime changing. This call to
> > inode_inc_iversion seems to make no sense, as we aren't bumping the
> > mtime here.
> > 
> > I'm still working on and testing this, but I think we'll probably just
> > want to remove this inode_inc_iversion entirely, and leave the i_version
> > bumping for normal files to happen when the timestamps are updated. So
> > far, my testing seems to indicate that that does the right thing.
> 
> I agree that inode_inc_iversion() may be overly agressive here but where
> else does get iversion updated for things like inode owner update or
> permission changes?
> 
> 								Honza

If we remove it here, then both the setattr and setxattr codepaths will
need to explicitly bump the iversion counter. Note that we update the
ctime in those paths too, so that gives us a guidepost as to when we
should update i_version. xfs will need similar changes, but btrfs turns
out to already do the right thing.

I'm planning to post my latest patches in just a bit.
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 601214453c3a..2a220be34caa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5731,7 +5731,12 @@  int ext4_mark_iloc_dirty(handle_t *handle,
 	}
 	ext4_fc_track_inode(handle, inode);
 
-	if (IS_I_VERSION(inode))
+	/*
+	 * ea_inodes are using i_version for storing reference count, don't
+	 * mess with it
+	 */
+	if (IS_I_VERSION(inode) &&
+	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */