diff mbox series

[v2,2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

Message ID 20220803105340.17377-2-lczerner@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/3] ext4: don't increase iversion counter for ea_inodes | expand

Commit Message

Lukas Czerner Aug. 3, 2022, 10:53 a.m. UTC
Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
v2: Reworked according to suggestions from Jan

 fs/fs-writeback.c  | 34 ++++++++++++++++++++++------------
 fs/xfs/xfs_super.c |  3 ++-
 include/linux/fs.h |  6 +++---
 3 files changed, 27 insertions(+), 16 deletions(-)

Comments

Eric Biggers Aug. 5, 2022, 8:05 a.m. UTC | #1
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ad5e3520fae..2243797badf2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *			The inode itself only has dirty timestamps, and the
>   *			lazytime mount option is enabled.  We keep track of this
>   *			separately from I_DIRTY_SYNC in order to implement
>   *			lazytime.  This gets cleared if I_DIRTY_INODE
> - *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
> - *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> - *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> + *			in place.

I'm still having a hard time understanding the new semantics.  The first
sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
itself only has dirty timestamps", right?

Also, have you checked all the places that I_DIRTY_TIME is used and verified
they do the right thing now?  What about inode_is_dirtytime_only()?

Also what is the precise meaning of the flags argument to ->dirty_inode now?

	sb->s_op->dirty_inode(inode,
			flags & (I_DIRTY_INODE | I_DIRTY_TIME));

Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

- Eric
Lukas Czerner Aug. 5, 2022, 12:23 p.m. UTC | #2
On Fri, Aug 05, 2022 at 01:05:45AM -0700, Eric Biggers wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9ad5e3520fae..2243797badf2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *			The inode itself only has dirty timestamps, and the
> >   *			lazytime mount option is enabled.  We keep track of this
> >   *			separately from I_DIRTY_SYNC in order to implement
> >   *			lazytime.  This gets cleared if I_DIRTY_INODE
> > - *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
> > - *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> > - *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> > + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> > + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> > + *			in place.
> 
> I'm still having a hard time understanding the new semantics.  The first
> sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
> itself only has dirty timestamps", right?

The problem is that it was always assumed that I_DIRTY_INODE superseeds
I_DIRTY_TIME and so it would get cleared in __mark_inode_dirty() when we
have I_DIRTY_INODE. That's true, we call sb->s_op->dirty_inode(), the
time update gets pushed into on-disk inode structure, I_DIRTY_TIME
cleared and it will get queued for writeback.

Any subsequent dirtying with I_DIRTY_TIME gets ignored simply because
I_DIRTY_INODE is already set in i_state. But in ext4 this time update
will never get pushed into on disk inode and there is no I_DIRTY_TIME so
once the writeback is done we've lost all those I_DIRTY_TIME updates in
between even if there was a sync.

Now, we still clear I_DIRTY_TIME when we get I_DIRTY_INODE, but any
subsequent I_DIRTY_TIME only updates won't be ignored and we set it into
i_state. After the writeback is done it'll be moved to b_dirty_time
list.

So I am not sure how would you like it to be re-worded, simply removing
the 'only' would be ok?

> 
> Also, have you checked all the places that I_DIRTY_TIME is used and verified
> they do the right thing now?  What about inode_is_dirtytime_only()?

Yes, that's fine, despite the slightly misleading name ;)

> 
> Also what is the precise meaning of the flags argument to ->dirty_inode now?
> 
> 	sb->s_op->dirty_inode(inode,
> 			flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> 
> Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
well. Additionaly it can have I_DIRTY_TIME to inform the fs we have a
dirty timestamp as well (in case of lazytime).

Perhaps we can add:

If the inode has dirty timestamp and lazytime is enabled I_DIRTY_TIME
will be set in the flags.

-Lukas

> 
> - Eric
>
Dave Chinner Aug. 7, 2022, 11:08 p.m. UTC | #3
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
> 
> The problem can be reproduced on ext4 by running xfstest generic/622
> with -o iversion mount option.
> 
> Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> I_DIRTY_INODE. Also make sure that the case is properly handled in
> writeback_single_inode() as well. Additionally changes in
> xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> 
> Thanks Jan Kara for suggestions on how to make this work properly.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Suggested-by: Jan Kara <jack@suse.cz>
> ---
> v2: Reworked according to suggestions from Jan

....

> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index aa977c7ea370..cff05a4771b5 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
>  
>  	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
>  		return;
> -	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> +	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> +	    !((inode->i_state | flag) & I_DIRTY_TIME))
>  		return;

My eyes, they bleed. The dirty time code was already a horrid
abomination, and this makes it worse.

From looking at the code, I cannot work out what the new semantics
for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work
out what the condition this is new code is supposed to be doing. I
*can't verify it is correct* by reading the code.

Can you please add a comment here explaining the conditions where we
don't have to log a new timestamp update?

Also, if "flag" now contains multiple flags, can you rename it
"flags"?

Cheers,

Dave.
Lukas Czerner Aug. 8, 2022, 10:26 a.m. UTC | #4
On Mon, Aug 08, 2022 at 09:08:10AM +1000, Dave Chinner wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > Currently the I_DIRTY_TIME will never get set if the inode already has
> > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> > true, however ext4 will only update the on-disk inode in
> > ->dirty_inode(), not on actual writeback. As a result if the inode
> > already has I_DIRTY_INODE state by the time we get to
> > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> > into on-disk inode and will not get updated until the next I_DIRTY_INODE
> > update, which might never come if we crash or get a power failure.
> > 
> > The problem can be reproduced on ext4 by running xfstest generic/622
> > with -o iversion mount option.
> > 
> > Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> > I_DIRTY_INODE. Also make sure that the case is properly handled in
> > writeback_single_inode() as well. Additionally changes in
> > xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> > 
> > Thanks Jan Kara for suggestions on how to make this work properly.
> > 
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > ---
> > v2: Reworked according to suggestions from Jan
> 
> ....
> 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index aa977c7ea370..cff05a4771b5 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
> >  
> >  	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> >  		return;
> > -	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> > +	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> > +	    !((inode->i_state | flag) & I_DIRTY_TIME))
> >  		return;
> 
> My eyes, they bleed. The dirty time code was already a horrid
> abomination, and this makes it worse.
> 
> From looking at the code, I cannot work out what the new semantics
> for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work

Hi Dave,

please see the other thready for this patch with Eric Biggers, where I
try to explain and give some suggestion to change the doc. Does it make
sense to you, or am I missing something?

https://marc.info/?l=linux-ext4&m=165970194205621&w=2

> out what the condition this is new code is supposed to be doing. I
> *can't verify it is correct* by reading the code.

The ->dirty_inode() needed to be changed to clear I_DIRTY_TIME from
i_state *before* we call ->dirty_inode() to avoid race where we would
lose timestamp update that comes just a little later, after
-dirty_inode() call with I_DRITY_INODE.

But that would break xfs, so I decided to keep the condition and loosen
the requirement so that I_DIRTY_TIME can also be se in 'flag', not just
the i_state. Hence the abomination.

> 
> Can you please add a comment here explaining the conditions where we
> don't have to log a new timestamp update?

How about something like this?

Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC) and
has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be either already
set in i_state, or passed in flags possibly together with I_DIRTY_SYNC.

> 
> Also, if "flag" now contains multiple flags, can you rename it
> "flags"?

Sure, I can do that.

Thanks!
-Lukas

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
>
Eric Biggers Aug. 12, 2022, 6:20 p.m. UTC | #5
On Fri, Aug 05, 2022 at 02:23:06PM +0200, Lukas Czerner wrote:
> > 
> > Also what is the precise meaning of the flags argument to ->dirty_inode now?
> > 
> > 	sb->s_op->dirty_inode(inode,
> > 			flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> > 
> > Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.
> 
> Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
> well.

Well, it didn't really need to because there were only two possibilities:
datasync and not datasync.  This patch changes that.

> Additionaly it can have I_DIRTY_TIME to inform the fs we have a
> dirty timestamp as well (in case of lazytime).

This is introduced by this patch.

- Eric
diff mbox series

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@  static int writeback_single_inode(struct inode *inode,
 	 */
 	if (!(inode->i_state & I_DIRTY_ALL))
 		inode_cgwb_move_to_attached(inode, wb);
-	else if (!(inode->i_state & I_SYNC_QUEUED) &&
-		 (inode->i_state & I_DIRTY))
-		redirty_tail_locked(inode, wb);
+	else if (!(inode->i_state & I_SYNC_QUEUED)) {
+		if ((inode->i_state & I_DIRTY))
+			redirty_tail_locked(inode, wb);
+		else if (inode->i_state & I_DIRTY_TIME) {
+			inode->dirtied_when = jiffies;
+			inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		}
+	}
 
 	spin_unlock(&wb->list_lock);
 	inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@  void __mark_inode_dirty(struct inode *inode, int flags)
 	trace_writeback_mark_inode_dirty(inode, flags);
 
 	if (flags & I_DIRTY_INODE) {
+
+		/* Inode timestamp update will piggback on this dirtying */
+		if (inode->i_state & I_DIRTY_TIME) {
+			spin_lock(&inode->i_lock);
+			if (inode->i_state & I_DIRTY_TIME) {
+				inode->i_state &= ~I_DIRTY_TIME;
+				flags |= I_DIRTY_TIME;
+			}
+			spin_unlock(&inode->i_lock);
+		}
+
 		/*
 		 * Notify the filesystem about the inode being dirtied, so that
 		 * (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@  void __mark_inode_dirty(struct inode *inode, int flags)
 		 */
 		trace_writeback_dirty_inode_start(inode, flags);
 		if (sb->s_op->dirty_inode)
-			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+			sb->s_op->dirty_inode(inode,
+				flags & (I_DIRTY_INODE | I_DIRTY_TIME));
 		trace_writeback_dirty_inode(inode, flags);
 
 		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@  void __mark_inode_dirty(struct inode *inode, int flags)
 	 */
 	smp_mb();
 
-	if (((inode->i_state & flags) == flags) ||
-	    (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+	if ((inode->i_state & flags) == flags)
 		return;
 
 	spin_lock(&inode->i_lock);
-	if (dirtytime && (inode->i_state & I_DIRTY_INODE))
-		goto out_unlock_inode;
 	if ((inode->i_state & flags) != flags) {
 		const int was_dirty = inode->i_state & I_DIRTY;
 
 		inode_attach_wb(inode, NULL);
 
-		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
-		if (flags & I_DIRTY_INODE)
-			inode->i_state &= ~I_DIRTY_TIME;
 		inode->i_state |= flags;
 
 		/*
@@ -2486,7 +2497,6 @@  void __mark_inode_dirty(struct inode *inode, int flags)
 out_unlock:
 	if (wb)
 		spin_unlock(&wb->list_lock);
-out_unlock_inode:
 	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index aa977c7ea370..cff05a4771b5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -658,7 +658,8 @@  xfs_fs_dirty_inode(
 
 	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
 		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
+	    !((inode->i_state | flag) & I_DIRTY_TIME))
 		return;
 
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..2243797badf2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,9 +2245,9 @@  static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			lazytime mount option is enabled.  We keep track of this
  *			separately from I_DIRTY_SYNC in order to implement
  *			lazytime.  This gets cleared if I_DIRTY_INODE
- *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
- *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- *			i_state, but not both.  I_DIRTY_PAGES may still be set.
+ *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ *			in place.
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and