Message ID | 20210105005452.92521-12-ebiggers@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lazytime fixes and cleanups | expand |
On Mon 04-01-21 16:54:50, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Add a lazytime_expired method to 'struct super_operations'. Filesystems > can implement this to be notified when an inode's lazytime timestamps > have expired and need to be written to disk. > > This avoids any potential ambiguity with > ->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic > dirtying of the inode, not just a lazytime timestamp expiration. > In particular, this will be useful for XFS. > > If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to > be called. > > Note that there are three cases where we have to make sure to call > lazytime_expired(): > > - __writeback_single_inode(): inode is being written now > - vfs_fsync_range(): inode is going to be synced > - iput(): inode is going to be evicted > > In the latter two cases, the inode still needs to be put on the > writeback list. So, we can't just replace the calls to > mark_inode_dirty_sync() with lazytime_expired(). Instead, add a new > flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty(). > It's like I_DIRTY_SYNC, except it causes the filesystem to be notified > of a lazytime expiration rather than a generic I_DIRTY_SYNC. > > Signed-off-by: Eric Biggers <ebiggers@google.com> Hum, seeing this patch I kind of wonder: Why don't we dirty the inode after expiring the lazytime timestamps with I_DIRTY_SYNC | I_DIRTY_TIME_EXPIRED and propagate I_DIRTY_TIME_EXPIRED even to ->dirty_inode() where XFS can catch it and act? Functionally it would be the same but we'd save a bunch of generic code and ->lazytime_expired helper used just by a single filesystem... Honza > --- > Documentation/filesystems/locking.rst | 2 ++ > Documentation/filesystems/vfs.rst | 10 ++++++++++ > fs/fs-writeback.c | 27 ++++++++++++++++++++++----- > fs/inode.c | 2 +- > fs/sync.c | 2 +- > include/linux/fs.h | 7 ++++++- > 6 files changed, 42 insertions(+), 8 deletions(-) > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > index c0f2c7586531b..53088e2a93b69 100644 > --- a/Documentation/filesystems/locking.rst > +++ b/Documentation/filesystems/locking.rst > @@ -150,6 +150,7 @@ prototypes:: > void (*free_inode)(struct inode *); > void (*destroy_inode)(struct inode *); > void (*dirty_inode) (struct inode *, int flags); > + void (*lazytime_expired) (struct inode *); > int (*write_inode) (struct inode *, struct writeback_control *wbc); > int (*drop_inode) (struct inode *); > void (*evict_inode) (struct inode *); > @@ -175,6 +176,7 @@ alloc_inode: > free_inode: called from RCU callback > destroy_inode: > dirty_inode: > +lazytime_expired: > write_inode: > drop_inode: !!!inode->i_lock!!! > evict_inode: > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst > index 287b80948a40b..02531b1421d01 100644 > --- a/Documentation/filesystems/vfs.rst > +++ b/Documentation/filesystems/vfs.rst > @@ -231,6 +231,7 @@ filesystem. As of kernel 2.6.22, the following members are defined: > void (*destroy_inode)(struct inode *); > > void (*dirty_inode) (struct inode *, int flags); > + void (*lazytime_expired) (struct inode *); > int (*write_inode) (struct inode *, int); > void (*drop_inode) (struct inode *); > void (*delete_inode) (struct inode *); > @@ -275,6 +276,15 @@ or bottom half). > not its data. If the update needs to be persisted by fdatasync(), > then I_DIRTY_DATASYNC will be set in the flags argument. > > +``lazytime_expired`` > + when the lazytime mount option is enabled, this method is > + called when an inode's in-memory updated timestamps have > + expired and thus need to be written to disk. This happens > + when the timestamps have been in memory for too long, when the > + inode is going to be evicted, or when userspace triggers a > + sync. If this method is not implemented, then > + ->dirty_inode(inode, I_DIRTY_SYNC) is called instead. > + > ``write_inode`` > this method is called when the VFS needs to write an inode to > disc. The second parameter indicates whether the write should > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index ed76112bd067b..f187fc3f854e4 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1441,6 +1441,14 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, > } > } > > +static void lazytime_expired(struct inode *inode) > +{ > + if (inode->i_sb->s_op->lazytime_expired) > + inode->i_sb->s_op->lazytime_expired(inode); > + else if (inode->i_sb->s_op->dirty_inode) > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC); > +} > + > /* > * Write out an inode and its dirty pages. Do not update the writeback list > * linkage. That is left to the caller. The caller is also responsible for > @@ -1520,8 +1528,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > * isn't enough. Don't call mark_inode_dirty_sync(), as that > * would put the inode back on the dirty list. > */ > - if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode) > - inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC); > + if (dirty & I_DIRTY_TIME) > + lazytime_expired(inode); > > err = write_inode(inode, wbc); > if (ret == 0) > @@ -2231,8 +2239,9 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode) > * > * @inode: inode to mark > * @flags: what kind of dirty, e.g. I_DIRTY_SYNC. This can be a combination of > - * multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined > - * with I_DIRTY_PAGES. > + * multiple I_DIRTY_* flags, except that: > + * - I_DIRTY_TIME can't be combined with I_DIRTY_PAGES > + * - I_DIRTY_TIME_EXPIRED must be used by itself > * > * Mark an inode as dirty. We notify the filesystem, then update the inode's > * dirty flags. Then, if needed we add the inode to the appropriate dirty list. > @@ -2260,7 +2269,15 @@ void __mark_inode_dirty(struct inode *inode, int flags) > > trace_writeback_mark_inode_dirty(inode, flags); > > - if (flags & I_DIRTY_INODE) { > + if (flags & I_DIRTY_TIME_EXPIRED) { > + /* > + * Notify the filesystem about a lazytime timestamp expiration. > + * Afterwards, this case just turns into I_DIRTY_SYNC. > + */ > + WARN_ON_ONCE(flags & ~I_DIRTY_TIME_EXPIRED); > + lazytime_expired(inode); > + flags = I_DIRTY_SYNC; > + } else if (flags & I_DIRTY_INODE) { > /* > * Notify the filesystem about the inode being dirtied, so that > * (if needed) it can update on-disk fields and journal the > diff --git a/fs/inode.c b/fs/inode.c > index d0fa43d8e9d5c..039b201a4743a 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1673,7 +1673,7 @@ void iput(struct inode *inode) > atomic_inc(&inode->i_count); > spin_unlock(&inode->i_lock); > trace_writeback_lazytime_iput(inode); > - mark_inode_dirty_sync(inode); > + __mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED); > goto retry; > } > iput_final(inode); > diff --git a/fs/sync.c b/fs/sync.c > index 1373a610dc784..363071a3528e3 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -196,7 +196,7 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync) > if (!file->f_op->fsync) > return -EINVAL; > if (!datasync && (inode->i_state & I_DIRTY_TIME)) > - mark_inode_dirty_sync(inode); > + __mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED); > return file->f_op->fsync(file, start, end, datasync); > } > EXPORT_SYMBOL(vfs_fsync_range); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 45a0303b2aeb6..8c5f5c5e62be4 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1935,7 +1935,8 @@ struct super_operations { > void (*destroy_inode)(struct inode *); > void (*free_inode)(struct inode *); > > - void (*dirty_inode) (struct inode *, int flags); > + void (*dirty_inode) (struct inode *, int flags); > + void (*lazytime_expired)(struct inode *); > int (*write_inode) (struct inode *, struct writeback_control *wbc); > int (*drop_inode) (struct inode *); > void (*evict_inode) (struct inode *); > @@ -2108,6 +2109,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, > * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. > * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in > * i_state, but not both. I_DIRTY_PAGES may still be set. > + * I_DIRTY_TIME_EXPIRED Passed to __mark_inode_dirty() to indicate the intent to > + * expire the inode's timestamps. Not stored in i_state. > + * > * I_NEW Serves as both a mutex and completion notification. > * New inodes set I_NEW. If two processes both create > * the same inode, one of them will release its inode and > @@ -2173,6 +2177,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, > #define I_DIO_WAKEUP (1 << __I_DIO_WAKEUP) > #define I_LINKABLE (1 << 10) > #define I_DIRTY_TIME (1 << 11) > +#define I_DIRTY_TIME_EXPIRED (1 << 12) > #define I_WB_SWITCH (1 << 13) > #define I_OVL_INUSE (1 << 14) > #define I_CREATING (1 << 15) > -- > 2.30.0 >
On Thu, Jan 07, 2021 at 03:02:28PM +0100, Jan Kara wrote: > On Mon 04-01-21 16:54:50, Eric Biggers wrote: > > From: Eric Biggers <ebiggers@google.com> > > > > Add a lazytime_expired method to 'struct super_operations'. Filesystems > > can implement this to be notified when an inode's lazytime timestamps > > have expired and need to be written to disk. > > > > This avoids any potential ambiguity with > > ->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic > > dirtying of the inode, not just a lazytime timestamp expiration. > > In particular, this will be useful for XFS. > > > > If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to > > be called. > > > > Note that there are three cases where we have to make sure to call > > lazytime_expired(): > > > > - __writeback_single_inode(): inode is being written now > > - vfs_fsync_range(): inode is going to be synced > > - iput(): inode is going to be evicted > > > > In the latter two cases, the inode still needs to be put on the > > writeback list. So, we can't just replace the calls to > > mark_inode_dirty_sync() with lazytime_expired(). Instead, add a new > > flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty(). > > It's like I_DIRTY_SYNC, except it causes the filesystem to be notified > > of a lazytime expiration rather than a generic I_DIRTY_SYNC. > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > Hum, seeing this patch I kind of wonder: Why don't we dirty the inode after > expiring the lazytime timestamps with I_DIRTY_SYNC | I_DIRTY_TIME_EXPIRED > and propagate I_DIRTY_TIME_EXPIRED even to ->dirty_inode() where XFS can > catch it and act? Functionally it would be the same but we'd save a bunch > of generic code and ->lazytime_expired helper used just by a single > filesystem... > Yes, that would be equivalent to what this patch does. Either way, note that if we also use your suggestion for patch #1, then that already fixes the XFS bug, since i_state will start containing I_DIRTY_TIME when ->dirty_inode(I_DIRTY_SYNC) is called. So xfs_fs_dirty_inode() will start working as intended. That makes introducing ->lazytime_expired (or equivalently I_DIRTY_TIME_EXPIRED) kind of useless since it wouldn't actually fix anything. So I'm tempted to just drop it. The XFS developers might have a different opinion though, as they were the ones who requested it originally: https://lore.kernel.org/r/20200312143445.GA19160@infradead.org https://lore.kernel.org/r/20200325092057.GA25483@infradead.org https://lore.kernel.org/r/20200325154759.GY29339@magnolia https://lore.kernel.org/r/20200312223913.GL10776@dread.disaster.area Any thoughts from anyone about whether we should still introduce a separate notification for lazytime expiration, vs. just using ->dirty_inode(I_DIRTY_SYNC) with I_DIRTY_TIME in i_state? - Eric
On Thu, Jan 07, 2021 at 02:05:57PM -0800, Eric Biggers wrote: > The XFS developers might have a different opinion though, as they were the ones > who requested it originally: > > https://lore.kernel.org/r/20200312143445.GA19160@infradead.org > https://lore.kernel.org/r/20200325092057.GA25483@infradead.org > https://lore.kernel.org/r/20200325154759.GY29339@magnolia > https://lore.kernel.org/r/20200312223913.GL10776@dread.disaster.area > > Any thoughts from anyone about whether we should still introduce a separate > notification for lazytime expiration, vs. just using ->dirty_inode(I_DIRTY_SYNC) > with I_DIRTY_TIME in i_state? I still find the way ->dirty_inode is used very confusing, but with this series and Jan's first patch I think we have a good enough state for now and don't need to add a method just for XFS. I still think it might make sense to eventually revisit how file systems are notified about dirtying.
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index c0f2c7586531b..53088e2a93b69 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -150,6 +150,7 @@ prototypes:: void (*free_inode)(struct inode *); void (*destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); + void (*lazytime_expired) (struct inode *); int (*write_inode) (struct inode *, struct writeback_control *wbc); int (*drop_inode) (struct inode *); void (*evict_inode) (struct inode *); @@ -175,6 +176,7 @@ alloc_inode: free_inode: called from RCU callback destroy_inode: dirty_inode: +lazytime_expired: write_inode: drop_inode: !!!inode->i_lock!!! evict_inode: diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 287b80948a40b..02531b1421d01 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -231,6 +231,7 @@ filesystem. As of kernel 2.6.22, the following members are defined: void (*destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); + void (*lazytime_expired) (struct inode *); int (*write_inode) (struct inode *, int); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); @@ -275,6 +276,15 @@ or bottom half). not its data. If the update needs to be persisted by fdatasync(), then I_DIRTY_DATASYNC will be set in the flags argument. +``lazytime_expired`` + when the lazytime mount option is enabled, this method is + called when an inode's in-memory updated timestamps have + expired and thus need to be written to disk. This happens + when the timestamps have been in memory for too long, when the + inode is going to be evicted, or when userspace triggers a + sync. If this method is not implemented, then + ->dirty_inode(inode, I_DIRTY_SYNC) is called instead. + ``write_inode`` this method is called when the VFS needs to write an inode to disc. The second parameter indicates whether the write should diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index ed76112bd067b..f187fc3f854e4 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1441,6 +1441,14 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, } } +static void lazytime_expired(struct inode *inode) +{ + if (inode->i_sb->s_op->lazytime_expired) + inode->i_sb->s_op->lazytime_expired(inode); + else if (inode->i_sb->s_op->dirty_inode) + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC); +} + /* * Write out an inode and its dirty pages. Do not update the writeback list * linkage. That is left to the caller. The caller is also responsible for @@ -1520,8 +1528,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) * isn't enough. Don't call mark_inode_dirty_sync(), as that * would put the inode back on the dirty list. */ - if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode) - inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC); + if (dirty & I_DIRTY_TIME) + lazytime_expired(inode); err = write_inode(inode, wbc); if (ret == 0) @@ -2231,8 +2239,9 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode) * * @inode: inode to mark * @flags: what kind of dirty, e.g. I_DIRTY_SYNC. This can be a combination of - * multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined - * with I_DIRTY_PAGES. + * multiple I_DIRTY_* flags, except that: + * - I_DIRTY_TIME can't be combined with I_DIRTY_PAGES + * - I_DIRTY_TIME_EXPIRED must be used by itself * * Mark an inode as dirty. We notify the filesystem, then update the inode's * dirty flags. Then, if needed we add the inode to the appropriate dirty list. @@ -2260,7 +2269,15 @@ void __mark_inode_dirty(struct inode *inode, int flags) trace_writeback_mark_inode_dirty(inode, flags); - if (flags & I_DIRTY_INODE) { + if (flags & I_DIRTY_TIME_EXPIRED) { + /* + * Notify the filesystem about a lazytime timestamp expiration. + * Afterwards, this case just turns into I_DIRTY_SYNC. + */ + WARN_ON_ONCE(flags & ~I_DIRTY_TIME_EXPIRED); + lazytime_expired(inode); + flags = I_DIRTY_SYNC; + } else if (flags & I_DIRTY_INODE) { /* * Notify the filesystem about the inode being dirtied, so that * (if needed) it can update on-disk fields and journal the diff --git a/fs/inode.c b/fs/inode.c index d0fa43d8e9d5c..039b201a4743a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1673,7 +1673,7 @@ void iput(struct inode *inode) atomic_inc(&inode->i_count); spin_unlock(&inode->i_lock); trace_writeback_lazytime_iput(inode); - mark_inode_dirty_sync(inode); + __mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED); goto retry; } iput_final(inode); diff --git a/fs/sync.c b/fs/sync.c index 1373a610dc784..363071a3528e3 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -196,7 +196,7 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync) if (!file->f_op->fsync) return -EINVAL; if (!datasync && (inode->i_state & I_DIRTY_TIME)) - mark_inode_dirty_sync(inode); + __mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED); return file->f_op->fsync(file, start, end, datasync); } EXPORT_SYMBOL(vfs_fsync_range); diff --git a/include/linux/fs.h b/include/linux/fs.h index 45a0303b2aeb6..8c5f5c5e62be4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1935,7 +1935,8 @@ struct super_operations { void (*destroy_inode)(struct inode *); void (*free_inode)(struct inode *); - void (*dirty_inode) (struct inode *, int flags); + void (*dirty_inode) (struct inode *, int flags); + void (*lazytime_expired)(struct inode *); int (*write_inode) (struct inode *, struct writeback_control *wbc); int (*drop_inode) (struct inode *); void (*evict_inode) (struct inode *); @@ -2108,6 +2109,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in * i_state, but not both. I_DIRTY_PAGES may still be set. + * I_DIRTY_TIME_EXPIRED Passed to __mark_inode_dirty() to indicate the intent to + * expire the inode's timestamps. Not stored in i_state. + * * I_NEW Serves as both a mutex and completion notification. * New inodes set I_NEW. If two processes both create * the same inode, one of them will release its inode and @@ -2173,6 +2177,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, #define I_DIO_WAKEUP (1 << __I_DIO_WAKEUP) #define I_LINKABLE (1 << 10) #define I_DIRTY_TIME (1 << 11) +#define I_DIRTY_TIME_EXPIRED (1 << 12) #define I_WB_SWITCH (1 << 13) #define I_OVL_INUSE (1 << 14) #define I_CREATING (1 << 15)