diff mbox series

vfs/ext4: Fixed a potential problem related to an infinite loop

Message ID 20240926221103.24423-1-linmaxi@gmail.com (mailing list archive)
State New
Headers show
Series vfs/ext4: Fixed a potential problem related to an infinite loop | expand

Commit Message

Max Brener Sept. 26, 2024, 10:11 p.m. UTC
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306

This patch fixes a potential infinite journal-truncate-print problem.
When systemd's journald is called, ftruncate syscall is called. If anywhere down the call stack of ftruncate a printk of some sort happens, it triggers journald again
therefore an infinite loop is established. Some example of such situation (aside from the example given in the added link): ext4_es_print_tree(struct inode*) is one of
the functions being called as a part of the call stack of ftruncate (System.journal is an extent file). In this function, some debug prints occur.
There can be more printk statements down the call stack (and if not, then there might be in the future). 
This patch does two things: prevents an infinite loop, and optimizes truncate operation by avoiding doing it if it is a no-op call.

To fix this issue:
Add  a new inode flag S_TRUNCATED which helps in stopping such an infinite loop by marking an in-memory inode as already truncated.
When ext4_setattr() is triggered, it won't call ext4_truncate() if the size of the file is unchanged and the file was truncated at least once
(which might be necessary for removing preallocated blocks). 
Calling truncate for the second time with no size change, is a no-op call, therefore the call is avoided.
Next, turn on the flag of the inode if it is truncated in ext4_truncate().
In ext4_setattr, avoid calling ext4_truncate if the inode is already truncated and the size of the file is not meant to change in the current call.
Finally, zero S_TRUNCATED flag of the inode at fallocate() when it is being called with FALLOC_FL_KEEP_SIZE flag.

Signed-off-by: Max Brener <linmaxi@gmail.com>
---
 fs/ext4/inode.c    | 5 ++++-
 fs/open.c          | 3 +++
 include/linux/fs.h | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

Comments

Theodore Ts'o Sept. 27, 2024, 3:50 p.m. UTC | #1
On Fri, Sep 27, 2024 at 01:11:03AM +0300, Max Brener wrote:
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306
> 
> This patch fixes a potential infinite journal-truncate-print
> problem.  When systemd's journald is called, ftruncate syscall is
> called. If anywhere down the call stack of ftruncate a printk of
> some sort happens, it triggers journald again therefore an infinite
> loop is established.

This isn't a good justification for this change; in general, whenever
you have code paths which get triggered when a logging daemon is
triggered, whether it's systemd-journald, or syslog, and this can
cause this kind of infinite loop.  For example, suppose you are using
remote logging (where a log message gets sent over the network via the
remote syslog facility), and anything in the networking stack triggers
a printk, that will also trigger an "infinite loop".  This falls in
the "Doctor, doctor, it hurts when I do that --- so don't do that!"

In this particular situation, journald is doing something silly/stupid
which is whenver a message is logged, it is issuing a no-op ftruncate
to the journald log file.  It's also worth noting that ext4's truncate
path does *not* trigger a printk unless something really haw gone
wrong (e.g., a WARN_ON when a kernel bug has happened and flags in the
in-memory get erronously set, or the file system gets corrupted and
this gets reported via ext4_error()).  The reporter discovered this by
explicitly adding a printk in their privatea kernel sources, and in
general, when you add random changes to the kernel, any unfortunate
consequences are not something that upstream code can be expected to
defend against.

For context, see: https://bugzilla.kernel.org/show_bug.cgi?id=219306

We can justify an optimization here so that in the case of
silly/stupid userspace programs which are constnatly calling
truncate(2) which are no-ops, we can optimize ext4's handling of these
silly/stupid programs.  The ext4_truncate() code path causes starting
a journal handle, adding the inode to the orphan list, and then
removing it at the end of the truncate.  In the case where sopme
program calls truncate() in a tight loop, we can optimize the
behaviour.  It's not a high priority optimization, but if given that
we can't necessarily change silly/stupid userspace programmers, it can
be something that we can do if the patch is too invasive.

HOWEVER....


> To fix this issue:
> Add  a new inode flag S_TRUNCATED which helps in stopping such an infinite loop by marking an in-memory inode as already truncated.

Adding a generic VFS-level flag is not something that we can justify
here.  The VFS maintainers would NACK such a change, and deservedly
so.

What I had in mind was to define a new EXT4 state flag, say,
EXT4_STATE_TRUNCATED, and then test, set, and clear it using
ext4_{test,set,clear}_inode_state().

Cheers,

					- Ted
Jan Kara Sept. 30, 2024, 9:56 a.m. UTC | #2
On Fri 27-09-24 08:50:19, Theodore Ts'o wrote:
> On Fri, Sep 27, 2024 at 01:11:03AM +0300, Max Brener wrote:
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306
> > 
> > This patch fixes a potential infinite journal-truncate-print
> > problem.  When systemd's journald is called, ftruncate syscall is
> > called. If anywhere down the call stack of ftruncate a printk of
> > some sort happens, it triggers journald again therefore an infinite
> > loop is established.
> 
> This isn't a good justification for this change; in general, whenever
> you have code paths which get triggered when a logging daemon is
> triggered, whether it's systemd-journald, or syslog, and this can
> cause this kind of infinite loop.  For example, suppose you are using
> remote logging (where a log message gets sent over the network via the
> remote syslog facility), and anything in the networking stack triggers
> a printk, that will also trigger an "infinite loop".  This falls in
> the "Doctor, doctor, it hurts when I do that --- so don't do that!"
> 
> In this particular situation, journald is doing something silly/stupid
> which is whenver a message is logged, it is issuing a no-op ftruncate
> to the journald log file.  It's also worth noting that ext4's truncate
> path does *not* trigger a printk unless something really haw gone
> wrong (e.g., a WARN_ON when a kernel bug has happened and flags in the
> in-memory get erronously set, or the file system gets corrupted and
> this gets reported via ext4_error()).  The reporter discovered this by
> explicitly adding a printk in their privatea kernel sources, and in
> general, when you add random changes to the kernel, any unfortunate
> consequences are not something that upstream code can be expected to
> defend against.
> 
> For context, see: https://bugzilla.kernel.org/show_bug.cgi?id=219306

Right, loops like these are not something we should be fixing in the
kernel.

> We can justify an optimization here so that in the case of
> silly/stupid userspace programs which are constnatly calling
> truncate(2) which are no-ops, we can optimize ext4's handling of these
> silly/stupid programs.  The ext4_truncate() code path causes starting
> a journal handle, adding the inode to the orphan list, and then
> removing it at the end of the truncate.  In the case where sopme
> program calls truncate() in a tight loop, we can optimize the
> behaviour.  It's not a high priority optimization, but if given that
> we can't necessarily change silly/stupid userspace programmers, it can
> be something that we can do if the patch is too invasive.
> 
> HOWEVER....
> 
> 
> > To fix this issue:
> > Add  a new inode flag S_TRUNCATED which helps in stopping such an infinite loop by marking an in-memory inode as already truncated.
> 
> Adding a generic VFS-level flag is not something that we can justify
> here.  The VFS maintainers would NACK such a change, and deservedly
> so.
> 
> What I had in mind was to define a new EXT4 state flag, say,
> EXT4_STATE_TRUNCATED, and then test, set, and clear it using
> ext4_{test,set,clear}_inode_state().

Agreed as well. I'll also note that keeping such flag uptodate is not as
simple as it seems because there are various places that may be allocating
blocks beyond EOF (for example extending writes) and that rely on
ext4_truncate() removing them so one needs to be careful to capture all the
places where the "truncated" state needs to be cleared.

								Honza
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..c2a9e9be23e2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4193,6 +4193,8 @@  int ext4_truncate(struct inode *inode)
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
 
+	inode->i_flags |= S_TRUNCATED;
+
 out_stop:
 	/*
 	 * If this was a simple ftruncate() and the file will remain alive,
@@ -5492,7 +5494,8 @@  int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		 * Call ext4_truncate() even if i_size didn't change to
 		 * truncate possible preallocated blocks.
 		 */
-		if (attr->ia_size <= oldsize) {
+		if (attr->ia_size < oldsize ||
+			(attr->ia_size == oldsize && !IS_TRUNCATED(inode))) {
 			rc = ext4_truncate(inode);
 			if (rc)
 				error = rc;
diff --git a/fs/open.c b/fs/open.c
index daf1b55ca818..3f34dda79479 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -329,6 +329,9 @@  int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (!file->f_op->fallocate)
 		return -EOPNOTSUPP;
 
+	if (mode & FALLOC_FL_MODE_MASK & FALLOC_FL_KEEP_SIZE)
+		inode->i_flags &= (~S_TRUNCATED);
+
 	file_start_write(file);
 	ret = file->f_op->fallocate(file, mode, offset, len);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6b8df574729c..ae613481dc29 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2272,6 +2272,7 @@  struct super_operations {
 #define S_CASEFOLD	(1 << 15) /* Casefolded file */
 #define S_VERITY	(1 << 16) /* Verity file (using fs/verity/) */
 #define S_KERNEL_FILE	(1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
+#define S_TRUNCATED (1 << 18) /* Is the file truncated */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -2328,6 +2329,7 @@  static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
+#define IS_TRUNCATED(inode)	((inode)->i_flags & S_TRUNCATED)
 
 static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
 				   struct inode *inode)