diff mbox series

[2/3] fs: Introduce cmdline argument exceed_file_max_panic

Message ID 1591425140-20613-2-git-send-email-yangtiezhu@loongson.cn (mailing list archive)
State New, archived
Headers show
Series [1/3] fs: Use get_max_files() instead of files_stat.max_files in alloc_empty_file() | expand

Commit Message

Tiezhu Yang June 6, 2020, 6:32 a.m. UTC
It is important to ensure that files that are opened always get closed.
Failing to close files can result in file descriptor leaks. One common
answer to this problem is to just raise the limit of open file handles
and then restart the server every day or every few hours, this is not
a good idea for long-lived servers if there is no leaks.

If there exists file descriptor leaks, when file-max limit reached, we
can see that the system can not work well and at worst the user can do
nothing, it is even impossible to execute reboot command due to too many
open files in system. In order to reboot automatically to recover to the
normal status, introduce a new cmdline argument exceed_file_max_panic for
user to control whether to call panic in this case.

We can reproduce this problem used with the following simple test:

[yangtiezhu@linux ~]$ cat exceed_file_max_test.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
	int fd;

	while (1) {
		fd = open("/usr/include/stdio.h", 0444);
		if (fd == -1)
			fprintf(stderr, "%s\n", "open failed");
	}

	return 0;
}
[yangtiezhu@linux ~]$ cat exceed_file_max_test.sh
#!/bin/bash

gcc exceed_file_max_test.c -o exceed_file_max_test.bin -Wall

while true
do
	./exceed_file_max_test.bin >/dev/null 2>&1 &
done
[yangtiezhu@linux ~]$ sh exceed_file_max_test.sh &
[yangtiezhu@linux ~]$ reboot
bash: start pipeline: pgrp pipe: Too many open files in system
bash: /usr/sbin/reboot: Too many open files in system

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 fs/file_table.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Matthew Wilcox June 6, 2020, 2:13 p.m. UTC | #1
On Sat, Jun 06, 2020 at 02:32:19PM +0800, Tiezhu Yang wrote:
> It is important to ensure that files that are opened always get closed.
> Failing to close files can result in file descriptor leaks. One common
> answer to this problem is to just raise the limit of open file handles
> and then restart the server every day or every few hours, this is not
> a good idea for long-lived servers if there is no leaks.
> 
> If there exists file descriptor leaks, when file-max limit reached, we
> can see that the system can not work well and at worst the user can do
> nothing, it is even impossible to execute reboot command due to too many
> open files in system. In order to reboot automatically to recover to the
> normal status, introduce a new cmdline argument exceed_file_max_panic for
> user to control whether to call panic in this case.

ulimit -n is your friend.
Al Viro June 6, 2020, 2:28 p.m. UTC | #2
On Sat, Jun 06, 2020 at 02:32:19PM +0800, Tiezhu Yang wrote:
> It is important to ensure that files that are opened always get closed.
> Failing to close files can result in file descriptor leaks. One common
> answer to this problem is to just raise the limit of open file handles
> and then restart the server every day or every few hours, this is not
> a good idea for long-lived servers if there is no leaks.
> 
> If there exists file descriptor leaks, when file-max limit reached, we
> can see that the system can not work well and at worst the user can do
> nothing, it is even impossible to execute reboot command due to too many
> open files in system. In order to reboot automatically to recover to the
> normal status, introduce a new cmdline argument exceed_file_max_panic for
> user to control whether to call panic in this case.

What the hell?  You are modifying the path for !CAP_SYS_ADMIN.  IOW,
you've just handed an ability to panic the box to any non-priveleged
process.

NAK.  That makes no sense whatsoever.  Note that root is *NOT* affected
by any of that, so you can bloody well have a userland process running
as root and checking the number of files once in a while.  And doing
whatever it wants to do, up to and including reboot/writing to
/proc/sys/sysrq-trigger, etc.  Or just looking at the leaky processes
and killing them, with a nastygram along the lines of "$program appears
to be leaking descriptors; LART the authors of that FPOS if they can
be located" sent into log/over mail/etc.
diff mbox series

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index 26516d0..6943945 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -121,6 +121,17 @@  static struct file *__alloc_file(int flags, const struct cred *cred)
 	return f;
 }
 
+static bool exceed_file_max_panic;
+static int __init exceed_file_max_panic_setup(char *str)
+{
+	pr_info("Call panic when exceed file-max limit\n");
+	exceed_file_max_panic = true;
+
+	return 1;
+}
+
+__setup("exceed_file_max_panic", exceed_file_max_panic_setup);
+
 /* Find an unused file structure and return a pointer to it.
  * Returns an error pointer if some error happend e.g. we over file
  * structures limit, run out of memory or operation is not permitted.
@@ -159,6 +170,9 @@  struct file *alloc_empty_file(int flags, const struct cred *cred)
 	if (get_nr_files() > old_max) {
 		pr_info("VFS: file-max limit %lu reached\n", get_max_files());
 		old_max = get_nr_files();
+
+		if (exceed_file_max_panic)
+			panic("VFS: Too many open files in system\n");
 	}
 	return ERR_PTR(-ENFILE);
 }