mbox series

[0/6] Maintain the relative size of fs.file-max and fs.nr_open

Message ID 20241123180901.181825-1-alexjlzheng@tencent.com (mailing list archive)
Headers show
Series Maintain the relative size of fs.file-max and fs.nr_open | expand

Message

Jinliang Zheng Nov. 23, 2024, 6:08 p.m. UTC
According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
fs.file-max represent the number of file-handles that can be opened
by each process and the entire system, respectively.

Therefore, it's necessary to maintain a relative size between them,
meaning we should ensure that files_stat.max_files is not less than
sysctl_nr_open.

However, this point is overlooked in the current kernel code, and
this patchset aims to rectify this. Additionally, patch 0001 fixes
the type issue with the sysctl_nr_open handler.

Jinliang Zheng (6):
  fs: fix proc_handler for sysctl_nr_open
  fs: make files_stat globally visible
  sysctl: refactor __do_proc_doulongvec_minmax()
  sysctl: ensure files_stat.max_files is not less than sysctl_nr_open
  sysctl: ensure sysctl_nr_open is not greater than files_stat.max_files
  fs: synchronize the access of fs.file-max and fs.nr_open

 fs/file_table.c        | 15 ++++++++---
 include/linux/fs.h     |  2 ++
 include/linux/sysctl.h |  4 +++
 kernel/sysctl.c        | 59 ++++++++++++++++++++++++++++++++++++++----
 4 files changed, 71 insertions(+), 9 deletions(-)

Comments

Al Viro Nov. 23, 2024, 6:27 p.m. UTC | #1
On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> fs.file-max represent the number of file-handles that can be opened
> by each process and the entire system, respectively.
> 
> Therefore, it's necessary to maintain a relative size between them,
> meaning we should ensure that files_stat.max_files is not less than
> sysctl_nr_open.

NAK.

You are confusing descriptors (nr_open) and open IO channels (max_files).

We very well _CAN_ have more of the former.  For further details,
RTFM dup(2) or any introductory Unix textbook.
Al Viro Nov. 23, 2024, 7:32 p.m. UTC | #2
On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote:
> On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> > fs.file-max represent the number of file-handles that can be opened
> > by each process and the entire system, respectively.
> > 
> > Therefore, it's necessary to maintain a relative size between them,
> > meaning we should ensure that files_stat.max_files is not less than
> > sysctl_nr_open.
> 
> NAK.
> 
> You are confusing descriptors (nr_open) and open IO channels (max_files).
> 
> We very well _CAN_ have more of the former.  For further details,
> RTFM dup(2) or any introductory Unix textbook.

Short version: there are 3 different notions -
	1) file as a collection of data kept by filesystem. Such things as
contents, ownership, permissions, timestamps belong there.
	2) IO channel used to access one of (1).  open(2) creates such;
things like current position in file, whether it's read-only or read-write
open, etc. belong there.  It does not belong to a process - after fork(),
child has access to all open channels parent had when it had spawned
a child.  If you open a file in parent, read 10 bytes from it, then spawn
a child that reads 10 more bytes and exits, then have parent read another
5 bytes, the first read by parent will have read bytes 0 to 9, read by
child - bytes 10 to 19 and the second read by parent - bytes 20 to 24.
Position is a property of IO channel; it belongs neither to underlying
file (otherwise another process opening the file and reading from it
would play havoc on your process) nor to process (otherwise reads done
by child would not have affected the parent and the second read from
parent would have gotten bytes 10 to 14).  Same goes for access mode -
it belongs to IO channel.
	3) file descriptor - a number that has a meaning only in context
of a process and refers to IO channel.	That's what system calls use
to identify the IO channel to operate upon; open() picks a descriptor
unused by the calling process, associates the new channel with it and
returns that descriptor (a number) to caller.  Multiple descriptors can
refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and
associates it with the same IO channel fd currently refers to.

	IO channels are not directly exposed to userland, but they are
very much present in Unix-style IO API.  Note that results of e.g.
	int fd1 = open("/etc/issue", 0);
	int fd2 = open("/etc/issue", 0);
and
	int fd1 = open("/etc/issue", 0);
	int fd2 = dup(fd1);
are not identical, even though in both cases fd1 and fd2 are opened
descriptors and reading from them will access the contents of the
/etc/issue; in the former case the positions being accessed by read from
fd1 and fd2 will be independent, in the latter they will be shared.

	It's really quite basic - Unix Programming 101 stuff.  It's not
just that POSIX requires that and that any Unix behaves that way,
anything even remotely Unix-like will be like that.

	You won't find the words 'IO channel' in POSIX, but I refuse
to use the term they have chosen instead - 'file description'.	Yes,
alongside with 'file descriptor', in the contexts where the distinction
between these notions is quite important.  I would rather not say what
I really think of those unsung geniuses, lest CoC gets overexcited...

	Anyway, in casual conversations the expression 'opened file'
usually refers to that thing.  Which is somewhat clumsy (sounds like
'file on filesystem that happens to be opened'), but usually it's
good enough.  If you need to be pedantic (e.g. when explaining that
material in aforementioned Unix Programming 101 class), 'IO channel'
works well enough, IME.