mbox series

[0/6] Maintain the relative size of fs.file-max and fs.nr_open

Message ID 20241123180901.181825-1-alexjlzheng@tencent.com
Headers show
Series Maintain the relative size of fs.file-max and fs.nr_open | expand

Message

Jinliang Zheng Nov. 23, 2024, 6:08 p.m. UTC
According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
fs.file-max represent the number of file-handles that can be opened
by each process and the entire system, respectively.

Therefore, it's necessary to maintain a relative size between them,
meaning we should ensure that files_stat.max_files is not less than
sysctl_nr_open.

However, this point is overlooked in the current kernel code, and
this patchset aims to rectify this. Additionally, patch 0001 fixes
the type issue with the sysctl_nr_open handler.

Jinliang Zheng (6):
  fs: fix proc_handler for sysctl_nr_open
  fs: make files_stat globally visible
  sysctl: refactor __do_proc_doulongvec_minmax()
  sysctl: ensure files_stat.max_files is not less than sysctl_nr_open
  sysctl: ensure sysctl_nr_open is not greater than files_stat.max_files
  fs: synchronize the access of fs.file-max and fs.nr_open

 fs/file_table.c        | 15 ++++++++---
 include/linux/fs.h     |  2 ++
 include/linux/sysctl.h |  4 +++
 kernel/sysctl.c        | 59 ++++++++++++++++++++++++++++++++++++++----
 4 files changed, 71 insertions(+), 9 deletions(-)

Comments

Al Viro Nov. 23, 2024, 6:27 p.m. UTC | #1
On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> fs.file-max represent the number of file-handles that can be opened
> by each process and the entire system, respectively.
> 
> Therefore, it's necessary to maintain a relative size between them,
> meaning we should ensure that files_stat.max_files is not less than
> sysctl_nr_open.

NAK.

You are confusing descriptors (nr_open) and open IO channels (max_files).

We very well _CAN_ have more of the former.  For further details,
RTFM dup(2) or any introductory Unix textbook.
Al Viro Nov. 23, 2024, 7:32 p.m. UTC | #2
On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote:
> On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> > fs.file-max represent the number of file-handles that can be opened
> > by each process and the entire system, respectively.
> > 
> > Therefore, it's necessary to maintain a relative size between them,
> > meaning we should ensure that files_stat.max_files is not less than
> > sysctl_nr_open.
> 
> NAK.
> 
> You are confusing descriptors (nr_open) and open IO channels (max_files).
> 
> We very well _CAN_ have more of the former.  For further details,
> RTFM dup(2) or any introductory Unix textbook.

Short version: there are 3 different notions -
	1) file as a collection of data kept by filesystem. Such things as
contents, ownership, permissions, timestamps belong there.
	2) IO channel used to access one of (1).  open(2) creates such;
things like current position in file, whether it's read-only or read-write
open, etc. belong there.  It does not belong to a process - after fork(),
child has access to all open channels parent had when it had spawned
a child.  If you open a file in parent, read 10 bytes from it, then spawn
a child that reads 10 more bytes and exits, then have parent read another
5 bytes, the first read by parent will have read bytes 0 to 9, read by
child - bytes 10 to 19 and the second read by parent - bytes 20 to 24.
Position is a property of IO channel; it belongs neither to underlying
file (otherwise another process opening the file and reading from it
would play havoc on your process) nor to process (otherwise reads done
by child would not have affected the parent and the second read from
parent would have gotten bytes 10 to 14).  Same goes for access mode -
it belongs to IO channel.
	3) file descriptor - a number that has a meaning only in context
of a process and refers to IO channel.	That's what system calls use
to identify the IO channel to operate upon; open() picks a descriptor
unused by the calling process, associates the new channel with it and
returns that descriptor (a number) to caller.  Multiple descriptors can
refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and
associates it with the same IO channel fd currently refers to.

	IO channels are not directly exposed to userland, but they are
very much present in Unix-style IO API.  Note that results of e.g.
	int fd1 = open("/etc/issue", 0);
	int fd2 = open("/etc/issue", 0);
and
	int fd1 = open("/etc/issue", 0);
	int fd2 = dup(fd1);
are not identical, even though in both cases fd1 and fd2 are opened
descriptors and reading from them will access the contents of the
/etc/issue; in the former case the positions being accessed by read from
fd1 and fd2 will be independent, in the latter they will be shared.

	It's really quite basic - Unix Programming 101 stuff.  It's not
just that POSIX requires that and that any Unix behaves that way,
anything even remotely Unix-like will be like that.

	You won't find the words 'IO channel' in POSIX, but I refuse
to use the term they have chosen instead - 'file description'.	Yes,
alongside with 'file descriptor', in the contexts where the distinction
between these notions is quite important.  I would rather not say what
I really think of those unsung geniuses, lest CoC gets overexcited...

	Anyway, in casual conversations the expression 'opened file'
usually refers to that thing.  Which is somewhat clumsy (sounds like
'file on filesystem that happens to be opened'), but usually it's
good enough.  If you need to be pedantic (e.g. when explaining that
material in aforementioned Unix Programming 101 class), 'IO channel'
works well enough, IME.
Theodore Ts'o Nov. 24, 2024, 2:30 a.m. UTC | #3
On Sat, Nov 23, 2024 at 07:32:27PM +0000, Al Viro wrote:
> 
> 	You won't find the words 'IO channel' in POSIX, but I refuse
> to use the term they have chosen instead - 'file description'.	Yes,
> alongside with 'file descriptor', in the contexts where the distinction
> between these notions is quite important.

What I tend to do is use the term "struct file" instead.  The "file
descriptor" literally is an integer index into an array of "struct
file" pointers.

"struct file" is how things are actually implemented in Linux and most
Unix systems.  And while it's admittedly ugly to use an implementation
detail as an abstract term, it's infinitely less ugly than Posix's
"file description".  :-)

						- Ted
Jinliang Zheng Nov. 24, 2024, 9:48 a.m. UTC | #4
On Sat, 23 Nov 2024 19:32:27 +0000, Al Viro wrote:
> On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote:
> > On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> > > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> > > fs.file-max represent the number of file-handles that can be opened
> > > by each process and the entire system, respectively.
> > > 
> > > Therefore, it's necessary to maintain a relative size between them,
> > > meaning we should ensure that files_stat.max_files is not less than
> > > sysctl_nr_open.
> > 
> > NAK.
> > 
> > You are confusing descriptors (nr_open) and open IO channels (max_files).
> > 
> > We very well _CAN_ have more of the former.  For further details,
> > RTFM dup(2) or any introductory Unix textbook.
> 
> Short version: there are 3 different notions -
> 	1) file as a collection of data kept by filesystem. Such things as
> contents, ownership, permissions, timestamps belong there.
> 	2) IO channel used to access one of (1).  open(2) creates such;
> things like current position in file, whether it's read-only or read-write
> open, etc. belong there.  It does not belong to a process - after fork(),
> child has access to all open channels parent had when it had spawned
> a child.  If you open a file in parent, read 10 bytes from it, then spawn
> a child that reads 10 more bytes and exits, then have parent read another
> 5 bytes, the first read by parent will have read bytes 0 to 9, read by
> child - bytes 10 to 19 and the second read by parent - bytes 20 to 24.
> Position is a property of IO channel; it belongs neither to underlying
> file (otherwise another process opening the file and reading from it
> would play havoc on your process) nor to process (otherwise reads done
> by child would not have affected the parent and the second read from
> parent would have gotten bytes 10 to 14).  Same goes for access mode -
> it belongs to IO channel.

I'm sorry that I don't know much about the implementation of UNIX, but
specific to the implementation of Linux, struct file is more like a
combination of what you said 1) and 2).

But I see your point, I missed the dup() case. dup() will occupy the
element position of the fdtable->fd array, but will not create a new
struct file.

Thank you.
Jinliang Zheng

> 	3) file descriptor - a number that has a meaning only in context
> of a process and refers to IO channel.	That's what system calls use
> to identify the IO channel to operate upon; open() picks a descriptor
> unused by the calling process, associates the new channel with it and
> returns that descriptor (a number) to caller.  Multiple descriptors can
> refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and
> associates it with the same IO channel fd currently refers to.
> 
> 	IO channels are not directly exposed to userland, but they are
> very much present in Unix-style IO API.  Note that results of e.g.
> 	int fd1 = open("/etc/issue", 0);
> 	int fd2 = open("/etc/issue", 0);
> and
> 	int fd1 = open("/etc/issue", 0);
> 	int fd2 = dup(fd1);
> are not identical, even though in both cases fd1 and fd2 are opened
> descriptors and reading from them will access the contents of the
> /etc/issue; in the former case the positions being accessed by read from
> fd1 and fd2 will be independent, in the latter they will be shared.
> 
> 	It's really quite basic - Unix Programming 101 stuff.  It's not
> just that POSIX requires that and that any Unix behaves that way,
> anything even remotely Unix-like will be like that.
> 
> 	You won't find the words 'IO channel' in POSIX, but I refuse
> to use the term they have chosen instead - 'file description'.	Yes,
> alongside with 'file descriptor', in the contexts where the distinction
> between these notions is quite important.  I would rather not say what
> I really think of those unsung geniuses, lest CoC gets overexcited...
> 
> 	Anyway, in casual conversations the expression 'opened file'
> usually refers to that thing.  Which is somewhat clumsy (sounds like
> 'file on filesystem that happens to be opened'), but usually it's
> good enough.  If you need to be pedantic (e.g. when explaining that
> material in aforementioned Unix Programming 101 class), 'IO channel'
> works well enough, IME.
Theodore Ts'o Nov. 24, 2024, 3:59 p.m. UTC | #5
On Sun, Nov 24, 2024 at 05:48:13PM +0800, Jinliang Zheng wrote:
> > 
> > Short version: there are 3 different notions -
> > 	1) file as a collection of data kept by filesystem. Such things as
> > contents, ownership, permissions, timestamps belong there.
> > 	2) IO channel used to access one of (1).  open(2) creates such;
> > things like current position in file, whether it's read-only or read-write
> > open, etc. belong there.  It does not belong to a process - after fork(),
> > ...
>
> I'm sorry that I don't know much about the implementation of UNIX, but
> specific to the implementation of Linux, struct file is more like a
> combination of what you said 1) and 2).

This is incorrect.  In Linux (and historical implementations of Unix)
struct file is precisely (2).  The struct file has a pointer to a
struct dentry, which in turn has a pointer to a struct inode.  So a
struct file *refers* to (1), but it is *not* (1).

       	    	     	     	       	     - Ted