Message ID | 20240717124534.1200735-1-cleger@rivosinc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] osdep: add a qemu_close_all_open_fd() helper | expand |
Hi Clément, On 17/7/24 14:45, Clément Léger wrote: > Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on > POSIX"), the maximum number of file descriptors that can be opened are > raised to nofile.rlim_max. On recent debian distro, this yield a maximum > of 1073741816 file descriptors. Now, when forking to start > qemu-bridge-helper, this actually calls close() on the full possible file > descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which > takes a considerable amount of time. In order to reduce that time, > factorize existing code to close all open files descriptors in a new > qemu_close_all_open_fd() function. This function uses various methods > to close all the open file descriptors ranging from the most efficient > one to the least one. It also accepts an ordered array of file > descriptors that should not be closed since this is required by the > callers that calls it after forking. > > Signed-off-by: Clément Léger <cleger@rivosinc.com> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > ---- FYI git tools parse 3 '-', not 4. > v4: > - Add a comment saying that qemu_close_all_fds() can take a NULL skip > array and nskip == 0 > - Added an assert in qemu_close_all_fds() to check for skip/nskip > parameters > - Fix spurious tabs instead of spaces > - Applied checkpatch > - v3: https://lore.kernel.org/qemu-devel/20240716144006.6571-1-cleger@rivosinc.com/ > +void qemu_close_all_open_fd(const int *skip, unsigned int nskip) > +{ > + int open_max = sysconf(_SC_OPEN_MAX); > + unsigned int cur_skip = 0; > + int i; > + > + assert(skip != NULL || nskip == 0); > + > + if (qemu_close_all_open_fd_close_range(skip, nskip)) { > + return; > + } > + > + if (qemu_close_all_open_fd_proc(skip, nskip)) { > + return; > + } > + > + /* Fallback */ > + for (i = 0; i < open_max; i++) { > + if (cur_skip < nskip && i == skip[cur_skip]) { > + cur_skip++; > + continue; > + } > + close(i); > + } > +} Build failure on windows: ../util/osdep.c: In function 'qemu_close_all_open_fd': ../util/osdep.c:725:20: error: implicit declaration of function 'sysconf'; did you mean 'swscanf'? [-Wimplicit-function-declaration] 725 | int open_max = sysconf(_SC_OPEN_MAX); | ^~~~~~~ | swscanf ../util/osdep.c:725:20: error: nested extern declaration of 'sysconf' [-Werror=nested-externs] ../util/osdep.c:725:28: error: '_SC_OPEN_MAX' undeclared (first use in this function); did you mean 'FOPEN_MAX'? 725 | int open_max = sysconf(_SC_OPEN_MAX); | ^~~~~~~~~~~~ | FOPEN_MAX ../util/osdep.c:725:28: note: each undeclared identifier is reported only once for each function it appears in
On 23/07/2024 08:24, Philippe Mathieu-Daudé wrote: > Hi Clément, > > On 17/7/24 14:45, Clément Léger wrote: >> Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on >> POSIX"), the maximum number of file descriptors that can be opened are >> raised to nofile.rlim_max. On recent debian distro, this yield a maximum >> of 1073741816 file descriptors. Now, when forking to start >> qemu-bridge-helper, this actually calls close() on the full possible file >> descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which >> takes a considerable amount of time. In order to reduce that time, >> factorize existing code to close all open files descriptors in a new >> qemu_close_all_open_fd() function. This function uses various methods >> to close all the open file descriptors ranging from the most efficient >> one to the least one. It also accepts an ordered array of file >> descriptors that should not be closed since this is required by the >> callers that calls it after forking. >> >> Signed-off-by: Clément Léger <cleger@rivosinc.com> >> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> >> ---- > > FYI git tools parse 3 '-', not 4. Hi Philippe, Thanks for the info, I was not aware of that ! I'll use 3 '-' from now on. > >> v4: >> - Add a comment saying that qemu_close_all_fds() can take a NULL skip >> array and nskip == 0 >> - Added an assert in qemu_close_all_fds() to check for skip/nskip >> parameters >> - Fix spurious tabs instead of spaces >> - Applied checkpatch >> - v3: >> https://lore.kernel.org/qemu-devel/20240716144006.6571-1-cleger@rivosinc.com/ > > >> +void qemu_close_all_open_fd(const int *skip, unsigned int nskip) >> +{ >> + int open_max = sysconf(_SC_OPEN_MAX); >> + unsigned int cur_skip = 0; >> + int i; >> + >> + assert(skip != NULL || nskip == 0); >> + >> + if (qemu_close_all_open_fd_close_range(skip, nskip)) { >> + return; >> + } >> + >> + if (qemu_close_all_open_fd_proc(skip, nskip)) { >> + return; >> + } >> + >> + /* Fallback */ >> + for (i = 0; i < open_max; i++) { >> + if (cur_skip < nskip && i == skip[cur_skip]) { >> + cur_skip++; >> + continue; >> + } >> + close(i); >> + } >> +} > > Build failure on windows: > > ../util/osdep.c: In function 'qemu_close_all_open_fd': > ../util/osdep.c:725:20: error: implicit declaration of function > 'sysconf'; did you mean 'swscanf'? [-Wimplicit-function-declaration] > 725 | int open_max = sysconf(_SC_OPEN_MAX); > | ^~~~~~~ > | swscanf > ../util/osdep.c:725:20: error: nested extern declaration of 'sysconf' > [-Werror=nested-externs] > ../util/osdep.c:725:28: error: '_SC_OPEN_MAX' undeclared (first use in > this function); did you mean 'FOPEN_MAX'? > 725 | int open_max = sysconf(_SC_OPEN_MAX); > | ^~~~~~~~~~~~ > | FOPEN_MAX > ../util/osdep.c:725:28: note: each undeclared identifier is reported > only once for each function it appears in > Should I move this chunk of code in oslib-posix.c and stub that function for win32 ? It seems like this code was not built for windows before I added it in osdep, which means it is not needed for win32. What's your advice on that ? Thanks, Clément
Hi On Wed, Jul 17, 2024 at 4:48 PM Clément Léger <cleger@rivosinc.com> wrote: > Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on > POSIX"), the maximum number of file descriptors that can be opened are > raised to nofile.rlim_max. On recent debian distro, this yield a maximum > of 1073741816 file descriptors. Now, when forking to start > qemu-bridge-helper, this actually calls close() on the full possible file > descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which > takes a considerable amount of time. In order to reduce that time, > factorize existing code to close all open files descriptors in a new > qemu_close_all_open_fd() function. This function uses various methods > to close all the open file descriptors ranging from the most efficient > one to the least one. It also accepts an ordered array of file > descriptors that should not be closed since this is required by the > callers that calls it after forking. > > Signed-off-by: Clément Léger <cleger@rivosinc.com> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > GLib already implemented those kinds of portable facilities. I wonder why launch_script() is not using glib gspawn API. async-teardown should use g_clownfrom() when glib >= 2.80. my 2c
On Tue, Jul 23, 2024 at 09:16:15AM +0200, Clément Léger wrote: > > > On 23/07/2024 08:24, Philippe Mathieu-Daudé wrote: > > Hi Clément, > > > > On 17/7/24 14:45, Clément Léger wrote: > >> Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on > >> POSIX"), the maximum number of file descriptors that can be opened are > >> raised to nofile.rlim_max. On recent debian distro, this yield a maximum > >> of 1073741816 file descriptors. Now, when forking to start > >> qemu-bridge-helper, this actually calls close() on the full possible file > >> descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which > >> takes a considerable amount of time. In order to reduce that time, > >> factorize existing code to close all open files descriptors in a new > >> qemu_close_all_open_fd() function. This function uses various methods > >> to close all the open file descriptors ranging from the most efficient > >> one to the least one. It also accepts an ordered array of file > >> descriptors that should not be closed since this is required by the > >> callers that calls it after forking. > >> > >> Signed-off-by: Clément Léger <cleger@rivosinc.com> > >> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > >> ---- > > > > FYI git tools parse 3 '-', not 4. > > Hi Philippe, > > Thanks for the info, I was not aware of that ! I'll use 3 '-' from now on. > > > > >> v4: > >> - Add a comment saying that qemu_close_all_fds() can take a NULL skip > >> array and nskip == 0 > >> - Added an assert in qemu_close_all_fds() to check for skip/nskip > >> parameters > >> - Fix spurious tabs instead of spaces > >> - Applied checkpatch > >> - v3: > >> https://lore.kernel.org/qemu-devel/20240716144006.6571-1-cleger@rivosinc.com/ > > > > > >> +void qemu_close_all_open_fd(const int *skip, unsigned int nskip) > >> +{ > >> + int open_max = sysconf(_SC_OPEN_MAX); > >> + unsigned int cur_skip = 0; > >> + int i; > >> + > >> + assert(skip != NULL || nskip == 0); > >> + > >> + if (qemu_close_all_open_fd_close_range(skip, nskip)) { > >> + return; > >> + } > >> + > >> + if (qemu_close_all_open_fd_proc(skip, nskip)) { > >> + return; > >> + } > >> + > >> + /* Fallback */ > >> + for (i = 0; i < open_max; i++) { > >> + if (cur_skip < nskip && i == skip[cur_skip]) { > >> + cur_skip++; > >> + continue; > >> + } > >> + close(i); > >> + } > >> +} > > > > Build failure on windows: > > > > ../util/osdep.c: In function 'qemu_close_all_open_fd': > > ../util/osdep.c:725:20: error: implicit declaration of function > > 'sysconf'; did you mean 'swscanf'? [-Wimplicit-function-declaration] > > 725 | int open_max = sysconf(_SC_OPEN_MAX); > > | ^~~~~~~ > > | swscanf > > ../util/osdep.c:725:20: error: nested extern declaration of 'sysconf' > > [-Werror=nested-externs] > > ../util/osdep.c:725:28: error: '_SC_OPEN_MAX' undeclared (first use in > > this function); did you mean 'FOPEN_MAX'? > > 725 | int open_max = sysconf(_SC_OPEN_MAX); > > | ^~~~~~~~~~~~ > > | FOPEN_MAX > > ../util/osdep.c:725:28: note: each undeclared identifier is reported > > only once for each function it appears in > > > > Should I move this chunk of code in oslib-posix.c and stub that function > for win32 ? It seems like this code was not built for windows before I > added it in osdep, which means it is not needed for win32. Yes, I think that'll be OK. The fork/exec concept isn't present on Windows so we also don't need to close all FDs. With regards, Daniel
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 191916f38e..43a035d756 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -757,6 +757,14 @@ static inline void qemu_reset_optind(void) int qemu_fdatasync(int fd); +/** + * Close all open file descriptors except the ones supplied in the @skip array + * + * @skip: ordered array of distinct file descriptors that should not be closed + * @nskip: number of entries in the @skip array. + */ +void qemu_close_all_open_fd(const int *skip, unsigned int nskip); + /** * Sync changes made to the memory mapped file back to the backing * storage. For POSIX compliant systems this will fallback diff --git a/net/tap.c b/net/tap.c index 51f7aec39d..45cb963142 100644 --- a/net/tap.c +++ b/net/tap.c @@ -385,6 +385,23 @@ static TAPState *net_tap_fd_init(NetClientState *peer, return s; } +static void close_all_fds_after_fork(int excluded_fd) +{ + const int skip_fd[] = {STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, + excluded_fd}; + unsigned int nskip = ARRAY_SIZE(skip_fd); + + /* + * skip_fd must be an ordered array of distinct fds, exclude + * excluded_fd if already included in the [STDIN_FILENO - STDERR_FILENO] + * range + */ + if (excluded_fd <= STDERR_FILENO) { + nskip--; + } + qemu_close_all_open_fd(skip_fd, nskip); +} + static void launch_script(const char *setup_script, const char *ifname, int fd, Error **errp) { @@ -400,13 +417,7 @@ static void launch_script(const char *setup_script, const char *ifname, return; } if (pid == 0) { - int open_max = sysconf(_SC_OPEN_MAX), i; - - for (i = 3; i < open_max; i++) { - if (i != fd) { - close(i); - } - } + close_all_fds_after_fork(fd); parg = args; *parg++ = (char *)setup_script; *parg++ = (char *)ifname; @@ -490,17 +501,11 @@ static int net_bridge_run_helper(const char *helper, const char *bridge, return -1; } if (pid == 0) { - int open_max = sysconf(_SC_OPEN_MAX), i; char *fd_buf = NULL; char *br_buf = NULL; char *helper_cmd = NULL; - for (i = 3; i < open_max; i++) { - if (i != sv[1]) { - close(i); - } - } - + close_all_fds_after_fork(sv[1]); fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]); if (strrchr(helper, ' ') || strrchr(helper, '\t')) { diff --git a/system/async-teardown.c b/system/async-teardown.c index 396963c091..9148ee8d04 100644 --- a/system/async-teardown.c +++ b/system/async-teardown.c @@ -26,40 +26,6 @@ static pid_t the_ppid; -/* - * Close all open file descriptors. - */ -static void close_all_open_fd(void) -{ - struct dirent *de; - int fd, dfd; - DIR *dir; - -#ifdef CONFIG_CLOSE_RANGE - int r = close_range(0, ~0U, 0); - if (!r) { - /* Success, no need to try other ways. */ - return; - } -#endif - - dir = opendir("/proc/self/fd"); - if (!dir) { - /* If /proc is not mounted, there is nothing that can be done. */ - return; - } - /* Avoid closing the directory. */ - dfd = dirfd(dir); - - for (de = readdir(dir); de; de = readdir(dir)) { - fd = atoi(de->d_name); - if (fd != dfd) { - close(fd); - } - } - closedir(dir); -} - static void hup_handler(int signal) { /* Check every second if this process has been reparented. */ @@ -85,9 +51,8 @@ static int async_teardown_fn(void *arg) /* * Close all file descriptors that might have been inherited from the * main qemu process when doing clone, needed to make libvirt happy. - * Not using close_range for increased compatibility with older kernels. */ - close_all_open_fd(); + qemu_close_all_open_fd(NULL, 0); /* Set up a handler for SIGHUP and unblock SIGHUP. */ sigaction(SIGHUP, &sa, NULL); diff --git a/util/osdep.c b/util/osdep.c index 770369831b..2f95748a4a 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -613,3 +613,135 @@ int qemu_fdatasync(int fd) return fsync(fd); #endif } + +static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int nskip) +{ +#ifdef CONFIG_LINUX + struct dirent *de; + int fd, dfd; + bool close_fd; + DIR *dir; + unsigned int i, skip_start = 0, skip_end = nskip; + + dir = opendir("/proc/self/fd"); + if (!dir) { + /* If /proc is not mounted, there is nothing that can be done. */ + return false; + } + /* Avoid closing the directory. */ + dfd = dirfd(dir); + + for (de = readdir(dir); de; de = readdir(dir)) { + if (de->d_name[0] == '.') { + continue; + } + fd = atoi(de->d_name); + close_fd = true; + if (fd == dfd) { + close_fd = false; + } else { + for (i = skip_start; i < skip_end; i++) { + if (fd < skip[i]) { + /* We are below the next skipped fd, break */ + break; + } else if (fd == skip[i]) { + close_fd = false; + /* Restrict the range as we found fds matching start/end */ + if (i == skip_start) { + skip_start++; + } else if (i == skip_end) { + skip_end--; + } + break; + } + } + } + if (close_fd) { + close(fd); + } + } + closedir(dir); + + return true; +#else + return false; +#endif +} + + +static bool qemu_close_all_open_fd_close_range(const int *skip, + unsigned int nskip) +{ +#ifdef CONFIG_CLOSE_RANGE + int max_fd = sysconf(_SC_OPEN_MAX) - 1; + int first = 0, last = max_fd; + unsigned int cur_skip = 0; + int ret; + + do { + /* Find the start boundary of the range to close */ + while (cur_skip < nskip && first == skip[cur_skip]) { + cur_skip++; + first++; + } + + /* Find the upper boundary of the range to close */ + if (cur_skip < nskip) { + last = skip[cur_skip] - 1; + } + /* + * Adjust the maximum fd to close if it's above what the system + * supports + */ + if (last > max_fd) { + last = max_fd; + /* + * We can directly skip the remaining skip fds since the current + * one is already above the maximum supported one. + */ + cur_skip = nskip; + } + /* If last was adjusted, we might be > first, bail out */ + if (first > last) { + break; + } + + ret = close_range(first, last, 0); + if (ret < 0) { + return false; + } + first = last + 1; + last = max_fd; + } while (cur_skip < nskip); + + return true; +#else + return false; +#endif +} + +void qemu_close_all_open_fd(const int *skip, unsigned int nskip) +{ + int open_max = sysconf(_SC_OPEN_MAX); + unsigned int cur_skip = 0; + int i; + + assert(skip != NULL || nskip == 0); + + if (qemu_close_all_open_fd_close_range(skip, nskip)) { + return; + } + + if (qemu_close_all_open_fd_proc(skip, nskip)) { + return; + } + + /* Fallback */ + for (i = 0; i < open_max; i++) { + if (cur_skip < nskip && i == skip[cur_skip]) { + cur_skip++; + continue; + } + close(i); + } +}