[v3,0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large

Message ID	20230617053621.50359-1-bmeng@tinylab.org (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Bin Meng <bmeng@tinylab.org> To: qemu-devel@nongnu.org Cc: Zhangjin Wu <falcon@tinylab.org>, Alberto Faria <afaria@redhat.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Jason Wang <jasowang@redhat.com>, Juan Quintela <quintela@redhat.com>, Kevin Wolf <kwolf@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9_Lureau?= <marcandre.lureau@redhat.com>, Markus Armbruster <armbru@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Nikita Ivanov <nivanov@cloudlinux.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Huth <thuth@redhat.com>, Xuzhou Cheng <xuzhou.cheng@windriver.com> Subject: [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large Date: Sat, 17 Jun 2023 13:36:15 +0800 Message-Id: <20230617053621.50359-1-bmeng@tinylab.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: bizesmtp:tinylab.org:qybglogicsvrgz:qybglogicsvrgz7a-0 Received-SPF: pass client-ip=43.154.221.58; envelope-from=bmeng@tinylab.org; helo=bg4.exmail.qq.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SBL=0.141, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large \| expand [v3,0/6] net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large [v3,1/6] tests/tcg/cris: Fix the coding style [v3,2/6] tests/tcg/cris: Correct the off-by-one error [v3,3/6] util/async-teardown: Fall back to close fds one by one [v3,4/6] util/osdep: Introduce qemu_close_range() [v3,5/6] util/async-teardown: Use qemu_close_range() to close fds [v3,6/6] net: tap: Use qemu_close_range() to close fds

Message ID

20230617053621.50359-1-bmeng@tinylab.org (mailing list archive)

Headers

From: Bin Meng <bmeng@tinylab.org>
To: qemu-devel@nongnu.org
Cc: Zhangjin Wu <falcon@tinylab.org>, Alberto Faria <afaria@redhat.com>,
 Claudio Imbrenda <imbrenda@linux.ibm.com>,
 "Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
 Jason Wang <jasowang@redhat.com>, Juan Quintela <quintela@redhat.com>,
 Kevin Wolf <kwolf@redhat.com>,
 =?utf-8?q?Marc-Andr=C3=A9_Lureau?= <marcandre.lureau@redhat.com>,
 Markus Armbruster <armbru@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>,
 Nikita Ivanov <nivanov@cloudlinux.com>, Paolo Bonzini <pbonzini@redhat.com>,
 Thomas Huth <thuth@redhat.com>, Xuzhou Cheng <xuzhou.cheng@windriver.com>
Subject: [PATCH v3 0/6] net/tap: Fix QEMU frozen issue when the maximum number
 of file descriptors is very large
Date: Sat, 17 Jun 2023 13:36:15 +0800
Message-Id: <20230617053621.50359-1-bmeng@tinylab.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Feedback-ID: bizesmtp:tinylab.org:qybglogicsvrgz:qybglogicsvrgz7a-0
Received-SPF: pass client-ip=43.154.221.58; envelope-from=bmeng@tinylab.org;
 helo=bg4.exmail.qq.com
X-Spam_score_int: -17
X-Spam_score: -1.8
X-Spam_bar: -
X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001,
 RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SBL=0.141, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

Series

net/tap: Fix QEMU frozen issue when the maximum number of file descriptors is very large | expand

Message

Bin Meng June 17, 2023, 5:36 a.m. UTC

Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
is set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks frozen during
start-up.

The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
doesn't need to manually close the fds for child process as the proper
O_CLOEXEC flag should have been set properly on files with its own
codes, QEMU uses a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.

Modern Linux and BSDs have the close_range() call we can use to do the
job, and on Linux we have one more way to walk through /proc/self/fd
to complete the task efficiently, which is what qemu_close_range()
does, a new API we add in util/osdep.c.

V1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/

Changes in v3:
- fix win32 build failure
- limit the last_fd of qemu_close_range() to sysconf(_SC_OPEN_MAX)

Changes in v2:
- new patch: "tests/tcg/cris: Fix the coding style"
- new patch: "tests/tcg/cris: Correct the off-by-one error"
- new patch: "util/async-teardown: Fall back to close fds one by one"
- new patch: "util/osdep: Introduce qemu_close_range()"
- new patch: "util/async-teardown: Use qemu_close_range() to close fds"
- Change to use qemu_close_range() to close fds for child process efficiently
- v1 link: https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/

Bin Meng (4):
  tests/tcg/cris: Fix the coding style
  tests/tcg/cris: Correct the off-by-one error
  util/async-teardown: Fall back to close fds one by one
  util/osdep: Introduce qemu_close_range()

Zhangjin Wu (2):
  util/async-teardown: Use qemu_close_range() to close fds
  net: tap: Use qemu_close_range() to close fds

 include/qemu/osdep.h                |  1 +
 net/tap.c                           | 23 ++++++------
 tests/tcg/cris/libc/check_openpf5.c | 57 ++++++++++++++---------------
 util/async-teardown.c               | 38 +------------------
 util/osdep.c                        | 48 ++++++++++++++++++++++++
 5 files changed, 89 insertions(+), 78 deletions(-)

Comments

Richard Henderson June 19, 2023, 7:13 a.m. UTC | #1

On 6/17/23 07:36, Bin Meng wrote:
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.
> 
> The close-on-exec flag (O_CLOEXEC) was introduced since Linux kernel
> 2.6.23, FreeBSD 8.3, OpenBSD 5.0, Solaris 11. While it's true QEMU
> doesn't need to manually close the fds for child process as the proper
> O_CLOEXEC flag should have been set properly on files with its own
> codes, QEMU uses a huge number of 3rd party libraries and we don't
> trust them to reliably be using O_CLOEXEC on everything they open.
> 
> Modern Linux and BSDs have the close_range() call we can use to do the
> job, and on Linux we have one more way to walk through /proc/self/fd
> to complete the task efficiently, which is what qemu_close_range()
> does, a new API we add in util/osdep.c.
> 
> V1 link:https://lore.kernel.org/qemu-devel/20230406112041.798585-1-bmeng@tinylab.org/
> 
> Changes in v3:
> - fix win32 build failure
> - limit the last_fd of qemu_close_range() to sysconf(_SC_OPEN_MAX)

Sorry, I didn't see this and sent some comments against v2.
Though using _SC_OPEN_MAX was one of them, so that's nice.  :-)


r~

Michael Tokarev June 28, 2023, 5:13 p.m. UTC | #2

17.06.2023 08:36, Bin Meng wrote:
> 
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> is set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks frozen during
> start-up.

What's the reason to close all these file descriptors in the first place?

No other software I know does this.

For some situations, such closing is actually bad, -- think, eg,

   flock lockfile qemu-system-foo ...  --

this one opens a file, locks it using fcntl/flock, and executes the
command, keeping the file descriptor open across exec, so the file
stays locked until the process terminates. This works and works well.
Qemu with its let's-close-everything approach breaks this.

Why? :)

/mjt