[08/23] common: fix pkill by running test program in a separate session

From: Darrick J. Wong <djwong@kernel.org>

From: Darrick J. Wong <djwong@kernel.org>

Run each test program with a separate session id so that we can tell
pkill to kill all processes of a given name, but only within our own
session id.  This /should/ suffice to run multiple fstests on the same
machine without one instance shooting down processes of another
instance.

This fixes a general problem with using "pkill --parent" -- if the
process being targeted is not a direct descendant of the bash script
calling pkill, then pkill will not do anything.  The scrub stress tests
make use of multiple background subshells, which is how a ^C in the
parent process fails to result in fsx/fsstress being killed.

This is necessary to fix SOAK_DURATION runtime constraints for all the
scrub stress tests.  However, there is a cost -- the test program no
longer runs with the same controlling tty as ./check, which means that
^Z doesn't work and SIGINT/SIGQUIT are set to SIG_IGN.  IOWs, if a test
wants to kill its subprocesses, it must use another signal such as
SIGPIPE.  Fortunately, bash doesn't whine about children dying due to
fatal signals if the children run in a different session id.

I also explored alternate designs, and this was the least unsatisfying:

a) Setting the process group didn't work because background subshells
are assigned a new group id.

b) Constraining the pkill/pgrep search to a cgroup could work, but we'd
have to set up a cgroup in which to run the fstest.

c) Putting test subprocesses in a systemd sub-scope and telling systemd
to kill the sub-scope could work because ./check can already use it to
ensure that all child processes of a test are killed.  However, this is
an *optional* feature, which means that we'd have to require systemd.

d) Constraining the pkill/pgrep search to a particular mount namespace
could work, but we already have tests that set up their own mount
namespaces, which means the constrained pgrep will not find all child
processes of a test.

e) Constraining to any other type of namespace (uts, pid, etc) might not
work because those namespaces might not be enabled.

f) Revert check-parallel and go back to one fstests instance per system.
Zorro already chose not to revert.

So.  Change _run_seq to create a the ./$seq process with a new session
id, update _su calls to use the same session as the parent test, update
all the pkill sites to use a wrapper so that we only target processes
created by *this* instance of fstests, and update SIGINT to SIGPIPE.

Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 check             |   33 ++++++++++++++++++++++++++++-----
 common/fuzzy      |   17 ++++++++---------
 common/rc         |   12 ++++++++++--
 tests/generic/310 |    6 +++---
 tests/generic/561 |    2 +-
 5 files changed, 50 insertions(+), 20 deletions(-)

Message ID	173706974197.1927324.9208284704325894988.stgit@frogsfrogsfrogs (mailing list archive)
State	New, archived
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39A73232438; Thu, 16 Jan 2025 23:27:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737070036; cv=none; b=OPV7KBmo0glB1arbWxk3CMZWSuSq9X8jBZO8qhh6rR3OSdrvKVDD6+Hzy8gzTFisZc/Ox3ialdOF2ZL/jcfIx1fsbZt5edxFYp5C3ROwWFTbuVgBZ/uGqlCiXEeIy0lP6IcUbqzQKe4/rucCXQsS6evI11lhMuACZZvTB3I2q9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737070036; c=relaxed/simple; bh=oMzGVeg4ZdgtOWYLq0T6ROEUmRySL652sqQCJMoVQ1M=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qaXrDiAfvS+o05b73k3e6GDkTNwtoDanDZyn1AAxRVBLZ4cIlV32iBIvWEhZdUcJ3R28XWUE689xGmFMh3FyLvValZoTbsi/gJi3998ximqdVMpIk3j6zTRAmVw9GDuVHr8traCXVH1hF+rPATlcbkh7UG1kJV6TD0KH6BCqguw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iwT1tyHX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iwT1tyHX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1086BC4CED6; Thu, 16 Jan 2025 23:27:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737070036; bh=oMzGVeg4ZdgtOWYLq0T6ROEUmRySL652sqQCJMoVQ1M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=iwT1tyHXkxOSbJLJObSW6aNUg0PX1frwi8hAIalRvZyvzqN5tH5ur0vGvjU6KeA8x ddNhvgJtpYyYluQNv2H79THAauonYtKZTav4kWADA9KKbNOrfIKxW/LsZtDHe6WMiG I5/38GDFszh+hNWZMNXlBGbNA4ZtzkBhzCiB+tWLqxgsZFTPxqPIaPa6vMgrrQz3QK qYdjslYl3dSf9mLPqYBWjM5ewk8jf6nvdRkr6cB2PqAOqrK74aCDCXbZe96PmWS0sM Wevx8k0NVUbM9rappgL+VQwOnw9CAwCOT45bYXSamCVSRR7XqPtzZI6NH5no2jUqGh e8q6Y7L7uBYYQ== Date: Thu, 16 Jan 2025 15:27:15 -0800 Subject: [PATCH 08/23] common: fix pkill by running test program in a separate session From: "Darrick J. Wong" <djwong@kernel.org> To: zlang@redhat.com, djwong@kernel.org Cc: hch@lst.de, fstests@vger.kernel.org, linux-xfs@vger.kernel.org Message-ID: <173706974197.1927324.9208284704325894988.stgit@frogsfrogsfrogs> In-Reply-To: <173706974044.1927324.7824600141282028094.stgit@frogsfrogsfrogs> References: <173706974044.1927324.7824600141282028094.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: fstests@vger.kernel.org List-Id: <fstests.vger.kernel.org> List-Subscribe: <mailto:fstests+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:fstests+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit
Series	[01/23] generic/476: fix fsstress process management \| expand [01/23] generic/476: fix fsstress process management [02/23] metadump: make non-local function variables more obvious [03/23] metadump: fix cleanup for v1 metadump testing [04/23] generic/482: _run_fsstress needs the test filesystem [05/23] generic/019: don't fail if fio crashes while shutting down [06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx [07/23] common/rc: create a wrapper for the su command [08/23] common: fix pkill by running test program in a separate session [09/23] unmount: resume logging of stdout and stderr for filtering [10/23] mkfs: don't hardcode log size [11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown [12/23] preamble: fix missing _kill_fsstress [13/23] generic/650: revert SOAK DURATION changes [14/23] generic/032: fix pinned mount failure [15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails [16/23] fuzzy: don't use readarray for xfsfind output [17/23] fuzzy: always stop the scrub fsstress loop on error [18/23] fuzzy: port fsx and fsstress loop to use --duration [19/23] common/rc: don't copy fsstress to $TEST_DIR [20/23] fix _require_scratch_duperemove ordering [21/23] fsstress: fix a memory leak [22/23] fsx: fix leaked log file pointer [23/23] build: initialize stack variables to zero by default

[08/23] common: fix pkill by running test program in a separate session

Commit Message

Comments

Patch