From patchwork Tue Jul 30 01:13:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746132 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05E9B8BF0 for ; Tue, 30 Jul 2024 01:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301987; cv=none; b=jK/NpC4IxPNdqyWKSm8fcLI7xwg2Wv9wAN50GykZm86YhSpXZLZJxkcRuM+Nt7TMQMnltsd+W4N4XNqZDgDKA97GHxpgUG+ARrMkh0N93rgnCHA55ztbVqBgLrWWB+VK4xAnpPLBDriz/7X1365dzCtWWOh/9Rl2ZJuymBkoef0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722301987; c=relaxed/simple; bh=QTyM1oOu+UcI120PgwKWQ83UeuwQNy8Q3WxLI3BxhK8=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i66YK0eHoqoJimhL6npmsp4LEUP5zx5C5bOdXWtD+LWZy/QozOjzi3MNxM+p9GTtydDnq+9UnP2cKmwcAOnZXaRSgWTuYkuxFWHYIGCcw35padJVpIGfUAO2DQpkknGFGgDTD7D/Ayxju/+Q45RKi/HwWlSrE1FSBTsCQc8v3Iw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nGuvWuja; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nGuvWuja" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9523C32786; Tue, 30 Jul 2024 01:13:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722301986; bh=QTyM1oOu+UcI120PgwKWQ83UeuwQNy8Q3WxLI3BxhK8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=nGuvWujaS/Jc1GSVizvYYIIYODUh2oIfJQA4iE4M7+JFvWge8fAgRLAVeNypArMyx K413PXZXJF9/UhqWVJylhCabotuypHFk/PyBE5IUNfYHcTgbB/04NmTKXWMaVA6Ioq d67A2Nk+pp7QWpEX/s6fVFUfLkexN2R+AMwoyKynbZJXJhX6S/+ZaWij3LDBT0ePRz P6XY2eV9rwE8/NbCXRnY8zaoVzsOy1b2opZaIWmBKWH+yhwkooal/qdFXessR62Izq vQdnYZf4gnTMd68fl+GwQR6QNJiwcfu02TGFcukzTcHDDQOd4bdG3SXbNjtgbpqXXe RUQOOpytHjUww== Date: Mon, 29 Jul 2024 18:13:06 -0700 Subject: [PATCH 1/6] xfs_scrub: allow auxiliary pathnames for sandboxing From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848872.1349910.15225595668476876570.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong In the next patch, we'll tighten up the security on the xfs_scrub service so that it can't escape. However, sandboxing the service involves making the host filesystem as inaccessible as possible, with the filesystem to scrub bind mounted onto a known location within the sandbox. Hence we need one path for reporting and a new -M argument to tell scrub what it should actually be trying to open. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- man/man8/xfs_scrub.8 | 9 ++++++++- scrub/phase1.c | 4 ++-- scrub/vfs.c | 2 +- scrub/xfs_scrub.c | 11 ++++++++--- scrub/xfs_scrub.h | 5 ++++- 5 files changed, 23 insertions(+), 8 deletions(-) diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8 index b9f253e1b..615401127 100644 --- a/man/man8/xfs_scrub.8 +++ b/man/man8/xfs_scrub.8 @@ -4,7 +4,7 @@ xfs_scrub \- check and repair the contents of a mounted XFS filesystem .SH SYNOPSIS .B xfs_scrub [ -.B \-abCemnTvx +.B \-abCeMmnTvx ] .I mount-point .br @@ -79,6 +79,13 @@ behavior. .B \-k Do not call TRIM on the free space. .TP +.BI \-M " real-mount-point" +Open the this path for issuing scrub system calls to the kernel. +The positional +.I mount-point +parameter will be used for displaying informational messages and logging. +This parameter exists to enable process sandboxing for service mode. +.TP .BI \-m " file" Search this file for mounted filesystems instead of /etc/mtab. .TP diff --git a/scrub/phase1.c b/scrub/phase1.c index 1b3f6e8eb..516d929d6 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -146,7 +146,7 @@ phase1_func( * CAP_SYS_ADMIN, which we probably need to do anything fancy * with the (XFS driver) kernel. */ - error = -xfd_open(&ctx->mnt, ctx->mntpoint, + error = -xfd_open(&ctx->mnt, ctx->actual_mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY); if (error) { if (error == EPERM) @@ -199,7 +199,7 @@ _("Not an XFS filesystem.")); return error; } - error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle, + error = path_to_fshandle(ctx->actual_mntpoint, &ctx->fshandle, &ctx->fshandle_len); if (error) { str_errno(ctx, _("getting fshandle")); diff --git a/scrub/vfs.c b/scrub/vfs.c index 22c19485a..fca9a4cf3 100644 --- a/scrub/vfs.c +++ b/scrub/vfs.c @@ -249,7 +249,7 @@ scan_fs_tree( goto out_cond; } - ret = queue_subdir(ctx, &sft, &wq, ctx->mntpoint, true); + ret = queue_subdir(ctx, &sft, &wq, ctx->actual_mntpoint, true); if (ret) { str_liberror(ctx, ret, _("queueing directory scan")); goto out_wq; diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c index 296d814ec..d7cef115d 100644 --- a/scrub/xfs_scrub.c +++ b/scrub/xfs_scrub.c @@ -725,7 +725,7 @@ main( pthread_mutex_init(&ctx.lock, NULL); ctx.mode = SCRUB_MODE_REPAIR; ctx.error_action = ERRORS_CONTINUE; - while ((c = getopt(argc, argv, "a:bC:de:km:no:TvxV")) != EOF) { + while ((c = getopt(argc, argv, "a:bC:de:kM:m:no:TvxV")) != EOF) { switch (c) { case 'a': ctx.max_errors = cvt_u64(optarg, 10); @@ -769,6 +769,9 @@ main( case 'k': want_fstrim = false; break; + case 'M': + ctx.actual_mntpoint = optarg; + break; case 'm': mtab = optarg; break; @@ -823,6 +826,8 @@ main( usage(); ctx.mntpoint = argv[optind]; + if (!ctx.actual_mntpoint) + ctx.actual_mntpoint = ctx.mntpoint; stdout_isatty = isatty(STDOUT_FILENO); stderr_isatty = isatty(STDERR_FILENO); @@ -840,7 +845,7 @@ main( return SCRUB_RET_OPERROR; /* Find the mount record for the passed-in argument. */ - if (stat(argv[optind], &ctx.mnt_sb) < 0) { + if (stat(ctx.actual_mntpoint, &ctx.mnt_sb) < 0) { fprintf(stderr, _("%s: could not stat: %s: %s\n"), progname, argv[optind], strerror(errno)); @@ -863,7 +868,7 @@ main( } fs_table_initialise(0, NULL, 0, NULL); - fsp = fs_table_lookup_mount(ctx.mntpoint); + fsp = fs_table_lookup_mount(ctx.actual_mntpoint); if (!fsp) { fprintf(stderr, _("%s: Not a XFS mount point.\n"), ctx.mntpoint); diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h index 7d48f4bad..b0aa9fcc6 100644 --- a/scrub/xfs_scrub.h +++ b/scrub/xfs_scrub.h @@ -38,9 +38,12 @@ enum error_action { struct scrub_ctx { /* Immutable scrub state. */ - /* Strings we need for presentation */ + /* Mountpoint we use for presentation */ char *mntpoint; + /* Actual VFS path to the filesystem */ + char *actual_mntpoint; + /* Mountpoint info */ struct stat mnt_sb; struct statvfs mnt_sv; From patchwork Tue Jul 30 01:13:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746133 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D108033C8 for ; Tue, 30 Jul 2024 01:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302002; cv=none; b=pK7kY/ZmLCBFYqI9Te5uxglziNet6pVE3HVBgqsR2srPC4LLyXZszcecurQ3oAfrvWzL5x/OifbE320imo73oK5r33ENHqaa4HSthPUSJ70dneiNMlk2ujVovYDF2quRSPgkDLiY4Q34EQxRo2HRQaKimSbxmxRrUETIsiFHSOg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302002; c=relaxed/simple; bh=XE8Cng7Uc7LdgK/iGx4rKQtHQDzzzS7jHM48Y3JrnKs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UVN9P0Zw8JvnHQlZBWzsngwafuVNEl6FwFtmamIcZs0IqQ0Sv+ct+a5FUbvW9nESletKRL25rhC/zFczNcbMts7pOv3WJQ+61vkstmc8TdUEqLKvI3+rTzouFSMIjndYjWvAhEdl6H+xhZxVlJPPDcFncC6i0G8ja6lTKuT+9nc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Kl1RV/DL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Kl1RV/DL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71B06C32786; Tue, 30 Jul 2024 01:13:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722302002; bh=XE8Cng7Uc7LdgK/iGx4rKQtHQDzzzS7jHM48Y3JrnKs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Kl1RV/DLGkuLAzbLs8FamAvMCnnogrp2ZdOReGHYukhF42RGAwcC6FcLJ+/QTqZKl Zaka8ya4nTobQ2kXwKtL02ofmb0LU3FYrDoSHQbCgFjuFTR3ZqcCRQGX6R06oyq/mK BShjSvtRQB0O3XkwfGYZM+733UDiI3ZM244zyoCBJhSTWQRUbcGQ4jdL96MY3ObwnC PvSPpCKRCcK0t5jJAAOXi5C1+69+vZeuz3X0AH8Si7D8zDy64nvaeDNtdOxc8otYRm Ys/B8xCsfxYXqmr3UR20Mgs3vcA6YhSPB8BsuXZFvJPDbuZd1oswj7CjDSGRSW1fyy AWRZXr46jzAWg== Date: Mon, 29 Jul 2024 18:13:22 -0700 Subject: [PATCH 2/6] xfs_scrub.service: reduce background CPU usage to less than one core if possible From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848887.1349910.1654048860772249392.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Currently, the xfs_scrub background service is configured to use -b, which means that the program runs completely serially. However, even using all of one CPU core with idle priority may be enough to cause thermal throttling and unwanted fan noise on smaller systems (e.g. laptops) with fast IO systems. Let's try to avoid this (at least on systemd) by using cgroups to limit the program's usage to slghtly more than half of one CPU and lowering the nice priority in the scheduler. What we /really/ want is to run steadily on an efficiency core, but there doesn't seem to be a means to ask the scheduler not to ramp up the CPU frequency for a particular task. While we're at it, group the resource limit directives together. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/Makefile | 7 ++++++- scrub/system-xfs_scrub.slice | 30 ++++++++++++++++++++++++++++++ scrub/xfs_scrub@.service.in | 12 ++++++++++-- scrub/xfs_scrub_all.service.in | 4 ++++ scrub/xfs_scrub_fail@.service.in | 4 ++++ 5 files changed, 54 insertions(+), 3 deletions(-) create mode 100644 scrub/system-xfs_scrub.slice diff --git a/scrub/Makefile b/scrub/Makefile index 8ccc67d01..2a257e080 100644 --- a/scrub/Makefile +++ b/scrub/Makefile @@ -18,7 +18,12 @@ XFS_SCRUB_FAIL_PROG = xfs_scrub_fail XFS_SCRUB_ARGS = -b -n ifeq ($(HAVE_SYSTEMD),yes) INSTALL_SCRUB += install-systemd -SYSTEMD_SERVICES = $(scrub_svcname) xfs_scrub_all.service xfs_scrub_all.timer xfs_scrub_fail@.service +SYSTEMD_SERVICES=\ + $(scrub_svcname) \ + xfs_scrub_fail@.service \ + xfs_scrub_all.service \ + xfs_scrub_all.timer \ + system-xfs_scrub.slice OPTIONAL_TARGETS += $(SYSTEMD_SERVICES) endif ifeq ($(HAVE_CROND),yes) diff --git a/scrub/system-xfs_scrub.slice b/scrub/system-xfs_scrub.slice new file mode 100644 index 000000000..95cd4f745 --- /dev/null +++ b/scrub/system-xfs_scrub.slice @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2022-2024 Oracle. All Rights Reserved. +# Author: Darrick J. Wong + +[Unit] +Description=xfs_scrub background service slice +Before=slices.target + +[Slice] + +# If the CPU usage cgroup controller is available, don't use more than 60% of a +# single core for all background processes. +CPUQuota=60% +CPUAccounting=true + +[Install] +# As of systemd 249, the systemd cgroupv2 configuration code will drop resource +# controllers from the root and system.slice cgroups at startup if it doesn't +# find any direct dependencies that require a given controller. Newly +# activated units with resource control directives are created under the system +# slice but do not cause a reconfiguration of the slice's resource controllers. +# Hence we cannot put CPUQuota= into the xfs_scrub service units directly. +# +# For the CPUQuota directive to have any effect, we must therefore create an +# explicit definition file for the slice that systemd creates to contain the +# xfs_scrub instance units (e.g. xfs_scrub@.service) and we must configure this +# slice as a dependency of the system slice to establish the direct dependency +# relation. +WantedBy=system.slice diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in index 05e5293ee..855fe4de4 100644 --- a/scrub/xfs_scrub@.service.in +++ b/scrub/xfs_scrub@.service.in @@ -18,8 +18,16 @@ PrivateTmp=no AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO NoNewPrivileges=yes User=nobody -IOSchedulingClass=idle -CPUSchedulingPolicy=idle Environment=SERVICE_MODE=1 ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f SyslogIdentifier=%N + +# Run scrub with minimal CPU and IO priority so that nothing else will starve. +IOSchedulingClass=idle +CPUSchedulingPolicy=idle +CPUAccounting=true +Nice=19 + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in index 347cd6e66..96be90e74 100644 --- a/scrub/xfs_scrub_all.service.in +++ b/scrub/xfs_scrub_all.service.in @@ -14,3 +14,7 @@ Type=oneshot Environment=SERVICE_MODE=1 ExecStart=@sbindir@/xfs_scrub_all SyslogIdentifier=xfs_scrub_all + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in index 96a2ed5da..32012ec35 100644 --- a/scrub/xfs_scrub_fail@.service.in +++ b/scrub/xfs_scrub_fail@.service.in @@ -14,3 +14,7 @@ ExecStart=@pkg_libexec_dir@/xfs_scrub_fail "${EMAIL_ADDR}" %f User=mail Group=mail SupplementaryGroups=systemd-journal + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice From patchwork Tue Jul 30 01:13:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746134 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5739F7464 for ; Tue, 30 Jul 2024 01:13:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302018; cv=none; b=pqjAl8/WnFAd1ZLB3v1864La2z/4pFC0rn/GnzKE+CMJNyOb0Qyk4AyMXgplxhikqJ5ng3r0ZZ8EQNMxHABke1HPozTJD029XDMr9uAAvuYULAr2dEDSr0n8r+SlmDYJgb0IJgnsmn882bsu3vQ/uAxu8JkubvtaSKjsasmcicQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302018; c=relaxed/simple; bh=VTutdpIFCEM0YB8M9Gf36qvLfzxXZOPjIBa186XkCMM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c+4oZNC4MYtvNI1B48e7IEZza18id2meYCCOBh8NREG2cTIsHwfnZ9edTFXpBxjkMIcVvdBBhWG8P9UBx4dv+so2In1hgig3eMVaa47+2tRekRMtftAluYGx3Zk5f6iJ/DiXSo/EMIFhK75TdalmmtEdgUWXmg+ldDDxsnI63Cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Hk1d/mVu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Hk1d/mVu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23C62C32786; Tue, 30 Jul 2024 01:13:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722302018; bh=VTutdpIFCEM0YB8M9Gf36qvLfzxXZOPjIBa186XkCMM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Hk1d/mVusw0NNod9bF73svGPAb2W0ZWoHpI8lS2/Cv5sESoa/7G09aoy1mRZGojVJ iVSggNsWUhxrZGte5ld1nhLfKC3bHRNIXhowvn6qKyYX18Y7wKC6qIYgUOdM+1TmBW 0FEOjQvDM65jLAHFOOnqOcUOKBDQITBN+E1YeUV8k6XiH1kpQiO6mc7HJr1SmJWIpL RHmfAunUE28fKpYq62mP6sRaps3M9dXIDdB3NY5UsKWnP7rXMu+ryQAL6ojpxWer6l F8MnEvYwaL8wcQapzo49F/7MDyAzk7JngUV1gb3wiSBIRfaWbIA2Cvhoru6hx1Ob/n 29eZ6S+6gxIew== Date: Mon, 29 Jul 2024 18:13:37 -0700 Subject: [PATCH 3/6] xfs_scrub: use dynamic users when running as a systemd service From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Helle Vaanzinn , Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848903.1349910.11661470792666399204.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Five years ago, systemd introduced the DynamicUser directive that allocates a new unique user/group id, runs a service with those ids, and deletes them after the service exits. This is a good replacement for User=nobody, since it eliminates the threat of nobody-services messing with each other. Make this transition ahead of all the other security tightenings that will land in the next few patches, and add credits for the people who suggested the change and reviewed it. Link: https://0pointer.net/blog/dynamic-users-with-systemd.html Suggested-by: Helle Vaanzinn Reviewed-by: Christoph Hellwig Signed-off-by: Darrick J. Wong --- scrub/xfs_scrub@.service.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in index 855fe4de4..52068add8 100644 --- a/scrub/xfs_scrub@.service.in +++ b/scrub/xfs_scrub@.service.in @@ -17,7 +17,6 @@ ProtectHome=read-only PrivateTmp=no AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO NoNewPrivileges=yes -User=nobody Environment=SERVICE_MODE=1 ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f SyslogIdentifier=%N @@ -31,3 +30,6 @@ Nice=19 # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# Dynamically create a user that isn't root +DynamicUser=true From patchwork Tue Jul 30 01:13:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746135 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48A218BF0 for ; Tue, 30 Jul 2024 01:13:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302034; cv=none; b=pK9Uqll+BdrjM7MLgs5A7xZrA+GB7bsoQtovOJzHzjpBIe67/dR33HZBrye/91ij5TwaCNPP+i4xk5mDuxEFnX3HrYokoI/Bc30hafNI81wexHNN6oqVI+dxc/eVpryUO5MX5gpR5WlL9oY9yQi3ItHMp94oLhetWYtBbkXJIWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302034; c=relaxed/simple; bh=9Zon9OVzk55dODAtcu5JzGThuhwNdEYt2V3Wnqm9BnY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ncHp1XzzAweJKVCctBqTybxzbYsjR+/szUiXYVrm0jzA3Uz9FVjOyHcKrwuYX0jrJnGO6gl3odxItNd+hZAZYwv0lmURrCY/jpiwXs6XiunLIxgFZHOd6yZvE0jQrlIPDS9iF3k09xNf678/PmFSG84AwTSeCrfDL+vlo3yovDw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FbP8Y4PO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FbP8Y4PO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8071C32786; Tue, 30 Jul 2024 01:13:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722302033; bh=9Zon9OVzk55dODAtcu5JzGThuhwNdEYt2V3Wnqm9BnY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=FbP8Y4POf1R28jIkeU6XFZPr5tArqH5Wkb78Gb2FYM0R/JgFp+4KpxE+fbwmdMNYx tww46Yh0i7INrUOV+i8ZmFgP9fKWwEbxh+dLYFk1tZE6428FrrIdVThdLLlhiUzZt2 sScuHV+lLA32LVVCuoJ9cm3To33XoedFVBFUgGCQaZqtgUOH+EeftBKv25hKBJ2Dgz jp7QjweoYt3BvuLUEIZkkAohxJ38Q3xiraZuzQxbInBCAPgZ5PLi26lsVs2X3kLYzK LyP3BAkIuS/jAYWXzH1l7FTaFXpbd+Q6XmvEzZNo8s3X197bM926cv/XrKF0pL0L6o 7WaEUqowMK64g== Date: Mon, 29 Jul 2024 18:13:53 -0700 Subject: [PATCH 4/6] xfs_scrub: tighten up the security on the background systemd service From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848917.1349910.13460116022520022971.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Currently, xfs_scrub has to run with some elevated privileges. Minimize the risk of xfs_scrub escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub@.service' in systemd 249. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/xfs_scrub@.service.in | 81 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 73 insertions(+), 8 deletions(-) diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in index 52068add8..a8dd9052f 100644 --- a/scrub/xfs_scrub@.service.in +++ b/scrub/xfs_scrub@.service.in @@ -8,17 +8,21 @@ Description=Online XFS Metadata Check for %f OnFailure=xfs_scrub_fail@%i.service Documentation=man:xfs_scrub(8) +# Explicitly require the capabilities that this program needs +ConditionCapability=CAP_SYS_ADMIN +ConditionCapability=CAP_FOWNER +ConditionCapability=CAP_DAC_OVERRIDE +ConditionCapability=CAP_DAC_READ_SEARCH +ConditionCapability=CAP_SYS_RAWIO + +# Must be a mountpoint +ConditionPathIsMountPoint=%f +RequiresMountsFor=%f + [Service] Type=oneshot -PrivateNetwork=true -ProtectSystem=full -ProtectHome=read-only -# Disable private /tmp just in case %f is a path under /tmp. -PrivateTmp=no -AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO -NoNewPrivileges=yes Environment=SERVICE_MODE=1 -ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f +ExecStart=@sbindir@/xfs_scrub @scrub_args@ -M /tmp/scrub/ %f SyslogIdentifier=%N # Run scrub with minimal CPU and IO priority so that nothing else will starve. @@ -31,5 +35,66 @@ Nice=19 # can control resource usage. Slice=system-xfs_scrub.slice +# No realtime CPU scheduling +RestrictRealtime=true + # Dynamically create a user that isn't root DynamicUser=true + +# Make the entire filesystem readonly and /home inaccessible, then bind mount +# the filesystem we're supposed to be checking into our private /tmp dir. +# 'norbind' means that we don't bind anything under that original mount. +ProtectSystem=strict +ProtectHome=yes +PrivateTmp=true +BindPaths=%f:/tmp/scrub:norbind + +# Don't let scrub complain about paths in /etc/projects that have been hidden +# by our sandboxing. scrub doesn't care about project ids anyway. +InaccessiblePaths=-/etc/projects + +# No network access +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=none +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_scrub needs these privileges to run, and no others +CapabilityBoundingSet=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +NoNewPrivileges=true + +# xfs_scrub doesn't create files +UMask=7777 + +# No access to hardware /dev files except for block devices +ProtectClock=true +DevicePolicy=closed +DeviceAllow=block-* From patchwork Tue Jul 30 01:14:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746136 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D7FD4A2D for ; Tue, 30 Jul 2024 01:14:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302049; cv=none; b=LsHbBN6QzA1wjOgZJkI4FKBnsD2FDqA8NYnoQnRvjfx9SeobPa2/dYcOgYyL64RkuXKxoOJX9JsUgIGAu/6Git4jA8xDF+kIGofwE+AR24h8bIh5nSxIiZ8HpFONRuEO6nDRX4qVR+QBPqDd313lyVkr5pb5teh/7T4pIZBHGXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302049; c=relaxed/simple; bh=ZaBVXanaXH0DpsAv6YP/hy7X+vdluMY/FjohbzHKzfw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=E5CQ1I6T22FWw0Gwb9LUltHDP66xsOLhisOFPCoXcnr1SklJ/RfMKgXzMTQ4cvxA1+MpdmhKJjU1lR8ld7Xj3eUiQyc/5WixrOTHqqT99lUjG0za9RrTj0C1Wuf7zX/pbad4AogvPypfSY5ApJkComY6rK7aM/4iYTFhhgv1hMA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qv6OacO1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qv6OacO1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72642C32786; Tue, 30 Jul 2024 01:14:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722302049; bh=ZaBVXanaXH0DpsAv6YP/hy7X+vdluMY/FjohbzHKzfw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=qv6OacO1tq5+vrXwH56yOb1g7xldeGZR1TBBhBH5wNe2U2spXHtIdr6JqL4FoWkYr Z9U+KtMxqBbmvCeY1akX+Ewfyu94ik/bS6r73O9FExNILameyebZ+k+2MEBGPlXkO4 IW3mNpP+R5QERReQEm7rOrUjr+N+W7fLaOfwQkQdvbN8yu0Cdm+gwuvEQmIte9im2p GzKTqFABwRriTt++FFk3JgVnO2cSUp/KNsRaDTSqm56jmU18XAUU0kGQ/GEVfYLUwd ce+g5b5Jmajxn0fqIiuGNwPKKyLcoNGO4eTk/BU3YKWG5cZyB3ylP4yzNW0isDDNes kHT9if6ZMyNFg== Date: Mon, 29 Jul 2024 18:14:08 -0700 Subject: [PATCH 5/6] xfs_scrub_fail: tighten up the security on the background systemd service From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848929.1349910.8374450657608066953.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Currently, xfs_scrub_fail has to run with enough privileges to access the journal contents for a given scrub run and to send a report via email. Minimize the risk of xfs_scrub_fail escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub_fail@.service' in systemd 249. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/xfs_scrub_fail@.service.in | 55 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in index 32012ec35..2c879afd6 100644 --- a/scrub/xfs_scrub_fail@.service.in +++ b/scrub/xfs_scrub_fail@.service.in @@ -18,3 +18,58 @@ SupplementaryGroups=systemd-journal # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# No realtime scheduling +RestrictRealtime=true + +# Make the entire filesystem readonly and /home inaccessible. +ProtectSystem=full +ProtectHome=yes +PrivateTmp=true +RestrictSUIDSGID=true + +# Emailing reports requires network access, but not the ability to change the +# hostname. +ProtectHostname=true + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Can't hide /proc because journalctl needs it to find various pieces of log +# information +#ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_scrub needs these privileges to run, and no others +CapabilityBoundingSet= +NoNewPrivileges=true + +# Failure reporting shouldn't create world-readable files +UMask=0077 + +# Clean up any IPC objects when this unit stops +RemoveIPC=true + +# No access to hardware device files +PrivateDevices=true +ProtectClock=true From patchwork Tue Jul 30 01:14:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13746137 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1AABC2E3 for ; Tue, 30 Jul 2024 01:14:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302065; cv=none; b=gsw2lf92Cfjww4+CfATUkm0lMVegrW+XlFFH546BhlWjshRwUWGJHVbp8naV5ljUp0/He0rN6Mz9OtGTMtsF+XD5gfIjbq68HHRPV5idx14JnDqriDD5AdtDCbmjNiGl+Ej55axYHTTm5n9RdRo0GQL6wnYCwNhkKpn7VZQJUqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722302065; c=relaxed/simple; bh=GgjAx1j7O6l2CNrqPO3BBZ0+TmqPFkDbYYt95lAo/X8=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eT+piGj+MsO5lgnBoAN1tSqWoEC6nbjZ1IHHYamST7x4ZtLFeCvem4HZwxygFRTe6cLJ/tEdKFisIwHsr1b+c8J3vbxMqj4ZIR6oUnr2Ie2AvDleq9IrAWrJdtSOyYZ4yYj5/xqwNXQko7ygeRDz00dhiy5RHARMG3xGfB1Q7zI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mqXd+e2q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mqXd+e2q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E547C32786; Tue, 30 Jul 2024 01:14:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722302065; bh=GgjAx1j7O6l2CNrqPO3BBZ0+TmqPFkDbYYt95lAo/X8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=mqXd+e2q11vtq1JuBuzJJ6l68vpUdXux5EWpPuqEn7txVQ/JqN+tb50B9i2Zpeavz pZOvsgm4Qbc88//SAecDdVKLnUvzC9JipsANvW70P64LmzA9t7fjHZSQV9mK5xUSPo E0dqXTbkz90pLZgqs8PsQEfLkXjhfv7oceSITpr8f1p+lsOje5YzefKl/GJDv4mGOq uKtxAgql5+p3oOLWay/aFPDZCC+Ma6Ua9JPwLRJLqmGb7fbjsufat9fkS4wdmkCOEL RS/a0y6ID7u0XZ9nJ3ObarOpgHn0o+a86HEEwZg/TPsCjGQAxEKDEygYLlRXhrYGDL qL6PUt//6El3Q== Date: Mon, 29 Jul 2024 18:14:24 -0700 Subject: [PATCH 6/6] xfs_scrub_all: tighten up the security on the background systemd service From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <172229848942.1349910.11059506353208746133.stgit@frogsfrogsfrogs> In-Reply-To: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> References: <172229848851.1349910.300458734867859926.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Currently, xfs_scrub_all has to run with enough privileges to find mounted XFS filesystems and the device associated with that mount and to start xfs_scrub@ sub-services. Minimize the risk of xfs_scrub_all escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub_all.service' in systemd 249. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- scrub/xfs_scrub_all.service.in | 62 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in index 96be90e74..478cd8d05 100644 --- a/scrub/xfs_scrub_all.service.in +++ b/scrub/xfs_scrub_all.service.in @@ -18,3 +18,65 @@ SyslogIdentifier=xfs_scrub_all # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# Run scrub_all with minimal CPU and IO priority so that nothing will starve. +IOSchedulingClass=idle +CPUSchedulingPolicy=idle +CPUAccounting=true +Nice=19 + +# No realtime scheduling +RestrictRealtime=true + +# No special privileges, but we still have to run as root so that we can +# contact the service manager to start the sub-units. +CapabilityBoundingSet= +NoNewPrivileges=true +RestrictSUIDSGID=true + +# Make the entire filesystem readonly. We don't want to hide anything because +# we need to find all mounted XFS filesystems in the host. +ProtectSystem=strict +ProtectHome=read-only +PrivateTmp=false + +# No network access except to the systemd control socket +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=AF_UNIX +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# Media scan stamp file shouldn't be readable by regular users +UMask=0077 + +# lsblk ignores mountpoints if it can't find the device files, so we cannot +# hide them +#ProtectClock=true +#PrivateDevices=true