From patchwork Fri Dec 30 22:18:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75C07C4332F for ; Sat, 31 Dec 2022 00:36:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235722AbiLaAgJ (ORCPT ); Fri, 30 Dec 2022 19:36:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235750AbiLaAgH (ORCPT ); Fri, 30 Dec 2022 19:36:07 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D987512A8D for ; Fri, 30 Dec 2022 16:36:06 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 688EA61CAA for ; Sat, 31 Dec 2022 00:36:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C087EC433D2; Sat, 31 Dec 2022 00:36:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446965; bh=ivCyPMTxgydXyBsNfhSM9J+xa6rIloBozJrCTvqQ/Bw=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=rOMyZlxS6yrPUMmex4oYNN1iSXRV3K3KKfAkz5lBFFXbMTu7cVbg7hmPAk2Pg4c7p FygmuJevvu6TgrCeMkV9xlh8LqIipt6r8e12+gWrmiUPLPBtBRuRAw0v9DjGQEzH+w WeMx2gpqHW6Ubu6LxQbNLVGFMGQR7LH/ZHvR11htLBIhzdNGUG6lvPSgcoOD/ksO4P rkalAZ3JEq7ReYJzxfl0VkiBbz1q8R6vCYKedG6ccJPtixGCHXyufnpvfrq3XNSZeT PlMcZlYWkr+L6apfGSQSA4vNJsT3kYiTx+jVyJsqNQ5Ltl3waBDPxjmKqoUI9JqAoL VM61ZhmoITkBA== Subject: [PATCH 1/5] xfs_scrub: allow auxiliary pathnames for sandboxing From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:34 -0800 Message-ID: <167243871478.718298.14119656193739596554.stgit@magnolia> In-Reply-To: <167243871464.718298.4729609315819255063.stgit@magnolia> References: <167243871464.718298.4729609315819255063.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In the next patch, we'll tighten up the security on the xfs_scrub service so that it can't escape. However, sanboxing the service involves making the host filesystem as inaccessible as possible, with the filesystem to scrub bind mounted onto a known location within the sandbox. Hence we need one path for reporting and a new -A argument to tell scrub what it should actually be trying to open. Signed-off-by: Darrick J. Wong --- doc/README-env-vars.txt | 2 ++ scrub/phase1.c | 4 ++-- scrub/vfs.c | 2 +- scrub/xfs_scrub.c | 9 +++++++-- scrub/xfs_scrub.h | 5 ++++- 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/doc/README-env-vars.txt b/doc/README-env-vars.txt index eec59a82513..d7984df8202 100644 --- a/doc/README-env-vars.txt +++ b/doc/README-env-vars.txt @@ -24,3 +24,5 @@ XFS_SCRUB_THREADS -- start exactly this number of threads Available even in non-debug mode: SERVICE_MODE -- compress all error codes to 1 for LSB service action compliance +SERVICE_MOUNTPOINT -- actual path to open for issuing kernel + scrub calls diff --git a/scrub/phase1.c b/scrub/phase1.c index faa554f1e1e..80fd0c6e27c 100644 --- a/scrub/phase1.c +++ b/scrub/phase1.c @@ -146,7 +146,7 @@ phase1_func( * CAP_SYS_ADMIN, which we probably need to do anything fancy * with the (XFS driver) kernel. */ - error = -xfd_open(&ctx->mnt, ctx->mntpoint, + error = -xfd_open(&ctx->mnt, ctx->actual_mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY); if (error) { if (error == EPERM) @@ -199,7 +199,7 @@ _("Not an XFS filesystem.")); return error; } - error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle, + error = path_to_fshandle(ctx->actual_mntpoint, &ctx->fshandle, &ctx->fshandle_len); if (error) { str_errno(ctx, _("getting fshandle")); diff --git a/scrub/vfs.c b/scrub/vfs.c index 85ee2694b00..c64c6c41105 100644 --- a/scrub/vfs.c +++ b/scrub/vfs.c @@ -249,7 +249,7 @@ scan_fs_tree( goto out_cond; } - ret = queue_subdir(ctx, &sft, &wq, ctx->mntpoint, true); + ret = queue_subdir(ctx, &sft, &wq, ctx->actual_mntpoint, true); if (ret) { str_liberror(ctx, ret, _("queueing directory scan")); goto out_wq; diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c index bdee8e4fdae..23d8fec5d9b 100644 --- a/scrub/xfs_scrub.c +++ b/scrub/xfs_scrub.c @@ -118,6 +118,8 @@ * Available even in non-debug mode: * SERVICE_MODE -- compress all error codes to 1 for LSB * service action compliance + * SERVICE_MOUNTPOINT -- actual path to open for issuing kernel + * scrub calls */ /* Program name; needed for libfrog error reports. */ @@ -739,6 +741,9 @@ main( usage(); ctx.mntpoint = argv[optind]; + ctx.actual_mntpoint = getenv("SERVICE_MOUNTPOINT"); + if (!ctx.actual_mntpoint) + ctx.actual_mntpoint = ctx.mntpoint; stdout_isatty = isatty(STDOUT_FILENO); stderr_isatty = isatty(STDERR_FILENO); @@ -756,7 +761,7 @@ main( return SCRUB_RET_OPERROR; /* Find the mount record for the passed-in argument. */ - if (stat(argv[optind], &ctx.mnt_sb) < 0) { + if (stat(ctx.actual_mntpoint, &ctx.mnt_sb) < 0) { fprintf(stderr, _("%s: could not stat: %s: %s\n"), progname, argv[optind], strerror(errno)); @@ -779,7 +784,7 @@ main( } fs_table_initialise(0, NULL, 0, NULL); - fsp = fs_table_lookup_mount(ctx.mntpoint); + fsp = fs_table_lookup_mount(ctx.actual_mntpoint); if (!fsp) { fprintf(stderr, _("%s: Not a XFS mount point.\n"), ctx.mntpoint); diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h index 004d2d02587..2ef8b2e5066 100644 --- a/scrub/xfs_scrub.h +++ b/scrub/xfs_scrub.h @@ -36,9 +36,12 @@ enum error_action { struct scrub_ctx { /* Immutable scrub state. */ - /* Strings we need for presentation */ + /* Mountpoint we use for presentation */ char *mntpoint; + /* Actual VFS path to the filesystem */ + char *actual_mntpoint; + /* Mountpoint info */ struct stat mnt_sb; struct statvfs mnt_sv; From patchwork Fri Dec 30 22:18:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085178 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EC35C4332F for ; Sat, 31 Dec 2022 00:36:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235798AbiLaAg0 (ORCPT ); Fri, 30 Dec 2022 19:36:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235750AbiLaAgZ (ORCPT ); Fri, 30 Dec 2022 19:36:25 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AF5112A9B for ; Fri, 30 Dec 2022 16:36:24 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B643CB81DCF for ; Sat, 31 Dec 2022 00:36:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 502E9C433EF; Sat, 31 Dec 2022 00:36:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446981; bh=O3uUCjKJzQlH10kn+DlvqbqN7inSc81HH1d2Xt9Rer4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Uut3/c9ppS0ukw3TDWmZRX/ZNw4+NYNZaIt2tM2uMG89O8vYqlemWw5O6OzKLYs7G K3ZTpbpXHqDvk7+VFw7Y2Q9DpuTmpJ1ihb1obCYO3ZDUJkseV9b0r832AcVxIijttp Ch3A8v6nSqlg/Av+XASD/614O5S7adKRqpxsysmB/AX9h3uBZvuj3BErJ4TrwABCAC K/czqvWSAe+nGoT1urSWJ5aLf8kudmXmN2mgTisDGWQw7Z2is5H4sVICsZKE0Ds0jB BmJpX8RAI1ohbqvzW0E3MZiQwhU1jUpm9WBAFlQ+KcaYHF2eaSWJKW4gcAw4Pdd6QZ E02QRyIqTsqaQ== Subject: [PATCH 2/5] xfs_scrub.service: reduce CPU usage to 60% when possible From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:34 -0800 Message-ID: <167243871491.718298.15672577969573659238.stgit@magnolia> In-Reply-To: <167243871464.718298.4729609315819255063.stgit@magnolia> References: <167243871464.718298.4729609315819255063.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Currently, the xfs_scrub background service is configured to use -b, which means that the program runs completely serially. However, even using 100% of one CPU with idle priority may be enough to cause thermal throttling and unwanted fan noise on smaller systems (e.g. laptops) with fast IO systems. Let's try to avoid this (at least on systemd) by using cgroups to limit the program's usage to 60% of one CPU and lowering the nice priority in the scheduler. What we /really/ want is to run steadily on an efficiency core, but there doesn't seem to be a means to ask the scheduler not to ramp up the CPU frequency for a particular task. While we're at it, group the resource limit directives together. Signed-off-by: Darrick J. Wong --- scrub/Makefile | 7 ++++++- scrub/system-xfs_scrub.slice | 30 ++++++++++++++++++++++++++++++ scrub/xfs_scrub@.service.in | 12 ++++++++++-- scrub/xfs_scrub_all.service.in | 4 ++++ scrub/xfs_scrub_fail@.service.in | 4 ++++ 5 files changed, 54 insertions(+), 3 deletions(-) create mode 100644 scrub/system-xfs_scrub.slice diff --git a/scrub/Makefile b/scrub/Makefile index 2dc0fe1935c..1c36621b400 100644 --- a/scrub/Makefile +++ b/scrub/Makefile @@ -15,7 +15,12 @@ XFS_SCRUB_ALL_PROG = xfs_scrub_all XFS_SCRUB_ARGS = -b -n ifeq ($(HAVE_SYSTEMD),yes) INSTALL_SCRUB += install-systemd -SYSTEMD_SERVICES = xfs_scrub@.service xfs_scrub_all.service xfs_scrub_all.timer xfs_scrub_fail@.service +SYSTEMD_SERVICES=\ + xfs_scrub@.service \ + xfs_scrub_fail@.service \ + xfs_scrub_all.service \ + xfs_scrub_all.timer \ + system-xfs_scrub.slice OPTIONAL_TARGETS += $(SYSTEMD_SERVICES) endif ifeq ($(HAVE_CROND),yes) diff --git a/scrub/system-xfs_scrub.slice b/scrub/system-xfs_scrub.slice new file mode 100644 index 00000000000..051cbb14108 --- /dev/null +++ b/scrub/system-xfs_scrub.slice @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2022 Oracle. All Rights Reserved. +# Author: Darrick J. Wong + +[Unit] +Description=xfs_scrub background service slice +Before=slices.target + +[Slice] + +# If the CPU usage cgroup controller is available, don't use more than 60% of a +# single core for all background processes. +CPUQuota=60% +CPUAccounting=true + +[Install] +# As of systemd 249, the systemd cgroupv2 configuration code will drop resource +# controllers from the root and system.slice cgroups at startup if it doesn't +# find any direct dependencies that require a given controller. Newly +# activated units with resource control directives are created under the system +# slice but do not cause a reconfiguration of the slice's resource controllers. +# Hence we cannot put CPUQuota= into the xfs_scrub service units directly. +# +# For the CPUQuota directive to have any effect, we must therefore create an +# explicit definition file for the slice that systemd creates to contain the +# xfs_scrub instance units (e.g. xfs_scrub@.service) and we must configure this +# slice as a dependency of the system slice to establish the direct dependency +# relation. +WantedBy=system.slice diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in index c8662fc85a6..3c64252de49 100644 --- a/scrub/xfs_scrub@.service.in +++ b/scrub/xfs_scrub@.service.in @@ -18,8 +18,16 @@ PrivateTmp=no AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO NoNewPrivileges=yes User=nobody -IOSchedulingClass=idle -CPUSchedulingPolicy=idle Environment=SERVICE_MODE=1 ExecStart=@sbindir@/xfs_scrub @scrub_args@ %I SyslogIdentifier=%N + +# Run scrub with minimal CPU and IO priority so that nothing else will starve. +IOSchedulingClass=idle +CPUSchedulingPolicy=idle +CPUAccounting=true +Nice=19 + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in index b874eb6f757..ae4135033dd 100644 --- a/scrub/xfs_scrub_all.service.in +++ b/scrub/xfs_scrub_all.service.in @@ -14,3 +14,7 @@ Type=oneshot Environment=SERVICE_MODE=1 ExecStart=@sbindir@/xfs_scrub_all SyslogIdentifier=xfs_scrub_all + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in index ac0cb2e283b..591486599ce 100644 --- a/scrub/xfs_scrub_fail@.service.in +++ b/scrub/xfs_scrub_fail@.service.in @@ -14,3 +14,7 @@ ExecStart=@pkg_lib_dir@/@pkg_name@/xfs_scrub_fail "${EMAIL_ADDR}" %I User=mail Group=mail SupplementaryGroups=systemd-journal + +# Create the service underneath the scrub background service slice so that we +# can control resource usage. +Slice=system-xfs_scrub.slice From patchwork Fri Dec 30 22:18:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8726DC4332F for ; Sat, 31 Dec 2022 00:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235801AbiLaAgl (ORCPT ); Fri, 30 Dec 2022 19:36:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235750AbiLaAgk (ORCPT ); Fri, 30 Dec 2022 19:36:40 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6522F12A9B for ; Fri, 30 Dec 2022 16:36:39 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1908AB80883 for ; Sat, 31 Dec 2022 00:36:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D04A8C433EF; Sat, 31 Dec 2022 00:36:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672446996; bh=xy18g4wuD3ySapQDCLPeEQVkt6IvM8/lkW6E35+ksHE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=VVsvtVFEAdxQzOunkCcim5cfcHurvg+hz9rnt+VO2aRlLb41Sf075tMfTrd0USs4V itQmr5P9fhMPQ4Eoh4eSUNKVVvdN5iAT7a4DHs3axFcHY0E02jI6LSSZws4Fmokxcy 8vJY8mByKJrk/SW87QWZJmadzMplPdUgzsNW06hcNfd66oUBOYC5D3nBDUr1oNniHc xoWm3g6/kEPidFkFclYFLL1UDTKH4y0I1mbi+T4USXhS8EeikRBb75mJzu7ZKtue+o 4fL24fZA9kvIOt9UWJtJbfzNBO6Mt4UADRY/wG8faQvmu3Qek+cVqvtnWpK/Dq0UhN eK/8ChT5PajYw== Subject: [PATCH 3/5] xfs_scrub: tighten up the security on the background systemd service From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:35 -0800 Message-ID: <167243871504.718298.11721955751660856262.stgit@magnolia> In-Reply-To: <167243871464.718298.4729609315819255063.stgit@magnolia> References: <167243871464.718298.4729609315819255063.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Currently, xfs_scrub has to run with some elevated privileges. Minimize the risk of xfs_scrub escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub@.service' in systemd 249. Signed-off-by: Darrick J. Wong --- scrub/xfs_scrub@.service.in | 73 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 65 insertions(+), 8 deletions(-) diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in index 3c64252de49..39af00d4b73 100644 --- a/scrub/xfs_scrub@.service.in +++ b/scrub/xfs_scrub@.service.in @@ -10,15 +10,8 @@ Documentation=man:xfs_scrub(8) [Service] Type=oneshot -PrivateNetwork=true -ProtectSystem=full -ProtectHome=read-only -# Disable private /tmp just in case %i is a path under /tmp. -PrivateTmp=no -AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO -NoNewPrivileges=yes -User=nobody Environment=SERVICE_MODE=1 +Environment=SERVICE_MOUNTPOINT=/tmp/scrub ExecStart=@sbindir@/xfs_scrub @scrub_args@ %I SyslogIdentifier=%N @@ -31,3 +24,67 @@ Nice=19 # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# No realtime CPU scheduling +RestrictRealtime=true + +# Dynamically create a user that isn't root +DynamicUser=true + +# Make the entire filesystem readonly and /home inaccessible, then bind mount +# the filesystem we're supposed to be checking into our private /tmp dir. +# 'norbind' means that we don't bind anything under that original mount. +ProtectSystem=strict +ProtectHome=yes +PrivateTmp=true +BindPaths=/%I:/tmp/scrub:norbind + +# Don't let scrub complain about paths in /etc/projects that have been hidden +# by our sandboxing. scrub doesn't care about project ids anyway. +InaccessiblePaths=-/etc/projects + +# No network access +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=none +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_scrub needs these privileges to run, and no others +CapabilityBoundingSet=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +NoNewPrivileges=true + +# xfs_scrub doesn't create files +UMask=7777 + +# No access to hardware /dev files except for block devices +ProtectClock=true +DevicePolicy=closed +DeviceAllow=block-* From patchwork Fri Dec 30 22:18:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085180 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22C40C4332F for ; Sat, 31 Dec 2022 00:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235841AbiLaAhC (ORCPT ); Fri, 30 Dec 2022 19:37:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235813AbiLaAgy (ORCPT ); Fri, 30 Dec 2022 19:36:54 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79C1E1EAC0 for ; Fri, 30 Dec 2022 16:36:53 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 10A4E61CF1 for ; Sat, 31 Dec 2022 00:36:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71D6BC433D2; Sat, 31 Dec 2022 00:36:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672447012; bh=yYLA/WWglwQAwiwi9u3fDtHGUtZL56FWgE/IomGSDDY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=RRkpv4SYOMHNEjXDmdixNzf8lMJLHGfMrzyMeBA+yx6kwifIayZ+Y64cO8MZu2Q6S Mxi8o8QLJselOQHKPsYCIF3RV4dz8JL0oIHGhuStwhQo6ji2XdHiZuEVmWzglj9xzR gdY6myTrwQVkj5SkjmsiMmVMFPNRAYSkU7ktjif6wRrkV3GLM927KQXgQV0WzcE7qf 9OLmP3HcRHp6yUcslveVNp4rVJvpxyruNC//FzF6fIB4N5qoEy+obApGsgaDgy+hUI 3dW+iiXVRrLq6YBto5mj/9Bbi5vSnFba0OAjSbnFk/DE3m91R/4ApX+mw++D+kKUaP j87LDIv8+rJzw== Subject: [PATCH 4/5] xfs_scrub_fail: tighten up the security on the background systemd service From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:35 -0800 Message-ID: <167243871517.718298.1106619899786924335.stgit@magnolia> In-Reply-To: <167243871464.718298.4729609315819255063.stgit@magnolia> References: <167243871464.718298.4729609315819255063.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Currently, xfs_scrub_fail has to run with enough privileges to access the journal contents for a given scrub run and to send a report via email. Minimize the risk of xfs_scrub_fail escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub_fail@.service' in systemd 249. Signed-off-by: Darrick J. Wong --- scrub/xfs_scrub_fail@.service.in | 56 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in index 591486599ce..2c36c47ab02 100644 --- a/scrub/xfs_scrub_fail@.service.in +++ b/scrub/xfs_scrub_fail@.service.in @@ -18,3 +18,59 @@ SupplementaryGroups=systemd-journal # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# No realtime scheduling +RestrictRealtime=true + +# Make the entire filesystem readonly and /home inaccessible, then bind mount +# the filesystem we're supposed to be checking into our private /tmp dir. +ProtectSystem=full +ProtectHome=yes +PrivateTmp=true +RestrictSUIDSGID=true + +# Emailing reports requires network access, but not the ability to change the +# hostname. +ProtectHostname=true + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Can't hide /proc because journalctl needs it to find various pieces of log +# information +#ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_scrub needs these privileges to run, and no others +CapabilityBoundingSet= +NoNewPrivileges=true + +# Failure reporting shouldn't create world-readable files +UMask=0077 + +# Clean up any IPC objects when this unit stops +RemoveIPC=true + +# No access to hardware device files +PrivateDevices=true +ProtectClock=true From patchwork Fri Dec 30 22:18:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13BE7C4332F for ; Sat, 31 Dec 2022 00:37:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235802AbiLaAhK (ORCPT ); Fri, 30 Dec 2022 19:37:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235750AbiLaAhJ (ORCPT ); Fri, 30 Dec 2022 19:37:09 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 046711E3FE for ; Fri, 30 Dec 2022 16:37:09 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9547A61CF1 for ; Sat, 31 Dec 2022 00:37:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02140C433D2; Sat, 31 Dec 2022 00:37:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672447028; bh=CWYkcDjmuCYanAa5NSUJAbgh3lrsl8DtS8Jo2ChSEig=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=tahRgNEZkRDlED4X9/lVEqERkDz2neMfjNldbFjNl/A0h/hb+Ni3trlOvWT/Don1p vW4WNhBuCBVoUt3c0IYTY3XqJwPsSlQuWQ3ELXvGWxUAnEVZZQobK5+qqIkf/WsPQP K6bk4WWmJf3EoO6PqNJ4s0p+Wn9znm96Imom/mYzA/z9F60wt3tm+cuPtRLH6wqPOb Cm6n3OZYvshMV2OeX4e0hZAwHsvhmI2JI2QiWTPshOI3Yt458sMoJfEDE74wxJC+e+ /ApVBhqEm9uyJD6cx71EiIVGhqTIYZ7zpvHWcpzlcDc3HMZasWH3dGqqP4oKTFqit5 zf2YlLoqBuC6Q== Subject: [PATCH 5/5] xfs_scrub_all: tighten up the security on the background systemd service From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:18:35 -0800 Message-ID: <167243871531.718298.13745628368000596845.stgit@magnolia> In-Reply-To: <167243871464.718298.4729609315819255063.stgit@magnolia> References: <167243871464.718298.4729609315819255063.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Currently, xfs_scrub_all has to run with enough privileges to find mounted XFS filesystems and the device associated with that mount and to start xfs_scrub@ sub-services. Minimize the risk of xfs_scrub_all escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub_all.service' in systemd 249. Signed-off-by: Darrick J. Wong --- scrub/xfs_scrub_all.service.in | 62 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in index ae4135033dd..c1c6012b47d 100644 --- a/scrub/xfs_scrub_all.service.in +++ b/scrub/xfs_scrub_all.service.in @@ -18,3 +18,65 @@ SyslogIdentifier=xfs_scrub_all # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# Run scrub_all with minimal CPU and IO priority so that nothing will starve. +IOSchedulingClass=idle +CPUSchedulingPolicy=idle +CPUAccounting=true +Nice=19 + +# No realtime scheduling +RestrictRealtime=true + +# No special privileges, but we still have to run as root so that we can +# contact the service manager to start the sub-units. +CapabilityBoundingSet= +NoNewPrivileges=true +RestrictSUIDSGID=true + +# Make the entire filesystem readonly. We don't want to hide anything because +# we need to find all mounted XFS filesystems in the host. +ProtectSystem=strict +ProtectHome=read-only +PrivateTmp=false + +# No network access except to the systemd control socket +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=AF_UNIX +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# Media scan stamp file shouldn't be readable by regular users +UMask=0077 + +# lsblk ignores mountpoints if it can't find the device files, so we cannot +# hide them +#ProtectClock=true +#PrivateDevices=true