diff mbox series

xfs_scrub_all.timer: don't run if /var/lib/xfsprogs is readonly

Message ID 20250122020025.GL1611770@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series xfs_scrub_all.timer: don't run if /var/lib/xfsprogs is readonly | expand

Commit Message

Darrick J. Wong Jan. 22, 2025, 2 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

The xfs_scrub_all program wants to write a state file into the package
state dir to keep track of how recently it performed a media scan.
Don't allow the systemd timer to run if that path isn't writable.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 267ae610a3d90f ("xfs_scrub_all: enable periodic file data scrubs automatically")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/Makefile               |    6 +++++-
 scrub/xfs_scrub_all.timer.in |    3 ++-
 2 files changed, 7 insertions(+), 2 deletions(-)
 rename scrub/{xfs_scrub_all.timer => xfs_scrub_all.timer.in} (77%)

Comments

Christoph Hellwig Jan. 22, 2025, 6:02 a.m. UTC | #1
On Tue, Jan 21, 2025 at 06:00:25PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The xfs_scrub_all program wants to write a state file into the package
> state dir to keep track of how recently it performed a media scan.
> Don't allow the systemd timer to run if that path isn't writable.

Why would the path not be writable?  Do we need a different place for
it that is guaranteed to be writable even for setups they try to keep
much of the system read-only?
Darrick J. Wong Jan. 22, 2025, 7:18 a.m. UTC | #2
On Wed, Jan 22, 2025 at 07:02:30AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 21, 2025 at 06:00:25PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The xfs_scrub_all program wants to write a state file into the package
> > state dir to keep track of how recently it performed a media scan.
> > Don't allow the systemd timer to run if that path isn't writable.
> 
> Why would the path not be writable?  Do we need a different place for
> it that is guaranteed to be writable even for setups they try to keep
> much of the system read-only?

Eh, it's mostly to shut up systems where even /var/lib is readonly.
IIRC systemd's volatile mode isn't quite that silly, but it'd be nice
to avoid xfs_scrub_all repeatedly failing every time the timer fires.

OTOH maybe a better solution is just to run scrub in media scan mode if
the media scan stamp file can't be opened for writing?

--D
Christoph Hellwig Jan. 22, 2025, 7:22 a.m. UTC | #3
On Tue, Jan 21, 2025 at 11:18:29PM -0800, Darrick J. Wong wrote:
> Eh, it's mostly to shut up systems where even /var/lib is readonly.

What are those systems?  They must still provide some place writable
for semi-persistant data, so we should look for that?

> IIRC systemd's volatile mode isn't quite that silly, but it'd be nice
> to avoid xfs_scrub_all repeatedly failing every time the timer fires.
> 
> OTOH maybe a better solution is just to run scrub in media scan mode if
> the media scan stamp file can't be opened for writing?

Or not run it at all?  Either way I'd like to understand what causes
this.
Darrick J. Wong Jan. 22, 2025, 7:34 p.m. UTC | #4
On Wed, Jan 22, 2025 at 08:22:47AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 21, 2025 at 11:18:29PM -0800, Darrick J. Wong wrote:
> > Eh, it's mostly to shut up systems where even /var/lib is readonly.
> 
> What are those systems?  They must still provide some place writable
> for semi-persistant data, so we should look for that?

The particular place that I noticed this was on my fstests fleet, where
the root filesystem is an ro nfs4 export.  I forgot to configure an
overlayfs for /var/lib/xfsprogs, so when I upgraded it to xfsprogs 6.12
and left the VMs running on a Sunday morning, they tried to start
xfs_scrub_all and failed.  Then the monitoring systems emailed me about
that, and I got 150 emails. :(

This /should/ be a pretty uncommon situation since (AFAICT) most
readonly-root systems set up a writable (and possibly volatile)
/var/lib, but I thought I should just turn off the timer if it's going
to fail anyway.

> > IIRC systemd's volatile mode isn't quite that silly, but it'd be nice
> > to avoid xfs_scrub_all repeatedly failing every time the timer fires.
> > 
> > OTOH maybe a better solution is just to run scrub in media scan mode if
> > the media scan stamp file can't be opened for writing?
> 
> Or not run it at all?  Either way I'd like to understand what causes
> this.

<nod>

--D
diff mbox series

Patch

diff --git a/scrub/Makefile b/scrub/Makefile
index 1e1109048c2a83..934b9062651bf1 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -108,10 +108,14 @@  endif
 # Automatically trigger a media scan once per month
 XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_INTERVAL=1mo
 
-LDIRT = $(XFS_SCRUB_ALL_PROG) $(XFS_SCRUB_FAIL_PROG) *.service *.cron
+LDIRT = $(XFS_SCRUB_ALL_PROG) $(XFS_SCRUB_FAIL_PROG) *.service *.cron xfs_scrub_all.timer
 
 default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG) $(XFS_SCRUB_FAIL_PROG) $(OPTIONAL_TARGETS)
 
+xfs_scrub_all.timer: xfs_scrub_all.timer.in $(builddefs)
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@pkg_state_dir@|$(PKG_STATE_DIR)|g" < $< > $@
+
 xfs_scrub_all: xfs_scrub_all.in $(builddefs)
 	@echo "    [SED]    $@"
 	$(Q)$(SED) -e "s|@sbindir@|$(PKG_SBIN_DIR)|g" \
diff --git a/scrub/xfs_scrub_all.timer b/scrub/xfs_scrub_all.timer.in
similarity index 77%
rename from scrub/xfs_scrub_all.timer
rename to scrub/xfs_scrub_all.timer.in
index f0c557fc380391..9008f036d496c0 100644
--- a/scrub/xfs_scrub_all.timer
+++ b/scrub/xfs_scrub_all.timer.in
@@ -1,10 +1,11 @@ 
 # SPDX-License-Identifier: GPL-2.0-or-later
 #
-# Copyright (C) 2018-2024 Oracle.  All Rights Reserved.
+# Copyright (C) 2018-2025 Oracle.  All Rights Reserved.
 # Author: Darrick J. Wong <djwong@kernel.org>
 
 [Unit]
 Description=Periodic XFS Online Metadata Check for All Filesystems
+ConditionPathIsReadWrite=@pkg_state_dir@
 
 [Timer]
 # Run on Sunday at 3:10am, to avoid running afoul of DST changes