diff mbox series

[v2,5/9] xfs_db: add inobtcnt upgrade path

Message ID 20201116211351.GT9695@magnolia (mailing list archive)
State Superseded, archived
Headers show
Series None | expand

Commit Message

Darrick J. Wong Nov. 16, 2020, 9:13 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Enable users to upgrade their filesystems to support inode btree block
counters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: set inprogress to force repair (which xfs_admin immediately does),
clean up the code to pass around fewer arguments, and try to revert the
change if we hit io errors
---
 db/sb.c              |  149 ++++++++++++++++++++++++++++++++++++++++++++++++--
 db/xfs_admin.sh      |    6 ++
 man/man8/xfs_admin.8 |   16 +++++
 3 files changed, 163 insertions(+), 8 deletions(-)

Comments

Eric Sandeen Nov. 18, 2020, 9:05 p.m. UTC | #1
On 11/16/20 3:13 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Enable users to upgrade their filesystems to support inode btree block
> counters.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: set inprogress to force repair (which xfs_admin immediately does),
> clean up the code to pass around fewer arguments, and try to revert the
> change if we hit io errors
> ---

sooooo the inprogress thing sets off some unexpected behavior.

In testing this, I noticed that if we have inprogress set, and uknown features/
version on disk, we go looking for backup superblocks and actually end up
corrupting the filesystem before bailing out:

# xfs_repair /dev/pmem0p2 
Phase 1 - find and verify superblock...
bad primary superblock - filesystem mkfs-in-progress bit set !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Superblock has unknown compat/rocompat/incompat features (0x0/0x8/0x0).
Using a more recent xfs_repair is recommended.
Found unsupported filesystem features.  Exiting now.

# xfs_db -c check /dev/pmem0p2
disconnected inode 129, nlink 1
disconnected inode 130, nlink 1

so this seems to have exposed a hole in how repair deals with unknown features
when the inprogress bit is set.

And TBH scampering off to find backup superblocks to "repair" an inprogress
filesystem seems like ... not the right thing to do after a feature upgrade.

I'm not sure what's better, but 

> bad primary superblock - filesystem mkfs-in-progress bit set !!!

seems ... unexpected for this purpose.

-Eric
Darrick J. Wong Nov. 20, 2020, 1:05 a.m. UTC | #2
On Wed, Nov 18, 2020 at 03:05:42PM -0600, Eric Sandeen wrote:
> On 11/16/20 3:13 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Enable users to upgrade their filesystems to support inode btree block
> > counters.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: set inprogress to force repair (which xfs_admin immediately does),
> > clean up the code to pass around fewer arguments, and try to revert the
> > change if we hit io errors
> > ---
> 
> sooooo the inprogress thing sets off some unexpected behavior.
> 
> In testing this, I noticed that if we have inprogress set, and uknown features/
> version on disk, we go looking for backup superblocks and actually end up
> corrupting the filesystem before bailing out:
> 
> # xfs_repair /dev/pmem0p2 
> Phase 1 - find and verify superblock...
> bad primary superblock - filesystem mkfs-in-progress bit set !!!
> 
> attempting to find secondary superblock...
> .found candidate secondary superblock...
> verified secondary superblock...
> writing modified primary superblock
> sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
> resetting superblock realtime bitmap inode pointer to 129
> sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
> resetting superblock realtime summary inode pointer to 130
> Superblock has unknown compat/rocompat/incompat features (0x0/0x8/0x0).
> Using a more recent xfs_repair is recommended.
> Found unsupported filesystem features.  Exiting now.
> 
> # xfs_db -c check /dev/pmem0p2
> disconnected inode 129, nlink 1
> disconnected inode 130, nlink 1
> 
> so this seems to have exposed a hole in how repair deals with unknown features
> when the inprogress bit is set.
> 
> And TBH scampering off to find backup superblocks to "repair" an inprogress
> filesystem seems like ... not the right thing to do after a feature upgrade.
> 
> I'm not sure what's better, but 
> 
> > bad primary superblock - filesystem mkfs-in-progress bit set !!!
> 
> seems ... unexpected for this purpose.

Yeah.  Dave suggested that I use an incompat flag for this, so I think
I'll do that instead since inprogress is such a mess.

--D

> -Eric
>
Eric Sandeen Nov. 20, 2020, 4:10 a.m. UTC | #3
On 11/19/20 7:05 PM, Darrick J. Wong wrote:
> On Wed, Nov 18, 2020 at 03:05:42PM -0600, Eric Sandeen wrote:

...

>> so this seems to have exposed a hole in how repair deals with unknown features
>> when the inprogress bit is set.
>>
>> And TBH scampering off to find backup superblocks to "repair" an inprogress
>> filesystem seems like ... not the right thing to do after a feature upgrade.
>>
>> I'm not sure what's better, but 
>>
>>> bad primary superblock - filesystem mkfs-in-progress bit set !!!
>>
>> seems ... unexpected for this purpose.
> 
> Yeah.  Dave suggested that I use an incompat flag for this, so I think
> I'll do that instead since inprogress is such a mess.

And I'll try to get to validating V5 features earlier in the repair cycle ...

-Eric
diff mbox series

Patch

diff --git a/db/sb.c b/db/sb.c
index e3b1fe0b2e6e..cfc2e32023fc 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -620,6 +620,117 @@  do_version(xfs_agnumber_t agno, uint16_t version, uint32_t features)
 	return 1;
 }
 
+struct v5feat {
+	uint32_t		compat;
+	uint32_t		ro_compat;
+	uint32_t		incompat;
+	uint32_t		log_incompat;
+};
+
+static void
+get_v5_features(
+	struct xfs_mount	*mp,
+	struct v5feat		*feat)
+{
+	feat->compat = mp->m_sb.sb_features_compat;
+	feat->ro_compat = mp->m_sb.sb_features_ro_compat;
+	feat->incompat = mp->m_sb.sb_features_incompat;
+	feat->log_incompat = mp->m_sb.sb_features_log_incompat;
+}
+
+static bool
+set_v5_features(
+	struct xfs_mount	*mp,
+	const struct v5feat	*upgrade)
+{
+	struct xfs_sb		tsb;
+	struct v5feat		old;
+	xfs_agnumber_t		agno = 0;
+	xfs_agnumber_t		revert_agno = 0;
+
+	if (upgrade->compat == mp->m_sb.sb_features_compat &&
+	    upgrade->ro_compat == mp->m_sb.sb_features_ro_compat &&
+	    upgrade->incompat == mp->m_sb.sb_features_incompat &&
+	    upgrade->log_incompat == mp->m_sb.sb_features_log_incompat)
+		return true;
+
+	dbprintf(_("Upgrading V5 filesystem\n"));
+
+	/* Upgrade primary superblock. */
+	if (!get_sb(agno, &tsb))
+		goto fail;
+
+	/* Save old values */
+	old.compat = tsb.sb_features_compat;
+	old.ro_compat = tsb.sb_features_ro_compat;
+	old.incompat = tsb.sb_features_incompat;
+	old.log_incompat = tsb.sb_features_log_incompat;
+
+	/* Update feature flags and force user to run repair before mounting. */
+	tsb.sb_features_compat |= upgrade->compat;
+	tsb.sb_features_ro_compat |= upgrade->ro_compat;
+	tsb.sb_features_incompat |= upgrade->incompat;
+	tsb.sb_features_log_incompat |= upgrade->log_incompat;
+	tsb.sb_inprogress = 1;
+	libxfs_sb_to_disk(iocur_top->data, &tsb);
+
+	/* Write new primary superblock */
+	write_cur();
+	if (!iocur_top->bp || iocur_top->bp->b_error)
+		goto fail;
+
+	/* Update the secondary superblocks, or revert. */
+	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
+		if (!get_sb(agno, &tsb)) {
+			agno--;
+			goto revert;
+		}
+
+		/* Set features on secondary suepr. */
+		tsb.sb_features_compat |= upgrade->compat;
+		tsb.sb_features_ro_compat |= upgrade->ro_compat;
+		tsb.sb_features_incompat |= upgrade->incompat;
+		tsb.sb_features_log_incompat |= upgrade->log_incompat;
+		libxfs_sb_to_disk(iocur_top->data, &tsb);
+		write_cur();
+
+		/* Write or abort. */
+		if (!iocur_top->bp || iocur_top->bp->b_error)
+			goto revert;
+	}
+
+	/* All superblocks updated, update the incore values. */
+	mp->m_sb.sb_features_compat |= upgrade->compat;
+	mp->m_sb.sb_features_ro_compat |= upgrade->ro_compat;
+	mp->m_sb.sb_features_incompat |= upgrade->incompat;
+	mp->m_sb.sb_features_log_incompat |= upgrade->log_incompat;
+
+	dbprintf(_("Upgraded V5 filesystem.  Please run xfs_repair.\n"));
+	return true;
+
+revert:
+	/*
+	 * Try to revert feature flag changes, and don't worry if we fail.
+	 * We're probably in a mess anyhow, and the admin will have to run
+	 * repair anyways.
+	 */
+	for (revert_agno = 0; revert_agno <= agno; revert_agno++) {
+		if (!get_sb(revert_agno, &tsb))
+			continue;
+
+		tsb.sb_features_compat = old.compat;
+		tsb.sb_features_ro_compat = old.ro_compat;
+		tsb.sb_features_incompat = old.incompat;
+		tsb.sb_features_log_incompat = old.log_incompat;
+		libxfs_sb_to_disk(iocur_top->data, &tsb);
+		write_cur();
+	}
+fail:
+	dbprintf(
+_("Failed to upgrade V5 filesystem at AG %d, please run xfs_repair.\n"), agno);
+	return false;
+}
+
 static char *
 version_string(
 	xfs_sb_t	*sbp)
@@ -692,12 +803,7 @@  version_string(
 	return s;
 }
 
-/*
- * XXX: this only supports reading and writing to version 4 superblock fields.
- * V5 superblocks always define certain V4 feature bits - they are blocked from
- * being changed if a V5 sb is detected, but otherwise v5 superblock features
- * are not handled here.
- */
+/* Upgrade a superblock to support a feature. */
 static int
 version_f(
 	int		argc,
@@ -708,6 +814,9 @@  version_f(
 	xfs_agnumber_t	ag;
 
 	if (argc == 2) {	/* WRITE VERSION */
+		struct v5feat	v5features;
+
+		get_v5_features(mp, &v5features);
 
 		if ((x.isreadonly & LIBXFS_ISREADONLY) || !expert_mode) {
 			dbprintf(_("%s: not in expert mode, writing disabled\n"),
@@ -716,7 +825,28 @@  version_f(
 		}
 
 		/* Logic here derived from the IRIX xfs_chver(1M) script. */
-		if (!strcasecmp(argv[1], "extflg")) {
+		if (!strcasecmp(argv[1], "inobtcount")) {
+			if (xfs_sb_version_hasinobtcounts(&mp->m_sb)) {
+				dbprintf(
+		_("inode btree counter feature is already enabled\n"));
+				exitcode = 1;
+				return 1;
+			}
+			if (!xfs_sb_version_hasfinobt(&mp->m_sb)) {
+				dbprintf(
+		_("inode btree counter feature cannot be enabled on filesystems lacking free inode btrees\n"));
+				exitcode = 1;
+				return 1;
+			}
+			if (!xfs_sb_version_hascrc(&mp->m_sb)) {
+				dbprintf(
+		_("inode btree counter feature cannot be enabled on pre-V5 filesystems\n"));
+				exitcode = 1;
+				return 1;
+			}
+
+			v5features.ro_compat |= XFS_SB_FEAT_RO_COMPAT_INOBTCNT;
+		} else if (!strcasecmp(argv[1], "extflg")) {
 			switch (XFS_SB_VERSION_NUM(&mp->m_sb)) {
 			case XFS_SB_VERSION_1:
 				version = 0x0004 | XFS_SB_VERSION_EXTFLGBIT;
@@ -807,6 +937,11 @@  version_f(
 			mp->m_sb.sb_versionnum = version;
 			mp->m_sb.sb_features2 = features;
 		}
+
+		if (!set_v5_features(mp, &v5features)) {
+			exitcode = 1;
+			return 1;
+		}
 	}
 
 	if (argc == 3) {	/* VERSIONNUM + FEATURES2 */
diff --git a/db/xfs_admin.sh b/db/xfs_admin.sh
index bd325da2f776..41a14d4521ba 100755
--- a/db/xfs_admin.sh
+++ b/db/xfs_admin.sh
@@ -9,7 +9,7 @@  DB_OPTS=""
 REPAIR_OPTS=""
 USAGE="Usage: xfs_admin [-efjlpuV] [-c 0|1] [-L label] [-U uuid] device [logdev]"
 
-while getopts "efjlpuc:L:U:V" c
+while getopts "efjlpuc:L:O:U:V" c
 do
 	case $c in
 	c)	REPAIR_OPTS=$REPAIR_OPTS" -c lazycount="$OPTARG;;
@@ -19,6 +19,9 @@  do
 	l)	DB_OPTS=$DB_OPTS" -r -c label";;
 	L)	DB_OPTS=$DB_OPTS" -c 'label "$OPTARG"'";;
 	p)	DB_OPTS=$DB_OPTS" -c 'version projid32bit'";;
+	O)	DB_OPTS=$DB_OPTS" -c 'version "$OPTARG"'";
+		# Force repair to run by adding a single space to REPAIR_OPTS
+		REPAIR_OPTS="$REPAIR_OPTS ";;
 	u)	DB_OPTS=$DB_OPTS" -r -c uuid";;
 	U)	DB_OPTS=$DB_OPTS" -c 'uuid "$OPTARG"'";;
 	V)	xfs_db -p xfs_admin -V
@@ -48,6 +51,7 @@  case $# in
 		fi
 		if [ -n "$REPAIR_OPTS" ]
 		then
+			echo "Running xfs_repair to ensure filesystem consistency."
 			# Hide normal repair output which is sent to stderr
 			# assuming the filesystem is fine when a user is
 			# running xfs_admin.
diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8
index 8afc873fb50a..65ca6afc1e12 100644
--- a/man/man8/xfs_admin.8
+++ b/man/man8/xfs_admin.8
@@ -6,6 +6,8 @@  xfs_admin \- change parameters of an XFS filesystem
 [
 .B \-eflpu
 ] [
+.BR \-O " feature"
+] [
 .BR "\-c 0" | 1
 ] [
 .B \-L
@@ -103,6 +105,20 @@  The filesystem label can be cleared using the special "\c
 " value for
 .IR label .
 .TP
+.BI \-O " feature"
+Add a new feature to the filesystem.
+Only one feature can be specified at a time.
+Features are as follows:
+.RS 0.7i
+.TP
+.B inobtcount
+Upgrade the filesystem to support the inode btree counters feature.
+This reduces mount time by caching the size of the inode btrees in the
+allocation group metadata.
+Once enabled, the filesystem will not be writable by older kernels.
+The filesystem cannot be downgraded after this feature is enabled.
+.RE
+.TP
 .BI \-U " uuid"
 Set the UUID of the filesystem to
 .IR uuid .