From patchwork Sun Dec 31 20:42:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507465 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AC69BA22 for ; Sun, 31 Dec 2023 20:42:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Zj0DAVOa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3CD6C433C8; Sun, 31 Dec 2023 20:42:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704055334; bh=4BU3PmoSsnUtaVJ5Hwf0NJxgyA32fJwEUDNQWEQBQtY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Zj0DAVOaifqeAAnvVrmK0ZaZ8p8aiUTC4eT0zLOIPVXWEW/ZeWYxAwXwY0FbOKKP2 LoCCYNWr+zMlGECUnQk6XkTPFZ1wWUelUGQ4LK7sD1Yfb4IQiGq4EJgULeGnrI+JS+ qTKE5TvNEbpalFmWNzJwqFLJygxaVxhPj9mqpzjjUC6kD5u9vzqoqUX+g216UDRKiu 4ZsjBcsUS938XH0CvTRS41gTi8O5DBX+6v5q0MDZq/wK3QPyTqVWad7PIHfVe4WB4T /LhgwL6s4CEDUri4qTjp0aeLcRC4LfjN4JbFV6scNBWvHj2Z/I6L/f4pAR5jrb0whm G9s2XzSrHt21g== Date: Sun, 31 Dec 2023 12:42:14 -0800 Subject: [PATCH 1/4] docs: update the parent pointers documentation to the final version From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404839494.1756140.2227622227607708094.stgit@frogsfrogsfrogs> In-Reply-To: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> References: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we've decided on the ondisk format of parent pointers, update the documentation to reflect that. Signed-off-by: Darrick J. Wong --- .../filesystems/xfs-online-fsck-design.rst | 91 +++++++++++--------- 1 file changed, 50 insertions(+), 41 deletions(-) diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst index 827fcd49fe6d5..8fb0fc30f3fa4 100644 --- a/Documentation/filesystems/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs-online-fsck-design.rst @@ -4464,10 +4464,11 @@ reconstruction of filesystem space metadata. The parent pointer feature, however, makes total directory reconstruction possible. -XFS parent pointers include the dirent name and location of the entry within -the parent directory. +XFS parent pointers contain the information needed to identify the +corresponding directory entry in the parent directory. In other words, child files use extended attributes to store pointers to -parents in the form ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``. +parents in the form ``(parent_inum, parent_gen, dirent_name_hash) → +(dirent_name)``. The directory checking process can be strengthened to ensure that the target of each dirent also contains a parent pointer pointing back to the dirent. Likewise, each parent pointer can be checked by ensuring that the target of @@ -4475,8 +4476,6 @@ each parent pointer is a directory and that it contains a dirent matching the parent pointer. Both online and offline repair can use this strategy. -**Note**: The ondisk format of parent pointers is not yet finalized. - +--------------------------------------------------------------------------+ | **Historical Sidebar**: | +--------------------------------------------------------------------------+ @@ -4518,8 +4517,54 @@ Both online and offline repair can use this strategy. | Chandan increased the maximum extent counts of both data and attribute | | forks, thereby ensuring that the extended attribute structure can grow | | to handle the maximum hardlink count of any file. | +| | +| For this second effort, the ondisk parent pointer format as originally | +| proposed was ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``. | +| The format was changed during development to eliminate the requirement | +| of repair tools needing to to ensure that the ``dirent_pos`` field | +| always matched when reconstructing a directory. | +| | +| There were a few other ways to have solved that problem: | +| | +| 1. The field could be designated advisory, since the other three values | +| are sufficient to find the entry in the parent. | +| However, this makes indexed key lookup impossible while repairs are | +| ongoing. | +| | +| 2. We could allow creating directory entries at specified offsets, which | +| solves the referential integrity problem but runs the risk that | +| dirent creation will fail due to conflicts with the free space in the | +| directory. | +| | +| These conflicts could be resolved by appending the directory entry | +| and amending the xattr code to support updating an xattr key and | +| reindexing the dabtree, though this would have to be performed with | +| the parent directory still locked. | +| | +| 3. Same as above, but remove the old parent pointer entry and add a new | +| one atomically. | +| | +| 4. Change the ondisk xattr format to | +| ``(parent_inum, name) → (parent_gen)``, which would provide the attr | +| name uniqueness that we require, without forcing repair code to | +| update the dirent position. | +| Unfortunately, this requires changes to the xattr code to support | +| attr names as long as 263 bytes. | +| | +| 5. Change the ondisk xattr format to ``(parent_inum, hash(name)) → | +| (name, parent_gen)``. | +| If the hash is sufficiently resistant to collisions (e.g. sha256) | +| then this should provide the attr name uniqueness that we require. | +| Names shorter than 247 bytes could be stored directly. | +| | +| In the end, it was decided that the hash collisions of #5 were not a | +| serious issue because the directory/attr btree can handle multiple | +| identical extended attribute keys. | +| Reusing the dirent name hash instead of sha256 is much faster and would | +| result in a more compact ondisk format. | +--------------------------------------------------------------------------+ + Case Study: Repairing Directories with Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4568,42 +4613,6 @@ The proposed patchset is the `_ series. -**Unresolved Question**: How will repair ensure that the ``dirent_pos`` fields -match in the reconstructed directory? - -*Answer*: There are a few ways to solve this problem: - -1. The field could be designated advisory, since the other three values are - sufficient to find the entry in the parent. - However, this makes indexed key lookup impossible while repairs are ongoing. - -2. We could allow creating directory entries at specified offsets, which solves - the referential integrity problem but runs the risk that dirent creation - will fail due to conflicts with the free space in the directory. - - These conflicts could be resolved by appending the directory entry and - amending the xattr code to support updating an xattr key and reindexing the - dabtree, though this would have to be performed with the parent directory - still locked. - -3. Same as above, but remove the old parent pointer entry and add a new one - atomically. - -4. Change the ondisk xattr format to ``(parent_inum, name) → (parent_gen)``, - which would provide the attr name uniqueness that we require, without - forcing repair code to update the dirent position. - Unfortunately, this requires changes to the xattr code to support attr - names as long as 263 bytes. - -5. Change the ondisk xattr format to ``(parent_inum, hash(name)) → - (name, parent_gen)``. - If the hash is sufficiently resistant to collisions (e.g. sha256) then - this should provide the attr name uniqueness that we require. - Names shorter than 247 bytes could be stored directly. - -Discussion is ongoing under the `parent pointers patch deluge -`_. - Case Study: Repairing Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From patchwork Sun Dec 31 20:42:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507466 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EBD8BA2B for ; Sun, 31 Dec 2023 20:42:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cP0/0OCu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6DA55C433C7; Sun, 31 Dec 2023 20:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704055350; bh=GjP/OI7KVJG80amlTIgiLbalpoispiGGu3uj6hkzYyU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cP0/0OCuJ6ynCutnKprvNWFENsRQcyWjGUHkHuEwMCNtj+bLzQXn0RmA/k7uBxtUS 5fFWvZSgVmNHToSv2O7QFswrq053pvEprxSvSdEn8BtV8tDLY+CXi9ALbm+9fq9FIl hRs+YgblZM/+qjQ1ynwIDtywIQrC85PMYmVYhBZvKkNDx+8invozYUa3M1f0Ez28PS MLyKb/lXl97uR96qHSjigc0FEIWbH10WVNSKVSQA1JlDy1nXXMxUm6coHs+xHx53au 3VI/M9oQ9ks3f9bktClyVJ3YwFj646AmgJvYhAhvNr/7+tZCsojpvA3rN4F2gB9rtP cpYwRKrwYVf7w== Date: Sun, 31 Dec 2023 12:42:29 -0800 Subject: [PATCH 2/4] docs: update online directory and parent pointer repair sections From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404839510.1756140.12621480111001198603.stgit@frogsfrogsfrogs> In-Reply-To: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> References: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update the case studies of online directory and parent pointer reconstruction to reflect what they actually do in the final version. Signed-off-by: Darrick J. Wong --- .../filesystems/xfs-online-fsck-design.rst | 58 +++++++++++--------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst index 8fb0fc30f3fa4..f5ba59b335f8e 100644 --- a/Documentation/filesystems/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs-online-fsck-design.rst @@ -4572,8 +4572,9 @@ Directory rebuilding uses a :ref:`coordinated inode scan ` and a :ref:`directory entry live update hook ` as follows: 1. Set up a temporary directory for generating the new directory structure, - an xfblob for storing entry names, and an xfarray for stashing directory - updates. + an xfblob for storing entry names, and an xfarray for stashing the fixed + size fields involved in a directory update: ``(child inumber, add vs. + remove, name cookie, ftype)``. 2. Set up an inode scanner and hook into the directory entry code to receive updates on directory operations. @@ -4582,35 +4583,34 @@ a :ref:`directory entry live update hook ` as follows: pointer references the directory of interest. If so: - a. Stash an addname entry for this dirent in the xfarray for later. + a. Stash the parent pointer name and an addname entry for this dirent in the + xfblob and xfarray, respectively. - b. When finished scanning that file, flush the stashed updates to the - temporary directory. + b. When finished scanning that file or the kernel memory consumption exceeds + a threshold, flush the stashed updates to the temporary directory. 4. For each live directory update received via the hook, decide if the child has already been scanned. If so: - a. Stash an addname or removename entry for this dirent update in the - xfarray for later. + a. Stash the parent pointer name an addname or removename entry for this + dirent update in the xfblob and xfarray for later. We cannot write directly to the temporary directory because hook functions are not allowed to modify filesystem metadata. Instead, we stash updates in the xfarray and rely on the scanner thread to apply the stashed updates to the temporary directory. -5. When the scan is complete, atomically swap the contents of the temporary - directory and the directory being repaired. +5. When the scan is complete, replay any stashed entries in the xfarray. + +6. Atomically swap the contents of the temporary directory and the directory + being repaired. The temporary directory now contains the damaged directory structure. 6. Reap the temporary directory. -7. Update the dirent position field of parent pointers as necessary. - This may require the queuing of a substantial number of xattr log intent - items. - The proposed patchset is the `parent pointers directory repair -`_ +`_ series. Case Study: Repairing Parent Pointers @@ -4620,8 +4620,9 @@ Online reconstruction of a file's parent pointer information works similarly to directory reconstruction: 1. Set up a temporary file for generating a new extended attribute structure, - an `xfblob` for storing parent pointer names, and an xfarray for - stashing parent pointer updates. + an xfblob for storing parent pointer names, and an xfarray for stashing the + fixed size fields involved in a parent pointer update: ``(parent inumber, + parent generation, add vs. remove, name cookie)``. 2. Set up an inode scanner and hook into the directory entry code to receive updates on directory operations. @@ -4630,34 +4631,37 @@ directory reconstruction: dirent references the file of interest. If so: - a. Stash an addpptr entry for this parent pointer in the xfblob and xfarray - for later. + a. Stash the dirent name and an addpptr entry for this parent pointer in the + xfblob and xfarray, respectively. - b. When finished scanning the directory, flush the stashed updates to the - temporary directory. + b. When finished scanning the directory or the kernel memory consumption + exceeds a threshold, flush the stashed updates to the temporary file. 4. For each live directory update received via the hook, decide if the parent has already been scanned. If so: - a. Stash an addpptr or removepptr entry for this dirent update in the - xfarray for later. + a. Stash the dirent name and an addpptr or removepptr entry for this dirent + update in the xfblob and xfarray for later. We cannot write parent pointers directly to the temporary file because hook functions are not allowed to modify filesystem metadata. Instead, we stash updates in the xfarray and rely on the scanner thread to apply the stashed parent pointer updates to the temporary file. -5. Copy all non-parent pointer extended attributes to the temporary file. +5. When the scan is complete, replay any stashed entries in the xfarray. -6. When the scan is complete, atomically swap the attribute fork of the - temporary file and the file being repaired. - The temporary file now contains the damaged extended attribute structure. +6. Copy all non-parent pointer extended attributes to the temporary file. + +7. Atomically swap the attribute fork of the temporary file and the file being + repaired. + The temporary file now contains the old extended attribute structure + containing the damaged parent pointers. 7. Reap the temporary file. The proposed patchset is the `parent pointers repair -`_ +`_ series. Digression: Offline Checking of Parent Pointers From patchwork Sun Dec 31 20:42:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507467 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25AF1BA22 for ; Sun, 31 Dec 2023 20:42:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ji92xeu0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7D0FC433C8; Sun, 31 Dec 2023 20:42:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704055366; bh=lb3YhD/0+nI8Bit9Z0z1iUBHzfKLFIE2onwgnC76JeI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ji92xeu06WfxtIscRNGt5J09zCN077mhRlDNonvqOIPOZmLVbMAh/Vl9iNOaANAQW I/Xl5nSV5dw2PJRSyKLCMXcfpTBi7LdBP2E0EKsMhDJuFVOpVnByxqM4v1js+pVvrS s2+JSp6leWt5q0LNylUuINKo/JrFnWGKRh/DEGcvWjGqUUJGX8iP9f8GJPewQFHiVQ zypa9XGWcpqCIbFIgQu4syGICXN5cSwN+wktmnLfWqxb03I1jJSaOIVPnj17bDvqMl l/mMJEcOZfjtPCAiE+fZ+eW18ImRtNtinxYalLldtU/6IVBDbt4tAGH49+E5B3ootQ gQRzXVGhGVB0w== Date: Sun, 31 Dec 2023 12:42:45 -0800 Subject: [PATCH 3/4] docs: update offline parent pointer repair strategy From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404839525.1756140.8535506633322967883.stgit@frogsfrogsfrogs> In-Reply-To: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> References: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now update how xfs_repair checks and repairs parent pointer info. Signed-off-by: Darrick J. Wong --- .../filesystems/xfs-online-fsck-design.rst | 79 +++++++++++++++----- 1 file changed, 58 insertions(+), 21 deletions(-) diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst index f5ba59b335f8e..04e445d79f2d4 100644 --- a/Documentation/filesystems/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs-online-fsck-design.rst @@ -4672,26 +4672,54 @@ files are erased long before directory tree connectivity checks are performed. Parent pointer checks are therefore a second pass to be added to the existing connectivity checks: -1. After the set of surviving files has been established (i.e. phase 6), +1. After the set of surviving files has been established (phase 6), walk the surviving directories of each AG in the filesystem. This is already performed as part of the connectivity checks. -2. For each directory entry found, record the name in an xfblob, and store - ``(child_ag_inum, parent_inum, parent_gen, dirent_pos)`` tuples in a - per-AG in-memory slab. +2. For each directory entry found, + + a. If the name has already been stored in the xfblob, then use that cookie + and skip the next step. + + b. Otherwise, record the name in an xfblob, and remember the xfblob cookie. + Unique mappings are critical for + + 1. Deduplicating names to reduce memory usage, and + + 2. Creating a stable sort key for the parent pointer indexes so that the + parent pointer validation described below will work. + + c. Store ``(child_ag_inum, parent_inum, parent_gen, name_hash, name_len, + name_cookie)`` tuples in a per-AG in-memory slab. 3. For each AG in the filesystem, - a. Sort the per-AG tuples in order of child_ag_inum, parent_inum, and - dirent_pos. + a. Sort the per-AG tuple set in order of ``child_ag_inum``, ``parent_inum``, + ``name_hash``, and ``name_cookie``. + Having a single ``name_cookie`` for each ``name`` is critical for + handling the uncommon case of a directory containing multiple hardlinks + to the same file where all the names hash to the same value. b. For each inode in the AG, 1. Scan the inode for parent pointers. - Record the names in a per-file xfblob, and store ``(parent_inum, - parent_gen, dirent_pos)`` tuples in a per-file slab. + For each parent pointer found, - 2. Sort the per-file tuples in order of parent_inum, and dirent_pos. + a. Validate the ondisk parent pointer. + If validation fails, move on to the next parent pointer in the + file. + + b. If the name has already been stored in the xfblob, then use that + cookie and skip the next step. + + c. Record the name in a per-file xfblob, and remember the xfblob + cookie. + + d. Store ``(parent_inum, parent_gen, name_hash, name_len, + name_cookie)`` tuples in a per-file slab. + + 2. Sort the per-file tuples in order of ``parent_inum``, ``name_hash``, + and ``name_cookie``. 3. Position one slab cursor at the start of the inode's records in the per-AG tuple slab. @@ -4700,28 +4728,37 @@ connectivity checks: 4. Position a second slab cursor at the start of the per-file tuple slab. - 5. Iterate the two cursors in lockstep, comparing the parent_ino and - dirent_pos fields of the records under each cursor. + 5. Iterate the two cursors in lockstep, comparing the ``parent_ino``, + ``name_hash``, and ``name_cookie`` fields of the records under each + cursor: - a. Tuples in the per-AG list but not the per-file list are missing and - need to be written to the inode. + a. If the per-AG cursor is at a lower point in the keyspace than the + per-file cursor, then the per-AG cursor points to a missing parent + pointer. + Add the parent pointer to the inode and advance the per-AG + cursor. - b. Tuples in the per-file list but not the per-AG list are dangling - and need to be removed from the inode. + b. If the per-file cursor is at a lower point in the keyspace than + the per-AG cursor, then the per-file cursor points to a dangling + parent pointer. + Remove the parent pointer from the inode and advance the per-file + cursor. - c. For tuples in both lists, update the parent_gen and name components - of the parent pointer if necessary. + c. Otherwise, both cursors point at the same parent pointer. + Update the parent_gen component if necessary. + Advance both cursors. 4. Move on to examining link counts, as we do today. The proposed patchset is the `offline parent pointers repair -`_ +`_ series. -Rebuilding directories from parent pointers in offline repair is very -challenging because it currently uses a single-pass scan of the filesystem -during phase 3 to decide which files are corrupt enough to be zapped. +Rebuilding directories from parent pointers in offline repair would be very +challenging because xfs_repair currently uses two single-pass scans of the +filesystem during phases 3 and 4 to decide which files are corrupt enough to be +zapped. This scan would have to be converted into a multi-pass scan: 1. The first pass of the scan zaps corrupt inodes, forks, and attributes From patchwork Sun Dec 31 20:43:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507468 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2142BA2E for ; Sun, 31 Dec 2023 20:43:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BzmL+DuY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8EF2AC433C7; Sun, 31 Dec 2023 20:43:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704055381; bh=omEykhFRDC+59zM4UYnykqrU4uM9EigjUFr44JkJrb8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=BzmL+DuY54JKw/N2BwdDNMZ05Nw8ECzy3JgfJcioF/9qs2faRE/aexhEg3bgDT8z3 ZmcnNS8jRMXvS9UxsNHRvW3GJYj+iysikVLn73rAsitrtn6DAqJsgnb4hgpvePQiHz xEdzHY/ihvbmeTTGf2OQ6IvXewotuRKpsWKwqO0aMsTwwa1xmb9n1ewOZdxdj4GjG0 Gt075Asp8YACLOB37IHZSFKrmM/f+NSFHRB/i0FbFMEwR5gb8rdQsIhSHbR8AyqgrW fzQVi23FyMw6eNe+m8i1iWuluyJw5REMQtRXu2xOa+6mzzydsCGAikM/QAo1LjNzoL 9eu/G3jJSNpXA== Date: Sun, 31 Dec 2023 12:43:01 -0800 Subject: [PATCH 4/4] docs: describe xfs directory tree online fsck From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404839541.1756140.14306767364283206225.stgit@frogsfrogsfrogs> In-Reply-To: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> References: <170404839471.1756140.4033459504904771587.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong I've added a scrubber that checks the directory tree structure and fixes them; describe this in the design documentation. Signed-off-by: Darrick J. Wong --- .../filesystems/xfs-online-fsck-design.rst | 121 ++++++++++++++++++++ 1 file changed, 121 insertions(+) diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst index 04e445d79f2d4..29e123189d303 100644 --- a/Documentation/filesystems/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs-online-fsck-design.rst @@ -4780,6 +4780,127 @@ This scan would have to be converted into a multi-pass scan: This code has not yet been constructed. +.. _dirtree: + +Case Study: Directory Tree Structure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As mentioned earlier, the filesystem directory tree is supposed to be a +directed acylic graph structure. +However, each node in this graph is a separate ``xfs_inode`` object with its +own locks, which makes validating the tree qualities difficult. +Fortunately, non-directories are allowed to have multiple parents and cannot +have children, so only directories need to be scanned. +Directories typically constitute 5-10% of the files in a filesystem, which +reduces the amount of work dramatically. + +If the directory tree could be frozen, it would be easy to discover cycles and +disconnected regions by running a depth (or breadth) first search downwards +from the root directory and marking a bitmap for each directory found. +At any point in the walk, trying to set an already set bit means there is a +cycle. +After the scan completes, XORing the marked inode bitmap with the inode +allocation bitmap reveals disconnected inodes. +However, one of online repair's design goals is to avoid locking the entire +filesystem unless it's absolutely necessary. +Directory tree updates can move subtrees across the scanner wavefront on a live +filesystem, so the bitmap algorithm cannot be applied. + +Directory parent pointers enable an incremental approach to validation of the +tree structure. +Instead of using one thread to scan the entire filesystem, multiple threads can +walk from individual subdirectories upwards towards the root. +For this to work, all directory entries and parent pointers must be internally +consistent, each directory entry must have a parent pointer, and the link +counts of all directories must be correct. +Each scanner thread must be able to take the IOLOCK of an alleged parent +directory while holding the IOLOCK of the child directory to prevent either +directory from being moved within the tree. +This was made possible only recently with locking changes in Linux 6.5. + +The scanning process uses a dirent hook to detect changes to the directories +mentioned in the scan data. +The scan works as follows: + +1. For each subdirectory in the filesystem, + + a. For each parent pointer of that subdirectory, + + 1. Create a path object for that parent pointer, and mark the + subdirectory inode number in the path object's bitmap. + + 2. Record the parent pointer name and inode number in a path structure. + + 3. If the alleged parent is the subdirectory being scrubbed, the path is + a cycle. + Mark the path for deletion and repeat step 1a with the next + subdirectory parent pointer. + + 4. Try to mark the alleged parent inode number in a bitmap in the path + object. + If the bit is already set, then there is a cycle in the directory + tree. + Mark the path as a cycle and repeat step 1a with the next subdirectory + parent pointer. + + 5. Load the alleged parent. + If the alleged parent is not a linked directory, abort the scan + because the parent pointer information is inconsistent. + + 6. For each parent pointer of this alleged ancestor directory, + + a. Record the parent pointer name and inode number in the path object + if no parent has been set for that level. + + b. If an ancestor has more than one parent, mark the path as corrupt. + Repeat step 1a with the next subdirectory parent pointer. + + c. Repeat steps 1a3-1a6 for the ancestor identified in step 1a6a. + This repeats until the directory tree root is reached or no parents + are found. + + 7. If the walk terminates at the root directory, mark the path as ok. + + 8. If the walk terminates without reaching the root, mark the path as + disconnected. + +2. If the directory entry update hook triggers, check all paths already found + by the scan. + If the entry matches part of a path, mark that path and the scan stale. + When the scanner thread sees that the scan has been marked stale, it deletes + all scan data and starts over. + +Repairing the directory tree works as follows: + +1. Walk each path of the target subdirectory. + + a. Corrupt paths and cycle paths are counted as suspect. + + b. Paths already marked for deletion are counted as bad. + + c. Paths that reached the root are counted as good. + +2. If the subdirectory is either the root directory or has zero link count, + delete all incoming directory entries in the immediate parents. + Repairs are complete. + +3. If the subdirectory has exactly one path, set the dotdot entry to the + parent and exit. + +4. If the subdirectory has at least one good path, delete all the other + incoming directory entries in the immediate parents. + +5. If the subdirectory has no good paths and more than one suspect path, delete + all the other incoming directory entries in the immediate parents. + +6. If the subdirectory has zero paths, attach it to the lost and found. + +The proposed patches are in the +`directory tree repair +`_ +series. + + .. _orphanage: The Orphanage