From patchwork Wed Apr 10 00:46:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13623338 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E765E19E for ; Wed, 10 Apr 2024 00:46:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710005; cv=none; b=ZUs4LTcifEIUem+/gtSikLf/hb83Iu5zeDvFoWfjwdX9wEbke+dg5wjduDI854SN57MLDv0gIoiwbwBiq5+kxO/obC57SXXdPxU8nQd9RHPDfPdwp4d3+PcNL1dq8EuxXqcE2Xx6aeNbxkf6KbN2SabaFoFY/7VvVBrLifhHlHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710005; c=relaxed/simple; bh=2yhO6Qy8/BrvwcWpktYBOqs2nAyiFuWGb+Z/RdGZfug=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Jvulp7XMC4pSQ1ynOZ34/BeLezrHarCRvfLyIHCHCGFXCzU/1CXPWUeeSPBc2/huBBNfampE+jnp32Jdf7b/xUoSW2ZLc/A0zvhMLoEktmbwUty0fdU0OHNgBPld6ozd18ZfbEp7tFOYjZvG5NlZn/PD5tzWgH6wU66R/PAgZxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ir+sYacA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ir+sYacA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F102C433C7; Wed, 10 Apr 2024 00:46:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712710004; bh=2yhO6Qy8/BrvwcWpktYBOqs2nAyiFuWGb+Z/RdGZfug=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Ir+sYacAFy9A93Eak6KUI5R8B2pMtbRTX2HkHfbPVoBNdmO6NvVkVfaKJHFOGqV45 tDvlf6/8h8dOMlmbsNFGBYvYt+BN3homFpxuMijHPpmTXdF2alg0n2uuPCfYCER+YP Cc1LH1lo/lWPa3dEdpDvcrndkiR+025KVlkQCHqL7/PT2ochNG1emmx7fSv0+tFTRP 6mLRZksGloaERU9OdoWCixKABFzSyqSY4E9SGE5bqN1AJZoP7D9DBv96V9Fnaooo3E vZYBO1ZUDraIPE4jedjHUM0xoGSmpOE7riGZW4EU6ExfH8jZaL2qhjgWBnNJGVy+Kl WuZiTihgZNXdA== Date: Tue, 09 Apr 2024 17:46:43 -0700 Subject: [PATCH 1/4] docs: update the parent pointers documentation to the final version From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <171270967483.3631017.15423312733723466021.stgit@frogsfrogsfrogs> In-Reply-To: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> References: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we've decided on the ondisk format of parent pointers, update the documentation to reflect that. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../filesystems/xfs/xfs-online-fsck-design.rst | 94 +++++++++++--------- 1 file changed, 53 insertions(+), 41 deletions(-) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 74a8e42c74bd0..1e3211d12247d 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -4465,10 +4465,10 @@ reconstruction of filesystem space metadata. The parent pointer feature, however, makes total directory reconstruction possible. -XFS parent pointers include the dirent name and location of the entry within -the parent directory. +XFS parent pointers contain the information needed to identify the +corresponding directory entry in the parent directory. In other words, child files use extended attributes to store pointers to -parents in the form ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``. +parents in the form ``(dirent_name) → (parent_inum, parent_gen)``. The directory checking process can be strengthened to ensure that the target of each dirent also contains a parent pointer pointing back to the dirent. Likewise, each parent pointer can be checked by ensuring that the target of @@ -4476,8 +4476,6 @@ each parent pointer is a directory and that it contains a dirent matching the parent pointer. Both online and offline repair can use this strategy. -**Note**: The ondisk format of parent pointers is not yet finalized. - +--------------------------------------------------------------------------+ | **Historical Sidebar**: | +--------------------------------------------------------------------------+ @@ -4519,8 +4517,58 @@ Both online and offline repair can use this strategy. | Chandan increased the maximum extent counts of both data and attribute | | forks, thereby ensuring that the extended attribute structure can grow | | to handle the maximum hardlink count of any file. | +| | +| For this second effort, the ondisk parent pointer format as originally | +| proposed was ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``. | +| The format was changed during development to eliminate the requirement | +| of repair tools needing to to ensure that the ``dirent_pos`` field | +| always matched when reconstructing a directory. | +| | +| There were a few other ways to have solved that problem: | +| | +| 1. The field could be designated advisory, since the other three values | +| are sufficient to find the entry in the parent. | +| However, this makes indexed key lookup impossible while repairs are | +| ongoing. | +| | +| 2. We could allow creating directory entries at specified offsets, which | +| solves the referential integrity problem but runs the risk that | +| dirent creation will fail due to conflicts with the free space in the | +| directory. | +| | +| These conflicts could be resolved by appending the directory entry | +| and amending the xattr code to support updating an xattr key and | +| reindexing the dabtree, though this would have to be performed with | +| the parent directory still locked. | +| | +| 3. Same as above, but remove the old parent pointer entry and add a new | +| one atomically. | +| | +| 4. Change the ondisk xattr format to | +| ``(parent_inum, name) → (parent_gen)``, which would provide the attr | +| name uniqueness that we require, without forcing repair code to | +| update the dirent position. | +| Unfortunately, this requires changes to the xattr code to support | +| attr names as long as 263 bytes. | +| | +| 5. Change the ondisk xattr format to ``(parent_inum, hash(name)) → | +| (name, parent_gen)``. | +| If the hash is sufficiently resistant to collisions (e.g. sha256) | +| then this should provide the attr name uniqueness that we require. | +| Names shorter than 247 bytes could be stored directly. | +| | +| 6. Change the ondisk xattr format to ``(dirent_name) → (parent_ino, | +| parent_gen)``. This format doesn't require any of the complicated | +| nested name hashing of the previous suggestions. However, it was | +| discovered that multiple hardlinks to the same inode with the same | +| filename caused performance problems with hashed xattr lookups, so | +| the parent inumber is now xor'd into the hash index. | +| | +| In the end, it was decided that solution #6 was the most compact and the | +| most performant. A new hash function was designed for parent pointers. | +--------------------------------------------------------------------------+ + Case Study: Repairing Directories with Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4569,42 +4617,6 @@ The proposed patchset is the `_ series. -**Unresolved Question**: How will repair ensure that the ``dirent_pos`` fields -match in the reconstructed directory? - -*Answer*: There are a few ways to solve this problem: - -1. The field could be designated advisory, since the other three values are - sufficient to find the entry in the parent. - However, this makes indexed key lookup impossible while repairs are ongoing. - -2. We could allow creating directory entries at specified offsets, which solves - the referential integrity problem but runs the risk that dirent creation - will fail due to conflicts with the free space in the directory. - - These conflicts could be resolved by appending the directory entry and - amending the xattr code to support updating an xattr key and reindexing the - dabtree, though this would have to be performed with the parent directory - still locked. - -3. Same as above, but remove the old parent pointer entry and add a new one - atomically. - -4. Change the ondisk xattr format to ``(parent_inum, name) → (parent_gen)``, - which would provide the attr name uniqueness that we require, without - forcing repair code to update the dirent position. - Unfortunately, this requires changes to the xattr code to support attr - names as long as 263 bytes. - -5. Change the ondisk xattr format to ``(parent_inum, hash(name)) → - (name, parent_gen)``. - If the hash is sufficiently resistant to collisions (e.g. sha256) then - this should provide the attr name uniqueness that we require. - Names shorter than 247 bytes could be stored directly. - -Discussion is ongoing under the `parent pointers patch deluge -`_. - Case Study: Repairing Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From patchwork Wed Apr 10 00:46:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13623339 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DF7437C for ; Wed, 10 Apr 2024 00:47:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710020; cv=none; b=PKQQ7seGp7tvp0a1QVaVyFkohcYrfBm5DkfUNGWqexWnxZonRuSn+XiSRUctOAZSi0PdKa7ApOj6dVTFa/5ACT8BERlpRBEYbRNnxnp1QjiUBJLiJA51LfLnRhbm2KRQVzttiZl/g+xJm99PP8ytYVHU52wP13lUV2FsaSex5Ic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710020; c=relaxed/simple; bh=f39Aj0VqdtwQ2sV9EBkjTnDZVKUmes/PxQXub8TocYs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ha4lj9IOfknVKVozoRLlQZ/XtVirLsBden3xyIGnnrGkoodbbnTaEDS9xkLulYK7QzzL81s3HmWC2j71ze9eYl/mQQiUPwSn8Sf9JADg3X/vHFU4Ri9tQOiHps7nbgZhBSE/zUCvTG9L+mv7mjs5L0cjJYUbB+Ht9gHKQLQdCd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QExDbpZa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QExDbpZa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FB10C43390; Wed, 10 Apr 2024 00:47:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712710020; bh=f39Aj0VqdtwQ2sV9EBkjTnDZVKUmes/PxQXub8TocYs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=QExDbpZalgKVhmzfGwgGsGIN0N72/FQ1VOgBl7rq+Uwkvwyz1nzxKd/6zEzapxthf nFtTP1o5zSo+RLkGjCcSuF+UxTRHhjfGz0bODwp8MLa/pf1mr3fydNQwsO/FyhGJA4 zIpmWkXyo/SKoDjTMpfnRk1j0OP/4RsNYoSsuYtmGnxGDkDffhiq0G56J1zvCQ3OUz mJT4fBSEVj5fj5Z5KFVq4SqfG7i4Rr4yq/6x2UeCXFst3Y1f9j9FZiBI0YV/CvkYgG rRUlInZNKN+EwW/0R4H0whmd0V6/1PMyCRJ5xIKFVptFXVohRZoDrTKucyMahy2AMb ln89/IflIqrLw== Date: Tue, 09 Apr 2024 17:46:59 -0700 Subject: [PATCH 2/4] docs: update online directory and parent pointer repair sections From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <171270967499.3631017.12364220130145297814.stgit@frogsfrogsfrogs> In-Reply-To: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> References: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update the case studies of online directory and parent pointer reconstruction to reflect what they actually do in the final version. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../filesystems/xfs/xfs-online-fsck-design.rst | 55 +++++++++++--------- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 1e3211d12247d..1ea4e59c9cdbd 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -4576,8 +4576,9 @@ Directory rebuilding uses a :ref:`coordinated inode scan ` and a :ref:`directory entry live update hook ` as follows: 1. Set up a temporary directory for generating the new directory structure, - an xfblob for storing entry names, and an xfarray for stashing directory - updates. + an xfblob for storing entry names, and an xfarray for stashing the fixed + size fields involved in a directory update: ``(child inumber, add vs. + remove, name cookie, ftype)``. 2. Set up an inode scanner and hook into the directory entry code to receive updates on directory operations. @@ -4586,35 +4587,34 @@ a :ref:`directory entry live update hook ` as follows: pointer references the directory of interest. If so: - a. Stash an addname entry for this dirent in the xfarray for later. + a. Stash the parent pointer name and an addname entry for this dirent in the + xfblob and xfarray, respectively. - b. When finished scanning that file, flush the stashed updates to the - temporary directory. + b. When finished scanning that file or the kernel memory consumption exceeds + a threshold, flush the stashed updates to the temporary directory. 4. For each live directory update received via the hook, decide if the child has already been scanned. If so: - a. Stash an addname or removename entry for this dirent update in the - xfarray for later. + a. Stash the parent pointer name an addname or removename entry for this + dirent update in the xfblob and xfarray for later. We cannot write directly to the temporary directory because hook functions are not allowed to modify filesystem metadata. Instead, we stash updates in the xfarray and rely on the scanner thread to apply the stashed updates to the temporary directory. -5. When the scan is complete, atomically exchange the contents of the temporary +5. When the scan is complete, replay any stashed entries in the xfarray. + +6. When the scan is complete, atomically exchange the contents of the temporary directory and the directory being repaired. The temporary directory now contains the damaged directory structure. -6. Reap the temporary directory. - -7. Update the dirent position field of parent pointers as necessary. - This may require the queuing of a substantial number of xattr log intent - items. +7. Reap the temporary directory. The proposed patchset is the `parent pointers directory repair -`_ +`_ series. Case Study: Repairing Parent Pointers @@ -4624,8 +4624,9 @@ Online reconstruction of a file's parent pointer information works similarly to directory reconstruction: 1. Set up a temporary file for generating a new extended attribute structure, - an `xfblob` for storing parent pointer names, and an xfarray for - stashing parent pointer updates. + an xfblob for storing parent pointer names, and an xfarray for stashing the + fixed size fields involved in a parent pointer update: ``(parent inumber, + parent generation, add vs. remove, name cookie)``. 2. Set up an inode scanner and hook into the directory entry code to receive updates on directory operations. @@ -4634,34 +4635,36 @@ directory reconstruction: dirent references the file of interest. If so: - a. Stash an addpptr entry for this parent pointer in the xfblob and xfarray - for later. + a. Stash the dirent name and an addpptr entry for this parent pointer in the + xfblob and xfarray, respectively. - b. When finished scanning the directory, flush the stashed updates to the - temporary directory. + b. When finished scanning the directory or the kernel memory consumption + exceeds a threshold, flush the stashed updates to the temporary file. 4. For each live directory update received via the hook, decide if the parent has already been scanned. If so: - a. Stash an addpptr or removepptr entry for this dirent update in the - xfarray for later. + a. Stash the dirent name and an addpptr or removepptr entry for this dirent + update in the xfblob and xfarray for later. We cannot write parent pointers directly to the temporary file because hook functions are not allowed to modify filesystem metadata. Instead, we stash updates in the xfarray and rely on the scanner thread to apply the stashed parent pointer updates to the temporary file. -5. Copy all non-parent pointer extended attributes to the temporary file. +5. When the scan is complete, replay any stashed entries in the xfarray. -6. When the scan is complete, atomically exchange the mappings of the attribute +6. Copy all non-parent pointer extended attributes to the temporary file. + +7. When the scan is complete, atomically exchange the mappings of the attribute forks of the temporary file and the file being repaired. The temporary file now contains the damaged extended attribute structure. -7. Reap the temporary file. +8. Reap the temporary file. The proposed patchset is the `parent pointers repair -`_ +`_ series. Digression: Offline Checking of Parent Pointers From patchwork Wed Apr 10 00:47:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13623356 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42910387 for ; Wed, 10 Apr 2024 00:47:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710036; cv=none; b=g7U0j9AY8JWYa8nmL5T/YKXFht+n+MMs5H48j7vyLU6dLIQwaDQqD4whQFdja+um0GdQelH8IW8pJiKRCo2WFj3LxiJnHAkqz2h1EkF8nqN5IgwOkCswkLR34ezuP0Sxvx9+ArbqD3LoADFROQIUjUkiZrAt8CIbvoFNmizxLvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710036; c=relaxed/simple; bh=2x1jxf3SYyYde24kWXsAKADu9B+pkZTSxSQYHtmKU9Q=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FjPxCH4v8FMTn0MV75q2ZdQ9pMQ/Q5INPx+oz27yT/xidRipR279y9q6NaCpc8gGe+Mu+sSyzA7+qtcy0cVLiQLjMlmQxW7fGx6eVgMpoPmAJyatkZ7tkcwBmhZVNj7lcpP6vTx5KPqNgIkT0fA2trUJ9ArSzEyiIhYs19ERQUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=t/6lvKyL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="t/6lvKyL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3C90C433C7; Wed, 10 Apr 2024 00:47:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712710035; bh=2x1jxf3SYyYde24kWXsAKADu9B+pkZTSxSQYHtmKU9Q=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=t/6lvKyLwdvL/DllFzHFqFx7Up2sfN6+vEe1b0DrwWuRxC5I7+/eoIinYHgniwy6U IxvBsTvjjvFXLr9EWODZf9w0R5BN5TCBWt4pgzck/dsTaYagshphnt+MGqRSY68h1N e6NzZ+fNzsiSdu0F4YL5lOWy6HlsGvSG4zfQ4eTOrsz+wZD93fE+mpI62sXDgFPpCc 35DHHeINPRG7Kg4KqwSbERjcL8/1TN1QHC6P09McAeol6v737nFpSIdXJ9UlbXjeJp JrvOhFaLS82Y9qpXjvfymHsAGM8s8TQhB4gGdgC+xm8TOs5YQZXQOtDYEpkEsutOk9 s/Dbed66vJoEA== Date: Tue, 09 Apr 2024 17:47:15 -0700 Subject: [PATCH 3/4] docs: update offline parent pointer repair strategy From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <171270967515.3631017.5330745726828504817.stgit@frogsfrogsfrogs> In-Reply-To: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> References: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now update how xfs_repair checks and repairs parent pointer info. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../filesystems/xfs/xfs-online-fsck-design.rst | 81 +++++++++++++++----- 1 file changed, 60 insertions(+), 21 deletions(-) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 1ea4e59c9cdbd..70e3e629d8b3f 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -4675,26 +4675,56 @@ files are erased long before directory tree connectivity checks are performed. Parent pointer checks are therefore a second pass to be added to the existing connectivity checks: -1. After the set of surviving files has been established (i.e. phase 6), +1. After the set of surviving files has been established (phase 6), walk the surviving directories of each AG in the filesystem. This is already performed as part of the connectivity checks. -2. For each directory entry found, record the name in an xfblob, and store - ``(child_ag_inum, parent_inum, parent_gen, dirent_pos)`` tuples in a - per-AG in-memory slab. +2. For each directory entry found, + + a. If the name has already been stored in the xfblob, then use that cookie + and skip the next step. + + b. Otherwise, record the name in an xfblob, and remember the xfblob cookie. + Unique mappings are critical for + + 1. Deduplicating names to reduce memory usage, and + + 2. Creating a stable sort key for the parent pointer indexes so that the + parent pointer validation described below will work. + + c. Store ``(child_ag_inum, parent_inum, parent_gen, name_hash, name_len, + name_cookie)`` tuples in a per-AG in-memory slab. The ``name_hash`` + referenced in this section is the regular directory entry name hash, not + the specialized one used for parent pointer xattrs. 3. For each AG in the filesystem, - a. Sort the per-AG tuples in order of child_ag_inum, parent_inum, and - dirent_pos. + a. Sort the per-AG tuple set in order of ``child_ag_inum``, ``parent_inum``, + ``name_hash``, and ``name_cookie``. + Having a single ``name_cookie`` for each ``name`` is critical for + handling the uncommon case of a directory containing multiple hardlinks + to the same file where all the names hash to the same value. b. For each inode in the AG, 1. Scan the inode for parent pointers. - Record the names in a per-file xfblob, and store ``(parent_inum, - parent_gen, dirent_pos)`` tuples in a per-file slab. + For each parent pointer found, - 2. Sort the per-file tuples in order of parent_inum, and dirent_pos. + a. Validate the ondisk parent pointer. + If validation fails, move on to the next parent pointer in the + file. + + b. If the name has already been stored in the xfblob, then use that + cookie and skip the next step. + + c. Record the name in a per-file xfblob, and remember the xfblob + cookie. + + d. Store ``(parent_inum, parent_gen, name_hash, name_len, + name_cookie)`` tuples in a per-file slab. + + 2. Sort the per-file tuples in order of ``parent_inum``, ``name_hash``, + and ``name_cookie``. 3. Position one slab cursor at the start of the inode's records in the per-AG tuple slab. @@ -4703,28 +4733,37 @@ connectivity checks: 4. Position a second slab cursor at the start of the per-file tuple slab. - 5. Iterate the two cursors in lockstep, comparing the parent_ino and - dirent_pos fields of the records under each cursor. + 5. Iterate the two cursors in lockstep, comparing the ``parent_ino``, + ``name_hash``, and ``name_cookie`` fields of the records under each + cursor: - a. Tuples in the per-AG list but not the per-file list are missing and - need to be written to the inode. + a. If the per-AG cursor is at a lower point in the keyspace than the + per-file cursor, then the per-AG cursor points to a missing parent + pointer. + Add the parent pointer to the inode and advance the per-AG + cursor. - b. Tuples in the per-file list but not the per-AG list are dangling - and need to be removed from the inode. + b. If the per-file cursor is at a lower point in the keyspace than + the per-AG cursor, then the per-file cursor points to a dangling + parent pointer. + Remove the parent pointer from the inode and advance the per-file + cursor. - c. For tuples in both lists, update the parent_gen and name components - of the parent pointer if necessary. + c. Otherwise, both cursors point at the same parent pointer. + Update the parent_gen component if necessary. + Advance both cursors. 4. Move on to examining link counts, as we do today. The proposed patchset is the `offline parent pointers repair -`_ +`_ series. -Rebuilding directories from parent pointers in offline repair is very -challenging because it currently uses a single-pass scan of the filesystem -during phase 3 to decide which files are corrupt enough to be zapped. +Rebuilding directories from parent pointers in offline repair would be very +challenging because xfs_repair currently uses two single-pass scans of the +filesystem during phases 3 and 4 to decide which files are corrupt enough to be +zapped. This scan would have to be converted into a multi-pass scan: 1. The first pass of the scan zaps corrupt inodes, forks, and attributes From patchwork Wed Apr 10 00:47:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13623357 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E676637C for ; Wed, 10 Apr 2024 00:47:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710052; cv=none; b=CwvCKuH9dpSQinNqeZhI4KyyhV2gXicRJMIjhTPKRdWRvyTPbH/xMREhFM+nS9aLe/OtX5RIJ7K9S/lzJwBBmgjirPjpW5v4qYs6DOI2Kefog1waG8MLpoS4GrYUMP689/hDfoKNk8Co26GdDN49QHhzz1QroK7FKMnblWCVX7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712710052; c=relaxed/simple; bh=zQibxg6BKExJaRVKtPFQgezsT+jAW3a4r9JYPPwiRqc=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hsiPzWAmiMS1HBwyoAAUDi7lAbrfIwuXgqklz/xJ7+XkDxbXrb42FKDSBUYpfeZwuxziEmEJKQLaYK3MedP6AXZ1BcQT4uaTHhCzsaQbI5Vnk3jwe2aI+jheu/957LYzQ3KvFgR1xgzj5kZELQvCe83Aa90oGTUTnebwgejCelk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Z1fNU0Mz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Z1fNU0Mz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 676F1C433C7; Wed, 10 Apr 2024 00:47:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712710051; bh=zQibxg6BKExJaRVKtPFQgezsT+jAW3a4r9JYPPwiRqc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Z1fNU0MzqSgY7sjTWudx2IEglajal3jrgYVOozaHNrd2mHPS+n0lZRR8CUv/7JRF7 0KFtF5nnnVQx0k4H/EA853EknBM/MUBUfnZl75xF5HAlCIIy0IyK/P6cgIh4iOBYHk y1beXAcaupml/2zqWazY0tb4bosEFXKDzwA5MuI6u99xXUO6qNphIginxLatHGDjjF 2d7GYRnEkBg4uvluTp1EJbsMIgZ4gOGSdVWRRsowKQplWZ1IMAxeCvaQMkN3J02b4Z AHy59coa4gDOKd2tmqmI09nZZY+vPHy94WoVP1H22KGP4B4Wv3gzocbzbPJFlNa8fv aO6FdfUIJ0a4A== Date: Tue, 09 Apr 2024 17:47:30 -0700 Subject: [PATCH 4/4] docs: describe xfs directory tree online fsck From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <171270967532.3631017.5046025491953457287.stgit@frogsfrogsfrogs> In-Reply-To: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> References: <171270967457.3631017.1709831303627611754.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong I've added a scrubber that checks the directory tree structure and fixes them; describe this in the design documentation. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../filesystems/xfs/xfs-online-fsck-design.rst | 124 ++++++++++++++++++++ 1 file changed, 124 insertions(+) diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 70e3e629d8b3f..12aa638408304 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -4785,6 +4785,130 @@ This scan would have to be converted into a multi-pass scan: This code has not yet been constructed. +.. _dirtree: + +Case Study: Directory Tree Structure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As mentioned earlier, the filesystem directory tree is supposed to be a +directed acylic graph structure. +However, each node in this graph is a separate ``xfs_inode`` object with its +own locks, which makes validating the tree qualities difficult. +Fortunately, non-directories are allowed to have multiple parents and cannot +have children, so only directories need to be scanned. +Directories typically constitute 5-10% of the files in a filesystem, which +reduces the amount of work dramatically. + +If the directory tree could be frozen, it would be easy to discover cycles and +disconnected regions by running a depth (or breadth) first search downwards +from the root directory and marking a bitmap for each directory found. +At any point in the walk, trying to set an already set bit means there is a +cycle. +After the scan completes, XORing the marked inode bitmap with the inode +allocation bitmap reveals disconnected inodes. +However, one of online repair's design goals is to avoid locking the entire +filesystem unless it's absolutely necessary. +Directory tree updates can move subtrees across the scanner wavefront on a live +filesystem, so the bitmap algorithm cannot be applied. + +Directory parent pointers enable an incremental approach to validation of the +tree structure. +Instead of using one thread to scan the entire filesystem, multiple threads can +walk from individual subdirectories upwards towards the root. +For this to work, all directory entries and parent pointers must be internally +consistent, each directory entry must have a parent pointer, and the link +counts of all directories must be correct. +Each scanner thread must be able to take the IOLOCK of an alleged parent +directory while holding the IOLOCK of the child directory to prevent either +directory from being moved within the tree. +This is not possible since the VFS does not take the IOLOCK of a child +subdirectory when moving that subdirectory, so instead the scanner stabilizes +the parent -> child relationship by taking the ILOCKs and installing a dirent +update hook to detect changes. + +The scanning process uses a dirent hook to detect changes to the directories +mentioned in the scan data. +The scan works as follows: + +1. For each subdirectory in the filesystem, + + a. For each parent pointer of that subdirectory, + + 1. Create a path object for that parent pointer, and mark the + subdirectory inode number in the path object's bitmap. + + 2. Record the parent pointer name and inode number in a path structure. + + 3. If the alleged parent is the subdirectory being scrubbed, the path is + a cycle. + Mark the path for deletion and repeat step 1a with the next + subdirectory parent pointer. + + 4. Try to mark the alleged parent inode number in a bitmap in the path + object. + If the bit is already set, then there is a cycle in the directory + tree. + Mark the path as a cycle and repeat step 1a with the next subdirectory + parent pointer. + + 5. Load the alleged parent. + If the alleged parent is not a linked directory, abort the scan + because the parent pointer information is inconsistent. + + 6. For each parent pointer of this alleged ancestor directory, + + a. Record the parent pointer name and inode number in the path object + if no parent has been set for that level. + + b. If an ancestor has more than one parent, mark the path as corrupt. + Repeat step 1a with the next subdirectory parent pointer. + + c. Repeat steps 1a3-1a6 for the ancestor identified in step 1a6a. + This repeats until the directory tree root is reached or no parents + are found. + + 7. If the walk terminates at the root directory, mark the path as ok. + + 8. If the walk terminates without reaching the root, mark the path as + disconnected. + +2. If the directory entry update hook triggers, check all paths already found + by the scan. + If the entry matches part of a path, mark that path and the scan stale. + When the scanner thread sees that the scan has been marked stale, it deletes + all scan data and starts over. + +Repairing the directory tree works as follows: + +1. Walk each path of the target subdirectory. + + a. Corrupt paths and cycle paths are counted as suspect. + + b. Paths already marked for deletion are counted as bad. + + c. Paths that reached the root are counted as good. + +2. If the subdirectory is either the root directory or has zero link count, + delete all incoming directory entries in the immediate parents. + Repairs are complete. + +3. If the subdirectory has exactly one path, set the dotdot entry to the + parent and exit. + +4. If the subdirectory has at least one good path, delete all the other + incoming directory entries in the immediate parents. + +5. If the subdirectory has no good paths and more than one suspect path, delete + all the other incoming directory entries in the immediate parents. + +6. If the subdirectory has zero paths, attach it to the lost and found. + +The proposed patches are in the +`directory tree repair +`_ +series. + + .. _orphanage: The Orphanage