diff mbox series

[v2] btrfs-progs: convert-ext2: insert a dummy inode item before inode ref

Message ID 6e1e07ad53a9e716be28e4d505042a50c1676254.1705134953.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs-progs: convert-ext2: insert a dummy inode item before inode ref | expand

Commit Message

Qu Wenruo Jan. 13, 2024, 8:37 a.m. UTC
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:

  Create btrfs metadata
  corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
  leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
  leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
  fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
  chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
          item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
                  index 171 namelen 51 name: [FILENAME1]
          item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
                  index 103 namelen 51 name: [FILENAME2]

[CAUSE]
When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
along with the INODE_REF of that inode.

This leads to above stray INODE_REFs, and trigger the tree-checker.

This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.

[FIX]
Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
be updated when iterating the child inode of the directory.

Issue: #731
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/mode-common.h   | 15 ---------------
 common/utils.h        | 16 ++++++++++++++++
 convert/source-ext2.c | 30 ++++++++++++++++++++----------
 convert/source-fs.c   | 20 ++++++++++++++++++++
 4 files changed, 56 insertions(+), 25 deletions(-)

---
Changelog:
v2:
- Initialized dummy inodes' mode/generation/transid
  As the mode can still trigger tree-checker warnings.

--
2.43.0

Comments

David Sterba Jan. 16, 2024, 6:47 p.m. UTC | #1
On Sat, Jan 13, 2024 at 07:07:06PM +1030, Qu Wenruo wrote:
> [BUG]
> There is a report about failed btrfs-convert, which shows the following
> error:
> 
>   Create btrfs metadata
>   corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
>   leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
>   leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
>   fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
>   chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
>           item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
>                   index 171 namelen 51 name: [FILENAME1]
>           item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
>                   index 103 namelen 51 name: [FILENAME2]
> 
> [CAUSE]
> When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
> along with the INODE_REF of that inode.
> 
> This leads to above stray INODE_REFs, and trigger the tree-checker.
> 
> This can only happen for large fs, as for most cases we have all these
> modified tree blocks cached, thus tree-checker won't be triggered.
> But when the tree block cache is not hit, and we have to read from disk,
> then such behavior can lead to above tree-checker error.
> 
> [FIX]
> Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
> be updated when iterating the child inode of the directory.
> 
> Issue: #731
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks, the cached data are uncovering some bugs, I wonder if
https://github.com/kdave/btrfs-progs/issues/349 could be also caused by
that.

> ---
>  check/mode-common.h   | 15 ---------------
>  common/utils.h        | 16 ++++++++++++++++
>  convert/source-ext2.c | 30 ++++++++++++++++++++----------
>  convert/source-fs.c   | 20 ++++++++++++++++++++
>  4 files changed, 56 insertions(+), 25 deletions(-)
> 
> ---
> Changelog:
> v2:
> - Initialized dummy inodes' mode/generation/transid
>   As the mode can still trigger tree-checker warnings.
> 
> diff --git a/check/mode-common.h b/check/mode-common.h
> index 894bbbb8141b..80672e51e870 100644
> --- a/check/mode-common.h
> +++ b/check/mode-common.h
> @@ -167,21 +167,6 @@ static inline bool is_valid_imode(u32 imode)
> 
>  int recow_extent_buffer(struct btrfs_root *root, struct extent_buffer *eb);
> 
> -static inline u32 btrfs_type_to_imode(u8 type)
> -{
> -	static u32 imode_by_btrfs_type[] = {
> -		[BTRFS_FT_REG_FILE]	= S_IFREG,
> -		[BTRFS_FT_DIR]		= S_IFDIR,
> -		[BTRFS_FT_CHRDEV]	= S_IFCHR,
> -		[BTRFS_FT_BLKDEV]	= S_IFBLK,
> -		[BTRFS_FT_FIFO]		= S_IFIFO,
> -		[BTRFS_FT_SOCK]		= S_IFSOCK,
> -		[BTRFS_FT_SYMLINK]	= S_IFLNK,
> -	};
> -
> -	return imode_by_btrfs_type[(type)];
> -}

Why did you move this helper to utils.h? Here it's available for
anything that needs it. Mkfs and convert share some code, no style
problem to cross include from each other. Also moving it to utils.h is
going the opposite way, it's a header that's a default if there's no
better place. Lot of code has been factored out of it.
David Sterba Jan. 16, 2024, 8:06 p.m. UTC | #2
On Tue, Jan 16, 2024 at 07:47:38PM +0100, David Sterba wrote:
> On Sat, Jan 13, 2024 at 07:07:06PM +1030, Qu Wenruo wrote:
> > -static inline u32 btrfs_type_to_imode(u8 type)
> > -{
> > -	static u32 imode_by_btrfs_type[] = {
> > -		[BTRFS_FT_REG_FILE]	= S_IFREG,
> > -		[BTRFS_FT_DIR]		= S_IFDIR,
> > -		[BTRFS_FT_CHRDEV]	= S_IFCHR,
> > -		[BTRFS_FT_BLKDEV]	= S_IFBLK,
> > -		[BTRFS_FT_FIFO]		= S_IFIFO,
> > -		[BTRFS_FT_SOCK]		= S_IFSOCK,
> > -		[BTRFS_FT_SYMLINK]	= S_IFLNK,
> > -	};
> > -
> > -	return imode_by_btrfs_type[(type)];
> > -}
> 
> Why did you move this helper to utils.h? Here it's available for
> anything that needs it. Mkfs and convert share some code, no style
> problem to cross include from each other. Also moving it to utils.h is
> going the opposite way, it's a header that's a default if there's no
> better place. Lot of code has been factored out of it.

Ok so I confused mkfs and check, the header is from check/ and is not
clean for inclusion in convert due to task_ctx conflict. So I'll keep it
but noted for a future cleanup.
Qu Wenruo Jan. 16, 2024, 8:13 p.m. UTC | #3
On 2024/1/17 05:17, David Sterba wrote:
> On Sat, Jan 13, 2024 at 07:07:06PM +1030, Qu Wenruo wrote:
>> [BUG]
>> There is a report about failed btrfs-convert, which shows the following
>> error:
>>
>>    Create btrfs metadata
>>    corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
>>    leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
>>    leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
>>    fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
>>    chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
>>            item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
>>                    index 171 namelen 51 name: [FILENAME1]
>>            item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
>>                    index 103 namelen 51 name: [FILENAME2]
>>
>> [CAUSE]
>> When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
>> along with the INODE_REF of that inode.
>>
>> This leads to above stray INODE_REFs, and trigger the tree-checker.
>>
>> This can only happen for large fs, as for most cases we have all these
>> modified tree blocks cached, thus tree-checker won't be triggered.
>> But when the tree block cache is not hit, and we have to read from disk,
>> then such behavior can lead to above tree-checker error.
>>
>> [FIX]
>> Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
>> be updated when iterating the child inode of the directory.
>>
>> Issue: #731
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Thanks, the cached data are uncovering some bugs, I wonder if
> https://github.com/kdave/btrfs-progs/issues/349 could be also caused by
> that.

Unfortunately the csum is not the same problem at all.

I don't have any clue yet, but can take sometime to look into it since
there is a reproducer.

>
>> ---
>>   check/mode-common.h   | 15 ---------------
>>   common/utils.h        | 16 ++++++++++++++++
>>   convert/source-ext2.c | 30 ++++++++++++++++++++----------
>>   convert/source-fs.c   | 20 ++++++++++++++++++++
>>   4 files changed, 56 insertions(+), 25 deletions(-)
>>
>> ---
>> Changelog:
>> v2:
>> - Initialized dummy inodes' mode/generation/transid
>>    As the mode can still trigger tree-checker warnings.
>>
>> diff --git a/check/mode-common.h b/check/mode-common.h
>> index 894bbbb8141b..80672e51e870 100644
>> --- a/check/mode-common.h
>> +++ b/check/mode-common.h
>> @@ -167,21 +167,6 @@ static inline bool is_valid_imode(u32 imode)
>>
>>   int recow_extent_buffer(struct btrfs_root *root, struct extent_buffer *eb);
>>
>> -static inline u32 btrfs_type_to_imode(u8 type)
>> -{
>> -	static u32 imode_by_btrfs_type[] = {
>> -		[BTRFS_FT_REG_FILE]	= S_IFREG,
>> -		[BTRFS_FT_DIR]		= S_IFDIR,
>> -		[BTRFS_FT_CHRDEV]	= S_IFCHR,
>> -		[BTRFS_FT_BLKDEV]	= S_IFBLK,
>> -		[BTRFS_FT_FIFO]		= S_IFIFO,
>> -		[BTRFS_FT_SOCK]		= S_IFSOCK,
>> -		[BTRFS_FT_SYMLINK]	= S_IFLNK,
>> -	};
>> -
>> -	return imode_by_btrfs_type[(type)];
>> -}
>
> Why did you move this helper to utils.h? Here it's available for
> anything that needs it. Mkfs and convert share some code, no style
> problem to cross include from each other. Also moving it to utils.h is
> going the opposite way, it's a header that's a default if there's no
> better place. Lot of code has been factored out of it.

OK, my initial problem is about including headers from check/, but since
it's not a problem then I'm totally fine.

Would update the patch and reflect that.

Thanks,
Qu
>
David Sterba Jan. 17, 2024, 12:56 a.m. UTC | #4
On Wed, Jan 17, 2024 at 06:43:50AM +1030, Qu Wenruo wrote:
> 
> 
> On 2024/1/17 05:17, David Sterba wrote:
> > On Sat, Jan 13, 2024 at 07:07:06PM +1030, Qu Wenruo wrote:
> >> [BUG]
> >> There is a report about failed btrfs-convert, which shows the following
> >> error:
> >>
> >>    Create btrfs metadata
> >>    corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
> >>    leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
> >>    leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
> >>    fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
> >>    chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
> >>            item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
> >>                    index 171 namelen 51 name: [FILENAME1]
> >>            item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
> >>                    index 103 namelen 51 name: [FILENAME2]
> >>
> >> [CAUSE]
> >> When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
> >> along with the INODE_REF of that inode.
> >>
> >> This leads to above stray INODE_REFs, and trigger the tree-checker.
> >>
> >> This can only happen for large fs, as for most cases we have all these
> >> modified tree blocks cached, thus tree-checker won't be triggered.
> >> But when the tree block cache is not hit, and we have to read from disk,
> >> then such behavior can lead to above tree-checker error.
> >>
> >> [FIX]
> >> Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
> >> be updated when iterating the child inode of the directory.
> >>
> >> Issue: #731
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >
> > Thanks, the cached data are uncovering some bugs, I wonder if
> > https://github.com/kdave/btrfs-progs/issues/349 could be also caused by
> > that.
> 
> Unfortunately the csum is not the same problem at all.
> 
> I don't have any clue yet, but can take sometime to look into it since
> there is a reproducer.
> 
> >
> >> ---
> >>   check/mode-common.h   | 15 ---------------
> >>   common/utils.h        | 16 ++++++++++++++++
> >>   convert/source-ext2.c | 30 ++++++++++++++++++++----------
> >>   convert/source-fs.c   | 20 ++++++++++++++++++++
> >>   4 files changed, 56 insertions(+), 25 deletions(-)
> >>
> >> ---
> >> Changelog:
> >> v2:
> >> - Initialized dummy inodes' mode/generation/transid
> >>    As the mode can still trigger tree-checker warnings.
> >>
> >> diff --git a/check/mode-common.h b/check/mode-common.h
> >> index 894bbbb8141b..80672e51e870 100644
> >> --- a/check/mode-common.h
> >> +++ b/check/mode-common.h
> >> @@ -167,21 +167,6 @@ static inline bool is_valid_imode(u32 imode)
> >>
> >>   int recow_extent_buffer(struct btrfs_root *root, struct extent_buffer *eb);
> >>
> >> -static inline u32 btrfs_type_to_imode(u8 type)
> >> -{
> >> -	static u32 imode_by_btrfs_type[] = {
> >> -		[BTRFS_FT_REG_FILE]	= S_IFREG,
> >> -		[BTRFS_FT_DIR]		= S_IFDIR,
> >> -		[BTRFS_FT_CHRDEV]	= S_IFCHR,
> >> -		[BTRFS_FT_BLKDEV]	= S_IFBLK,
> >> -		[BTRFS_FT_FIFO]		= S_IFIFO,
> >> -		[BTRFS_FT_SOCK]		= S_IFSOCK,
> >> -		[BTRFS_FT_SYMLINK]	= S_IFLNK,
> >> -	};
> >> -
> >> -	return imode_by_btrfs_type[(type)];
> >> -}
> >
> > Why did you move this helper to utils.h? Here it's available for
> > anything that needs it. Mkfs and convert share some code, no style
> > problem to cross include from each other. Also moving it to utils.h is
> > going the opposite way, it's a header that's a default if there's no
> > better place. Lot of code has been factored out of it.
> 
> OK, my initial problem is about including headers from check/, but since
> it's not a problem then I'm totally fine.
> 
> Would update the patch and reflect that.

No need to, I've added it to devel already, thanks.
diff mbox series

Patch

diff --git a/check/mode-common.h b/check/mode-common.h
index 894bbbb8141b..80672e51e870 100644
--- a/check/mode-common.h
+++ b/check/mode-common.h
@@ -167,21 +167,6 @@  static inline bool is_valid_imode(u32 imode)

 int recow_extent_buffer(struct btrfs_root *root, struct extent_buffer *eb);

-static inline u32 btrfs_type_to_imode(u8 type)
-{
-	static u32 imode_by_btrfs_type[] = {
-		[BTRFS_FT_REG_FILE]	= S_IFREG,
-		[BTRFS_FT_DIR]		= S_IFDIR,
-		[BTRFS_FT_CHRDEV]	= S_IFCHR,
-		[BTRFS_FT_BLKDEV]	= S_IFBLK,
-		[BTRFS_FT_FIFO]		= S_IFIFO,
-		[BTRFS_FT_SOCK]		= S_IFSOCK,
-		[BTRFS_FT_SYMLINK]	= S_IFLNK,
-	};
-
-	return imode_by_btrfs_type[(type)];
-}
-
 int get_extent_item_generation(u64 bytenr, u64 *gen_ret);

 /*
diff --git a/common/utils.h b/common/utils.h
index e267814b08a8..dcd817144f9c 100644
--- a/common/utils.h
+++ b/common/utils.h
@@ -22,6 +22,7 @@ 
 #include "kerncompat.h"
 #include <stdbool.h>
 #include <stddef.h>
+#include <fcntl.h>
 #include "kernel-lib/list.h"
 #include "kernel-shared/volumes.h"
 #include "common/fsfeatures.h"
@@ -40,6 +41,21 @@  enum exclusive_operation {
 	BTRFS_EXCLOP_UNKNOWN = -1,
 };

+static inline u32 btrfs_type_to_imode(u8 type)
+{
+	static u32 imode_by_btrfs_type[] = {
+		[BTRFS_FT_REG_FILE]	= S_IFREG,
+		[BTRFS_FT_DIR]		= S_IFDIR,
+		[BTRFS_FT_CHRDEV]	= S_IFCHR,
+		[BTRFS_FT_BLKDEV]	= S_IFBLK,
+		[BTRFS_FT_FIFO]		= S_IFIFO,
+		[BTRFS_FT_SOCK]		= S_IFSOCK,
+		[BTRFS_FT_SYMLINK]	= S_IFLNK,
+	};
+
+	return imode_by_btrfs_type[(type)];
+}
+
 /* 2 for "0x", 2 for each byte, plus nul */
 #define BTRFS_CSUM_STRING_LEN		(2 + 2 * BTRFS_CSUM_SIZE + 1)
 void btrfs_format_csum(u16 csum_type, const u8 *data, char *output);
diff --git a/convert/source-ext2.c b/convert/source-ext2.c
index 7e93b0901489..f56d79734715 100644
--- a/convert/source-ext2.c
+++ b/convert/source-ext2.c
@@ -857,6 +857,10 @@  static int ext2_copy_single_inode(struct btrfs_trans_handle *trans,
 	struct btrfs_key inode_key;
 	struct btrfs_path path = { 0 };

+	inode_key.objectid = objectid;
+	inode_key.type = BTRFS_INODE_ITEM_KEY;
+	inode_key.offset = 0;
+
 	if (ext2_inode->i_links_count == 0)
 		return 0;

@@ -878,13 +882,23 @@  static int ext2_copy_single_inode(struct btrfs_trans_handle *trans,
 	ext2_convert_inode_flags(&btrfs_inode, ext2_inode);

 	/*
-	 * The inode item must be inserted before any file extents/dir items/xattrs,
-	 * or we may trigger tree-checker. (File extents/dir items/xattrs require
-	 * the previous item has the same key objectid).
+	 * The inode may already be created (with dummy contents), in that
+	 * case we don't need to do anything yet.
+	 * The inode item would be updated at the end anyway.
 	 */
-	ret = btrfs_insert_inode(trans, root, objectid, &btrfs_inode);
-	if (ret < 0)
-		return ret;
+	ret = btrfs_lookup_inode(trans, root, &path, &inode_key, 1);
+	btrfs_release_path(&path);
+	if (ret > 0) {
+		/*
+		 * No inode item yet, the inode item must be inserted before
+		 * any file extents/dir items/xattrs, or we may trigger
+		 * tree-checker. (File extents/dir items/xattrs require the
+		 * previous item has the same key objectid).
+		 */
+		ret = btrfs_insert_inode(trans, root, objectid, &btrfs_inode);
+		if (ret < 0)
+			return ret;
+	}

 	switch (ext2_inode->i_mode & S_IFMT) {
 	case S_IFREG:
@@ -917,10 +931,6 @@  static int ext2_copy_single_inode(struct btrfs_trans_handle *trans,
 	 * Update the inode item, as above insert never updates the inode's
 	 * nbytes and size.
 	 */
-	inode_key.objectid = objectid;
-	inode_key.type = BTRFS_INODE_ITEM_KEY;
-	inode_key.offset = 0;
-
 	ret = btrfs_lookup_inode(trans, root, &path, &inode_key, 1);
 	if (ret > 0)
 		ret = -ENOENT;
diff --git a/convert/source-fs.c b/convert/source-fs.c
index fe1ff7d0d795..66561438866e 100644
--- a/convert/source-fs.c
+++ b/convert/source-fs.c
@@ -23,6 +23,8 @@ 
 #include "kernel-shared/ctree.h"
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/volumes.h"
+#include "kernel-shared/transaction.h"
+#include "common/utils.h"
 #include "common/internal.h"
 #include "common/messages.h"
 #include "common/extent-cache.h"
@@ -183,6 +185,7 @@  int convert_insert_dirent(struct btrfs_trans_handle *trans,
 {
 	int ret;
 	u64 inode_size;
+	struct btrfs_inode_item dummy_iitem = { 0 };
 	struct btrfs_key location = {
 		.objectid = objectid,
 		.offset = 0,
@@ -193,6 +196,23 @@  int convert_insert_dirent(struct btrfs_trans_handle *trans,
 				    dir, &location, file_type, index_cnt);
 	if (ret)
 		return ret;
+
+	btrfs_set_stack_inode_mode(&dummy_iitem, btrfs_type_to_imode(file_type));
+	btrfs_set_stack_inode_generation(&dummy_iitem, trans->transid);
+	btrfs_set_stack_inode_transid(&dummy_iitem, trans->transid);
+	/*
+	 * We must have an INOTE_ITEM before INODE_REF, or tree-checker won't
+	 * be happy.
+	 * The content of the INODE_ITEM would be properly updated when iterating
+	 * that child inode, but we should still try to make it as valid as
+	 * possible, or we may still trigger some tree checker.
+	 */
+	ret = btrfs_insert_inode(trans, root, objectid, &dummy_iitem);
+	/* The inode item is already there, just skip it. */
+	if (ret == -EEXIST)
+		ret = 0;
+	if (ret < 0)
+		return ret;
 	ret = btrfs_insert_inode_ref(trans, root, name, name_len,
 				     objectid, dir, index_cnt);
 	if (ret)