[1/1] ext4: fix crash on BUG_ON in ext4_alloc_group_tables

Message ID	20240925143325.518508-2-aleksandr.mikhalitsyn@canonical.com (mailing list archive)
State	New
Headers	show Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6D0D43158 for <linux-fsdevel@vger.kernel.org>; Wed, 25 Sep 2024 14:34:15 +0000 (UTC) From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> To: tytso@mit.edu Cc: stable@vger.kernel.org, Andreas Dilger <adilger.kernel@dilger.ca>, Jan Kara <jack@suse.cz>, Baokun Li <libaokun1@huawei.com>, =?utf-8?q?St?= =?utf-8?q?=C3=A9phane_Graber?= <stgraber@stgraber.org>, Christian Brauner <brauner@kernel.org>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Wesley Hershberger <wesley.hershberger@canonical.com>, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Subject: [PATCH 1/1] ext4: fix crash on BUG_ON in ext4_alloc_group_tables Date: Wed, 25 Sep 2024 16:33:24 +0200 Message-Id: <20240925143325.518508-2-aleksandr.mikhalitsyn@canonical.com> In-Reply-To: <20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com> References: <20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	ext4: fix crash on BUG_ON in ext4_alloc_group_tables \| expand [0/1] ext4: fix crash on BUG_ON in ext4_alloc_group_tables [1/1] ext4: fix crash on BUG_ON in ext4_alloc_group_tables

Aleksandr Mikhalitsyn Sept. 25, 2024, 2:33 p.m. UTC

[   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
[   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
[   33.888740] ------------[ cut here ]------------
[   33.888742] kernel BUG at fs/ext4/resize.c:324!
[   33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
[   33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
[   33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
[   33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
[   33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
[   33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
[   33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
[   33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
[   33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
[   33.895877] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
[   33.896524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
[   33.897516] Call Trace:
[   33.897638]  <TASK>
[   33.897728]  ? show_regs+0x6d/0x80
[   33.897942]  ? die+0x3c/0xa0
[   33.898106]  ? do_trap+0xe5/0x110
[   33.898311]  ? do_error_trap+0x6e/0x90
[   33.898555]  ? ext4_resize_fs+0x1212/0x12d0
[   33.898844]  ? exc_invalid_op+0x57/0x80
[   33.899101]  ? ext4_resize_fs+0x1212/0x12d0
[   33.899387]  ? asm_exc_invalid_op+0x1f/0x30
[   33.899675]  ? ext4_resize_fs+0x1212/0x12d0
[   33.899961]  ? ext4_resize_fs+0x745/0x12d0
[   33.900239]  __ext4_ioctl+0x4e0/0x1800
[   33.900489]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.900832]  ? putname+0x5b/0x70
[   33.901028]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.901374]  ? do_sys_openat2+0x87/0xd0
[   33.901632]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.901981]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.902324]  ? __x64_sys_openat+0x59/0xa0
[   33.902595]  ext4_ioctl+0x12/0x20
[   33.902802]  ? ext4_ioctl+0x12/0x20
[   33.903031]  __x64_sys_ioctl+0x99/0xd0
[   33.903277]  x64_sys_call+0x1206/0x20d0
[   33.903534]  do_syscall_64+0x72/0x110
[   33.903771]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.904115]  ? irqentry_exit+0x3f/0x50
[   33.904362]  ? srso_alias_return_thunk+0x5/0xfbef5
[   33.904707]  ? exc_page_fault+0x1aa/0x7b0
[   33.904979]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   33.905349] RIP: 0033:0x7f46efe3294f
[   33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f
[   33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004
[   33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7
[   33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[   33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000
[   33.910740]  </TASK>
[   33.910837] Modules linked in:
[   33.911049] ---[ end trace 0000000000000000 ]---
[   33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
[   33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
[   33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
[   33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
[   33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
[   33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
[   33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
[   33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
[   33.917884] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
[   33.918818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
[   44.072293] ------------[ cut here ]------------

Cc: stable@vger.kernel.org # v6.8+
Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Baokun Li <libaokun1@huawei.com>
Cc: Stéphane Graber <stgraber@stgraber.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: <linux-ext4@vger.kernel.org>
Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 fs/ext4/resize.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Jan Kara Sept. 25, 2024, 3:57 p.m. UTC | #1

On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
> [   33.888740] ------------[ cut here ]------------
> [   33.888742] kernel BUG at fs/ext4/resize.c:324!

Ah, I was staring at this for a while before I understood what's going on
(it would be great to explain this in the changelog BTW).  As far as I
understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- 1) which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change
the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
flexbg size. Baokun?

								Honza


> [   33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [   33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
> [   33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> [   33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
> [   33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
> [   33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
> [   33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
> [   33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
> [   33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
> [   33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
> [   33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
> [   33.895877] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
> [   33.896524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
> [   33.897516] Call Trace:
> [   33.897638]  <TASK>
> [   33.897728]  ? show_regs+0x6d/0x80
> [   33.897942]  ? die+0x3c/0xa0
> [   33.898106]  ? do_trap+0xe5/0x110
> [   33.898311]  ? do_error_trap+0x6e/0x90
> [   33.898555]  ? ext4_resize_fs+0x1212/0x12d0
> [   33.898844]  ? exc_invalid_op+0x57/0x80
> [   33.899101]  ? ext4_resize_fs+0x1212/0x12d0
> [   33.899387]  ? asm_exc_invalid_op+0x1f/0x30
> [   33.899675]  ? ext4_resize_fs+0x1212/0x12d0
> [   33.899961]  ? ext4_resize_fs+0x745/0x12d0
> [   33.900239]  __ext4_ioctl+0x4e0/0x1800
> [   33.900489]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.900832]  ? putname+0x5b/0x70
> [   33.901028]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.901374]  ? do_sys_openat2+0x87/0xd0
> [   33.901632]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.901981]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.902324]  ? __x64_sys_openat+0x59/0xa0
> [   33.902595]  ext4_ioctl+0x12/0x20
> [   33.902802]  ? ext4_ioctl+0x12/0x20
> [   33.903031]  __x64_sys_ioctl+0x99/0xd0
> [   33.903277]  x64_sys_call+0x1206/0x20d0
> [   33.903534]  do_syscall_64+0x72/0x110
> [   33.903771]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.904115]  ? irqentry_exit+0x3f/0x50
> [   33.904362]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   33.904707]  ? exc_page_fault+0x1aa/0x7b0
> [   33.904979]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   33.905349] RIP: 0033:0x7f46efe3294f
> [   33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [   33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [   33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f
> [   33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004
> [   33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7
> [   33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
> [   33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000
> [   33.910740]  </TASK>
> [   33.910837] Modules linked in:
> [   33.911049] ---[ end trace 0000000000000000 ]---
> [   33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
> [   33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
> [   33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
> [   33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
> [   33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
> [   33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
> [   33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
> [   33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
> [   33.917884] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
> [   33.918818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
> [   44.072293] ------------[ cut here ]------------
> 
> Cc: stable@vger.kernel.org # v6.8+
> Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Baokun Li <libaokun1@huawei.com>
> Cc: Stéphane Graber <stgraber@stgraber.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> Cc: <linux-fsdevel@vger.kernel.org>
> Cc: <linux-ext4@vger.kernel.org>
> Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
> Reported-by: Stéphane Graber <stgraber@stgraber.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  fs/ext4/resize.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index e04eb08b9060..c057a7867363 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -300,8 +300,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
>   * block group.
>   */
>  static int ext4_alloc_group_tables(struct super_block *sb,
> -				struct ext4_new_flex_group_data *flex_gd,
> -				unsigned int flexbg_size)
> +				struct ext4_new_flex_group_data *flex_gd)
>  {
>  	struct ext4_new_group_data *group_data = flex_gd->groups;
>  	ext4_fsblk_t start_blk;
> @@ -313,7 +312,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
>  	ext4_group_t group;
>  	ext4_group_t last_group;
>  	unsigned overhead;
> -	__u16 uninit_mask = (flexbg_size > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
> +	__u16 uninit_mask = (flex_gd->resize_bg > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
>  	int i;
>  
>  	BUG_ON(flex_gd->count == 0 || group_data == NULL);
> @@ -321,8 +320,8 @@ static int ext4_alloc_group_tables(struct super_block *sb,
>  	src_group = group_data[0].group;
>  	last_group  = src_group + flex_gd->count - 1;
>  
> -	BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
> -	       (last_group & ~(flexbg_size - 1))));
> +	BUG_ON((flex_gd->resize_bg > 1) && ((src_group & ~(flex_gd->resize_bg - 1)) !=
> +	       (last_group & ~(flex_gd->resize_bg - 1))));
>  next_group:
>  	group = group_data[0].group;
>  	if (src_group >= group_data[0].group + flex_gd->count)
> @@ -403,7 +402,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
>  
>  		printk(KERN_DEBUG "EXT4-fs: adding a flex group with "
>  		       "%u groups, flexbg size is %u:\n", flex_gd->count,
> -		       flexbg_size);
> +		       flex_gd->resize_bg);
>  
>  		for (i = 0; i < flex_gd->count; i++) {
>  			ext4_debug(
> @@ -2158,7 +2157,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
>  					 ext4_blocks_count(es));
>  			last_update_time = jiffies;
>  		}
> -		if (ext4_alloc_group_tables(sb, flex_gd, flexbg_size) != 0)
> +		if (ext4_alloc_group_tables(sb, flex_gd) != 0)
>  			break;
>  		err = ext4_flex_group_add(sb, resize_inode, flex_gd);
>  		if (unlikely(err))
> -- 
> 2.34.1
>

Aleksandr Mikhalitsyn Sept. 25, 2024, 4:17 p.m. UTC | #2

On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
> > [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
> > [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
> > [   33.888740] ------------[ cut here ]------------
> > [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>
> Ah, I was staring at this for a while before I understood what's going on
> (it would be great to explain this in the changelog BTW).  As far as I
> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
> - 1) which then confuses things. I think that was not really intended and

Hi Jan,

First of all, thanks for your reaction/review on this one ;-)

You are absolutely right, have just checked with our reproducer and
this modification:

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index e04eb08b9060..530a918f0cab 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
*alloc_flex_gd(unsigned int flexbg_size,
                flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
                                              fls(n_group - last_group));

+       BUG_ON(flex_gd->resize_bg > flexbg_size);
+
        flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
                                        sizeof(struct ext4_new_group_data),
                                        GFP_NOFS);

and yes, it crashes on this BUG_ON. So it looks like instead of making
flex_gd->resize_bg to be smaller
than flexbg_size in most cases we can actually have an opposite effect
here. I guess we really need to fix alloc_flex_gd() too.

> instead of fixing up ext4_alloc_group_tables() we should really change
> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
> flexbg size. Baokun?

At the same time, if I understand the code right, as we can have
flex_gd->resize_bg != flexbg_size after
5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
flex bg") and
665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
we should always refer to flex_gd->resize_bg value which means that
ext4_alloc_group_tables() fix is needed too.
Am I correct in my understanding?

>
>                                                                 Honza

Kind regards,
Alex

>
>
> > [   33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [   33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
> > [   33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > [   33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
> > [   33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
> > [   33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
> > [   33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
> > [   33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
> > [   33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
> > [   33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
> > [   33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
> > [   33.895877] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
> > [   33.896524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
> > [   33.897516] Call Trace:
> > [   33.897638]  <TASK>
> > [   33.897728]  ? show_regs+0x6d/0x80
> > [   33.897942]  ? die+0x3c/0xa0
> > [   33.898106]  ? do_trap+0xe5/0x110
> > [   33.898311]  ? do_error_trap+0x6e/0x90
> > [   33.898555]  ? ext4_resize_fs+0x1212/0x12d0
> > [   33.898844]  ? exc_invalid_op+0x57/0x80
> > [   33.899101]  ? ext4_resize_fs+0x1212/0x12d0
> > [   33.899387]  ? asm_exc_invalid_op+0x1f/0x30
> > [   33.899675]  ? ext4_resize_fs+0x1212/0x12d0
> > [   33.899961]  ? ext4_resize_fs+0x745/0x12d0
> > [   33.900239]  __ext4_ioctl+0x4e0/0x1800
> > [   33.900489]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.900832]  ? putname+0x5b/0x70
> > [   33.901028]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.901374]  ? do_sys_openat2+0x87/0xd0
> > [   33.901632]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.901981]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.902324]  ? __x64_sys_openat+0x59/0xa0
> > [   33.902595]  ext4_ioctl+0x12/0x20
> > [   33.902802]  ? ext4_ioctl+0x12/0x20
> > [   33.903031]  __x64_sys_ioctl+0x99/0xd0
> > [   33.903277]  x64_sys_call+0x1206/0x20d0
> > [   33.903534]  do_syscall_64+0x72/0x110
> > [   33.903771]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.904115]  ? irqentry_exit+0x3f/0x50
> > [   33.904362]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [   33.904707]  ? exc_page_fault+0x1aa/0x7b0
> > [   33.904979]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [   33.905349] RIP: 0033:0x7f46efe3294f
> > [   33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> > [   33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [   33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f
> > [   33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004
> > [   33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7
> > [   33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
> > [   33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000
> > [   33.910740]  </TASK>
> > [   33.910837] Modules linked in:
> > [   33.911049] ---[ end trace 0000000000000000 ]---
> > [   33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0
> > [   33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8
> > [   33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202
> > [   33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0
> > [   33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810
> > [   33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000
> > [   33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000
> > [   33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000
> > [   33.917884] FS:  00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000
> > [   33.918818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0
> > [   44.072293] ------------[ cut here ]------------
> >
> > Cc: stable@vger.kernel.org # v6.8+
> > Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
> > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Baokun Li <libaokun1@huawei.com>
> > Cc: Stéphane Graber <stgraber@stgraber.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: <linux-kernel@vger.kernel.org>
> > Cc: <linux-fsdevel@vger.kernel.org>
> > Cc: <linux-ext4@vger.kernel.org>
> > Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
> > Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
> > Reported-by: Stéphane Graber <stgraber@stgraber.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> >  fs/ext4/resize.c | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> > index e04eb08b9060..c057a7867363 100644
> > --- a/fs/ext4/resize.c
> > +++ b/fs/ext4/resize.c
> > @@ -300,8 +300,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
> >   * block group.
> >   */
> >  static int ext4_alloc_group_tables(struct super_block *sb,
> > -                             struct ext4_new_flex_group_data *flex_gd,
> > -                             unsigned int flexbg_size)
> > +                             struct ext4_new_flex_group_data *flex_gd)
> >  {
> >       struct ext4_new_group_data *group_data = flex_gd->groups;
> >       ext4_fsblk_t start_blk;
> > @@ -313,7 +312,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
> >       ext4_group_t group;
> >       ext4_group_t last_group;
> >       unsigned overhead;
> > -     __u16 uninit_mask = (flexbg_size > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
> > +     __u16 uninit_mask = (flex_gd->resize_bg > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
> >       int i;
> >
> >       BUG_ON(flex_gd->count == 0 || group_data == NULL);
> > @@ -321,8 +320,8 @@ static int ext4_alloc_group_tables(struct super_block *sb,
> >       src_group = group_data[0].group;
> >       last_group  = src_group + flex_gd->count - 1;
> >
> > -     BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
> > -            (last_group & ~(flexbg_size - 1))));
> > +     BUG_ON((flex_gd->resize_bg > 1) && ((src_group & ~(flex_gd->resize_bg - 1)) !=
> > +            (last_group & ~(flex_gd->resize_bg - 1))));
> >  next_group:
> >       group = group_data[0].group;
> >       if (src_group >= group_data[0].group + flex_gd->count)
> > @@ -403,7 +402,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
> >
> >               printk(KERN_DEBUG "EXT4-fs: adding a flex group with "
> >                      "%u groups, flexbg size is %u:\n", flex_gd->count,
> > -                    flexbg_size);
> > +                    flex_gd->resize_bg);
> >
> >               for (i = 0; i < flex_gd->count; i++) {
> >                       ext4_debug(
> > @@ -2158,7 +2157,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
> >                                        ext4_blocks_count(es));
> >                       last_update_time = jiffies;
> >               }
> > -             if (ext4_alloc_group_tables(sb, flex_gd, flexbg_size) != 0)
> > +             if (ext4_alloc_group_tables(sb, flex_gd) != 0)
> >                       break;
> >               err = ext4_flex_group_add(sb, resize_inode, flex_gd);
> >               if (unlikely(err))
> > --
> > 2.34.1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Baokun Li Sept. 26, 2024, 8:28 a.m. UTC | #3

On 2024/9/25 23:57, Jan Kara wrote:
> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>> [   33.888740] ------------[ cut here ]------------
>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
> Ah, I was staring at this for a while before I understood what's going on
> (it would be great to explain this in the changelog BTW).  As far as I
> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
> - 1) which then confuses things. I think that was not really intended and
> instead of fixing up ext4_alloc_group_tables() we should really change
> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
> flexbg size. Baokun?
>
> 								Honza

Hi Honza,

Your analysis is absolutely correct. It's a bug!
Thank you for locating this issue！
An extra 1 should not be added when calculating resize_bg in 
alloc_flex_gd().


Hi Aleksandr,

Could you help test if the following changes work?


Thanks,
Baokun

---

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index e04eb08b9060..1f01a7632149 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data 
*alloc_flex_gd(unsigned int flexbg_size,
         /* Avoid allocating large 'groups' array if not needed */
         last_group = o_group | (flex_gd->resize_bg - 1);
         if (n_group <= last_group)
-               flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
+               flex_gd->resize_bg = 1 << fls(n_group - o_group);
         else if (n_group - last_group < flex_gd->resize_bg)
-               flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
+               flex_gd->resize_bg = 1 << max(fls(last_group - o_group),
                                               fls(n_group - last_group));

         flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
                                         sizeof(struct ext4_new_group_data),

Baokun Li Sept. 26, 2024, 8:50 a.m. UTC | #4

On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
> On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@suse.cz> wrote:
>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>>> [   33.888740] ------------[ cut here ]------------
>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>> Ah, I was staring at this for a while before I understood what's going on
>> (it would be great to explain this in the changelog BTW).  As far as I
>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
>> - 1) which then confuses things. I think that was not really intended and
> Hi Jan,
>
> First of all, thanks for your reaction/review on this one ;-)
>
> You are absolutely right, have just checked with our reproducer and
> this modification:
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index e04eb08b9060..530a918f0cab 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
> *alloc_flex_gd(unsigned int flexbg_size,
>                  flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
>                                                fls(n_group - last_group));
>
> +       BUG_ON(flex_gd->resize_bg > flexbg_size);
> +
>          flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
>                                          sizeof(struct ext4_new_group_data),
>                                          GFP_NOFS);
>
> and yes, it crashes on this BUG_ON. So it looks like instead of making
> flex_gd->resize_bg to be smaller
> than flexbg_size in most cases we can actually have an opposite effect
> here. I guess we really need to fix alloc_flex_gd() too.
>
>> instead of fixing up ext4_alloc_group_tables() we should really change
>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
>> flexbg size. Baokun?
> At the same time, if I understand the code right, as we can have
> flex_gd->resize_bg != flexbg_size after
> 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
> flex bg") and
> 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
> we should always refer to flex_gd->resize_bg value which means that
> ext4_alloc_group_tables() fix is needed too.
> Am I correct in my understanding?

Hi Alex,

These two are not exactly equivalent.

The flex_gd->resize_bg is only used to determine how many block groups we
allocate memory to, i.e., the maximum number of block groups per resize.
And the flexbg_size is used to make some judgement on flexible block
groups, for example, the BUG_ON triggered in the issue is to make sure
src_group and last_group must be in the same flexible block group.


Regards,
Baokun

Aleksandr Mikhalitsyn Sept. 26, 2024, 9:16 a.m. UTC | #5

On Thu, Sep 26, 2024 at 10:29 AM Baokun Li <libaokun1@huawei.com> wrote:
>
> On 2024/9/25 23:57, Jan Kara wrote:
> > On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
> >> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
> >> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
> >> [   33.888740] ------------[ cut here ]------------
> >> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
> > Ah, I was staring at this for a while before I understood what's going on
> > (it would be great to explain this in the changelog BTW).  As far as I
> > understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
> > in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
> > flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
> > - 1) which then confuses things. I think that was not really intended and
> > instead of fixing up ext4_alloc_group_tables() we should really change
> > the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
> > flexbg size. Baokun?
> >
> >                                                               Honza
>
> Hi Honza,
>
> Your analysis is absolutely correct. It's a bug!
> Thank you for locating this issue！
> An extra 1 should not be added when calculating resize_bg in
> alloc_flex_gd().
>
>
> Hi Aleksandr,

Hi Baokun,

>
> Could you help test if the following changes work?

I can confirm that this patch helps.

Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

Kind regards,
Alex

>
>
> Thanks,
> Baokun
>
> ---
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index e04eb08b9060..1f01a7632149 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data
> *alloc_flex_gd(unsigned int flexbg_size,
>          /* Avoid allocating large 'groups' array if not needed */
>          last_group = o_group | (flex_gd->resize_bg - 1);
>          if (n_group <= last_group)
> -               flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
> +               flex_gd->resize_bg = 1 << fls(n_group - o_group);
>          else if (n_group - last_group < flex_gd->resize_bg)
> -               flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
> +               flex_gd->resize_bg = 1 << max(fls(last_group - o_group),
>                                                fls(n_group - last_group));
>
>          flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
>                                          sizeof(struct ext4_new_group_data),
>

Aleksandr Mikhalitsyn Sept. 26, 2024, 9:23 a.m. UTC | #6

On Thu, Sep 26, 2024 at 10:50 AM Baokun Li <libaokun1@huawei.com> wrote:
>
> On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
> > On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@suse.cz> wrote:
> >> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
> >>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
> >>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
> >>> [   33.888740] ------------[ cut here ]------------
> >>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
> >> Ah, I was staring at this for a while before I understood what's going on
> >> (it would be great to explain this in the changelog BTW).  As far as I
> >> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
> >> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
> >> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
> >> - 1) which then confuses things. I think that was not really intended and
> > Hi Jan,
> >
> > First of all, thanks for your reaction/review on this one ;-)
> >
> > You are absolutely right, have just checked with our reproducer and
> > this modification:
> >
> > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> > index e04eb08b9060..530a918f0cab 100644
> > --- a/fs/ext4/resize.c
> > +++ b/fs/ext4/resize.c
> > @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
> > *alloc_flex_gd(unsigned int flexbg_size,
> >                  flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
> >                                                fls(n_group - last_group));
> >
> > +       BUG_ON(flex_gd->resize_bg > flexbg_size);
> > +
> >          flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
> >                                          sizeof(struct ext4_new_group_data),
> >                                          GFP_NOFS);
> >
> > and yes, it crashes on this BUG_ON. So it looks like instead of making
> > flex_gd->resize_bg to be smaller
> > than flexbg_size in most cases we can actually have an opposite effect
> > here. I guess we really need to fix alloc_flex_gd() too.
> >
> >> instead of fixing up ext4_alloc_group_tables() we should really change
> >> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
> >> flexbg size. Baokun?
> > At the same time, if I understand the code right, as we can have
> > flex_gd->resize_bg != flexbg_size after
> > 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
> > flex bg") and
> > 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
> > we should always refer to flex_gd->resize_bg value which means that
> > ext4_alloc_group_tables() fix is needed too.
> > Am I correct in my understanding?
>
> Hi Alex,

Hi Baokun,

>
> These two are not exactly equivalent.
>
> The flex_gd->resize_bg is only used to determine how many block groups we
> allocate memory to, i.e., the maximum number of block groups per resize.
> And the flexbg_size is used to make some judgement on flexible block
> groups, for example, the BUG_ON triggered in the issue is to make sure
> src_group and last_group must be in the same flexible block group.

Huge thanks for explaining this!

Then I guess it's better if you send a patch with your fix.
Feel free to add my Tested-by tag.

Question to you and Jan. Do you guys think that it makes sense to try
to create a minimal reproducer for this problem without Incus/LXD involved?
(only e2fsprogs, lvm tools, etc)

I guess this test can be put in the xfstests test suite, right?

Kind regards,
Alex

>
>
> Regards,
> Baokun
>

Baokun Li Sept. 26, 2024, 9:28 a.m. UTC | #7

On 2024/9/26 17:16, Aleksandr Mikhalitsyn wrote:
> On Thu, Sep 26, 2024 at 10:29 AM Baokun Li <libaokun1@huawei.com> wrote:
>> On 2024/9/25 23:57, Jan Kara wrote:
>>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>>>> [   33.888740] ------------[ cut here ]------------
>>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>>> Ah, I was staring at this for a while before I understood what's going on
>>> (it would be great to explain this in the changelog BTW).  As far as I
>>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
>>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
>>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
>>> - 1) which then confuses things. I think that was not really intended and
>>> instead of fixing up ext4_alloc_group_tables() we should really change
>>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
>>> flexbg size. Baokun?
>>>
>>>                                                                Honza
>> Hi Honza,
>>
>> Your analysis is absolutely correct. It's a bug!
>> Thank you for locating this issue！
>> An extra 1 should not be added when calculating resize_bg in
>> alloc_flex_gd().
>>
>>
>> Hi Aleksandr,
> Hi Baokun,
>
>> Could you help test if the following changes work?
> I can confirm that this patch helps.
>
> Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>
> Kind regards,
> Alex

Thank you for the test!


Cheers,
Baokun
>>
>> Thanks,
>> Baokun
>>
>> ---
>>
>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>> index e04eb08b9060..1f01a7632149 100644
>> --- a/fs/ext4/resize.c
>> +++ b/fs/ext4/resize.c
>> @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data
>> *alloc_flex_gd(unsigned int flexbg_size,
>>           /* Avoid allocating large 'groups' array if not needed */
>>           last_group = o_group | (flex_gd->resize_bg - 1);
>>           if (n_group <= last_group)
>> -               flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
>> +               flex_gd->resize_bg = 1 << fls(n_group - o_group);
>>           else if (n_group - last_group < flex_gd->resize_bg)
>> -               flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
>> +               flex_gd->resize_bg = 1 << max(fls(last_group - o_group),
>>                                                 fls(n_group - last_group));
>>
>>           flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
>>                                           sizeof(struct ext4_new_group_data),
>>

Baokun Li Sept. 26, 2024, 9:40 a.m. UTC | #8

On 2024/9/26 17:23, Aleksandr Mikhalitsyn wrote:
> On Thu, Sep 26, 2024 at 10:50 AM Baokun Li <libaokun1@huawei.com> wrote:
>> On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
>>> On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@suse.cz> wrote:
>>>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>>>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>>>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>>>>> [   33.888740] ------------[ cut here ]------------
>>>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>>>> Ah, I was staring at this for a while before I understood what's going on
>>>> (it would be great to explain this in the changelog BTW).  As far as I
>>>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
>>>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
>>>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
>>>> - 1) which then confuses things. I think that was not really intended and
>>> Hi Jan,
>>>
>>> First of all, thanks for your reaction/review on this one ;-)
>>>
>>> You are absolutely right, have just checked with our reproducer and
>>> this modification:
>>>
>>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>>> index e04eb08b9060..530a918f0cab 100644
>>> --- a/fs/ext4/resize.c
>>> +++ b/fs/ext4/resize.c
>>> @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
>>> *alloc_flex_gd(unsigned int flexbg_size,
>>>                   flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
>>>                                                 fls(n_group - last_group));
>>>
>>> +       BUG_ON(flex_gd->resize_bg > flexbg_size);
>>> +
>>>           flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
>>>                                           sizeof(struct ext4_new_group_data),
>>>                                           GFP_NOFS);
>>>
>>> and yes, it crashes on this BUG_ON. So it looks like instead of making
>>> flex_gd->resize_bg to be smaller
>>> than flexbg_size in most cases we can actually have an opposite effect
>>> here. I guess we really need to fix alloc_flex_gd() too.
>>>
>>>> instead of fixing up ext4_alloc_group_tables() we should really change
>>>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
>>>> flexbg size. Baokun?
>>> At the same time, if I understand the code right, as we can have
>>> flex_gd->resize_bg != flexbg_size after
>>> 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
>>> flex bg") and
>>> 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
>>> we should always refer to flex_gd->resize_bg value which means that
>>> ext4_alloc_group_tables() fix is needed too.
>>> Am I correct in my understanding?
>> Hi Alex,
> Hi Baokun,
>
>> These two are not exactly equivalent.
>>
>> The flex_gd->resize_bg is only used to determine how many block groups we
>> allocate memory to, i.e., the maximum number of block groups per resize.
>> And the flexbg_size is used to make some judgement on flexible block
>> groups, for example, the BUG_ON triggered in the issue is to make sure
>> src_group and last_group must be in the same flexible block group.
> Huge thanks for explaining this!
>
> Then I guess it's better if you send a patch with your fix.
> Feel free to add my Tested-by tag.
Okay, I'll send a patch later.
>
> Question to you and Jan. Do you guys think that it makes sense to try
> to create a minimal reproducer for this problem without Incus/LXD involved?
> (only e2fsprogs, lvm tools, etc)
>
> I guess this test can be put in the xfstests test suite, right?
>
> Kind regards,
> Alex
I think it makes sense, and it's good to have more use cases to look
around some corners. If you have an idea, let it go.

Regards, Baokun

Aleksandr Mikhalitsyn Sept. 26, 2024, 11:32 a.m. UTC | #9

On Thu, Sep 26, 2024 at 11:40 AM Baokun Li <libaokun1@huawei.com> wrote:
>
> On 2024/9/26 17:23, Aleksandr Mikhalitsyn wrote:
> > On Thu, Sep 26, 2024 at 10:50 AM Baokun Li <libaokun1@huawei.com> wrote:
> >> On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
> >>> On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@suse.cz> wrote:
> >>>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
> >>>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
> >>>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
> >>>>> [   33.888740] ------------[ cut here ]------------
> >>>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
> >>>> Ah, I was staring at this for a while before I understood what's going on
> >>>> (it would be great to explain this in the changelog BTW).  As far as I
> >>>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
> >>>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
> >>>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
> >>>> - 1) which then confuses things. I think that was not really intended and
> >>> Hi Jan,
> >>>
> >>> First of all, thanks for your reaction/review on this one ;-)
> >>>
> >>> You are absolutely right, have just checked with our reproducer and
> >>> this modification:
> >>>
> >>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> >>> index e04eb08b9060..530a918f0cab 100644
> >>> --- a/fs/ext4/resize.c
> >>> +++ b/fs/ext4/resize.c
> >>> @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data
> >>> *alloc_flex_gd(unsigned int flexbg_size,
> >>>                   flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
> >>>                                                 fls(n_group - last_group));
> >>>
> >>> +       BUG_ON(flex_gd->resize_bg > flexbg_size);
> >>> +
> >>>           flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
> >>>                                           sizeof(struct ext4_new_group_data),
> >>>                                           GFP_NOFS);
> >>>
> >>> and yes, it crashes on this BUG_ON. So it looks like instead of making
> >>> flex_gd->resize_bg to be smaller
> >>> than flexbg_size in most cases we can actually have an opposite effect
> >>> here. I guess we really need to fix alloc_flex_gd() too.
> >>>
> >>>> instead of fixing up ext4_alloc_group_tables() we should really change
> >>>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
> >>>> flexbg size. Baokun?
> >>> At the same time, if I understand the code right, as we can have
> >>> flex_gd->resize_bg != flexbg_size after
> >>> 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized
> >>> flex bg") and
> >>> 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
> >>> we should always refer to flex_gd->resize_bg value which means that
> >>> ext4_alloc_group_tables() fix is needed too.
> >>> Am I correct in my understanding?
> >> Hi Alex,
> > Hi Baokun,
> >
> >> These two are not exactly equivalent.
> >>
> >> The flex_gd->resize_bg is only used to determine how many block groups we
> >> allocate memory to, i.e., the maximum number of block groups per resize.
> >> And the flexbg_size is used to make some judgement on flexible block
> >> groups, for example, the BUG_ON triggered in the issue is to make sure
> >> src_group and last_group must be in the same flexible block group.
> > Huge thanks for explaining this!
> >
> > Then I guess it's better if you send a patch with your fix.
> > Feel free to add my Tested-by tag.
> Okay, I'll send a patch later.
> >
> > Question to you and Jan. Do you guys think that it makes sense to try
> > to create a minimal reproducer for this problem without Incus/LXD involved?
> > (only e2fsprogs, lvm tools, etc)
> >
> > I guess this test can be put in the xfstests test suite, right?
> >
> > Kind regards,
> > Alex
> I think it makes sense, and it's good to have more use cases to look
> around some corners. If you have an idea, let it go.

Minimal reproducer:

mkdir -p /tmp/ext4_crash/mnt
EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
rm -f $EXT4_CRASH_IMG
truncate $EXT4_CRASH_IMG --size 25MiB
EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
$EXT4_CRASH_IMG)
mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
truncate $EXT4_CRASH_IMG --size 3GiB
losetup -c $EXT4_CRASH_DEV
resize2fs $EXT4_CRASH_DEV

>
> Regards, Baokun
>

Baokun Li Sept. 26, 2024, 1:58 p.m. UTC | #10

On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
>>> Question to you and Jan. Do you guys think that it makes sense to try
>>> to create a minimal reproducer for this problem without Incus/LXD involved?
>>> (only e2fsprogs, lvm tools, etc)
>>>
>>> I guess this test can be put in the xfstests test suite, right?
>>>
>>> Kind regards,
>>> Alex
>> I think it makes sense, and it's good to have more use cases to look
>> around some corners. If you have an idea, let it go.
> Minimal reproducer:
>
> mkdir -p /tmp/ext4_crash/mnt
> EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
> rm -f $EXT4_CRASH_IMG
> truncate $EXT4_CRASH_IMG --size 25MiB
> EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
> $EXT4_CRASH_IMG)
> mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
> mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
> truncate $EXT4_CRASH_IMG --size 3GiB
> losetup -c $EXT4_CRASH_DEV
> resize2fs $EXT4_CRASH_DEV
>
Hi Alex,

This replicator didn't replicate the issue in my VM, so I took a deeper
look. The reproduction of the problem requires the following:

o_group = flexbg_size * 2 * n;
o_size = (o_group + 1) * group_size;
n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)

Take n=1,flexbg_size=16 as an example:
                                                  last:47
|----------------|----------------|o---------------|--------------n-|
                                   old:32 >>>           new:62

Thus the replicator can be simplified as:

img=test.img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 264M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 504M

Aleksandr Mikhalitsyn Sept. 26, 2024, 2:19 p.m. UTC | #11

On Thu, Sep 26, 2024 at 3:58 PM Baokun Li <libaokun1@huawei.com> wrote:
>
> On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
> >>> Question to you and Jan. Do you guys think that it makes sense to try
> >>> to create a minimal reproducer for this problem without Incus/LXD involved?
> >>> (only e2fsprogs, lvm tools, etc)
> >>>
> >>> I guess this test can be put in the xfstests test suite, right?
> >>>
> >>> Kind regards,
> >>> Alex
> >> I think it makes sense, and it's good to have more use cases to look
> >> around some corners. If you have an idea, let it go.
> > Minimal reproducer:
> >
> > mkdir -p /tmp/ext4_crash/mnt
> > EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
> > rm -f $EXT4_CRASH_IMG
> > truncate $EXT4_CRASH_IMG --size 25MiB
> > EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
> > $EXT4_CRASH_IMG)
> > mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
> > mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
> > truncate $EXT4_CRASH_IMG --size 3GiB
> > losetup -c $EXT4_CRASH_DEV
> > resize2fs $EXT4_CRASH_DEV
> >
> Hi Alex,
>
> This replicator didn't replicate the issue in my VM, so I took a deeper
> look. The reproduction of the problem requires the following:

That's weird. Have just tried once again and it reproduces the issue:

root@ubuntu:/home/ubuntu# mkdir -p /tmp/ext4_crash/mnt
EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
rm -f $EXT4_CRASH_IMG
truncate $EXT4_CRASH_IMG --size 25MiB
EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
$EXT4_CRASH_IMG)
mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
truncate $EXT4_CRASH_IMG --size 3GiB
losetup -c $EXT4_CRASH_DEV
resize2fs $EXT4_CRASH_DEV
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 6400 4k blocks and 6400 inodes

Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

resize2fs 1.47.0 (5-Feb-2023)
Filesystem at /dev/loop4 is mounted on /tmp/ext4_crash/mnt; on-line
resizing required
old_desc_blocks = 1, new_desc_blocks = 1
Segmentation fault

My kernel's commit hash is 684a64bf32b6e488004e0ad7f0d7e922798f65b6

Maybe it somehow depends on the resize2fs version?

Kind regards,
Alex

>
> o_group = flexbg_size * 2 * n;
> o_size = (o_group + 1) * group_size;
> n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
>
> Take n=1,flexbg_size=16 as an example:
>                                                   last:47
> |----------------|----------------|o---------------|--------------n-|
>                                    old:32 >>>           new:62
>
> Thus the replicator can be simplified as:
>
> img=test.img
> truncate -s 600M $img
> mkfs.ext4 -F $img -b 1024 -G 16 264M
> dev=`losetup -f --show $img`
> mkdir -p /tmp/test
> mount $dev /tmp/test
> resize2fs $dev 504M
>
>
> --
> Cheers,
> Baokun
>

Baokun Li Sept. 26, 2024, 2:35 p.m. UTC | #12

On 2024/9/26 22:19, Aleksandr Mikhalitsyn wrote:
> On Thu, Sep 26, 2024 at 3:58 PM Baokun Li <libaokun1@huawei.com> wrote:
>> On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
>>>>> Question to you and Jan. Do you guys think that it makes sense to try
>>>>> to create a minimal reproducer for this problem without Incus/LXD involved?
>>>>> (only e2fsprogs, lvm tools, etc)
>>>>>
>>>>> I guess this test can be put in the xfstests test suite, right?
>>>>>
>>>>> Kind regards,
>>>>> Alex
>>>> I think it makes sense, and it's good to have more use cases to look
>>>> around some corners. If you have an idea, let it go.
>>> Minimal reproducer:
>>>
>>> mkdir -p /tmp/ext4_crash/mnt
>>> EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
>>> rm -f $EXT4_CRASH_IMG
>>> truncate $EXT4_CRASH_IMG --size 25MiB
>>> EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
>>> $EXT4_CRASH_IMG)
>>> mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
>>> mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
>>> truncate $EXT4_CRASH_IMG --size 3GiB
>>> losetup -c $EXT4_CRASH_DEV
>>> resize2fs $EXT4_CRASH_DEV
>>>
>> Hi Alex,
>>
>> This replicator didn't replicate the issue in my VM, so I took a deeper
>> look. The reproduction of the problem requires the following:
> That's weird. Have just tried once again and it reproduces the issue:
>
> root@ubuntu:/home/ubuntu# mkdir -p /tmp/ext4_crash/mnt
> EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img"
> rm -f $EXT4_CRASH_IMG
> truncate $EXT4_CRASH_IMG --size 25MiB
> EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show
> $EXT4_CRASH_IMG)
> mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV
> mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt
> truncate $EXT4_CRASH_IMG --size 3GiB
> losetup -c $EXT4_CRASH_DEV
> resize2fs $EXT4_CRASH_DEV
> mke2fs 1.47.0 (5-Feb-2023)
> Creating filesystem with 6400 4k blocks and 6400 inodes
>
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (1024 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> resize2fs 1.47.0 (5-Feb-2023)
> Filesystem at /dev/loop4 is mounted on /tmp/ext4_crash/mnt; on-line
> resizing required
> old_desc_blocks = 1, new_desc_blocks = 1
I can see why, on my side I mkfsed a 25M sized disk, and the ext4
block size is 1024 by default, whereas on your side it's 4096.
I set the block size to 4096 and it also reproduced the issue.

Thanks for your feedback!


Cheers,
Baokun

Eric Sandeen Sept. 26, 2024, 4:04 p.m. UTC | #13

On 9/26/24 3:28 AM, Baokun Li wrote:
> On 2024/9/25 23:57, Jan Kara wrote:
>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>>> [   33.888740] ------------[ cut here ]------------
>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>> Ah, I was staring at this for a while before I understood what's going on
>> (it would be great to explain this in the changelog BTW).  As far as I
>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
>> - 1) which then confuses things. I think that was not really intended and
>> instead of fixing up ext4_alloc_group_tables() we should really change
>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
>> flexbg size. Baokun?
>>
>>                                 Honza
> 
> Hi Honza,
> 
> Your analysis is absolutely correct. It's a bug!
> Thank you for locating this issue！
> An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
> 
> 
> Hi Aleksandr,
> 
> Could you help test if the following changes work?
> 

I just got an internal bug report for this as well, and I can also confirm that
the patch fixes my testcase, thanks! Feel free to add:

Tested-by: Eric Sandeen <sandeen@redhat.com>

I had been trying to debug this and it felt like an off by one but I didn't get
to a solution in time. ;) 

Can you explain what the 2 cases under

/* Avoid allocating large 'groups' array if not needed */

are doing? I *think* the first 'if' is trying not to over-allocate for the last
batch of block groups that get added during a resize. What is the "else if" case
doing?

Thanks,
-Eric

> Thanks,
> Baokun
> 
> ---
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index e04eb08b9060..1f01a7632149 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size,
>         /* Avoid allocating large 'groups' array if not needed */
>         last_group = o_group | (flex_gd->resize_bg - 1);
>         if (n_group <= last_group)
> -               flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
> +               flex_gd->resize_bg = 1 << fls(n_group - o_group);
>         else if (n_group - last_group < flex_gd->resize_bg)
> -               flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
> +               flex_gd->resize_bg = 1 << max(fls(last_group - o_group),
>                                               fls(n_group - last_group));
> 
>         flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
>                                         sizeof(struct ext4_new_group_data),
> 
>

Eric Sandeen Sept. 26, 2024, 4:29 p.m. UTC | #14

On 9/26/24 11:04 AM, Eric Sandeen wrote:

> Can you explain what the 2 cases under
> 
> /* Avoid allocating large 'groups' array if not needed */
> 
> are doing? I *think* the first 'if' is trying not to over-allocate for the last
> batch of block groups that get added during a resize. What is the "else if" case
> doing?

(or maybe I had that backwards)

Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned
into CVE-2023-52622, so it's quite likely that distros have backported the broken
commit as part of the CVE game.

So the followup fix looks a bit urgent to me.

-Eric

Baokun Li Sept. 27, 2024, 1:43 a.m. UTC | #15

On 2024/9/27 0:04, Eric Sandeen wrote:
> On 9/26/24 3:28 AM, Baokun Li wrote:
>> On 2024/9/25 23:57, Jan Kara wrote:
>>> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
>>>> [   33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none.
>>>> [   33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
>>>> [   33.888740] ------------[ cut here ]------------
>>>> [   33.888742] kernel BUG at fs/ext4/resize.c:324!
>>> Ah, I was staring at this for a while before I understood what's going on
>>> (it would be great to explain this in the changelog BTW).  As far as I
>>> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation
>>> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than
>>> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
>>> - 1) which then confuses things. I think that was not really intended and
>>> instead of fixing up ext4_alloc_group_tables() we should really change
>>> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds
>>> flexbg size. Baokun?
>>>
>>>                                  Honza
>> Hi Honza,
>>
>> Your analysis is absolutely correct. It's a bug!
>> Thank you for locating this issue！
>> An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
>>
>>
>> Hi Aleksandr,
>>
>> Could you help test if the following changes work?
>>
> I just got an internal bug report for this as well, and I can also confirm that
> the patch fixes my testcase, thanks! Feel free to add:
>
> Tested-by: Eric Sandeen <sandeen@redhat.com>
>
> I had been trying to debug this and it felt like an off by one but I didn't get
> to a solution in time. ;)
>
> Can you explain what the 2 cases under
>
> /* Avoid allocating large 'groups' array if not needed */
>
> are doing? I *think* the first 'if' is trying not to over-allocate for the last
> batch of block groups that get added during a resize. What is the "else if" case
> doing?
>
> Thanks,
> -Eric
>
The ext4 online resize started out by allocating the memory needed
for an entire flexible block group regardless of the number of block
groups to be added.

This led to a warning being triggered in the mm module when the
userland set a very large flexbg_size (perhaps they wanted to put
all the group metadata in the head of the disk), allocating more
memory than the MAX_ORDER size. Commit 5d1935ac02ca5a
("ext4: avoid online resizing failures due to oversized flex bg") fixes
this problem with resize_bg.

Normally, when online resizing, we may not need to allocate the
memory required for an entire flexible block group. So if o_group
and n_group are in the same flexible block group or if o_group
and n_group are in neighbouring flexible block groups, we
check if less memory can be allocated. This is what was done in
commit 665d3e0af4d3 ("ext4: reduce unnecessary memory
allocation in alloc_flex_gd()").

Regards,
Baokun

Baokun Li Sept. 27, 2024, 1:51 a.m. UTC | #16

On 2024/9/27 0:29, Eric Sandeen wrote:
> On 9/26/24 11:04 AM, Eric Sandeen wrote:
>
>   
>> Can you explain what the 2 cases under
>>
>> /* Avoid allocating large 'groups' array if not needed */
>>
>> are doing? I *think* the first 'if' is trying not to over-allocate for the last
>> batch of block groups that get added during a resize. What is the "else if" case
>> doing?
> (or maybe I had that backwards)
>
> Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned
> into CVE-2023-52622, so it's quite likely that distros have backported the broken
> commit as part of the CVE game.
The commit to fix CVE-2023-52622 is commit 5d1935ac02ca5a
("ext4: avoid online resizing failures due to oversized flex bg").
This commit does not address the off by one issue above.
>
> So the followup fix looks a bit urgent to me.
>
> -Eric
Okay, I'll send out the fix patch today.

Regards,
Baokun
.

Eric Sandeen Sept. 27, 2024, 2:39 a.m. UTC | #17

On 9/26/24 8:51 PM, Baokun Li wrote:
> On 2024/9/27 0:29, Eric Sandeen wrote:
>> On 9/26/24 11:04 AM, Eric Sandeen wrote:
>>
>>  
>>> Can you explain what the 2 cases under
>>>
>>> /* Avoid allocating large 'groups' array if not needed */
>>>
>>> are doing? I *think* the first 'if' is trying not to over-allocate for the last
>>> batch of block groups that get added during a resize. What is the "else if" case
>>> doing?
>> (or maybe I had that backwards)
>>
>> Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned
>> into CVE-2023-52622, so it's quite likely that distros have backported the broken
>> commit as part of the CVE game.
> The commit to fix CVE-2023-52622 is commit 5d1935ac02ca5a
> ("ext4: avoid online resizing failures due to oversized flex bg").

I'm sorry - you're right. 665d3e0af4d35ac was part of the original
series that included 5d1935ac02ca5a, but it was not the fix.

> This commit does not address the off by one issue above.

Agreed.

>>
>> So the followup fix looks a bit urgent to me.
>>
>> -Eric
> Okay, I'll send out the fix patch today.

thanks :)

-Eric

> 
> Regards,
> Baokun
> .
>

[1/1] ext4: fix crash on BUG_ON in ext4_alloc_group_tables

Commit Message

Comments

Patch