diff mbox series

btrfs: do not ASSERT() if the newly created subvolume already got read

Message ID 6a80cb4b32af89787dadee728310e5e2ca85343f.1705741883.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: do not ASSERT() if the newly created subvolume already got read | expand

Commit Message

Qu Wenruo Jan. 20, 2024, 9:11 a.m. UTC
[BUG]
There is a syzbot crash, triggered by the ASSERT() during subvolume
creation:

 assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/disk-io.c:1319!
 invalid opcode: 0000 [#1] PREEMPT SMP KASAN
 RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60
  <TASK>
  btrfs_get_new_fs_root+0xd3/0xf0
  create_subvol+0xd02/0x1650
  btrfs_mksubvol+0xe95/0x12b0
  __btrfs_ioctl_snap_create+0x2f9/0x4f0
  btrfs_ioctl_snap_create+0x16b/0x200
  btrfs_ioctl+0x35f0/0x5cf0
  __x64_sys_ioctl+0x19d/0x210
  do_syscall_64+0x3f/0xe0
  entry_SYSCALL_64_after_hwframe+0x63/0x6b
 ---[ end trace 0000000000000000 ]---

[CAUSE]
During create_subvol(), after inserting root item for the newly created
subvolume, we would trigger btrfs_get_new_fs_root() to get the
btrfs_root of that subvolume.

The idea here is, we have preallocated an anonymous device number for
the subvolume, thus we can assign it to the new subvolume.

But there is really nothing preventing things like backref walk to read
the new subvolume.
If that happens before we call btrfs_get_new_fs_root(), the subvolume
would be read out, with a new anonymous device number assigned already.

In that case, we would trigger ASSERT(), as we really expect no one to
read out that subvolume (which is not yet accessible from the fs).
But things like backref walk is still possible to trigger the read on
the subvolume.

Thus our assumption on the ASSERT() is not correct in the first place.

[FIX]
Fix it by removing the ASSERT(), and just free the @anon_dev, reset it
to 0, and continue.

If the subvolume tree is read out by something else, it should have
already get a new anon_dev assigned thus we only need to free the
preallocated one.

Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Filipe Manana Jan. 22, 2024, 11:02 a.m. UTC | #1
On Sat, Jan 20, 2024 at 9:12 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There is a syzbot crash, triggered by the ASSERT() during subvolume
> creation:
>
>  assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319
>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/disk-io.c:1319!
>  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>  RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60
>   <TASK>
>   btrfs_get_new_fs_root+0xd3/0xf0
>   create_subvol+0xd02/0x1650
>   btrfs_mksubvol+0xe95/0x12b0
>   __btrfs_ioctl_snap_create+0x2f9/0x4f0
>   btrfs_ioctl_snap_create+0x16b/0x200
>   btrfs_ioctl+0x35f0/0x5cf0
>   __x64_sys_ioctl+0x19d/0x210
>   do_syscall_64+0x3f/0xe0
>   entry_SYSCALL_64_after_hwframe+0x63/0x6b
>  ---[ end trace 0000000000000000 ]---
>
> [CAUSE]
> During create_subvol(), after inserting root item for the newly created
> subvolume, we would trigger btrfs_get_new_fs_root() to get the
> btrfs_root of that subvolume.
>
> The idea here is, we have preallocated an anonymous device number for
> the subvolume, thus we can assign it to the new subvolume.
>
> But there is really nothing preventing things like backref walk to read
> the new subvolume.
> If that happens before we call btrfs_get_new_fs_root(), the subvolume
> would be read out, with a new anonymous device number assigned already.
>
> In that case, we would trigger ASSERT(), as we really expect no one to
> read out that subvolume (which is not yet accessible from the fs).
> But things like backref walk is still possible to trigger the read on
> the subvolume.
>
> Thus our assumption on the ASSERT() is not correct in the first place.
>
> [FIX]
> Fix it by removing the ASSERT(), and just free the @anon_dev, reset it
> to 0, and continue.
>
> If the subvolume tree is read out by something else, it should have
> already get a new anon_dev assigned thus we only need to free the
> preallocated one.
>
> Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
> Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 57be7dd44da5..5d5faa844408 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1336,8 +1336,17 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>  again:
>         root = btrfs_lookup_fs_root(fs_info, objectid);
>         if (root) {
> -               /* Shouldn't get preallocated anon_dev for cached roots */
> -               ASSERT(!anon_dev);
> +               /*
> +                * Some other caller may have read out the newly inserted
> +                * subvolume already. (For things like backref walk etc).

Btw, whole sentences are not put inside parenthesis.
Should be something like:

... subvolume already (for things like backref walking, etc).

Apart from that:

Reviewed-by: Filipe Manana <fdmanana@suse.com>


> +                * Not that common but still possible.
> +                * In that case, we just need to free the anon_dev.
> +                */
> +               if (unlikely(anon_dev)) {
> +                       free_anon_bdev(anon_dev);
> +                       anon_dev = 0;
> +               }
> +
>                 if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
>                         btrfs_put_root(root);
>                         return ERR_PTR(-ENOENT);
> --
> 2.43.0
>
>
David Sterba Jan. 23, 2024, 8:03 p.m. UTC | #2
On Sat, Jan 20, 2024 at 07:41:28PM +1030, Qu Wenruo wrote:
> [BUG]
> There is a syzbot crash, triggered by the ASSERT() during subvolume
> creation:
> 
>  assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319
>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/disk-io.c:1319!
>  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>  RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60
>   <TASK>
>   btrfs_get_new_fs_root+0xd3/0xf0
>   create_subvol+0xd02/0x1650
>   btrfs_mksubvol+0xe95/0x12b0
>   __btrfs_ioctl_snap_create+0x2f9/0x4f0
>   btrfs_ioctl_snap_create+0x16b/0x200
>   btrfs_ioctl+0x35f0/0x5cf0
>   __x64_sys_ioctl+0x19d/0x210
>   do_syscall_64+0x3f/0xe0
>   entry_SYSCALL_64_after_hwframe+0x63/0x6b
>  ---[ end trace 0000000000000000 ]---
> 
> [CAUSE]
> During create_subvol(), after inserting root item for the newly created
> subvolume, we would trigger btrfs_get_new_fs_root() to get the
> btrfs_root of that subvolume.
> 
> The idea here is, we have preallocated an anonymous device number for
> the subvolume, thus we can assign it to the new subvolume.
> 
> But there is really nothing preventing things like backref walk to read
> the new subvolume.
> If that happens before we call btrfs_get_new_fs_root(), the subvolume
> would be read out, with a new anonymous device number assigned already.
> 
> In that case, we would trigger ASSERT(), as we really expect no one to
> read out that subvolume (which is not yet accessible from the fs).
> But things like backref walk is still possible to trigger the read on
> the subvolume.
> 
> Thus our assumption on the ASSERT() is not correct in the first place.
> 
> [FIX]
> Fix it by removing the ASSERT(), and just free the @anon_dev, reset it
> to 0, and continue.
> 
> If the subvolume tree is read out by something else, it should have
> already get a new anon_dev assigned thus we only need to free the
> preallocated one.
> 
> Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
> Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 57be7dd44da5..5d5faa844408 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1336,8 +1336,17 @@  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 again:
 	root = btrfs_lookup_fs_root(fs_info, objectid);
 	if (root) {
-		/* Shouldn't get preallocated anon_dev for cached roots */
-		ASSERT(!anon_dev);
+		/*
+		 * Some other caller may have read out the newly inserted
+		 * subvolume already. (For things like backref walk etc).
+		 * Not that common but still possible.
+		 * In that case, we just need to free the anon_dev.
+		 */
+		if (unlikely(anon_dev)) {
+			free_anon_bdev(anon_dev);
+			anon_dev = 0;
+		}
+
 		if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
 			btrfs_put_root(root);
 			return ERR_PTR(-ENOENT);