diff mbox series

btrfs: fix double free of anonymous device after snapshot creation failure

Message ID 2b3c3fd6b5a6b9f9a7aa39cd343b233a11495bce.1708707655.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix double free of anonymous device after snapshot creation failure | expand

Commit Message

Filipe Manana Feb. 23, 2024, 5:03 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When creating a snapshot we may do a double free of an anonymous device
in case there's an error comitting the transaction. The second free may
result in freeing an anonymous device number that was allocated by some
other subsystem in the kernel or another btrfs filesystem.

The steps that lead to this:

1) At ioctl.c:create_snapshot() we allocate an anonymous device number
   and assign it to pending_snapshot->anon_dev;

2) Then we call btrfs_commit_transaction() and end up at
   transaction.c:create_pending_snapshot();

3) There we call btrfs_get_new_fs_root() and pass it the anonymous device
   number stored in pending_snapshot->anon_dev;

4) btrfs_get_new_fs_root() frees that anonymous device number because
   btrfs_lookup_fs_root() returned a root - someone else did a lookup
   of the new root already, which could some task doing backref walking;

5) After that some error happens in the transaction commit path, and at
   ioctl.c:create_snapshot() we jump to the 'fail' label, and after
   that we free again the same anonymous device number, which in the
   meanwhile may have been reallocated somewhere else, because
   pending_snapshot->anon_dev still has the same value as in step 1.

Recently syzbot ran into this and reported the following trace:

  ------------[ cut here ]------------
  ida_free called for id=51 which is not allocated.
  WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525
  Modules linked in:
  CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
  RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525
  Code: 10 42 80 3c 28 (...)
  RSP: 0018:ffffc90015a67300 EFLAGS: 00010246
  RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000
  RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000
  RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4
  R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246
  R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246
  FS:  00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346
   create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837
   create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931
   btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404
   create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848
   btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998
   btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044
   __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306
   btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393
   btrfs_ioctl+0xa74/0xd40
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:871 [inline]
   __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857
   do_syscall_64+0xfb/0x240
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7fca3e67dda9
  Code: 28 00 00 00 (...)
  RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9
  RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003
  RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658
   </TASK>

Where we get an explicit message where we attempt to free an anonymous
device number that is not currently allocated. It happens in a different
code path from the example below, at btrfs_get_root_ref(), so this change
may not fix the case triggered by syzbot.

To fix at least the code path from the example above, change
btrfs_get_root_ref() and its callers to receive a dev_t pointer argument
for the anonymous device number, so that in case it frees the number, it
also resets it to 0, so that up in the call chain we don't attempt to do
the double free.

Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/
Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c     | 22 +++++++++++-----------
 fs/btrfs/disk-io.h     |  2 +-
 fs/btrfs/ioctl.c       |  2 +-
 fs/btrfs/transaction.c |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

Comments

David Sterba Feb. 26, 2024, 8:26 p.m. UTC | #1
On Fri, Feb 23, 2024 at 05:03:19PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When creating a snapshot we may do a double free of an anonymous device
> in case there's an error comitting the transaction. The second free may
> result in freeing an anonymous device number that was allocated by some
> other subsystem in the kernel or another btrfs filesystem.
> 
> The steps that lead to this:
> 
> 1) At ioctl.c:create_snapshot() we allocate an anonymous device number
>    and assign it to pending_snapshot->anon_dev;
> 
> 2) Then we call btrfs_commit_transaction() and end up at
>    transaction.c:create_pending_snapshot();
> 
> 3) There we call btrfs_get_new_fs_root() and pass it the anonymous device
>    number stored in pending_snapshot->anon_dev;
> 
> 4) btrfs_get_new_fs_root() frees that anonymous device number because
>    btrfs_lookup_fs_root() returned a root - someone else did a lookup
>    of the new root already, which could some task doing backref walking;
> 
> 5) After that some error happens in the transaction commit path, and at
>    ioctl.c:create_snapshot() we jump to the 'fail' label, and after
>    that we free again the same anonymous device number, which in the
>    meanwhile may have been reallocated somewhere else, because
>    pending_snapshot->anon_dev still has the same value as in step 1.
> 
> Recently syzbot ran into this and reported the following trace:
> 
>   ------------[ cut here ]------------
>   ida_free called for id=51 which is not allocated.
>   WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525
>   Modules linked in:
>   CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
>   RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525
>   Code: 10 42 80 3c 28 (...)
>   RSP: 0018:ffffc90015a67300 EFLAGS: 00010246
>   RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000
>   RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000
>   RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4
>   R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246
>   R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246
>   FS:  00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0
>   Call Trace:
>    <TASK>
>    btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346
>    create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837
>    create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931
>    btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404
>    create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848
>    btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998
>    btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044
>    __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306
>    btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393
>    btrfs_ioctl+0xa74/0xd40
>    vfs_ioctl fs/ioctl.c:51 [inline]
>    __do_sys_ioctl fs/ioctl.c:871 [inline]
>    __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857
>    do_syscall_64+0xfb/0x240
>    entry_SYSCALL_64_after_hwframe+0x6f/0x77
>   RIP: 0033:0x7fca3e67dda9
>   Code: 28 00 00 00 (...)
>   RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>   RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9
>   RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003
>   RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000
>   R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>   R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658
>    </TASK>
> 
> Where we get an explicit message where we attempt to free an anonymous
> device number that is not currently allocated. It happens in a different
> code path from the example below, at btrfs_get_root_ref(), so this change
> may not fix the case triggered by syzbot.
> 
> To fix at least the code path from the example above, change
> btrfs_get_root_ref() and its callers to receive a dev_t pointer argument
> for the anonymous device number, so that in case it frees the number, it
> also resets it to 0, so that up in the call chain we don't attempt to do
> the double free.
> 
> Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/

The link is fine as one can go directly to the report, for the syzbot to
auto-close the report I think it's most reliable to add

Reported-by: syzbot+623a623cfed57f422be1@syzkaller.appspotmail.com

> Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read")

This has been backported up to 5.10 so we'll need

CC: stable@vger.kernel.org # 5.10+

> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>
Filipe Manana Feb. 26, 2024, 11:25 p.m. UTC | #2
On Mon, Feb 26, 2024 at 8:27 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Feb 23, 2024 at 05:03:19PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When creating a snapshot we may do a double free of an anonymous device
> > in case there's an error comitting the transaction. The second free may
> > result in freeing an anonymous device number that was allocated by some
> > other subsystem in the kernel or another btrfs filesystem.
> >
> > The steps that lead to this:
> >
> > 1) At ioctl.c:create_snapshot() we allocate an anonymous device number
> >    and assign it to pending_snapshot->anon_dev;
> >
> > 2) Then we call btrfs_commit_transaction() and end up at
> >    transaction.c:create_pending_snapshot();
> >
> > 3) There we call btrfs_get_new_fs_root() and pass it the anonymous device
> >    number stored in pending_snapshot->anon_dev;
> >
> > 4) btrfs_get_new_fs_root() frees that anonymous device number because
> >    btrfs_lookup_fs_root() returned a root - someone else did a lookup
> >    of the new root already, which could some task doing backref walking;
> >
> > 5) After that some error happens in the transaction commit path, and at
> >    ioctl.c:create_snapshot() we jump to the 'fail' label, and after
> >    that we free again the same anonymous device number, which in the
> >    meanwhile may have been reallocated somewhere else, because
> >    pending_snapshot->anon_dev still has the same value as in step 1.
> >
> > Recently syzbot ran into this and reported the following trace:
> >
> >   ------------[ cut here ]------------
> >   ida_free called for id=51 which is not allocated.
> >   WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525
> >   Modules linked in:
> >   CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0
> >   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
> >   RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525
> >   Code: 10 42 80 3c 28 (...)
> >   RSP: 0018:ffffc90015a67300 EFLAGS: 00010246
> >   RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000
> >   RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000
> >   RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4
> >   R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246
> >   R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246
> >   FS:  00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0
> >   Call Trace:
> >    <TASK>
> >    btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346
> >    create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837
> >    create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931
> >    btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404
> >    create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848
> >    btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998
> >    btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044
> >    __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306
> >    btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393
> >    btrfs_ioctl+0xa74/0xd40
> >    vfs_ioctl fs/ioctl.c:51 [inline]
> >    __do_sys_ioctl fs/ioctl.c:871 [inline]
> >    __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857
> >    do_syscall_64+0xfb/0x240
> >    entry_SYSCALL_64_after_hwframe+0x6f/0x77
> >   RIP: 0033:0x7fca3e67dda9
> >   Code: 28 00 00 00 (...)
> >   RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> >   RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9
> >   RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003
> >   RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000
> >   R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >   R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658
> >    </TASK>
> >
> > Where we get an explicit message where we attempt to free an anonymous
> > device number that is not currently allocated. It happens in a different
> > code path from the example below, at btrfs_get_root_ref(), so this change
> > may not fix the case triggered by syzbot.
> >
> > To fix at least the code path from the example above, change
> > btrfs_get_root_ref() and its callers to receive a dev_t pointer argument
> > for the anonymous device number, so that in case it frees the number, it
> > also resets it to 0, so that up in the call chain we don't attempt to do
> > the double free.
> >
> > Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/
>
> The link is fine as one can go directly to the report, for the syzbot to
> auto-close the report I think it's most reliable to add
>
> Reported-by: syzbot+623a623cfed57f422be1@syzkaller.appspotmail.com

So I omitted that intentionally because, as I say in the changelog, I
don't think it's the same scenario as reported by syzbot.
There we attempt the double free in a different place from the scenario I found.

Since I mentioned that case, which I believe might be different, I
left only the Link tag and not the Reported-by.

>
> > Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read")
>
> This has been backported up to 5.10 so we'll need
>
> CC: stable@vger.kernel.org # 5.10+
>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: David Sterba <dsterba@suse.com>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a2e45ed6ef14..3df5477d48a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1308,12 +1308,12 @@  void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
  *
  * @objectid:	root id
  * @anon_dev:	preallocated anonymous block device number for new roots,
- * 		pass 0 for new allocation.
+ *		pass NULL for a new allocation.
  * @check_ref:	whether to check root item references, If true, return -ENOENT
  *		for orphan roots
  */
 static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
-					     u64 objectid, dev_t anon_dev,
+					     u64 objectid, dev_t *anon_dev,
 					     bool check_ref)
 {
 	struct btrfs_root *root;
@@ -1343,9 +1343,9 @@  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 		 * that common but still possible.  In that case, we just need
 		 * to free the anon_dev.
 		 */
-		if (unlikely(anon_dev)) {
-			free_anon_bdev(anon_dev);
-			anon_dev = 0;
+		if (unlikely(anon_dev && *anon_dev)) {
+			free_anon_bdev(*anon_dev);
+			*anon_dev = 0;
 		}
 
 		if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
@@ -1367,7 +1367,7 @@  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 		goto fail;
 	}
 
-	ret = btrfs_init_fs_root(root, anon_dev);
+	ret = btrfs_init_fs_root(root, anon_dev ? *anon_dev : 0);
 	if (ret)
 		goto fail;
 
@@ -1403,7 +1403,7 @@  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 	 * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
 	 * and once again by our caller.
 	 */
-	if (anon_dev)
+	if (anon_dev && *anon_dev)
 		root->anon_dev = 0;
 	btrfs_put_root(root);
 	return ERR_PTR(ret);
@@ -1419,7 +1419,7 @@  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref)
 {
-	return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
+	return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
 }
 
 /*
@@ -1427,11 +1427,11 @@  struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
  * the anonymous block device id
  *
  * @objectid:	tree objectid
- * @anon_dev:	if zero, allocate a new anonymous block device or use the
- *		parameter value
+ * @anon_dev:	if NULL, allocate a new anonymous block device or use the
+ *		parameter value if not NULL
  */
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev)
+					 u64 objectid, dev_t *anon_dev)
 {
 	return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
 }
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 375f62ae3709..76eb53fe7a11 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -73,7 +73,7 @@  void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref);
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev);
+					 u64 objectid, dev_t *anon_dev);
 struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
 						 struct btrfs_path *path,
 						 u64 objectid);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8b80fbea1e72..29e2b8e23363 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -731,7 +731,7 @@  static noinline int create_subvol(struct mnt_idmap *idmap,
 	free_extent_buffer(leaf);
 	leaf = NULL;
 
-	new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
+	new_root = btrfs_get_new_fs_root(fs_info, objectid, &anon_dev);
 	if (IS_ERR(new_root)) {
 		ret = PTR_ERR(new_root);
 		btrfs_abort_transaction(trans, ret);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index ea080ec2f927..31ac5a04cc02 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1832,7 +1832,7 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	}
 
 	key.offset = (u64)-1;
-	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
+	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev);
 	if (IS_ERR(pending->snap)) {
 		ret = PTR_ERR(pending->snap);
 		pending->snap = NULL;