diff mbox series

[RESEND] btrfs: set trans->drity in btrfs_commit_transaction

Message ID 20200117135751.42036-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] btrfs: set trans->drity in btrfs_commit_transaction | expand

Commit Message

Josef Bacik Jan. 17, 2020, 1:57 p.m. UTC
If we abort a transaction we have the following sequence

if (!trans->dirty && list_empty(&trans->new_bgs))
	return;
WRITE_ONCE(trans->transaction->aborted, err);

The idea being if we didn't modify anything with our trans handle then
we don't really need to abort the whole transaction, maybe the other
trans handles are fine and we can carry on.

However in the case of create_snapshot we add a pending_snapshot object
to our transaction and then commit the transaction.  We don't actually
modify anything.  sync() behaves the same way, attach to an existing
transaction and commit it.  This means that if we have an IO error in
the right places we could abort the committing transaction with our
trans->dirty being not set and thus not set transaction->aborted.

This is a problem because in the create_snapshot() case we depend on
pending->error being set to something, or btrfs_commit_transaction
returning an error.

If we are not the trans handle that gets to commit the transaction, and
we're waiting on the commit to happen we get our return value from
cur_trans->aborted.  If this was not set to anything because sync() hit
an error in the transaction commit before it could modify anything then
cur_trans->aborted would be 0.  Thus we'd return 0 from
btrfs_commit_transaction() in create_snapshot.

This is a problem because we then try to do things with
pending_snapshot->snap, which will be NULL because we didn't create the
snapshot, and then we'll get a NULL pointer dereference like the
following

"BUG: kernel NULL pointer dereference, address: 00000000000001f0"
RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330
Call Trace:
 ? btrfs_mksubvol.isra.31+0x3f2/0x510
 btrfs_mksubvol.isra.31+0x4bc/0x510
 ? __sb_start_write+0xfa/0x200
 ? mnt_want_write_file+0x24/0x50
 btrfs_ioctl_snap_create_transid+0x16c/0x1a0
 btrfs_ioctl_snap_create_v2+0x11e/0x1a0
 btrfs_ioctl+0x1534/0x2c10
 ? free_debug_processing+0x262/0x2a3
 do_vfs_ioctl+0xa6/0x6b0
 ? do_sys_open+0x188/0x220
 ? syscall_trace_enter+0x1f8/0x330
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4a/0x1b0

In order to fix this we need to make sure anybody who calls
commit_transaction has trans->dirty set so that they properly set the
trans->transaction->aborted value properly so any waiters know bad
things happened.

This was found while I was running generic/475 with my modified
fsstress, it reproduced within a few runs.  I ran with this patch all
night and didn't see the problem again.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
- Nothing has changed, simply rebased onto misc-next

 fs/btrfs/transaction.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

David Sterba Jan. 21, 2020, 1:26 p.m. UTC | #1
On Fri, Jan 17, 2020 at 08:57:51AM -0500, Josef Bacik wrote:
> If we abort a transaction we have the following sequence
> 
> if (!trans->dirty && list_empty(&trans->new_bgs))
> 	return;
> WRITE_ONCE(trans->transaction->aborted, err);
> 
> The idea being if we didn't modify anything with our trans handle then
> we don't really need to abort the whole transaction, maybe the other
> trans handles are fine and we can carry on.
> 
> However in the case of create_snapshot we add a pending_snapshot object
> to our transaction and then commit the transaction.  We don't actually
> modify anything.  sync() behaves the same way, attach to an existing
> transaction and commit it.  This means that if we have an IO error in
> the right places we could abort the committing transaction with our
> trans->dirty being not set and thus not set transaction->aborted.
> 
> This is a problem because in the create_snapshot() case we depend on
> pending->error being set to something, or btrfs_commit_transaction
> returning an error.
> 
> If we are not the trans handle that gets to commit the transaction, and
> we're waiting on the commit to happen we get our return value from
> cur_trans->aborted.  If this was not set to anything because sync() hit
> an error in the transaction commit before it could modify anything then
> cur_trans->aborted would be 0.  Thus we'd return 0 from
> btrfs_commit_transaction() in create_snapshot.
> 
> This is a problem because we then try to do things with
> pending_snapshot->snap, which will be NULL because we didn't create the
> snapshot, and then we'll get a NULL pointer dereference like the
> following
> 
> "BUG: kernel NULL pointer dereference, address: 00000000000001f0"
> RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330
> Call Trace:
>  ? btrfs_mksubvol.isra.31+0x3f2/0x510
>  btrfs_mksubvol.isra.31+0x4bc/0x510
>  ? __sb_start_write+0xfa/0x200
>  ? mnt_want_write_file+0x24/0x50
>  btrfs_ioctl_snap_create_transid+0x16c/0x1a0
>  btrfs_ioctl_snap_create_v2+0x11e/0x1a0
>  btrfs_ioctl+0x1534/0x2c10
>  ? free_debug_processing+0x262/0x2a3
>  do_vfs_ioctl+0xa6/0x6b0
>  ? do_sys_open+0x188/0x220
>  ? syscall_trace_enter+0x1f8/0x330
>  ksys_ioctl+0x60/0x90
>  __x64_sys_ioctl+0x16/0x20
>  do_syscall_64+0x4a/0x1b0
> 
> In order to fix this we need to make sure anybody who calls
> commit_transaction has trans->dirty set so that they properly set the
> trans->transaction->aborted value properly so any waiters know bad
> things happened.
> 
> This was found while I was running generic/475 with my modified
> fsstress, it reproduced within a few runs.  I ran with this patch all
> night and didn't see the problem again.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 27a535d4bb4f..a519083461c7 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2017,6 +2017,14 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	ASSERT(refcount_read(&trans->use_count) == 1);
 
+	/*
+	 * Some places just start a transaction to commit it.  We need to make
+	 * sure that if this commit fails that the abort code actually marks the
+	 * transaction as failed, so set trans->dirty to make the abort code do
+	 * the right thing.
+	 */
+	trans->dirty = true;
+
 	/* Stop the commit early if ->aborted is set */
 	if (unlikely(READ_ONCE(cur_trans->aborted))) {
 		ret = cur_trans->aborted;