diff mbox

Btrfs: place ordered operations on a per transaction list

Message ID 51220EE1.6090607@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Miao Xie Feb. 18, 2013, 11:22 a.m. UTC
On wed, 13 Feb 2013 11:13:22 -0500, Josef Bacik wrote:
> Miao made the ordered operations stuff run async, which introduced a
> deadlock where we could get somebody (sync) racing in and committing the
> transaction while a commit was already happening.  The new committer would
> try and flush ordered operations which would hang waiting for the commit to
> finish because it is done asynchronously and no longer inherits the callers
> trans handle.  To fix this we need to make the ordered operations list a per
> transaction list.  We can get new inodes added to the ordered operation list
> by truncating them and then having another process writing to them, so this
> makes it so that anybody trying to add an ordered operation _must_ start a
> transaction in order to add itself to the list, which will keep new inodes
> from getting added to the ordered operations list after we start committing.
> This should fix the deadlock and also keeps us from doing a lot more work
> than we need to during commit.  Thanks,

Firstly, thanks to deal with the bug which was introduced by my patch.

But comparing with this fix method, I prefer the following one because:
- we won't worry the similar problem if we add more work during commit
  in the future.
- it is unnecessary to get a new handle and commit it if the transaction
  is under the commit.

Thanks
Miao

Comments

Josef Bacik Feb. 18, 2013, 3:33 p.m. UTC | #1
On Mon, Feb 18, 2013 at 04:22:09AM -0700, Miao Xie wrote:
> On wed, 13 Feb 2013 11:13:22 -0500, Josef Bacik wrote:
> > Miao made the ordered operations stuff run async, which introduced a
> > deadlock where we could get somebody (sync) racing in and committing the
> > transaction while a commit was already happening.  The new committer would
> > try and flush ordered operations which would hang waiting for the commit to
> > finish because it is done asynchronously and no longer inherits the callers
> > trans handle.  To fix this we need to make the ordered operations list a per
> > transaction list.  We can get new inodes added to the ordered operation list
> > by truncating them and then having another process writing to them, so this
> > makes it so that anybody trying to add an ordered operation _must_ start a
> > transaction in order to add itself to the list, which will keep new inodes
> > from getting added to the ordered operations list after we start committing.
> > This should fix the deadlock and also keeps us from doing a lot more work
> > than we need to during commit.  Thanks,
> 
> Firstly, thanks to deal with the bug which was introduced by my patch.
> 
> But comparing with this fix method, I prefer the following one because:
> - we won't worry the similar problem if we add more work during commit
>   in the future.
> - it is unnecessary to get a new handle and commit it if the transaction
>   is under the commit.

Mine has the benefit of not making a committing transaction flush more stuff
that it doesn't need to, so I think I'll keep mine as well, but I agree yours is
good for the attach case as well.  So can you send this along properly with a
signed off and such and we can have our cake and eat it too.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index fc03aa6..c449cb5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -277,7 +277,8 @@  static void wait_current_trans(struct btrfs_root *root)
 	}
 }
 
-static int may_wait_transaction(struct btrfs_root *root, int type)
+static int may_wait_transaction(struct btrfs_root *root, int type,
+				bool is_joined)
 {
 	if (root->fs_info->log_root_recovering)
 		return 0;
@@ -285,6 +286,14 @@  static int may_wait_transaction(struct btrfs_root *root, int type)
 	if (type == TRANS_USERSPACE)
 		return 1;
 
+	/*
+	 * If we are ATTACH, it means we just want to catch the current
+	 * transaction and commit it. So if someone is committing the
+	 * current transaction now, it is very glad to wait it.
+	 */
+	if (is_joined && type == TRANS_ATTACH)
+		return 1;
+
 	if (type == TRANS_START &&
 	    !atomic_read(&root->fs_info->open_ioctl_trans))
 		return 1;
@@ -355,7 +364,7 @@  again:
 	if (type < TRANS_JOIN_NOLOCK)
 		sb_start_intwrite(root->fs_info->sb);
 
-	if (may_wait_transaction(root, type))
+	if (may_wait_transaction(root, type, false))
 		wait_current_trans(root);
 
 	do {
@@ -383,16 +392,26 @@  again:
 	h->block_rsv = NULL;
 	h->orig_rsv = NULL;
 	h->aborted = 0;
-	h->qgroup_reserved = qgroup_reserved;
+	h->qgroup_reserved = 0;
 	h->delayed_ref_elem.seq = 0;
 	h->type = type;
 	INIT_LIST_HEAD(&h->qgroup_ref_list);
 	INIT_LIST_HEAD(&h->new_bgs);
 
 	smp_mb();
-	if (cur_trans->blocked && may_wait_transaction(root, type)) {
-		btrfs_commit_transaction(h, root);
-		goto again;
+	if (cur_trans->blocked && may_wait_transaction(root, type, true)) {
+		if (cur_trans->in_commit) {
+			btrfs_end_transaction(h, root);
+			wait_current_trans(root);
+		} else {
+			btrfs_commit_transaction(h, root);
+		}
+		if (unlikely(type == TRANS_ATTACH)) {
+			ret = -ENOENT;
+			goto alloc_fail;
+		} else {
+			goto again;
+		}
 	}
 
 	if (num_bytes) {
@@ -401,6 +420,7 @@  again:
 		h->block_rsv = &root->fs_info->trans_block_rsv;
 		h->bytes_reserved = num_bytes;
 	}
+	h->qgroup_reserved = qgroup_reserved;
 
 got_it:
 	btrfs_record_root_in_trans(h, root);