diff mbox series

[RFC,1/1] ocfs2: fix a deadlock caused by commit 6f1b228529ae49b0

Message ID 1642658834-18657-1-git-send-email-gautham.ananthakrishna@oracle.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/1] ocfs2: fix a deadlock caused by commit 6f1b228529ae49b0 | expand

Commit Message

Gautham Ananthakrishna Jan. 20, 2022, 6:07 a.m. UTC
commit 6f1b228529ae49b0f85ab89bcdb6c365df401558 caused a deadlock
which was uncovered by our internal tests. The deadlock is as foollows:

Task 1:
A1) jbd2_journal_commit_transaction()
A2)         spin_lock(&jh->b_state_lock);
A3)         __jbd2_journal_remove_checkpoint()
A4)                         jbd2_journal_put_journal_head()
A5)                                         jbd_lock_bh_journal_head()

Task 2:
B1) ocfs2_test_bg_bit_allocatable()
B2)         jbd_lock_bh_journal_head()
B3)         spin_lock(&jh->b_state_lock);

A1->A2->A3->A4->B1->B2->B3(blocked)->A5(blocked)

Now cause process 2 has the jbd lock, it doesn’t let process 1 to continue after A5.
Process 2 now is waiting for b_state_lock to be released by process 1.

This patch resolves the deadlock.

Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
---
 fs/ocfs2/suballoc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Joseph Qi Jan. 20, 2022, 8:06 a.m. UTC | #1
On 1/20/22 2:07 PM, Gautham Ananthakrishna wrote:
> commit 6f1b228529ae49b0f85ab89bcdb6c365df401558 caused a deadlock
> which was uncovered by our internal tests. The deadlock is as foollows:

typo, s/foollows/follows
> 
> Task 1:
> A1) jbd2_journal_commit_transaction()
> A2)         spin_lock(&jh->b_state_lock);
> A3)         __jbd2_journal_remove_checkpoint()
> A4)                         jbd2_journal_put_journal_head()
> A5)                                         jbd_lock_bh_journal_head()
> 
> Task 2:
> B1) ocfs2_test_bg_bit_allocatable()
> B2)         jbd_lock_bh_journal_head()
> B3)         spin_lock(&jh->b_state_lock);
> 
> A1->A2->A3->A4->B1->B2->B3(blocked)->A5(blocked)
> 
> Now cause process 2 has the jbd lock, it doesn’t let process 1 to continue after A5.
> Process 2 now is waiting for b_state_lock to be released by process 1.
> 
> This patch resolves the deadlock.
> 
> Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
> ---
>  fs/ocfs2/suballoc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 481017e..fd995870 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -1259,7 +1259,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
>  	if (!buffer_jbd(bg_bh))
>  		return 1;
>  
> -	jbd_lock_bh_journal_head(bg_bh);
> +	jbd2_journal_grab_journal_head(bg_bh);

Seems we have to check the returned 'jh' here.
If NULL, we can finish the bit test earlier, else we can use this 'jh' directly.
And the following check without bh lock is unsafe.

Thanks,
Joseph

>  	if (buffer_jbd(bg_bh)) {
>  		jh = bh2jh(bg_bh);
>  		spin_lock(&jh->b_state_lock);
> @@ -1270,7 +1270,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
>  			ret = 1;
>  		spin_unlock(&jh->b_state_lock);
>  	}
> -	jbd_unlock_bh_journal_head(bg_bh);
> +	jbd2_journal_put_journal_head(bg_bh);
>  
>  	return ret;
>  }
diff mbox series

Patch

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 481017e..fd995870 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1259,7 +1259,7 @@  static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
 	if (!buffer_jbd(bg_bh))
 		return 1;
 
-	jbd_lock_bh_journal_head(bg_bh);
+	jbd2_journal_grab_journal_head(bg_bh);
 	if (buffer_jbd(bg_bh)) {
 		jh = bh2jh(bg_bh);
 		spin_lock(&jh->b_state_lock);
@@ -1270,7 +1270,7 @@  static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
 			ret = 1;
 		spin_unlock(&jh->b_state_lock);
 	}
-	jbd_unlock_bh_journal_head(bg_bh);
+	jbd2_journal_put_journal_head(bg_bh);
 
 	return ret;
 }