Message ID | 1642658834-18657-1-git-send-email-gautham.ananthakrishna@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/1] ocfs2: fix a deadlock caused by commit 6f1b228529ae49b0 | expand |
On 1/20/22 2:07 PM, Gautham Ananthakrishna wrote: > commit 6f1b228529ae49b0f85ab89bcdb6c365df401558 caused a deadlock > which was uncovered by our internal tests. The deadlock is as foollows: typo, s/foollows/follows > > Task 1: > A1) jbd2_journal_commit_transaction() > A2) spin_lock(&jh->b_state_lock); > A3) __jbd2_journal_remove_checkpoint() > A4) jbd2_journal_put_journal_head() > A5) jbd_lock_bh_journal_head() > > Task 2: > B1) ocfs2_test_bg_bit_allocatable() > B2) jbd_lock_bh_journal_head() > B3) spin_lock(&jh->b_state_lock); > > A1->A2->A3->A4->B1->B2->B3(blocked)->A5(blocked) > > Now cause process 2 has the jbd lock, it doesn’t let process 1 to continue after A5. > Process 2 now is waiting for b_state_lock to be released by process 1. > > This patch resolves the deadlock. > > Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> > --- > fs/ocfs2/suballoc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c > index 481017e..fd995870 100644 > --- a/fs/ocfs2/suballoc.c > +++ b/fs/ocfs2/suballoc.c > @@ -1259,7 +1259,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh, > if (!buffer_jbd(bg_bh)) > return 1; > > - jbd_lock_bh_journal_head(bg_bh); > + jbd2_journal_grab_journal_head(bg_bh); Seems we have to check the returned 'jh' here. If NULL, we can finish the bit test earlier, else we can use this 'jh' directly. And the following check without bh lock is unsafe. Thanks, Joseph > if (buffer_jbd(bg_bh)) { > jh = bh2jh(bg_bh); > spin_lock(&jh->b_state_lock); > @@ -1270,7 +1270,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh, > ret = 1; > spin_unlock(&jh->b_state_lock); > } > - jbd_unlock_bh_journal_head(bg_bh); > + jbd2_journal_put_journal_head(bg_bh); > > return ret; > }
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 481017e..fd995870 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -1259,7 +1259,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh, if (!buffer_jbd(bg_bh)) return 1; - jbd_lock_bh_journal_head(bg_bh); + jbd2_journal_grab_journal_head(bg_bh); if (buffer_jbd(bg_bh)) { jh = bh2jh(bg_bh); spin_lock(&jh->b_state_lock); @@ -1270,7 +1270,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh, ret = 1; spin_unlock(&jh->b_state_lock); } - jbd_unlock_bh_journal_head(bg_bh); + jbd2_journal_put_journal_head(bg_bh); return ret; }
commit 6f1b228529ae49b0f85ab89bcdb6c365df401558 caused a deadlock which was uncovered by our internal tests. The deadlock is as foollows: Task 1: A1) jbd2_journal_commit_transaction() A2) spin_lock(&jh->b_state_lock); A3) __jbd2_journal_remove_checkpoint() A4) jbd2_journal_put_journal_head() A5) jbd_lock_bh_journal_head() Task 2: B1) ocfs2_test_bg_bit_allocatable() B2) jbd_lock_bh_journal_head() B3) spin_lock(&jh->b_state_lock); A1->A2->A3->A4->B1->B2->B3(blocked)->A5(blocked) Now cause process 2 has the jbd lock, it doesn’t let process 1 to continue after A5. Process 2 now is waiting for b_state_lock to be released by process 1. This patch resolves the deadlock. Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> --- fs/ocfs2/suballoc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)