diff mbox

[1/1] ocfs2: race between umount and unfinished remastering during recovery

Message ID 1405630840-7071-1-git-send-email-tariq.x.saeed@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tariq Saeed July 17, 2014, 9 p.m. UTC
Orabug: 19074140

When umount is issued during recovery on the new master that
has not finished remastering locks, it triggers BUG() in
dlm_send_mig_lockres_msg().  Here is the situation:

1) node A has a lock on resource X mastered by node B.

2) node B dies ->  node A sets recovering flag for res X

3) Node C becomes the new master for resources owned by the
   dead node and is remastering locks of the dead node but
   has not finished the remastering process yet.

4) umount is issued on node C.

5) During processing of umount, ignoring unfished recovery,
   node C attempts to migrate resource X to node A.

6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers
   it a logic error and sends back -EFAULT.

7) node C asserts BUG() upon seeing EFAULT resp from node B.

Fix is to delay migrating res X  till remastering is finished
at which point recovering flag will be cleared on both A and C.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
---
 fs/ocfs2/dlm/dlmmaster.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Tariq Saeed Aug. 6, 2014, 10 p.m. UTC | #1
Hi Andrew,
This patch was submitted  on Jul 17. Is it in the review queue or has it 
been reviewed and a decision made?
Thanks
-Tariq Saeed
Andrew Morton Aug. 6, 2014, 10:04 p.m. UTC | #2
On Wed, 06 Aug 2014 15:00:12 -0700 Tariq Saeed <tariq.x.saeed@oracle.com> wrote:

> Hi Andrew,
> This patch was submitted  on Jul 17. Is it in the review queue or has it 
> been reviewed and a decision made?

ocfs2-race-between-umount-and-unfinished-remastering-during-recovery.patch
is in my to-send-to-Linus-today pile.

That pile is pretty small :(

ocfs2-correctly-check-the-return-value-of-ocfs2_search_extent_list.patch
ocfs2-remove-convertion-of-total_backoff-in-dlm_join_domain.patch
ocfs2-race-between-umount-and-unfinished-remastering-during-recovery.patch
fs-ocfs2-slot_mapc-replace-countsize-kzalloc-by-kcalloc.patch

Most of the OCFS2 patches in -mm need more review/ack/test/etc,
methinks.
diff mbox

Patch

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 82abf0c..3ec906e 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2405,6 +2405,10 @@  static int dlm_is_lockres_migrateable(struct dlm_ctxt *dlm,
 	if (res->state & DLM_LOCK_RES_MIGRATING)
 		return 0;
 
+	/* delay migration when the lockres is in RECOCERING state */
+	if (res->state & DLM_LOCK_RES_RECOVERING)
+		return 0;
+
 	if (res->owner != dlm->node_num)
 		return 0;