Message ID | 20140319211004.23C7F31C1E1@corp2gmr1-1.hot.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Mar 19, 2014 at 02:10:03PM -0700, Andrew Morton wrote: > From: jiangyiwen <jiangyiwen@huawei.com> > Subject: ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount > > The following case may lead to endless loop during umount. > > node A node B node C node D > umount volume, > migrate lockres1 > to B > want to lock lockres1, > send > MASTER_REQUEST_MSG > to C > init block mle > send > MIGRATE_REQUEST_MSG > to C > find a block > mle, and then > return > DLM_MIGRATE_RESPONSE_MASTERY_REF > to B > set C in refmap > umount successfully > try to umount, endless > loop occurs when migrate > lockres1 since C is in > refmap > > So we can fix this endless loop case by only returning > DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving > MIGRATE_REQUEST_MSG. > > [akpm@linux-foundation.org: coding-style fixes] > Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> > Cc: Mark Fasheh <mfasheh@suse.com> > Cc: Joel Becker <jlbec@evilplan.org> > Cc: Xue jiufei <xuejiufei@huawei.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Ok, I _think_ I got this race condition, and the patch itself seems sane. How was this bug hit, and how much testing did you do with this patch? I ask because dlm changes can sometimes have unintended effects and I really don't understand that particular code well enough right now to tell with 100% certainty we didn't mess something else up. Actually, I'm going to CC Sunil in the hopes he can look at this. --Mark > --- > > fs/ocfs2/dlm/dlmmaster.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount fs/ocfs2/dlm/dlmmaster.c > --- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount > +++ a/fs/ocfs2/dlm/dlmmaster.c > @@ -3084,11 +3084,15 @@ static int dlm_add_migration_mle(struct > /* remove it so that only one mle will be found */ > __dlm_unlink_mle(dlm, tmp); > __dlm_mle_detach_hb_events(dlm, tmp); > - ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; > - mlog(0, "%s:%.*s: master=%u, newmaster=%u, " > - "telling master to get ref for cleared out mle " > - "during migration\n", dlm->name, namelen, name, > - master, new_master); > + if (tmp->type == DLM_MLE_MASTER) { > + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; > + mlog(0, "%s:%.*s: master=%u, newmaster=%u, " > + "telling master to get ref " > + "for cleared out mle during " > + "migration\n", dlm->name, > + namelen, name, master, > + new_master); > + } > } > spin_unlock(&tmp->spinlock); > } > _ -- Mark Fasheh
diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount fs/ocfs2/dlm/dlmmaster.c --- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount +++ a/fs/ocfs2/dlm/dlmmaster.c @@ -3084,11 +3084,15 @@ static int dlm_add_migration_mle(struct /* remove it so that only one mle will be found */ __dlm_unlink_mle(dlm, tmp); __dlm_mle_detach_hb_events(dlm, tmp); - ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; - mlog(0, "%s:%.*s: master=%u, newmaster=%u, " - "telling master to get ref for cleared out mle " - "during migration\n", dlm->name, namelen, name, - master, new_master); + if (tmp->type == DLM_MLE_MASTER) { + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; + mlog(0, "%s:%.*s: master=%u, newmaster=%u, " + "telling master to get ref " + "for cleared out mle during " + "migration\n", dlm->name, + namelen, name, master, + new_master); + } } spin_unlock(&tmp->spinlock); }