diff mbox

[5/8] ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless, loop during umount

Message ID 20140319211004.23C7F31C1E1@corp2gmr1-1.hot.corp.google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Morton March 19, 2014, 9:10 p.m. UTC
From: jiangyiwen <jiangyiwen@huawei.com>
Subject: ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount

The following case may lead to endless loop during umount.

node A         node B               node C       node D
umount volume,
migrate lockres1
to B
                                                 want to lock lockres1,
                                                 send
                                                 MASTER_REQUEST_MSG
                                                 to C
                                    init block mle
               send
               MIGRATE_REQUEST_MSG
               to C
                                    find a block
                                    mle, and then
                                    return
                                    DLM_MIGRATE_RESPONSE_MASTERY_REF
                                    to B
               set C in refmap
                                    umount successfully
               try to umount, endless
               loop occurs when migrate
               lockres1 since C is in
               refmap

So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Xue jiufei <xuejiufei@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/dlm/dlmmaster.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Mark Fasheh March 31, 2014, 2:23 a.m. UTC | #1
On Wed, Mar 19, 2014 at 02:10:03PM -0700, Andrew Morton wrote:
> From: jiangyiwen <jiangyiwen@huawei.com>
> Subject: ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
> 
> The following case may lead to endless loop during umount.
> 
> node A         node B               node C       node D
> umount volume,
> migrate lockres1
> to B
>                                                  want to lock lockres1,
>                                                  send
>                                                  MASTER_REQUEST_MSG
>                                                  to C
>                                     init block mle
>                send
>                MIGRATE_REQUEST_MSG
>                to C
>                                     find a block
>                                     mle, and then
>                                     return
>                                     DLM_MIGRATE_RESPONSE_MASTERY_REF
>                                     to B
>                set C in refmap
>                                     umount successfully
>                try to umount, endless
>                loop occurs when migrate
>                lockres1 since C is in
>                refmap
> 
> So we can fix this endless loop case by only returning
> DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
> MIGRATE_REQUEST_MSG.
> 
> [akpm@linux-foundation.org: coding-style fixes]
> Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
> Cc: Mark Fasheh <mfasheh@suse.com>
> Cc: Joel Becker <jlbec@evilplan.org>
> Cc: Xue jiufei <xuejiufei@huawei.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Ok, I _think_ I got this race condition, and the patch itself seems sane.

How was this bug hit, and how much testing did you do with this patch? I ask
because dlm changes can sometimes have unintended effects and I really don't
understand that particular code well enough right now to tell with 100%
certainty we didn't mess something else up.

Actually, I'm going to CC Sunil in the hopes he can look at this.
	--Mark


> ---
> 
>  fs/ocfs2/dlm/dlmmaster.c |   14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount fs/ocfs2/dlm/dlmmaster.c
> --- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount
> +++ a/fs/ocfs2/dlm/dlmmaster.c
> @@ -3084,11 +3084,15 @@ static int dlm_add_migration_mle(struct
>  			/* remove it so that only one mle will be found */
>  			__dlm_unlink_mle(dlm, tmp);
>  			__dlm_mle_detach_hb_events(dlm, tmp);
> -			ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;
> -			mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
> -			    "telling master to get ref for cleared out mle "
> -			    "during migration\n", dlm->name, namelen, name,
> -			    master, new_master);
> +			if (tmp->type == DLM_MLE_MASTER) {
> +				ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;
> +				mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
> +						"telling master to get ref "
> +						"for cleared out mle during "
> +						"migration\n", dlm->name,
> +						namelen, name, master,
> +						new_master);
> +			}
>  		}
>  		spin_unlock(&tmp->spinlock);
>  	}
> _
--
Mark Fasheh
diff mbox

Patch

diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount fs/ocfs2/dlm/dlmmaster.c
--- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -3084,11 +3084,15 @@  static int dlm_add_migration_mle(struct
 			/* remove it so that only one mle will be found */
 			__dlm_unlink_mle(dlm, tmp);
 			__dlm_mle_detach_hb_events(dlm, tmp);
-			ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;
-			mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
-			    "telling master to get ref for cleared out mle "
-			    "during migration\n", dlm->name, namelen, name,
-			    master, new_master);
+			if (tmp->type == DLM_MLE_MASTER) {
+				ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;
+				mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
+						"telling master to get ref "
+						"for cleared out mle during "
+						"migration\n", dlm->name,
+						namelen, name, master,
+						new_master);
+			}
 		}
 		spin_unlock(&tmp->spinlock);
 	}