From patchwork Tue Jan 19 03:03:19 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Junxiao Bi <junxiao.bi@oracle.com>
X-Patchwork-Id: 8068191
Return-Path: <ocfs2-devel-bounces@oss.oracle.com>
X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 7C3D09FC36
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Wed, 20 Jan 2016 04:22:24 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 6D6E7204D1
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Wed, 20 Jan 2016 04:22:23 +0000 (UTC)
Received: from userp1050.oracle.com (userp1050.oracle.com [156.151.31.82])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 37916204D6
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Wed, 20 Jan 2016 04:22:22 +0000 (UTC)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81])
	by userp1050.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with
	ESMTP id u0J36jX1021778
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK) for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Tue, 19 Jan 2016 03:06:45 GMT
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234])
	by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2)
	with ESMTP id u0J34vhE014534
	(version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL);
	Tue, 19 Jan 2016 03:04:59 GMT
Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2])
	by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u0J34ulm022024
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 19 Jan 2016 03:04:56 GMT
Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com)
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <ocfs2-devel-bounces@oss.oracle.com>)
	id 1aLMbI-0002mV-Cw; Mon, 18 Jan 2016 19:04:56 -0800
Received: from userv0021.oracle.com ([156.151.31.71])
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <junxiao.bi@oracle.com>) id 1aLMas-0002lk-2Q
	for ocfs2-devel@oss.oracle.com; Mon, 18 Jan 2016 19:04:30 -0800
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75])
	by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u0J34TLp001946
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL);
	Tue, 19 Jan 2016 03:04:29 GMT
Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13])
	by userv0122.oracle.com (8.14.4/8.13.8) with ESMTP id
	u0J34T7V009657; Tue, 19 Jan 2016 03:04:29 GMT
Received: from [10.182.32.177] (/10.182.32.177)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Mon, 18 Jan 2016 19:04:28 -0800
To: xuejiufei <xuejiufei@huawei.com>, Joseph Qi <joseph.qi@huawei.com>
References: <56931785.2090603@huawei.com> <56947B02.7030503@oracle.com>
	<5694A857.5020906@huawei.com> <5695BA7D.90009@oracle.com>
	<5695ECE6.7050709@huawei.com> <5695F616.9040105@oracle.com>
	<569609BE.8050802@huawei.com> <569C69E4.6030500@oracle.com>
	<569C8F39.6090802@huawei.com>
From: Junxiao Bi <junxiao.bi@oracle.com>
Message-ID: <569DA777.4000909@oracle.com>
Date: Tue, 19 Jan 2016 11:03:19 +0800
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101
	Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <569C8F39.6090802@huawei.com>
Cc: "ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com>
Subject: Re: [Ocfs2-devel] ocfs2: A race between refmap setting and clearing
X-BeenThere: ocfs2-devel@oss.oracle.com
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <ocfs2-devel.oss.oracle.com>
List-Unsubscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=unsubscribe>
List-Archive: <http://oss.oracle.com/pipermail/ocfs2-devel>
List-Post: <mailto:ocfs2-devel@oss.oracle.com>
List-Help: <mailto:ocfs2-devel-request@oss.oracle.com?subject=help>
List-Subscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=subscribe>
Sender: ocfs2-devel-bounces@oss.oracle.com
Errors-To: ocfs2-devel-bounces@oss.oracle.com
X-Source-IP: userp1040.oracle.com [156.151.31.81]
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi Jiufei & Joseph,

On 01/18/2016 03:07 PM, xuejiufei wrote:
> 
> 
> On 2016/1/18 12:28, Junxiao Bi wrote:
>> On 01/13/2016 04:24 PM, Joseph Qi wrote:
>>> Hi Junxiao,
>>>
>>> On 2016/1/13 15:00, Junxiao Bi wrote:
>>>> On 01/13/2016 02:21 PM, xuejiufei wrote:
>>>>> Hi Junxiao,
>>>>> I have not describe the issue clearly.
>>>>>
>>>>> Node 1                               Node 2(master)
>>>>> dlmlock
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>> dlmlock succeed
>>>>> dlmunlock succeed
>>>>>
>>>>> dlm_purge_lockres
>>>>>                                 dlm_deref_handler
>>>>>                                 -> find lock resource is in
>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>                                    so dispatch a deref work
>>>>> dlm_purge_lockres succeed.
>>>>>
>>>>> call dlmlock again
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>
>>>>>                                 deref work trigger, call
>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>                                 to clear Node 1 from refmap
>>>>>
>>>>>                                 dlm_purge_lockres succeed
>>>>>
>>>>> dlm_send_remote_lock_request
>>>>>                                 return DLM_IVLOCKID because
>>>>>                                 the lockres is not exist
>>>> More clear now. Thank you.
>>>> This is a very complicated race. I didn't have a good solution to fix it
>>>> now. Your fix looks work, but I am afraid if we keep going fix this
>>>> kinds of races case by case, we will make dlm harder to understand and
>>>> easy to involve bugs, maybe we should think about refactor dlm.
>>>>
>>> Agree. IMO, the root cause is bit op cannot handle such a case.
>>> I wonder if we have to change it to refcount, which may require a much
>>> bigger refactoring.
>> one bit for each node seems reasonable, as lockres is per node. I think
>> the cause is the dis-order of set/clear, i am trying to see whether they
>> can be made happen in order.
>>
> Agree. The solution 1) in my first mail is going to add a new message to
> keep the order of set and clear. Other nodes can purge the lock resource
> only after the refmap on master is cleared.
I am taking another way, try to delete deref_lockres_worker, and make
deref refmap happen directly in deref handler. This needs find the
dis-order part and fix. As I can see, the dis-order can only happen at
do_assert_master set refmap after deref_lockres_handler clear it?

For this, need make sure not return MASTERY_REF to owner after purge.
SETREF_INPROG is designed to do this. It is set by assert_master_handler
and purge will wait it cleared before deref lockres. There are two cases
where SETREF_INPROG is not set when purge

1. assert master message not coming when purge()
This happened when lockres owner already exist when asking for a
lockres. Lockres owner will be known from master request message and
dlm_get_lock_resouces return. Soon dlmlock/dlmunlock is done, and purge
done. Then assert master message coming, but since MLE have been deleted
in dlm_get_lock_resouce, it will not return MASTERY_REF to owner to
reset refmap. So no problem in this case.

2. assert master handler set SETREF_INPROG too late
SETREF_INPROG is set after lockres owner has been updated and
dlm->spinlock released. That will make dlm_get_lock_resource waken up
too fast, and then purge may happen before SETREF_INPROG is set. This is
a bug and can be fix by the following patch.

                                        "from %u to %u\n",
@@ -2029,12 +2030,8 @@ ok:

 done:
        ret = 0;
-       if (res) {
-               spin_lock(&res->spinlock);
-               res->state |= DLM_LOCK_RES_SETREF_INPROG;
-               spin_unlock(&res->spinlock);
+       if (res)
                *ret_data = (void *)res;
-       }
        dlm_put(dlm);
        if (master_request) {
                mlog(0, "need to tell master to reassert\n");


Besides this path, revert commit
f3f854648de64c4b6f13f6f13113bc9525c621e5 ("ocfs2_dlm: Ensure correct
ordering of set/clear refmap bit on lockres"), can this fix this issue?

Thanks,
Junxiao.


> 
> Thanks
> Jiufei
> 
>> Thanks,
>> Junxiao.
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>>>> BUG if the lockres is $RECOVERY
>>>>>
>>>>> On 2016/1/13 10:46, Junxiao Bi wrote:
>>>>>> On 01/12/2016 03:16 PM, xuejiufei wrote:
>>>>>>> Hi, Junxiao
>>>>>>>
>>>>>>> On 2016/1/12 12:03, Junxiao Bi wrote:
>>>>>>>> Hi Jiufei,
>>>>>>>>
>>>>>>>> On 01/11/2016 10:46 AM, xuejiufei wrote:
>>>>>>>>> Hi all,
>>>>>>>>> We have found a race between refmap setting and clearing which
>>>>>>>>> will cause the lock resource on master is freed before other nodes
>>>>>>>>> purge it.
>>>>>>>>>
>>>>>>>>> Node 1                               Node 2(master)
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>> call dlm_purge_lockres after unlock
>>>>>>>>>                                 dlm_deref_handler
>>>>>>>>>                                 -> find lock resource is in
>>>>>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>>>>>                                    so dispatch a deref work
>>>>>>>>> dlm_purge_lockres succeed.
>>>>>>>>>
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>>
>>>>>>>>>                                 deref work trigger, call
>>>>>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>>>>>                                 to clear Node 1 from refmap
>>>>>>>>>
>>>>>>>>> Now Node 2 can purge the lock resource but the owner of lock resource
>>>>>>>>> on Node 1 is still Node 2 which may trigger BUG if the lock resource
>>>>>>>>> is $RECOVERY or other problems.
>>>>>>>>>
>>>>>>>>> We have discussed 2 solutions:
>>>>>>>>> 1)The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set. Node 1 will not retry and master send another message to Node 1
>>>>>>>>> after clearing the refmap. Node 1 can purge the lock resource after the
>>>>>>>>> refmap on master is cleared.
>>>>>>>>> 2) The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set, and Node 1 will retry to deref the lockres.
>>>>>>>>>
>>>>>>>>> Does anybody has better ideas?
>>>>>>>>>
>>>>>>>> dlm_purge_lockres() will wait to drop ref until
>>>>>>>> DLM_LOCK_RES_SETREF_INPROG cleared. So if set this flag when find the
>>>>>>>> master during doing master request. And then this flag was cleared when
>>>>>>>> receiving assert master message, can this fix the issue?
>>>>>>>>
>>>>>>> I don't think this can fix. Before doing master request, the lock resource is
>>>>>>> already purged. The master should clear the refmap before client purge it.
>>>>>> inflight_locks is increased in dlm_get_lock_resource() which will stop
>>>>>> lockres purged? Set DLM_LOCK_RES_SETREF_INPROG when found lockres owner
>>>>>> during master request, then this will stop lockres purged after unlock?
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiufei
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Junxiao.
>>>>>>>>> Thanks,
>>>>>>>>> --Jiufei
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>> .
>>
>

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9477d6e1de37..83cd65b128d0 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -1965,6 +1965,7 @@ ok:
                if (res) {
                        int wake = 0;
                        spin_lock(&res->spinlock);
+                       res->state |= DLM_LOCK_RES_SETREF_INPROG;
                        if (mle->type == DLM_MLE_MIGRATION) {
                                mlog(0, "finishing off migration of
lockres %.*s, "