diff mbox series

fix dead lock caused by ocfs2_defrag_extent

Message ID 20180827080121.31145-1-lchen@suse.com (mailing list archive)
State New, archived
Headers show
Series fix dead lock caused by ocfs2_defrag_extent | expand

Commit Message

Larry Chen Aug. 27, 2018, 8:01 a.m. UTC
ocfs2_defrag_extent may fall into dead lock.

ocfs2_ioctl_move_extents
  ocfs2_ioctl_move_extents	
    ocfs2_move_extents
      ocfs2_defrag_extent
        ocfs2_lock_allocators_move_extents  

          ocfs2_reserve_clusters
            inode_lock GLOBAL_BITMAP_SYSTEM_INODE

	  __ocfs2_flush_truncate_log
            inode_lock GLOBAL_BITMAP_SYSTEM_INODE

As back trace shows above, ocfs2_reserve_clusters will call inode_lock
against the global bitmap if local allocator has not sufficient cluters. 
Once global bitmap could meet the demand, ocfs2_reserve_cluster will 
return success with global bitmap locked.

After ocfs2_reserve_cluster, if truncate log is full, 
__ocfs2_flush_truncate_log will definitely fall into dead lock
because it needs to inode_lock global bitmap, which has
already been locked.

To fix this bug, we could remove from ocfs2_lock_allocators_move_extents 
the code which intends to lock global allocator, and put the removed
code after __ocfs2_flush_truncate_log

The ocfs2_lock_allocators_move_extents has been refered by 2 places, one
is here, the other does not need the data allocator context, which means
this patch does not affect the caller so far.



Signed-off-by: Larry Chen <lchen@suse.com>
---
 fs/ocfs2/move_extents.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

Joseph Qi Aug. 28, 2018, 1:02 a.m. UTC | #1
Hi Larry,

On 18/8/27 16:01, Larry Chen wrote:
> ocfs2_defrag_extent may fall into dead lock.
> 
> ocfs2_ioctl_move_extents
>   ocfs2_ioctl_move_extents	
>     ocfs2_move_extents
>       ocfs2_defrag_extent
>         ocfs2_lock_allocators_move_extents  
> 
>           ocfs2_reserve_clusters
>             inode_lock GLOBAL_BITMAP_SYSTEM_INODE
> 
> 	  __ocfs2_flush_truncate_log
>             inode_lock GLOBAL_BITMAP_SYSTEM_INODE
> 
> As back trace shows above, ocfs2_reserve_clusters will call inode_lock
> against the global bitmap if local allocator has not sufficient cluters. 
> Once global bitmap could meet the demand, ocfs2_reserve_cluster will 
> return success with global bitmap locked.
> 
> After ocfs2_reserve_cluster, if truncate log is full, 
> __ocfs2_flush_truncate_log will definitely fall into dead lock
> because it needs to inode_lock global bitmap, which has
> already been locked.
> 
> To fix this bug, we could remove from ocfs2_lock_allocators_move_extents 
> the code which intends to lock global allocator, and put the removed
> code after __ocfs2_flush_truncate_log
> 
> The ocfs2_lock_allocators_move_extents has been refered by 2 places, one
> is here, the other does not need the data allocator context, which means
> this patch does not affect the caller so far.
> 
After this change, data_ac in ocfs2_lock_allocators_move_extents() is
no longer used. So I'd prefer we also clean it up.

Thanks,
Joseph

> 
> 
> Signed-off-by: Larry Chen <lchen@suse.com>
> ---
>  fs/ocfs2/move_extents.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
> index 7eb3b0a6347e..064dedf40d74 100644
> --- a/fs/ocfs2/move_extents.c
> +++ b/fs/ocfs2/move_extents.c
> @@ -192,13 +192,6 @@ static int ocfs2_lock_allocators_move_extents(struct inode *inode,
>  		goto out;
>  	}
>  
> -	if (data_ac) {
> -		ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
> -		if (ret) {
> -			mlog_errno(ret);
> -			goto out;
> -		}
> -	}
>  
>  	*credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el);
>  
> @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
>  		}
>  	}
>  
> +	ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
> +	if (ret) {
> +		mlog_errno(ret);
> +		goto out_unlock_mutex;
> +	}
> +
>  	handle = ocfs2_start_trans(osb, credits);
>  	if (IS_ERR(handle)) {
>  		ret = PTR_ERR(handle);
>
Larry Chen Aug. 28, 2018, 1:37 a.m. UTC | #2
Hi Joseph,

Thanks for your review, I'll remove it later.

Thanks,
Larry

On 08/28/2018 09:02 AM, Joseph Qi wrote:
> Hi Larry,
> 
> On 18/8/27 16:01, Larry Chen wrote:
>> ocfs2_defrag_extent may fall into dead lock.
>>
>> ocfs2_ioctl_move_extents
>>    ocfs2_ioctl_move_extents	
>>      ocfs2_move_extents
>>        ocfs2_defrag_extent
>>          ocfs2_lock_allocators_move_extents
>>
>>            ocfs2_reserve_clusters
>>              inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>
>> 	  __ocfs2_flush_truncate_log
>>              inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>
>> As back trace shows above, ocfs2_reserve_clusters will call inode_lock
>> against the global bitmap if local allocator has not sufficient cluters.
>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will
>> return success with global bitmap locked.
>>
>> After ocfs2_reserve_cluster, if truncate log is full,
>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>> because it needs to inode_lock global bitmap, which has
>> already been locked.
>>
>> To fix this bug, we could remove from ocfs2_lock_allocators_move_extents
>> the code which intends to lock global allocator, and put the removed
>> code after __ocfs2_flush_truncate_log
>>
>> The ocfs2_lock_allocators_move_extents has been refered by 2 places, one
>> is here, the other does not need the data allocator context, which means
>> this patch does not affect the caller so far.
>>
> After this change, data_ac in ocfs2_lock_allocators_move_extents() is
> no longer used. So I'd prefer we also clean it up.
> 
> Thanks,
> Joseph
> 
>>
>>
>> Signed-off-by: Larry Chen <lchen@suse.com>
>> ---
>>   fs/ocfs2/move_extents.c | 13 ++++++-------
>>   1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>> index 7eb3b0a6347e..064dedf40d74 100644
>> --- a/fs/ocfs2/move_extents.c
>> +++ b/fs/ocfs2/move_extents.c
>> @@ -192,13 +192,6 @@ static int ocfs2_lock_allocators_move_extents(struct inode *inode,
>>   		goto out;
>>   	}
>>   
>> -	if (data_ac) {
>> -		ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
>> -		if (ret) {
>> -			mlog_errno(ret);
>> -			goto out;
>> -		}
>> -	}
>>   
>>   	*credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el);
>>   
>> @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
>>   		}
>>   	}
>>   
>> +	ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
>> +	if (ret) {
>> +		mlog_errno(ret);
>> +		goto out_unlock_mutex;
>> +	}
>> +
>>   	handle = ocfs2_start_trans(osb, credits);
>>   	if (IS_ERR(handle)) {
>>   		ret = PTR_ERR(handle);
>>
>
Changwei Ge Aug. 28, 2018, 2 a.m. UTC | #3
Hi Larry,

I'd like to propose another way to fix this issue, which might be easier.
Actually, ocfs2_reserve_clusters() inside will flush truncate log if 
there is no enough clusters left.
So how about just remove __ocfs2_flush_truncate_log() in 
ocfs2_defrag_extent()?

Thanks,
Changwei


On 2018/8/27 16:01, Larry Chen wrote:
> ocfs2_defrag_extent may fall into dead lock.
>
> ocfs2_ioctl_move_extents
>    ocfs2_ioctl_move_extents	
>      ocfs2_move_extents
>        ocfs2_defrag_extent
>          ocfs2_lock_allocators_move_extents
>
>            ocfs2_reserve_clusters
>              inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>
> 	  __ocfs2_flush_truncate_log
>              inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>
> As back trace shows above, ocfs2_reserve_clusters will call inode_lock
> against the global bitmap if local allocator has not sufficient cluters.
> Once global bitmap could meet the demand, ocfs2_reserve_cluster will
> return success with global bitmap locked.
>
> After ocfs2_reserve_cluster, if truncate log is full,
> __ocfs2_flush_truncate_log will definitely fall into dead lock
> because it needs to inode_lock global bitmap, which has
> already been locked.
>
> To fix this bug, we could remove from ocfs2_lock_allocators_move_extents
> the code which intends to lock global allocator, and put the removed
> code after __ocfs2_flush_truncate_log
>
> The ocfs2_lock_allocators_move_extents has been refered by 2 places, one
> is here, the other does not need the data allocator context, which means
> this patch does not affect the caller so far.
>
>
>
> Signed-off-by: Larry Chen <lchen@suse.com>
> ---
>   fs/ocfs2/move_extents.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
> index 7eb3b0a6347e..064dedf40d74 100644
> --- a/fs/ocfs2/move_extents.c
> +++ b/fs/ocfs2/move_extents.c
> @@ -192,13 +192,6 @@ static int ocfs2_lock_allocators_move_extents(struct inode *inode,
>   		goto out;
>   	}
>   
> -	if (data_ac) {
> -		ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
> -		if (ret) {
> -			mlog_errno(ret);
> -			goto out;
> -		}
> -	}
>   
>   	*credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el);
>   
> @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
>   		}
>   	}
>   
> +	ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
> +	if (ret) {
> +		mlog_errno(ret);
> +		goto out_unlock_mutex;
> +	}
> +
>   	handle = ocfs2_start_trans(osb, credits);
>   	if (IS_ERR(handle)) {
>   		ret = PTR_ERR(handle);
Changwei Ge Aug. 30, 2018, 2:05 a.m. UTC | #4
Hi Larry,

Sorry for this late reply.


On 2018/8/28 12:46, Larry Chen wrote:
> Hi Changwei,
>
> I think there might be some logic error.
>
> Imagine that we remove __ocfs2_flush_truncate_log() in
> ocfs2_defrag_extent().
>
> In ocfs2_reserve_clusters might skip freeing truncate log if there is 
> enough clusters in local. Then __ocfs2_move_extent might return error 
> because ocfs2_truncate_log_append might fail.
I think changing ocfs2 infrastructure methods like your way is not a 
good idea.
So how about this way?
->lock truncate log inode
->__ocfs2_flush_truncate_log()
->ocfs2_lock_allocators_move_extents()
->unlock truncate log inode

Thanks
Changwei

>
> Thanks,
> Larry
>
>
> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>> Hi Larry,
>>
>> I'd like to propose another way to fix this issue, which might be
>> easier. :-)
>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if
>> there is no enough clusters left.
>> So how about just remove __ocfs2_flush_truncate_log() in
>> ocfs2_defrag_extent()?
>>
>> Thanks,
>> Changwei
>>
>> On 2018/8/27 16:01, Larry Chen wrote:
>>> ocfs2_defrag_extent may fall into dead lock.
>>>
>>> ocfs2_ioctl_move_extents
>>>     ocfs2_ioctl_move_extents
>>>       ocfs2_move_extents
>>>         ocfs2_defrag_extent
>>>           ocfs2_lock_allocators_move_extents
>>>
>>>             ocfs2_reserve_clusters
>>>               inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>
>>>       __ocfs2_flush_truncate_log
>>>               inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>
>>> As back trace shows above, ocfs2_reserve_clusters will call inode_lock
>>> against the global bitmap if local allocator has not sufficient 
>>> cluters.
>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will
>>> return success with global bitmap locked.
>>>
>>> After ocfs2_reserve_cluster, if truncate log is full,
>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>> because it needs to inode_lock global bitmap, which has
>>> already been locked.
>>>
>>> To fix this bug, we could remove from 
>>> ocfs2_lock_allocators_move_extents
>>> the code which intends to lock global allocator, and put the removed
>>> code after __ocfs2_flush_truncate_log
>>>
>>> The ocfs2_lock_allocators_move_extents has been refered by 2 places, 
>>> one
>>> is here, the other does not need the data allocator context, which 
>>> means
>>> this patch does not affect the caller so far.
>>>
>>>
>>>
>>> Signed-off-by: Larry Chen <lchen@suse.com>
>>> ---
>>>    fs/ocfs2/move_extents.c | 13 ++++++-------
>>>    1 file changed, 6 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>> index 7eb3b0a6347e..064dedf40d74 100644
>>> --- a/fs/ocfs2/move_extents.c
>>> +++ b/fs/ocfs2/move_extents.c
>>> @@ -192,13 +192,6 @@ static int 
>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>>            goto out;
>>>        }
>>>    -    if (data_ac) {
>>> -        ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
>>> -        if (ret) {
>>> -            mlog_errno(ret);
>>> -            goto out;
>>> -        }
>>> -    }
>>>           *credits += ocfs2_calc_extend_credits(osb->sb, 
>>> et->et_root_el);
>>>    @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct 
>>> ocfs2_move_extents_context *context,
>>>            }
>>>        }
>>>    +    ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
>>> +    if (ret) {
>>> +        mlog_errno(ret);
>>> +        goto out_unlock_mutex;
>>> +    }
>>> +
>>>        handle = ocfs2_start_trans(osb, credits);
>>>        if (IS_ERR(handle)) {
>>>            ret = PTR_ERR(handle);
>>
>
Larry Chen Aug. 30, 2018, 6:26 a.m. UTC | #5
Hi Changwei,

Maybe we need more investigation.

The following is your fix:
lock truncate log inode
     __ocfs2_flush_truncate_log()
	ocfs2_lock_allocators_move_extents()
		unlock truncate log inode

The lock action will happen like following:
lock(truncate_inode)
	lock(sub allocat)
	lock(local_alloc) or lock(global_bitmap)
	
I'm not sure if there is another code path that tries to get the same 
locks but in different order, which may causes dead locks.

Indeed ocfs2 involves too many locks, I would like to reduce the 
deadlock risk at max extent.

Another way is to add an new argument for __ocfs2_flush_truncate which 
indicates whether global bitmap is needed to be locked.


Thanks,
Larry





On 08/30/2018 10:05 AM, Changwei Ge wrote:
> Hi Larry,
> 
> Sorry for this late reply.
> 
> 
> On 2018/8/28 12:46, Larry Chen wrote:
>> Hi Changwei,
>>
>> I think there might be some logic error.
>>
>> Imagine that we remove __ocfs2_flush_truncate_log() in
>> ocfs2_defrag_extent().
>>
>> In ocfs2_reserve_clusters might skip freeing truncate log if there is
>> enough clusters in local. Then __ocfs2_move_extent might return error
>> because ocfs2_truncate_log_append might fail.
> I think changing ocfs2 infrastructure methods like your way is not a
> good idea.
> So how about this way?
> ->lock truncate log inode
> ->__ocfs2_flush_truncate_log()
> ->ocfs2_lock_allocators_move_extents()
> ->unlock truncate log inode
> 
> Thanks
> Changwei
> 
>>
>> Thanks,
>> Larry
>>
>>
>> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>>> Hi Larry,
>>>
>>> I'd like to propose another way to fix this issue, which might be
>>> easier. :-)
>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if
>>> there is no enough clusters left.
>>> So how about just remove __ocfs2_flush_truncate_log() in
>>> ocfs2_defrag_extent()?
>>>
>>> Thanks,
>>> Changwei
>>>
>>> On 2018/8/27 16:01, Larry Chen wrote:
>>>> ocfs2_defrag_extent may fall into dead lock.
>>>>
>>>> ocfs2_ioctl_move_extents
>>>>      ocfs2_ioctl_move_extents
>>>>        ocfs2_move_extents
>>>>          ocfs2_defrag_extent
>>>>            ocfs2_lock_allocators_move_extents
>>>>
>>>>              ocfs2_reserve_clusters
>>>>                inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>>        __ocfs2_flush_truncate_log
>>>>                inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>> As back trace shows above, ocfs2_reserve_clusters will call inode_lock
>>>> against the global bitmap if local allocator has not sufficient
>>>> cluters.
>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will
>>>> return success with global bitmap locked.
>>>>
>>>> After ocfs2_reserve_cluster, if truncate log is full,
>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>>> because it needs to inode_lock global bitmap, which has
>>>> already been locked.
>>>>
>>>> To fix this bug, we could remove from
>>>> ocfs2_lock_allocators_move_extents
>>>> the code which intends to lock global allocator, and put the removed
>>>> code after __ocfs2_flush_truncate_log
>>>>
>>>> The ocfs2_lock_allocators_move_extents has been refered by 2 places,
>>>> one
>>>> is here, the other does not need the data allocator context, which
>>>> means
>>>> this patch does not affect the caller so far.
>>>>
>>>>
>>>>
>>>> Signed-off-by: Larry Chen <lchen@suse.com>
>>>> ---
>>>>     fs/ocfs2/move_extents.c | 13 ++++++-------
>>>>     1 file changed, 6 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>>> index 7eb3b0a6347e..064dedf40d74 100644
>>>> --- a/fs/ocfs2/move_extents.c
>>>> +++ b/fs/ocfs2/move_extents.c
>>>> @@ -192,13 +192,6 @@ static int
>>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>>>             goto out;
>>>>         }
>>>>     -    if (data_ac) {
>>>> -        ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
>>>> -        if (ret) {
>>>> -            mlog_errno(ret);
>>>> -            goto out;
>>>> -        }
>>>> -    }
>>>>            *credits += ocfs2_calc_extend_credits(osb->sb,
>>>> et->et_root_el);
>>>>     @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct
>>>> ocfs2_move_extents_context *context,
>>>>             }
>>>>         }
>>>>     +    ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
>>>> +    if (ret) {
>>>> +        mlog_errno(ret);
>>>> +        goto out_unlock_mutex;
>>>> +    }
>>>> +
>>>>         handle = ocfs2_start_trans(osb, credits);
>>>>         if (IS_ERR(handle)) {
>>>>             ret = PTR_ERR(handle);
>>>
>>
>
Changwei Ge Aug. 30, 2018, 7:52 a.m. UTC | #6
Hi Larry,


On 2018/8/30 14:26, Larry Chen wrote:
> Hi Changwei,
>
> Maybe we need more investigation.
>
> The following is your fix:
> lock truncate log inode
>     __ocfs2_flush_truncate_log()
>     ocfs2_lock_allocators_move_extents()
>         unlock truncate log inode
>
> The lock action will happen like following:
> lock(truncate_inode)
>     lock(sub allocat)
>     lock(local_alloc) or lock(global_bitmap)

I don't think we have to worry much about mixed order  of cluster lock 
and inode mutex since cluster lock granted node will directly succeed 
instead of waiting for itself.

>
> I'm not sure if there is another code path that tries to get the same 
> locks but in different order, which may causes dead locks.
>
> Indeed ocfs2 involves too many locks, I would like to reduce the 
> deadlock risk at max extent.
>
> Another way is to add an new argument for __ocfs2_flush_truncate which 
> indicates whether global bitmap is needed to be locked.

Sounds a feasible way :)

Thanks,
Changwei

>
>
> Thanks,
> Larry
>
>
>
>
>
> On 08/30/2018 10:05 AM, Changwei Ge wrote:
>> Hi Larry,
>>
>> Sorry for this late reply.
>>
>>
>> On 2018/8/28 12:46, Larry Chen wrote:
>>> Hi Changwei,
>>>
>>> I think there might be some logic error.
>>>
>>> Imagine that we remove __ocfs2_flush_truncate_log() in
>>> ocfs2_defrag_extent().
>>>
>>> In ocfs2_reserve_clusters might skip freeing truncate log if there is
>>> enough clusters in local. Then __ocfs2_move_extent might return error
>>> because ocfs2_truncate_log_append might fail.
>> I think changing ocfs2 infrastructure methods like your way is not a
>> good idea.
>> So how about this way?
>> ->lock truncate log inode
>> ->__ocfs2_flush_truncate_log()
>> ->ocfs2_lock_allocators_move_extents()
>> ->unlock truncate log inode
>>
>> Thanks
>> Changwei
>>
>>>
>>> Thanks,
>>> Larry
>>>
>>>
>>> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>>>> Hi Larry,
>>>>
>>>> I'd like to propose another way to fix this issue, which might be
>>>> easier. :-)
>>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if
>>>> there is no enough clusters left.
>>>> So how about just remove __ocfs2_flush_truncate_log() in
>>>> ocfs2_defrag_extent()?
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>> On 2018/8/27 16:01, Larry Chen wrote:
>>>>> ocfs2_defrag_extent may fall into dead lock.
>>>>>
>>>>> ocfs2_ioctl_move_extents
>>>>>      ocfs2_ioctl_move_extents
>>>>>        ocfs2_move_extents
>>>>>          ocfs2_defrag_extent
>>>>>            ocfs2_lock_allocators_move_extents
>>>>>
>>>>>              ocfs2_reserve_clusters
>>>>>                inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>>
>>>>>        __ocfs2_flush_truncate_log
>>>>>                inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>>
>>>>> As back trace shows above, ocfs2_reserve_clusters will call 
>>>>> inode_lock
>>>>> against the global bitmap if local allocator has not sufficient
>>>>> cluters.
>>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will
>>>>> return success with global bitmap locked.
>>>>>
>>>>> After ocfs2_reserve_cluster, if truncate log is full,
>>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>>>> because it needs to inode_lock global bitmap, which has
>>>>> already been locked.
>>>>>
>>>>> To fix this bug, we could remove from
>>>>> ocfs2_lock_allocators_move_extents
>>>>> the code which intends to lock global allocator, and put the removed
>>>>> code after __ocfs2_flush_truncate_log
>>>>>
>>>>> The ocfs2_lock_allocators_move_extents has been refered by 2 places,
>>>>> one
>>>>> is here, the other does not need the data allocator context, which
>>>>> means
>>>>> this patch does not affect the caller so far.
>>>>>
>>>>>
>>>>>
>>>>> Signed-off-by: Larry Chen <lchen@suse.com>
>>>>> ---
>>>>>     fs/ocfs2/move_extents.c | 13 ++++++-------
>>>>>     1 file changed, 6 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>>>> index 7eb3b0a6347e..064dedf40d74 100644
>>>>> --- a/fs/ocfs2/move_extents.c
>>>>> +++ b/fs/ocfs2/move_extents.c
>>>>> @@ -192,13 +192,6 @@ static int
>>>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>>>>             goto out;
>>>>>         }
>>>>>     -    if (data_ac) {
>>>>> -        ret = ocfs2_reserve_clusters(osb, clusters_to_move, 
>>>>> data_ac);
>>>>> -        if (ret) {
>>>>> -            mlog_errno(ret);
>>>>> -            goto out;
>>>>> -        }
>>>>> -    }
>>>>>            *credits += ocfs2_calc_extend_credits(osb->sb,
>>>>> et->et_root_el);
>>>>>     @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct
>>>>> ocfs2_move_extents_context *context,
>>>>>             }
>>>>>         }
>>>>>     +    ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
>>>>> +    if (ret) {
>>>>> +        mlog_errno(ret);
>>>>> +        goto out_unlock_mutex;
>>>>> +    }
>>>>> +
>>>>>         handle = ocfs2_start_trans(osb, credits);
>>>>>         if (IS_ERR(handle)) {
>>>>>             ret = PTR_ERR(handle);
>>>>
>>>
>>
>
Larry Chen Aug. 30, 2018, 8:24 a.m. UTC | #7
Hi Changwei,

On 08/30/2018 03:52 PM, Changwei Ge wrote:
> Hi Larry,
> 
> 
> On 2018/8/30 14:26, Larry Chen wrote:
>> Hi Changwei,
>>
>> Maybe we need more investigation.
>>
>> The following is your fix:
>> lock truncate log inode
>>      __ocfs2_flush_truncate_log()
>>      ocfs2_lock_allocators_move_extents()
>>          unlock truncate log inode
>>
>> The lock action will happen like following:
>> lock(truncate_inode)
>>      lock(sub allocat)
>>      lock(local_alloc) or lock(global_bitmap)
> 
> I don't think we have to worry much about mixed order  of cluster lock
> and inode mutex since cluster lock granted node will directly succeed
> instead of waiting for itself.
> 
>>
>> I'm not sure if there is another code path that tries to get the same
>> locks but in different order, which may causes dead locks.

Yeah, I use lock to mean both inode_lock and ocfs2_inode_lock.
As too many types of lock and inode locks are involved, I can not 
guarantee that there is no logic error.

>>
>> Indeed ocfs2 involves too many locks, I would like to reduce the
>> deadlock risk at max extent.
>>
>> Another way is to add an new argument for __ocfs2_flush_truncate which
>> indicates whether global bitmap is needed to be locked.
> 
> Sounds a feasible way :)

Haha, I also prefer this way :)
I'll send another patch and run test cases on my environment.


Thanks,
Larry
Larry Chen Sept. 2, 2018, 9:09 a.m. UTC | #8
Hi Changwei,

>>>
>>> Another way is to add an new argument for __ocfs2_flush_truncate which
>>> indicates whether global bitmap is needed to be locked.
>>
>> Sounds a feasible way :)
> 

I tried this way, but it still causes dead lock. Here is the back trace.
The root cause is that ocfs2 sometimes relies on work queue to flush 
truncate log.

The worker who process the work queue is ocfs2_truncate_log_worker,
which directly call ocfs2_flush_truncate_log. And its lock order
is truncate_log and then global bitmap, which conflicts with 
ocfs2_defrag_extent.


  #0 [ffffb6c000a03c50] __schedule at ffffffff966c47bf
  #1 [ffffb6c000a03ce0] schedule at ffffffff966c4e08
  #2 [ffffb6c000a03ce8] rwsem_down_write_failed at ffffffff966c82b3
  #3 [ffffb6c000a03d80] schedule at ffffffff966c4e08
  #4 [ffffb6c000a03d88] call_rwsem_down_write_failed at ffffffff963a6e03
  #5 [ffffb6c000a03dc8] down_write at ffffffff966c7410
		try global bit map
  #6 [ffffb6c000a03dd0] __ocfs2_flush_truncate_log at ffffffffc0749145 
[ocfs2]
  		got truncate log
  #7 [ffffb6c000a03e68] ocfs2_flush_truncate_log at ffffffffc0749afd [ocfs2]
  #8 [ffffb6c000a03e80] ocfs2_truncate_log_worker at ffffffffc0749b29 
[ocfs2]
  #9 [ffffb6c000a03e98] process_one_work at ffffffff9609e64a
#10 [ffffb6c000a03ed8] worker_thread at ffffffff9609e88b
#11 [ffffb6c000a03f10] kthread at ffffffff960a470a
#12 [ffffb6c000a03f50] ret_from_fork at ffffffff96800235



[<ffffffff963a6e03>] call_rwsem_down_write_failed+0x13/0x20
         try truncate log
	got global bm
[<ffffffffc07906f3>] ocfs2_defrag_extent+0x1d3/0x680 [ocfs2] 

[<ffffffffc07919f9>] ocfs2_move_extents+0x329/0x740 [ocfs2] 

[<ffffffffc0791f43>] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] 

[<ffffffffc0773a83>] ocfs2_ioctl+0x253/0x640 [ocfs2] 

[<ffffffff96256480>] do_vfs_ioctl+0x90/0x5f0 

[<ffffffff96256a54>] SyS_ioctl+0x74/0x80 

[<ffffffff96003ae4>] do_syscall_64+0x74/0x140 

[<ffffffff9680009a>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 

[<ffffffffffffffff>] 0xffffffffffffffff
diff mbox series

Patch

diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
index 7eb3b0a6347e..064dedf40d74 100644
--- a/fs/ocfs2/move_extents.c
+++ b/fs/ocfs2/move_extents.c
@@ -192,13 +192,6 @@  static int ocfs2_lock_allocators_move_extents(struct inode *inode,
 		goto out;
 	}
 
-	if (data_ac) {
-		ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
-		if (ret) {
-			mlog_errno(ret);
-			goto out;
-		}
-	}
 
 	*credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el);
 
@@ -283,6 +276,12 @@  static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
 		}
 	}
 
+	ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
+	if (ret) {
+		mlog_errno(ret);
+		goto out_unlock_mutex;
+	}
+
 	handle = ocfs2_start_trans(osb, credits);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);