ocfs2: should wait dio before inode lock in ocfs2_setattr()
diff mbox

Message ID 59F2E6F2.7090803@huawei.com
State New
Headers show

Commit Message

zhendong chen Oct. 27, 2017, 7:57 a.m. UTC
we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will be happened:
process 1                  process 2                    process 3
truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
ocfs2_setattr
 ocfs2_inode_lock_tracker
  ocfs2_inode_lock_full
 inode_dio_wait
  __inode_dio_wait
  -->waiting for all dio
  requests finish
                                                        dlm_proxy_ast_handler
                                                         dlm_do_local_bast
                                                          ocfs2_blocking_ast
                                                           ocfs2_generic_handle_bast
                                                            set OCFS2_LOCK_BLOCKED flag
                        dio_end_io
                         dio_bio_end_aio
                          dio_complete
                           ocfs2_dio_end_io
                            ocfs2_dio_end_io_write
                             ocfs2_inode_lock
                              __ocfs2_cluster_lock
                               ocfs2_wait_for_mask
                               -->waiting for OCFS2_LOCK_BLOCKED
                               flag to be cleared, that is waiting
                               for 'process 1' unlocking the inode lock
                           inode_dio_end
                           -->here dec the i_dio_count, but will never
                           be called, so a deadlock happened.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>

---
 fs/ocfs2/file.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Changwei Ge Oct. 27, 2017, 10:21 a.m. UTC | #1
Hi Alex,

Thanks for reporting.
I probably get your point. You mean that for a lock resource(say A), it
is used to protect metadata changing  among nodes in cluster.

Unfortunately, it was marks as BLOCKED since it was granted with a EX 
lock, and the lock can't be unblocked since it has more or equal to one 
::ex_holder(s), furthermore, since process 1 is waiting for all inflight 
dio accomplishment, it won't give up its ownership of lock source A.

Thus, hang, right?

 From code reviewing, I admit that the hang situation does exit.

But as for your patch, how can you guarantee no more bio will be issued 
from other nodes in cluster?

Also, I cc this patch to ocfs2 maintainers.

Thanks,
Changwei

On 2017/10/27 16:01, alex chen wrote:
> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will be happened:
> process 1                  process 2                    process 3
> truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
> ocfs2_setattr
>   ocfs2_inode_lock_tracker
>    ocfs2_inode_lock_full
>   inode_dio_wait
>    __inode_dio_wait
>    -->waiting for all dio
>    requests finish
>                                                          dlm_proxy_ast_handler
>                                                           dlm_do_local_bast
>                                                            ocfs2_blocking_ast
>                                                             ocfs2_generic_handle_bast
>                                                              set OCFS2_LOCK_BLOCKED flag
>                          dio_end_io
>                           dio_bio_end_aio
>                            dio_complete
>                             ocfs2_dio_end_io
>                              ocfs2_dio_end_io_write
>                               ocfs2_inode_lock
>                                __ocfs2_cluster_lock
>                                 ocfs2_wait_for_mask
>                                 -->waiting for OCFS2_LOCK_BLOCKED
>                                 flag to be cleared, that is waiting
>                                 for 'process 1' unlocking the inode lock
>                             inode_dio_end
>                             -->here dec the i_dio_count, but will never
>                             be called, so a deadlock happened.
> 
> Signed-off-by: Alex Chen <alex.chen@huawei.com>
> Reviewed-by: Jun Piao <piaojun@huawei.com>
> 
> ---
>   fs/ocfs2/file.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 6e41fc8..50e09a6 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>   	}
>   	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
>   	if (size_change) {
> +
> +		/* here we should wait dio to finish before inode lock
> +		 * to avoid a deadlock between ocfs2_setattr() and
> +		 * ocfs2_dio_end_io_write()
> +		 */
> +		inode_dio_wait(inode);
> +
>   		status = ocfs2_rw_lock(inode, 1);
>   		if (status < 0) {
>   			mlog_errno(status);
> @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>   		if (status)
>   			goto bail_unlock;
> 
> -		inode_dio_wait(inode);
> -
>   		if (i_size_read(inode) >= attr->ia_size) {
>   			if (ocfs2_should_order_data(inode)) {
>   				status = ocfs2_begin_ordered_truncate(inode,
>
zhendong chen Oct. 28, 2017, 2:34 a.m. UTC | #2
Hi Changwei,

Thanks for you reply.

On 2017/10/27 18:21, Changwei Ge wrote:
> Hi Alex,
> 
> Thanks for reporting.
> I probably get your point. You mean that for a lock resource(say A), it
> is used to protect metadata changing  among nodes in cluster.
> 
> Unfortunately, it was marks as BLOCKED since it was granted with a EX 
> lock, and the lock can't be unblocked since it has more or equal to one 
> ::ex_holder(s), furthermore, since process 1 is waiting for all inflight 
> dio accomplishment, it won't give up its ownership of lock source A.
> 
> Thus, hang, right?

Yes, I'm glad you can understand this.

> 
>  From code reviewing, I admit that the hang situation does exit.
> 
> But as for your patch, how can you guarantee no more bio will be issued 
> from other nodes in cluster?
> 

First of all, we use the inode_lock() in do_truncate() to prevent another bio to
be issued from this node.
Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
to guarantee no more bio will be issued from the other nodes in this cluster.

Thanks,
Alex

> Also, I cc this patch to ocfs2 maintainers.
> 
> Thanks,
> Changwei
> 
> On 2017/10/27 16:01, alex chen wrote:
>> we should wait dio requests to finish before inode lock in
>> ocfs2_setattr(), otherwise the following deadlock will be happened:
>> process 1                  process 2                    process 3
>> truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
>> ocfs2_setattr
>>   ocfs2_inode_lock_tracker
>>    ocfs2_inode_lock_full
>>   inode_dio_wait
>>    __inode_dio_wait
>>    -->waiting for all dio
>>    requests finish
>>                                                          dlm_proxy_ast_handler
>>                                                           dlm_do_local_bast
>>                                                            ocfs2_blocking_ast
>>                                                             ocfs2_generic_handle_bast
>>                                                              set OCFS2_LOCK_BLOCKED flag
>>                          dio_end_io
>>                           dio_bio_end_aio
>>                            dio_complete
>>                             ocfs2_dio_end_io
>>                              ocfs2_dio_end_io_write
>>                               ocfs2_inode_lock
>>                                __ocfs2_cluster_lock
>>                                 ocfs2_wait_for_mask
>>                                 -->waiting for OCFS2_LOCK_BLOCKED
>>                                 flag to be cleared, that is waiting
>>                                 for 'process 1' unlocking the inode lock
>>                             inode_dio_end
>>                             -->here dec the i_dio_count, but will never
>>                             be called, so a deadlock happened.
>>
>> Signed-off-by: Alex Chen <alex.chen@huawei.com>
>> Reviewed-by: Jun Piao <piaojun@huawei.com>
>>
>> ---
>>   fs/ocfs2/file.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index 6e41fc8..50e09a6 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>>   	}
>>   	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
>>   	if (size_change) {
>> +
>> +		/* here we should wait dio to finish before inode lock
>> +		 * to avoid a deadlock between ocfs2_setattr() and
>> +		 * ocfs2_dio_end_io_write()
>> +		 */
>> +		inode_dio_wait(inode);
>> +
>>   		status = ocfs2_rw_lock(inode, 1);
>>   		if (status < 0) {
>>   			mlog_errno(status);
>> @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>>   		if (status)
>>   			goto bail_unlock;
>>
>> -		inode_dio_wait(inode);
>> -
>>   		if (i_size_read(inode) >= attr->ia_size) {
>>   			if (ocfs2_should_order_data(inode)) {
>>   				status = ocfs2_begin_ordered_truncate(inode,
>>
> 
> 
> .
>
Changwei Ge Oct. 28, 2017, 8:12 a.m. UTC | #3
Hi Alex,
On 2017/10/28 10:35, alex chen wrote:
> Hi Changwei,
> 
> Thanks for you reply.
> 
> On 2017/10/27 18:21, Changwei Ge wrote:
>> Hi Alex,
>>
>> Thanks for reporting.
>> I probably get your point. You mean that for a lock resource(say A), it
>> is used to protect metadata changing  among nodes in cluster.
>>
>> Unfortunately, it was marks as BLOCKED since it was granted with a EX
>> lock, and the lock can't be unblocked since it has more or equal to one
>> ::ex_holder(s), furthermore, since process 1 is waiting for all inflight
>> dio accomplishment, it won't give up its ownership of lock source A.
>>
>> Thus, hang, right?
> 
> Yes, I'm glad you can understand this.
> 
>>
>>   From code reviewing, I admit that the hang situation does exit.
>>
>> But as for your patch, how can you guarantee no more bio will be issued
>> from other nodes in cluster?
>>
> 
> First of all, we use the inode_lock() in do_truncate() to prevent another bio to
> be issued from this node.
> Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
> to guarantee no more bio will be issued from the other nodes in this cluster.
Thanks for your particular elaboration.
I think this patch makes sense.
Acked.

Thanks,
Changwei

> 
> Thanks,
> Alex
> 
>> Also, I cc this patch to ocfs2 maintainers.
>>
>> Thanks,
>> Changwei
>>
>> On 2017/10/27 16:01, alex chen wrote:
>>> we should wait dio requests to finish before inode lock in
>>> ocfs2_setattr(), otherwise the following deadlock will be happened:
>>> process 1                  process 2                    process 3
>>> truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
>>> ocfs2_setattr
>>>    ocfs2_inode_lock_tracker
>>>     ocfs2_inode_lock_full
>>>    inode_dio_wait
>>>     __inode_dio_wait
>>>     -->waiting for all dio
>>>     requests finish
>>>                                                           dlm_proxy_ast_handler
>>>                                                            dlm_do_local_bast
>>>                                                             ocfs2_blocking_ast
>>>                                                              ocfs2_generic_handle_bast
>>>                                                               set OCFS2_LOCK_BLOCKED flag
>>>                           dio_end_io
>>>                            dio_bio_end_aio
>>>                             dio_complete
>>>                              ocfs2_dio_end_io
>>>                               ocfs2_dio_end_io_write
>>>                                ocfs2_inode_lock
>>>                                 __ocfs2_cluster_lock
>>>                                  ocfs2_wait_for_mask
>>>                                  -->waiting for OCFS2_LOCK_BLOCKED
>>>                                  flag to be cleared, that is waiting
>>>                                  for 'process 1' unlocking the inode lock
>>>                              inode_dio_end
>>>                              -->here dec the i_dio_count, but will never
>>>                              be called, so a deadlock happened.
>>>
>>> Signed-off-by: Alex Chen <alex.chen@huawei.com>
>>> Reviewed-by: Jun Piao <piaojun@huawei.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>

>>>
>>> ---
>>>    fs/ocfs2/file.c | 9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>>> index 6e41fc8..50e09a6 100644
>>> --- a/fs/ocfs2/file.c
>>> +++ b/fs/ocfs2/file.c
>>> @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>>>    	}
>>>    	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
>>>    	if (size_change) {
>>> +
>>> +		/* here we should wait dio to finish before inode lock
>>> +		 * to avoid a deadlock between ocfs2_setattr() and
>>> +		 * ocfs2_dio_end_io_write()
>>> +		 */
>>> +		inode_dio_wait(inode);
>>> +
>>>    		status = ocfs2_rw_lock(inode, 1);
>>>    		if (status < 0) {
>>>    			mlog_errno(status);
>>> @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>>>    		if (status)
>>>    			goto bail_unlock;
>>>
>>> -		inode_dio_wait(inode);
>>> -
>>>    		if (i_size_read(inode) >= attr->ia_size) {
>>>    			if (ocfs2_should_order_data(inode)) {
>>>    				status = ocfs2_begin_ordered_truncate(inode,
>>>
>>
>>
>> .
>>
> 
>
Joseph Qi Oct. 30, 2017, 12:30 p.m. UTC | #4
On 17/10/27 15:57, alex chen wrote:
> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will be happened:
> process 1                  process 2                    process 3
> truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
> ocfs2_setattr
>  ocfs2_inode_lock_tracker
>   ocfs2_inode_lock_full
>  inode_dio_wait
>   __inode_dio_wait
>   -->waiting for all dio
>   requests finish
>                                                         dlm_proxy_ast_handler
>                                                          dlm_do_local_bast
>                                                           ocfs2_blocking_ast
>                                                            ocfs2_generic_handle_bast
>                                                             set OCFS2_LOCK_BLOCKED flag
>                         dio_end_io
>                          dio_bio_end_aio
>                           dio_complete
>                            ocfs2_dio_end_io
>                             ocfs2_dio_end_io_write
>                              ocfs2_inode_lock
>                               __ocfs2_cluster_lock
>                                ocfs2_wait_for_mask
>                                -->waiting for OCFS2_LOCK_BLOCKED
>                                flag to be cleared, that is waiting
>                                for 'process 1' unlocking the inode lock
>                            inode_dio_end
>                            -->here dec the i_dio_count, but will never
>                            be called, so a deadlock happened.
> 
> Signed-off-by: Alex Chen <alex.chen@huawei.com>
> Reviewed-by: Jun Piao <piaojun@huawei.com>
> 
Redundant blank line here.

> ---
>  fs/ocfs2/file.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 6e41fc8..50e09a6 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  	}
>  	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
>  	if (size_change) {
> +
I'd prefer no blank line here. And comment it like:
/*
 * xxx
 */

Other looks good to me. After you fix above, feel free to add:
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>

> +		/* here we should wait dio to finish before inode lock
> +		 * to avoid a deadlock between ocfs2_setattr() and
> +		 * ocfs2_dio_end_io_write()
> +		 */
> +		inode_dio_wait(inode);
> +
>  		status = ocfs2_rw_lock(inode, 1);
>  		if (status < 0) {
>  			mlog_errno(status);
> @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>  		if (status)
>  			goto bail_unlock;
> 
> -		inode_dio_wait(inode);
> -
>  		if (i_size_read(inode) >= attr->ia_size) {
>  			if (ocfs2_should_order_data(inode)) {
>  				status = ocfs2_begin_ordered_truncate(inode,
>

Patch
diff mbox

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 6e41fc8..50e09a6 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1161,6 +1161,13 @@  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
 	if (size_change) {
+
+		/* here we should wait dio to finish before inode lock
+		 * to avoid a deadlock between ocfs2_setattr() and
+		 * ocfs2_dio_end_io_write()
+		 */
+		inode_dio_wait(inode);
+
 		status = ocfs2_rw_lock(inode, 1);
 		if (status < 0) {
 			mlog_errno(status);
@@ -1200,8 +1207,6 @@  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 		if (status)
 			goto bail_unlock;

-		inode_dio_wait(inode);
-
 		if (i_size_read(inode) >= attr->ia_size) {
 			if (ocfs2_should_order_data(inode)) {
 				status = ocfs2_begin_ordered_truncate(inode,