diff mbox series

[v3] ceph: always try to uninline inline data when opening files

Message ID 20220506153843.515915-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3] ceph: always try to uninline inline data when opening files | expand

Commit Message

Xiubo Li May 6, 2022, 3:38 p.m. UTC
This will help reduce possible deadlock while holding Fcr to use
getattr for read case.

Usually we shouldn't use getattr to fetch inline data after getting
Fcr caps, because it can cause deadlock. The solution is try uniline
the inline data when opening files, thanks David Howells' previous
work on uninlining the inline data work.

It was caused from one possible call path:
  ceph_filemap_fault()-->
     ceph_get_caps(Fcr);
     filemap_fault()-->
        do_sync_mmap_readahead()-->
           page_cache_ra_order()-->
              read_pages()-->
                 aops->readahead()-->
                    netfs_readahead()-->
                       netfs_begin_read()-->
                          netfs_rreq_submit_slice()-->
                             netfs_read_from_server()-->
                                netfs_ops->issue_read()-->
                                   ceph_netfs_issue_read()-->
                                      ceph_netfs_issue_op_inline()-->
                                         getattr()
      ceph_pu_caps_ref(Fcr);

This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
get 'Frwcb' caps from both auth and replica MDSes.

But if the getattr is sent to any MDS, the MDS needs to do Locker
transition to LOCK_MIX first and then to LOCK_SYNC. But when
transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
caps back, but the kclient already holding it and waiting the MDS
to finish.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/file.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Jeff Layton May 6, 2022, 4:33 p.m. UTC | #1
On Fri, 2022-05-06 at 23:38 +0800, Xiubo Li wrote:
> This will help reduce possible deadlock while holding Fcr to use
> getattr for read case.
> 


Ok, so I guess the situation here is that if we can't uninline the file,
then we just take our chances with the deadlock possibilities? I think
we have to consider to still be a problem then.

We've been aware that the inlining code is problematic for a long time.
Maybe it's just time to rip this stuff out. Still, it'd be good to have
it in working state before we do that, in case we need to revert it for
some reason...

> Usually we shouldn't use getattr to fetch inline data after getting
> Fcr caps, because it can cause deadlock. The solution is try uniline
> the inline data when opening files, thanks David Howells' previous
> work on uninlining the inline data work.
> 
> It was caused from one possible call path:
>   ceph_filemap_fault()-->
>      ceph_get_caps(Fcr);
>      filemap_fault()-->
>         do_sync_mmap_readahead()-->
>            page_cache_ra_order()-->
>               read_pages()-->
>                  aops->readahead()-->
>                     netfs_readahead()-->
>                        netfs_begin_read()-->
>                           netfs_rreq_submit_slice()-->
>                              netfs_read_from_server()-->
>                                 netfs_ops->issue_read()-->
>                                    ceph_netfs_issue_read()-->
>                                       ceph_netfs_issue_op_inline()-->
>                                          getattr()
>       ceph_pu_caps_ref(Fcr);
> 
> This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
> the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
> get 'Frwcb' caps from both auth and replica MDSes.
> 
> But if the getattr is sent to any MDS, the MDS needs to do Locker
> transition to LOCK_MIX first and then to LOCK_SYNC. But when
> transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
> caps back, but the kclient already holding it and waiting the MDS
> to finish.
> 

My hat's is off to you who can comprehend the LOCK_* state transitions!
I don't quite grok the problem fully, but I'll take your word for it.

The page should already be uptodate if we hold Fc, AFAICT, so this is
only a problem for the case where we can get Fr but not Fc.

Is there any way we could ensure that we send the getattr to the same
MDS that we got the Fr caps from? Presumably it can satisfy the request
without a revoking anything at that point.

I'll have to think about this a bit more.

> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/file.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 8c8226c0feac..5d5386c7ef01 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -241,11 +241,16 @@ static int ceph_init_file_info(struct inode *inode, struct file *file,
>  	INIT_LIST_HEAD(&fi->rw_contexts);
>  	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
>  
> -	if ((file->f_mode & FMODE_WRITE) &&
> -	    ci->i_inline_version != CEPH_INLINE_NONE) {
> -		ret = ceph_uninline_data(file);
> -		if (ret < 0)
> +	if (ci->i_inline_version != CEPH_INLINE_NONE) {
> +		ret = ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR);
> +		if (!ret) {
> +			ret = ceph_uninline_data(file);
> +			/* Ignore the error for readonly case */
> +			if (ret < 0 && (file->f_mode & FMODE_WRITE))
> +				goto error;
> +		} else if (ret != -EPERM) {
>  			goto error;
> +		}
>  	}
>  
>  	return 0;


Maybe instead of doing this, we should
Xiubo Li May 7, 2022, 12:52 a.m. UTC | #2
On 5/7/22 12:33 AM, Jeff Layton wrote:
> On Fri, 2022-05-06 at 23:38 +0800, Xiubo Li wrote:
>> This will help reduce possible deadlock while holding Fcr to use
>> getattr for read case.
>>
>
> Ok, so I guess the situation here is that if we can't uninline the file,
> then we just take our chances with the deadlock possibilities? I think
> we have to consider to still be a problem then.

Right.

>
> We've been aware that the inlining code is problematic for a long time.
> Maybe it's just time to rip this stuff out. Still, it'd be good to have
> it in working state before we do that, in case we need to revert it for
> some reason...
>
Okay, sounds good.


>> Usually we shouldn't use getattr to fetch inline data after getting
>> Fcr caps, because it can cause deadlock. The solution is try uniline
>> the inline data when opening files, thanks David Howells' previous
>> work on uninlining the inline data work.
>>
>> It was caused from one possible call path:
>>    ceph_filemap_fault()-->
>>       ceph_get_caps(Fcr);
>>       filemap_fault()-->
>>          do_sync_mmap_readahead()-->
>>             page_cache_ra_order()-->
>>                read_pages()-->
>>                   aops->readahead()-->
>>                      netfs_readahead()-->
>>                         netfs_begin_read()-->
>>                            netfs_rreq_submit_slice()-->
>>                               netfs_read_from_server()-->
>>                                  netfs_ops->issue_read()-->
>>                                     ceph_netfs_issue_read()-->
>>                                        ceph_netfs_issue_op_inline()-->
>>                                           getattr()
>>        ceph_pu_caps_ref(Fcr);
>>
>> This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
>> the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
>> get 'Frwcb' caps from both auth and replica MDSes.
>>
>> But if the getattr is sent to any MDS, the MDS needs to do Locker
>> transition to LOCK_MIX first and then to LOCK_SYNC. But when
>> transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
>> caps back, but the kclient already holding it and waiting the MDS
>> to finish.
>>
> My hat's is off to you who can comprehend the LOCK_* state transitions!
> I don't quite grok the problem fully, but I'll take your word for it.

I have confirmed this with Zheng too. And we already have several 
similar fixes about this before.


>
> The page should already be uptodate if we hold Fc, AFAICT, so this is
> only a problem for the case where we can get Fr but not Fc.
>
> Is there any way we could ensure that we send the getattr to the same
> MDS that we got the Fr caps from? Presumably it can satisfy the request
> without a revoking anything at that point.

Days ago I sent one patch to do this in [1], but I am not very sure this 
could cover all possible cases.

[1] 
https://github.com/ceph/ceph-client/commit/c99bb1ee6ce7a92b4720afd7208944ab182697de

--Xiubo


> I'll have to think about this a bit more.
>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/file.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 8c8226c0feac..5d5386c7ef01 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -241,11 +241,16 @@ static int ceph_init_file_info(struct inode *inode, struct file *file,
>>   	INIT_LIST_HEAD(&fi->rw_contexts);
>>   	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
>>   
>> -	if ((file->f_mode & FMODE_WRITE) &&
>> -	    ci->i_inline_version != CEPH_INLINE_NONE) {
>> -		ret = ceph_uninline_data(file);
>> -		if (ret < 0)
>> +	if (ci->i_inline_version != CEPH_INLINE_NONE) {
>> +		ret = ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR);
>> +		if (!ret) {
>> +			ret = ceph_uninline_data(file);
>> +			/* Ignore the error for readonly case */
>> +			if (ret < 0 && (file->f_mode & FMODE_WRITE))
>> +				goto error;
>> +		} else if (ret != -EPERM) {
>>   			goto error;
>> +		}
>>   	}
>>   
>>   	return 0;
>
> Maybe instead of doing this, we should
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 8c8226c0feac..5d5386c7ef01 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -241,11 +241,16 @@  static int ceph_init_file_info(struct inode *inode, struct file *file,
 	INIT_LIST_HEAD(&fi->rw_contexts);
 	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
 
-	if ((file->f_mode & FMODE_WRITE) &&
-	    ci->i_inline_version != CEPH_INLINE_NONE) {
-		ret = ceph_uninline_data(file);
-		if (ret < 0)
+	if (ci->i_inline_version != CEPH_INLINE_NONE) {
+		ret = ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR);
+		if (!ret) {
+			ret = ceph_uninline_data(file);
+			/* Ignore the error for readonly case */
+			if (ret < 0 && (file->f_mode & FMODE_WRITE))
+				goto error;
+		} else if (ret != -EPERM) {
 			goto error;
+		}
 	}
 
 	return 0;