diff mbox series

[v2] ceph: always try to uninline inline data when opening files

Message ID 20220506143052.479799-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] ceph: always try to uninline inline data when opening files | expand

Commit Message

Xiubo Li May 6, 2022, 2:30 p.m. UTC
This will help reduce possible deadlock while holding Fcr to use
getattr for read case.

Usually we shouldn't use getattr to fetch inline data after getting
Fcr caps, because it can cause deadlock. The solution is try uniline
the inline data when opening files, thanks David Howells' previous
work on uninlining the inline data work.

It was caused from one possible call path:
  ceph_filemap_fault()-->
     ceph_get_caps(Fcr);
     filemap_fault()-->
        do_sync_mmap_readahead()-->
           page_cache_ra_order()-->
              read_pages()-->
                 aops->readahead()-->
                    netfs_readahead()-->
                       netfs_begin_read()-->
                          netfs_rreq_submit_slice()-->
                             netfs_read_from_server()-->
                                netfs_ops->issue_read()-->
                                   ceph_netfs_issue_read()-->
                                      ceph_netfs_issue_op_inline()-->
                                         getattr()
      ceph_pu_caps_ref(Fcr);

This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
get 'Frwcb' caps from both auth and replica MDSes.

But if the getattr is sent to any MDS, the MDS needs to do Locker
transition to LOCK_MIX first and then to LOCK_SYNC. But when
transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
caps back, but the kclient already holding it and waiting the MDS
to finish.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/file.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Jeff Layton May 6, 2022, 2:52 p.m. UTC | #1
On Fri, 2022-05-06 at 22:30 +0800, Xiubo Li wrote:
> This will help reduce possible deadlock while holding Fcr to use
> getattr for read case.
> 
> Usually we shouldn't use getattr to fetch inline data after getting
> Fcr caps, because it can cause deadlock. The solution is try uniline
> the inline data when opening files, thanks David Howells' previous
> work on uninlining the inline data work.
> 
> It was caused from one possible call path:
>   ceph_filemap_fault()-->
>      ceph_get_caps(Fcr);
>      filemap_fault()-->
>         do_sync_mmap_readahead()-->
>            page_cache_ra_order()-->
>               read_pages()-->
>                  aops->readahead()-->
>                     netfs_readahead()-->
>                        netfs_begin_read()-->
>                           netfs_rreq_submit_slice()-->
>                              netfs_read_from_server()-->
>                                 netfs_ops->issue_read()-->
>                                    ceph_netfs_issue_read()-->
>                                       ceph_netfs_issue_op_inline()-->
>                                          getattr()
>       ceph_pu_caps_ref(Fcr);
> 
> This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
> the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
> get 'Frwcb' caps from both auth and replica MDSes.
> 
> But if the getattr is sent to any MDS, the MDS needs to do Locker
> transition to LOCK_MIX first and then to LOCK_SYNC. But when
> transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
> caps back, but the kclient already holding it and waiting the MDS
> to finish.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/file.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 8c8226c0feac..09327ef5a26d 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -241,11 +241,12 @@ static int ceph_init_file_info(struct inode *inode, struct file *file,
>  	INIT_LIST_HEAD(&fi->rw_contexts);
>  	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
>  
> -	if ((file->f_mode & FMODE_WRITE) &&
> -	    ci->i_inline_version != CEPH_INLINE_NONE) {
> -		ret = ceph_uninline_data(file);
> -		if (ret < 0)
> -			goto error;
> +	if (ci->i_inline_version != CEPH_INLINE_NONE) {
> +		if (!ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR)) {
> +			ret = ceph_uninline_data(file);
> +			if (ret < 0)
> +				goto error;
> +		}
>  	}
>  
>  	return 0;

Note that this may be considered a regression in some cases. If a client
doesn't have write permissions to the pool at all, then previously it'd
be able to read files, but now it wouldn't.

Those are not terribly common configurations as I understand, but we
should be aware of it if we go this route.

Beyond that though, this patch ignores errors from ceph_pool_perm_check
and doesn't uninline the file if one is returned. Is that the behavior
you want here?
Xiubo Li May 6, 2022, 3:16 p.m. UTC | #2
On 5/6/22 10:52 PM, Jeff Layton wrote:
> On Fri, 2022-05-06 at 22:30 +0800, Xiubo Li wrote:
>> This will help reduce possible deadlock while holding Fcr to use
>> getattr for read case.
>>
>> Usually we shouldn't use getattr to fetch inline data after getting
>> Fcr caps, because it can cause deadlock. The solution is try uniline
>> the inline data when opening files, thanks David Howells' previous
>> work on uninlining the inline data work.
>>
>> It was caused from one possible call path:
>>    ceph_filemap_fault()-->
>>       ceph_get_caps(Fcr);
>>       filemap_fault()-->
>>          do_sync_mmap_readahead()-->
>>             page_cache_ra_order()-->
>>                read_pages()-->
>>                   aops->readahead()-->
>>                      netfs_readahead()-->
>>                         netfs_begin_read()-->
>>                            netfs_rreq_submit_slice()-->
>>                               netfs_read_from_server()-->
>>                                  netfs_ops->issue_read()-->
>>                                     ceph_netfs_issue_read()-->
>>                                        ceph_netfs_issue_op_inline()-->
>>                                           getattr()
>>        ceph_pu_caps_ref(Fcr);
>>
>> This because if the Locker state is LOCK_EXEC_MIX for auth MDS, and
>> the replica MDSes' lock state is LOCK_LOCK. Then the kclient could
>> get 'Frwcb' caps from both auth and replica MDSes.
>>
>> But if the getattr is sent to any MDS, the MDS needs to do Locker
>> transition to LOCK_MIX first and then to LOCK_SYNC. But when
>> transfering to LOCK_MIX state the MDS Locker need to revoke the Fcb
>> caps back, but the kclient already holding it and waiting the MDS
>> to finish.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/file.c | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 8c8226c0feac..09327ef5a26d 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -241,11 +241,12 @@ static int ceph_init_file_info(struct inode *inode, struct file *file,
>>   	INIT_LIST_HEAD(&fi->rw_contexts);
>>   	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
>>   
>> -	if ((file->f_mode & FMODE_WRITE) &&
>> -	    ci->i_inline_version != CEPH_INLINE_NONE) {
>> -		ret = ceph_uninline_data(file);
>> -		if (ret < 0)
>> -			goto error;
>> +	if (ci->i_inline_version != CEPH_INLINE_NONE) {
>> +		if (!ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR)) {
>> +			ret = ceph_uninline_data(file);
>> +			if (ret < 0)
>> +				goto error;
>> +		}
>>   	}
>>   
>>   	return 0;
> Note that this may be considered a regression in some cases. If a client
> doesn't have write permissions to the pool at all, then previously it'd
> be able to read files, but now it wouldn't.

Only when the pool has write permission will it try to uninline the 
inline data. Or we will skip it.

Maybe we should ignore the error in read case ?


> Those are not terribly common configurations as I understand, but we
> should be aware of it if we go this route.
>
> Beyond that though, this patch ignores errors from ceph_pool_perm_check
> and doesn't uninline the file if one is returned. Is that the behavior
> you want here?

Yeah, I ignored them on purpose.  Even uninlining the inline_data fails 
shall we continue IMO.
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 8c8226c0feac..09327ef5a26d 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -241,11 +241,12 @@  static int ceph_init_file_info(struct inode *inode, struct file *file,
 	INIT_LIST_HEAD(&fi->rw_contexts);
 	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
 
-	if ((file->f_mode & FMODE_WRITE) &&
-	    ci->i_inline_version != CEPH_INLINE_NONE) {
-		ret = ceph_uninline_data(file);
-		if (ret < 0)
-			goto error;
+	if (ci->i_inline_version != CEPH_INLINE_NONE) {
+		if (!ceph_pool_perm_check(inode, CEPH_CAP_FILE_WR)) {
+			ret = ceph_uninline_data(file);
+			if (ret < 0)
+				goto error;
+		}
 	}
 
 	return 0;