diff mbox series

[2/2] ceph: fix coherency issue when truncating file size for fscrypt

Message ID 20220407144112.8455-3-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series ceph: misc fix size truncate for fscrypt | expand

Commit Message

Xiubo Li April 7, 2022, 2:41 p.m. UTC
From: Xiubo Li <xiubli@redhat.com>

When truncating the file size the MDS will help update the last
encrypted block, and during this we need to make sure the client
won't fill the pagecaches.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jeff Layton April 7, 2022, 3:33 p.m. UTC | #1
On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> When truncating the file size the MDS will help update the last
> encrypted block, and during this we need to make sure the client
> won't fill the pagecaches.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/inode.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index f4059d73edd5..cc1829ab497d 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		req->r_num_caps = 1;
>  		req->r_stamp = attr->ia_ctime;
>  		if (fill_fscrypt) {
> +			filemap_invalidate_lock(inode->i_mapping);
>  			err = fill_fscrypt_truncate(inode, req, attr);
> -			if (err)
> +			if (err) {
> +				filemap_invalidate_unlock(inode->i_mapping);
>  				goto out;
> +			}
>  		}
>  
>  		/*
> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		 * it.
>  		 */
>  		err = ceph_mdsc_do_request(mdsc, NULL, req);
> +		if (fill_fscrypt)
> +			filemap_invalidate_unlock(inode->i_mapping);
>  		if (err == -EAGAIN && truncate_retry--) {
>  			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>  			     inode, err, ceph_cap_string(dirtied), mask);

Looks reasonable. Is there any reason we shouldn't do this in the non-
encrypted case too? I suppose it doesn't make as much difference in that
case.

I'll plan to pull this and the other patch into the wip-fscrypt branch.
Should I just fold them into your earlier patches?
Jeff Layton April 7, 2022, 3:38 p.m. UTC | #2
On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > When truncating the file size the MDS will help update the last
> > encrypted block, and during this we need to make sure the client
> > won't fill the pagecaches.
> > 
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >  fs/ceph/inode.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index f4059d73edd5..cc1829ab497d 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> >  		req->r_num_caps = 1;
> >  		req->r_stamp = attr->ia_ctime;
> >  		if (fill_fscrypt) {
> > +			filemap_invalidate_lock(inode->i_mapping);
> >  			err = fill_fscrypt_truncate(inode, req, attr);
> > -			if (err)
> > +			if (err) {
> > +				filemap_invalidate_unlock(inode->i_mapping);
> >  				goto out;
> > +			}
> >  		}
> >  
> >  		/*
> > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> >  		 * it.
> >  		 */
> >  		err = ceph_mdsc_do_request(mdsc, NULL, req);
> > +		if (fill_fscrypt)
> > +			filemap_invalidate_unlock(inode->i_mapping);
> >  		if (err == -EAGAIN && truncate_retry--) {
> >  			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> >  			     inode, err, ceph_cap_string(dirtied), mask);
> 
> Looks reasonable. Is there any reason we shouldn't do this in the non-
> encrypted case too? I suppose it doesn't make as much difference in that
> case.
> 
> I'll plan to pull this and the other patch into the wip-fscrypt branch.
> Should I just fold them into your earlier patches?

OTOH...do we really need this? I'm not sure I understand the race you're
trying to prevent. Can you lay it out for me?

Thanks,
Xiubo Li April 7, 2022, 7:14 p.m. UTC | #3
On 4/7/22 11:38 PM, Jeff Layton wrote:
> On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
>> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> When truncating the file size the MDS will help update the last
>>> encrypted block, and during this we need to make sure the client
>>> won't fill the pagecaches.
>>>
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   fs/ceph/inode.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>>> index f4059d73edd5..cc1829ab497d 100644
>>> --- a/fs/ceph/inode.c
>>> +++ b/fs/ceph/inode.c
>>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>   		req->r_num_caps = 1;
>>>   		req->r_stamp = attr->ia_ctime;
>>>   		if (fill_fscrypt) {
>>> +			filemap_invalidate_lock(inode->i_mapping);
>>>   			err = fill_fscrypt_truncate(inode, req, attr);
>>> -			if (err)
>>> +			if (err) {
>>> +				filemap_invalidate_unlock(inode->i_mapping);
>>>   				goto out;
>>> +			}
>>>   		}
>>>   
>>>   		/*
>>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>   		 * it.
>>>   		 */
>>>   		err = ceph_mdsc_do_request(mdsc, NULL, req);
>>> +		if (fill_fscrypt)
>>> +			filemap_invalidate_unlock(inode->i_mapping);
>>>   		if (err == -EAGAIN && truncate_retry--) {
>>>   			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>>>   			     inode, err, ceph_cap_string(dirtied), mask);
>> Looks reasonable. Is there any reason we shouldn't do this in the non-
>> encrypted case too? I suppose it doesn't make as much difference in that
>> case.

We only need this in encrypted case, which will do the RMW for the last 
block.


>> I'll plan to pull this and the other patch into the wip-fscrypt branch.
>> Should I just fold them into your earlier patches?
Yeah, certainly.
> OTOH...do we really need this? I'm not sure I understand the race you're
> trying to prevent. Can you lay it out for me?

I am thinking during the RMW for the last block, the page fault still 
could happen because the page fault function doesn't prevent that.

And we should prevent it during the RMW is going on.

-- Xiubo

>
> Thanks,
Jeff Layton April 7, 2022, 8:32 p.m. UTC | #4
On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote:
> On 4/7/22 11:38 PM, Jeff Layton wrote:
> > On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
> > > On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > When truncating the file size the MDS will help update the last
> > > > encrypted block, and during this we need to make sure the client
> > > > won't fill the pagecaches.
> > > > 
> > > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > > ---
> > > >   fs/ceph/inode.c | 7 ++++++-
> > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > > > index f4059d73edd5..cc1829ab497d 100644
> > > > --- a/fs/ceph/inode.c
> > > > +++ b/fs/ceph/inode.c
> > > > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> > > >   		req->r_num_caps = 1;
> > > >   		req->r_stamp = attr->ia_ctime;
> > > >   		if (fill_fscrypt) {
> > > > +			filemap_invalidate_lock(inode->i_mapping);
> > > >   			err = fill_fscrypt_truncate(inode, req, attr);
> > > > -			if (err)
> > > > +			if (err) {
> > > > +				filemap_invalidate_unlock(inode->i_mapping);
> > > >   				goto out;
> > > > +			}
> > > >   		}
> > > >   
> > > >   		/*
> > > > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> > > >   		 * it.
> > > >   		 */
> > > >   		err = ceph_mdsc_do_request(mdsc, NULL, req);
> > > > +		if (fill_fscrypt)
> > > > +			filemap_invalidate_unlock(inode->i_mapping);
> > > >   		if (err == -EAGAIN && truncate_retry--) {
> > > >   			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> > > >   			     inode, err, ceph_cap_string(dirtied), mask);
> > > Looks reasonable. Is there any reason we shouldn't do this in the non-
> > > encrypted case too? I suppose it doesn't make as much difference in that
> > > case.
> 
> We only need this in encrypted case, which will do the RMW for the last 
> block.
> 
> 
> > > I'll plan to pull this and the other patch into the wip-fscrypt branch.
> > > Should I just fold them into your earlier patches?
> Yeah, certainly.
> > OTOH...do we really need this? I'm not sure I understand the race you're
> > trying to prevent. Can you lay it out for me?
> 
> I am thinking during the RMW for the last block, the page fault still 
> could happen because the page fault function doesn't prevent that.
> 
> And we should prevent it during the RMW is going on.
> 

Right, but the RMW is being done using an anonymous page, and at this
point in the process we haven't really touched the pagecache yet. That
doesn't happen until __ceph_do_pending_vmtruncate.

Most of the callers for filemap_invalidate_lock/_unlock are in the hole
punching codepaths, and not so much in truncate. What outcome are you
trying to prevent with this? Can you lay out the potential race and why
it would be harmful?
Xiubo Li April 7, 2022, 11:58 p.m. UTC | #5
On 4/8/22 4:32 AM, Jeff Layton wrote:
> On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote:
>> On 4/7/22 11:38 PM, Jeff Layton wrote:
>>> On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
>>>> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> When truncating the file size the MDS will help update the last
>>>>> encrypted block, and during this we need to make sure the client
>>>>> won't fill the pagecaches.
>>>>>
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>    fs/ceph/inode.c | 7 ++++++-
>>>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>>>>> index f4059d73edd5..cc1829ab497d 100644
>>>>> --- a/fs/ceph/inode.c
>>>>> +++ b/fs/ceph/inode.c
>>>>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>>>    		req->r_num_caps = 1;
>>>>>    		req->r_stamp = attr->ia_ctime;
>>>>>    		if (fill_fscrypt) {
>>>>> +			filemap_invalidate_lock(inode->i_mapping);
>>>>>    			err = fill_fscrypt_truncate(inode, req, attr);
>>>>> -			if (err)
>>>>> +			if (err) {
>>>>> +				filemap_invalidate_unlock(inode->i_mapping);
>>>>>    				goto out;
>>>>> +			}
>>>>>    		}
>>>>>    
>>>>>    		/*
>>>>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>>>    		 * it.
>>>>>    		 */
>>>>>    		err = ceph_mdsc_do_request(mdsc, NULL, req);
>>>>> +		if (fill_fscrypt)
>>>>> +			filemap_invalidate_unlock(inode->i_mapping);
>>>>>    		if (err == -EAGAIN && truncate_retry--) {
>>>>>    			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>>>>>    			     inode, err, ceph_cap_string(dirtied), mask);
>>>> Looks reasonable. Is there any reason we shouldn't do this in the non-
>>>> encrypted case too? I suppose it doesn't make as much difference in that
>>>> case.
>> We only need this in encrypted case, which will do the RMW for the last
>> block.
>>
>>
>>>> I'll plan to pull this and the other patch into the wip-fscrypt branch.
>>>> Should I just fold them into your earlier patches?
>> Yeah, certainly.
>>> OTOH...do we really need this? I'm not sure I understand the race you're
>>> trying to prevent. Can you lay it out for me?
>> I am thinking during the RMW for the last block, the page fault still
>> could happen because the page fault function doesn't prevent that.
>>
>> And we should prevent it during the RMW is going on.
>>
> Right, but the RMW is being done using an anonymous page, and at this
> point in the process we haven't really touched the pagecache yet. That
> doesn't happen until __ceph_do_pending_vmtruncate.
>
> Most of the callers for filemap_invalidate_lock/_unlock are in the hole
> punching codepaths, and not so much in truncate. What outcome are you
> trying to prevent with this? Can you lay out the potential race and why
> it would be harmful?

Yeah, here I forgot to invalidate the mapping. After writing the dirty 
pagecache back we should invalidate the mapping and drop the related 
page too.

It should be:

filemap_invalidate_lock(inode->i_mapping);

write pagecache back;

invalidate the mapping and drop the pages;

do the RMW;

filemap_invalidate_unlock(inode->i_mapping);


As you mentioned in another mail, other processes could do the map read 
at the same time, and we should make sure that when we are truncating 
the size, we should block map read to continue and just trigger a page 
fault and the page fault should wait our truncate size finish ?

-- Xiubo
diff mbox series

Patch

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index f4059d73edd5..cc1829ab497d 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2647,9 +2647,12 @@  int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		req->r_num_caps = 1;
 		req->r_stamp = attr->ia_ctime;
 		if (fill_fscrypt) {
+			filemap_invalidate_lock(inode->i_mapping);
 			err = fill_fscrypt_truncate(inode, req, attr);
-			if (err)
+			if (err) {
+				filemap_invalidate_unlock(inode->i_mapping);
 				goto out;
+			}
 		}
 
 		/*
@@ -2660,6 +2663,8 @@  int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		 * it.
 		 */
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
+		if (fill_fscrypt)
+			filemap_invalidate_unlock(inode->i_mapping);
 		if (err == -EAGAIN && truncate_retry--) {
 			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
 			     inode, err, ceph_cap_string(dirtied), mask);