Message ID | 20220407144112.8455-3-xiubli@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: misc fix size truncate for fscrypt | expand |
On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > When truncating the file size the MDS will help update the last > encrypted block, and during this we need to make sure the client > won't fill the pagecaches. > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > --- > fs/ceph/inode.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index f4059d73edd5..cc1829ab497d 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > req->r_num_caps = 1; > req->r_stamp = attr->ia_ctime; > if (fill_fscrypt) { > + filemap_invalidate_lock(inode->i_mapping); > err = fill_fscrypt_truncate(inode, req, attr); > - if (err) > + if (err) { > + filemap_invalidate_unlock(inode->i_mapping); > goto out; > + } > } > > /* > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > * it. > */ > err = ceph_mdsc_do_request(mdsc, NULL, req); > + if (fill_fscrypt) > + filemap_invalidate_unlock(inode->i_mapping); > if (err == -EAGAIN && truncate_retry--) { > dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", > inode, err, ceph_cap_string(dirtied), mask); Looks reasonable. Is there any reason we shouldn't do this in the non- encrypted case too? I suppose it doesn't make as much difference in that case. I'll plan to pull this and the other patch into the wip-fscrypt branch. Should I just fold them into your earlier patches?
On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote: > On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote: > > From: Xiubo Li <xiubli@redhat.com> > > > > When truncating the file size the MDS will help update the last > > encrypted block, and during this we need to make sure the client > > won't fill the pagecaches. > > > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > > --- > > fs/ceph/inode.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > index f4059d73edd5..cc1829ab497d 100644 > > --- a/fs/ceph/inode.c > > +++ b/fs/ceph/inode.c > > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > > req->r_num_caps = 1; > > req->r_stamp = attr->ia_ctime; > > if (fill_fscrypt) { > > + filemap_invalidate_lock(inode->i_mapping); > > err = fill_fscrypt_truncate(inode, req, attr); > > - if (err) > > + if (err) { > > + filemap_invalidate_unlock(inode->i_mapping); > > goto out; > > + } > > } > > > > /* > > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > > * it. > > */ > > err = ceph_mdsc_do_request(mdsc, NULL, req); > > + if (fill_fscrypt) > > + filemap_invalidate_unlock(inode->i_mapping); > > if (err == -EAGAIN && truncate_retry--) { > > dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", > > inode, err, ceph_cap_string(dirtied), mask); > > Looks reasonable. Is there any reason we shouldn't do this in the non- > encrypted case too? I suppose it doesn't make as much difference in that > case. > > I'll plan to pull this and the other patch into the wip-fscrypt branch. > Should I just fold them into your earlier patches? OTOH...do we really need this? I'm not sure I understand the race you're trying to prevent. Can you lay it out for me? Thanks,
On 4/7/22 11:38 PM, Jeff Layton wrote: > On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote: >> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote: >>> From: Xiubo Li <xiubli@redhat.com> >>> >>> When truncating the file size the MDS will help update the last >>> encrypted block, and during this we need to make sure the client >>> won't fill the pagecaches. >>> >>> Signed-off-by: Xiubo Li <xiubli@redhat.com> >>> --- >>> fs/ceph/inode.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c >>> index f4059d73edd5..cc1829ab497d 100644 >>> --- a/fs/ceph/inode.c >>> +++ b/fs/ceph/inode.c >>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c >>> req->r_num_caps = 1; >>> req->r_stamp = attr->ia_ctime; >>> if (fill_fscrypt) { >>> + filemap_invalidate_lock(inode->i_mapping); >>> err = fill_fscrypt_truncate(inode, req, attr); >>> - if (err) >>> + if (err) { >>> + filemap_invalidate_unlock(inode->i_mapping); >>> goto out; >>> + } >>> } >>> >>> /* >>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c >>> * it. >>> */ >>> err = ceph_mdsc_do_request(mdsc, NULL, req); >>> + if (fill_fscrypt) >>> + filemap_invalidate_unlock(inode->i_mapping); >>> if (err == -EAGAIN && truncate_retry--) { >>> dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", >>> inode, err, ceph_cap_string(dirtied), mask); >> Looks reasonable. Is there any reason we shouldn't do this in the non- >> encrypted case too? I suppose it doesn't make as much difference in that >> case. We only need this in encrypted case, which will do the RMW for the last block. >> I'll plan to pull this and the other patch into the wip-fscrypt branch. >> Should I just fold them into your earlier patches? Yeah, certainly. > OTOH...do we really need this? I'm not sure I understand the race you're > trying to prevent. Can you lay it out for me? I am thinking during the RMW for the last block, the page fault still could happen because the page fault function doesn't prevent that. And we should prevent it during the RMW is going on. -- Xiubo > > Thanks,
On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote: > On 4/7/22 11:38 PM, Jeff Layton wrote: > > On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote: > > > On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote: > > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > > > When truncating the file size the MDS will help update the last > > > > encrypted block, and during this we need to make sure the client > > > > won't fill the pagecaches. > > > > > > > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > > > > --- > > > > fs/ceph/inode.c | 7 ++++++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > > > index f4059d73edd5..cc1829ab497d 100644 > > > > --- a/fs/ceph/inode.c > > > > +++ b/fs/ceph/inode.c > > > > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > > > > req->r_num_caps = 1; > > > > req->r_stamp = attr->ia_ctime; > > > > if (fill_fscrypt) { > > > > + filemap_invalidate_lock(inode->i_mapping); > > > > err = fill_fscrypt_truncate(inode, req, attr); > > > > - if (err) > > > > + if (err) { > > > > + filemap_invalidate_unlock(inode->i_mapping); > > > > goto out; > > > > + } > > > > } > > > > > > > > /* > > > > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c > > > > * it. > > > > */ > > > > err = ceph_mdsc_do_request(mdsc, NULL, req); > > > > + if (fill_fscrypt) > > > > + filemap_invalidate_unlock(inode->i_mapping); > > > > if (err == -EAGAIN && truncate_retry--) { > > > > dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", > > > > inode, err, ceph_cap_string(dirtied), mask); > > > Looks reasonable. Is there any reason we shouldn't do this in the non- > > > encrypted case too? I suppose it doesn't make as much difference in that > > > case. > > We only need this in encrypted case, which will do the RMW for the last > block. > > > > > I'll plan to pull this and the other patch into the wip-fscrypt branch. > > > Should I just fold them into your earlier patches? > Yeah, certainly. > > OTOH...do we really need this? I'm not sure I understand the race you're > > trying to prevent. Can you lay it out for me? > > I am thinking during the RMW for the last block, the page fault still > could happen because the page fault function doesn't prevent that. > > And we should prevent it during the RMW is going on. > Right, but the RMW is being done using an anonymous page, and at this point in the process we haven't really touched the pagecache yet. That doesn't happen until __ceph_do_pending_vmtruncate. Most of the callers for filemap_invalidate_lock/_unlock are in the hole punching codepaths, and not so much in truncate. What outcome are you trying to prevent with this? Can you lay out the potential race and why it would be harmful?
On 4/8/22 4:32 AM, Jeff Layton wrote: > On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote: >> On 4/7/22 11:38 PM, Jeff Layton wrote: >>> On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote: >>>> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote: >>>>> From: Xiubo Li <xiubli@redhat.com> >>>>> >>>>> When truncating the file size the MDS will help update the last >>>>> encrypted block, and during this we need to make sure the client >>>>> won't fill the pagecaches. >>>>> >>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com> >>>>> --- >>>>> fs/ceph/inode.c | 7 ++++++- >>>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c >>>>> index f4059d73edd5..cc1829ab497d 100644 >>>>> --- a/fs/ceph/inode.c >>>>> +++ b/fs/ceph/inode.c >>>>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c >>>>> req->r_num_caps = 1; >>>>> req->r_stamp = attr->ia_ctime; >>>>> if (fill_fscrypt) { >>>>> + filemap_invalidate_lock(inode->i_mapping); >>>>> err = fill_fscrypt_truncate(inode, req, attr); >>>>> - if (err) >>>>> + if (err) { >>>>> + filemap_invalidate_unlock(inode->i_mapping); >>>>> goto out; >>>>> + } >>>>> } >>>>> >>>>> /* >>>>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c >>>>> * it. >>>>> */ >>>>> err = ceph_mdsc_do_request(mdsc, NULL, req); >>>>> + if (fill_fscrypt) >>>>> + filemap_invalidate_unlock(inode->i_mapping); >>>>> if (err == -EAGAIN && truncate_retry--) { >>>>> dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", >>>>> inode, err, ceph_cap_string(dirtied), mask); >>>> Looks reasonable. Is there any reason we shouldn't do this in the non- >>>> encrypted case too? I suppose it doesn't make as much difference in that >>>> case. >> We only need this in encrypted case, which will do the RMW for the last >> block. >> >> >>>> I'll plan to pull this and the other patch into the wip-fscrypt branch. >>>> Should I just fold them into your earlier patches? >> Yeah, certainly. >>> OTOH...do we really need this? I'm not sure I understand the race you're >>> trying to prevent. Can you lay it out for me? >> I am thinking during the RMW for the last block, the page fault still >> could happen because the page fault function doesn't prevent that. >> >> And we should prevent it during the RMW is going on. >> > Right, but the RMW is being done using an anonymous page, and at this > point in the process we haven't really touched the pagecache yet. That > doesn't happen until __ceph_do_pending_vmtruncate. > > Most of the callers for filemap_invalidate_lock/_unlock are in the hole > punching codepaths, and not so much in truncate. What outcome are you > trying to prevent with this? Can you lay out the potential race and why > it would be harmful? Yeah, here I forgot to invalidate the mapping. After writing the dirty pagecache back we should invalidate the mapping and drop the related page too. It should be: filemap_invalidate_lock(inode->i_mapping); write pagecache back; invalidate the mapping and drop the pages; do the RMW; filemap_invalidate_unlock(inode->i_mapping); As you mentioned in another mail, other processes could do the map read at the same time, and we should make sure that when we are truncating the size, we should block map read to continue and just trigger a page fault and the page fault should wait our truncate size finish ? -- Xiubo
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index f4059d73edd5..cc1829ab497d 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c req->r_num_caps = 1; req->r_stamp = attr->ia_ctime; if (fill_fscrypt) { + filemap_invalidate_lock(inode->i_mapping); err = fill_fscrypt_truncate(inode, req, attr); - if (err) + if (err) { + filemap_invalidate_unlock(inode->i_mapping); goto out; + } } /* @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c * it. */ err = ceph_mdsc_do_request(mdsc, NULL, req); + if (fill_fscrypt) + filemap_invalidate_unlock(inode->i_mapping); if (err == -EAGAIN && truncate_retry--) { dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", inode, err, ceph_cap_string(dirtied), mask);