mbox series

[v5,0/8] ceph: size handling for the fscrypt

Message ID 20211103012232.14488-1-xiubli@redhat.com (mailing list archive)
Headers show
Series ceph: size handling for the fscrypt | expand

Message

Xiubo Li Nov. 3, 2021, 1:22 a.m. UTC
From: Jeff Layton <jlayton@kernel.org>

This patch series is based on the "wip-fscrypt-fnames" branch in
repo https://github.com/ceph/ceph-client.git.

And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
branch in repo
https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.

====

This approach is based on the discussion from V1 and V2, which will
pass the encrypted last block contents to MDS along with the truncate
request.

This will send the encrypted last block contents to MDS along with
the truncate request when truncating to a smaller size and at the
same time new size does not align to BLOCK SIZE.

The MDS side patch is raised in PR
https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
previous great work in PR https://github.com/ceph/ceph/pull/41284.

The MDS will use the filer.write_trunc(), which could update and
truncate the file in one shot, instead of filer.truncate().

This just assume kclient won't support the inline data feature, which
will be remove soon, more detail please see:
https://tracker.ceph.com/issues/52916

Changed in V5:
- Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
- Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
  in linux.git repo.
- Add "i_truncate_pagecache_size" member support in ceph_inode_info
  struct, this will be used to truncate the pagecache only in kclient
  side, because the "i_truncate_size" will always be aligned to BLOCK
  SIZE. In fscrypt case we need to use the real size to truncate the
  pagecache.


Changed in V4:
- Retry the truncate request by 20 times before fail it with -EAGAIN.
- Remove the "fill_last_block" label and move the code to else branch.
- Remove the #3 patch, which has already been sent out separately, in
  V3 series.
- Improve some comments in the code.

Changed in V3:
- Fix possibly corrupting the file just before the MDS acquires the
  xlock for FILE lock, another client has updated it.
- Flush the pagecache buffer before reading the last block for the
  when filling the truncate request.
- Some other minore fixes.



Jeff Layton (5):
  libceph: add CEPH_OSD_OP_ASSERT_VER support
  ceph: size handling for encrypted inodes in cap updates
  ceph: fscrypt_file field handling in MClientRequest messages
  ceph: get file size from fscrypt_file when present in inode traces
  ceph: handle fscrypt fields in cap messages from MDS

Xiubo Li (3):
  ceph: add __ceph_get_caps helper support
  ceph: add __ceph_sync_read helper support
  ceph: add truncate size handling support for fscrypt

 fs/ceph/caps.c                  | 136 ++++++++++++++----
 fs/ceph/crypto.h                |   4 +
 fs/ceph/dir.c                   |   3 +
 fs/ceph/file.c                  |  43 ++++--
 fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
 fs/ceph/mds_client.c            |   9 +-
 fs/ceph/mds_client.h            |   2 +
 fs/ceph/super.h                 |  10 ++
 include/linux/ceph/crypto.h     |  28 ++++
 include/linux/ceph/osd_client.h |   6 +-
 include/linux/ceph/rados.h      |   4 +
 net/ceph/osd_client.c           |   5 +
 12 files changed, 427 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/ceph/crypto.h

Comments

Jeff Layton Nov. 3, 2021, 12:56 p.m. UTC | #1
On Wed, 2021-11-03 at 09:22 +0800, xiubli@redhat.com wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> This patch series is based on the "wip-fscrypt-fnames" branch in
> repo https://github.com/ceph/ceph-client.git.
> 
> And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
> branch in repo
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
> 
> ====
> 
> This approach is based on the discussion from V1 and V2, which will
> pass the encrypted last block contents to MDS along with the truncate
> request.
> 
> This will send the encrypted last block contents to MDS along with
> the truncate request when truncating to a smaller size and at the
> same time new size does not align to BLOCK SIZE.
> 
> The MDS side patch is raised in PR
> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> previous great work in PR https://github.com/ceph/ceph/pull/41284.
> 
> The MDS will use the filer.write_trunc(), which could update and
> truncate the file in one shot, instead of filer.truncate().
> 
> This just assume kclient won't support the inline data feature, which
> will be remove soon, more detail please see:
> https://tracker.ceph.com/issues/52916
> 
> Changed in V5:
> - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
> - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
>   in linux.git repo.
> - Add "i_truncate_pagecache_size" member support in ceph_inode_info
>   struct, this will be used to truncate the pagecache only in kclient
>   side, because the "i_truncate_size" will always be aligned to BLOCK
>   SIZE. In fscrypt case we need to use the real size to truncate the
>   pagecache.
> 
> 
> Changed in V4:
> - Retry the truncate request by 20 times before fail it with -EAGAIN.
> - Remove the "fill_last_block" label and move the code to else branch.
> - Remove the #3 patch, which has already been sent out separately, in
>   V3 series.
> - Improve some comments in the code.
> 
> Changed in V3:
> - Fix possibly corrupting the file just before the MDS acquires the
>   xlock for FILE lock, another client has updated it.
> - Flush the pagecache buffer before reading the last block for the
>   when filling the truncate request.
> - Some other minore fixes.
> 
> 
> 
> Jeff Layton (5):
>   libceph: add CEPH_OSD_OP_ASSERT_VER support
>   ceph: size handling for encrypted inodes in cap updates
>   ceph: fscrypt_file field handling in MClientRequest messages
>   ceph: get file size from fscrypt_file when present in inode traces
>   ceph: handle fscrypt fields in cap messages from MDS
> 
> Xiubo Li (3):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/caps.c                  | 136 ++++++++++++++----
>  fs/ceph/crypto.h                |   4 +
>  fs/ceph/dir.c                   |   3 +
>  fs/ceph/file.c                  |  43 ++++--
>  fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
>  fs/ceph/mds_client.c            |   9 +-
>  fs/ceph/mds_client.h            |   2 +
>  fs/ceph/super.h                 |  10 ++
>  include/linux/ceph/crypto.h     |  28 ++++
>  include/linux/ceph/osd_client.h |   6 +-
>  include/linux/ceph/rados.h      |   4 +
>  net/ceph/osd_client.c           |   5 +
>  12 files changed, 427 insertions(+), 59 deletions(-)
>  create mode 100644 include/linux/ceph/crypto.h
> 

Thanks Xiubo,

This looks like a great start. I set up an environment vs. a cephadm
cluster with your fscrypt changes, and started running xfstests against
it with test_dummy_encryption enabled. It got to generic/014 and the
test hung waiting on a SETATTR call to come back:

[root@client1 f3cf8b7a-38ec-11ec-a0e4-52540031ba78.client74208]# cat mdsc
89447	mds0	setattr	 #1000003b19c

Looking at the MDS that it was talking to, I see:

Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 31.627241 secs
Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : slow request 31.627240 seconds old, received at 2021-11-03T12:24:37.911553+0000: client_request(client.74208:89447 setattr size=102498304 #0x1000003b19c 2021-11-03T12:24:37.895292+0000 caller_uid=0, caller_gid=0{0,}) currently acquired locks
Nov 03 08:25:14 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 36.627323 secs
Nov 03 08:25:19 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 41.627389 secs

...and it still hasn't resolved.

I'll keep looking around a bit more, but I think there are still some
bugs in here. Let me know if you have thoughts as to what the issue is.

Thanks,
Xiubo Li Nov. 4, 2021, 3:24 a.m. UTC | #2
On 11/3/21 8:56 PM, Jeff Layton wrote:
> On Wed, 2021-11-03 at 09:22 +0800, xiubli@redhat.com wrote:
>> From: Jeff Layton <jlayton@kernel.org>
>>
>> This patch series is based on the "wip-fscrypt-fnames" branch in
>> repo https://github.com/ceph/ceph-client.git.
>>
>> And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
>> branch in repo
>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
>>
>> ====
>>
>> This approach is based on the discussion from V1 and V2, which will
>> pass the encrypted last block contents to MDS along with the truncate
>> request.
>>
>> This will send the encrypted last block contents to MDS along with
>> the truncate request when truncating to a smaller size and at the
>> same time new size does not align to BLOCK SIZE.
>>
>> The MDS side patch is raised in PR
>> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
>> previous great work in PR https://github.com/ceph/ceph/pull/41284.
>>
>> The MDS will use the filer.write_trunc(), which could update and
>> truncate the file in one shot, instead of filer.truncate().
>>
>> This just assume kclient won't support the inline data feature, which
>> will be remove soon, more detail please see:
>> https://tracker.ceph.com/issues/52916
>>
>> Changed in V5:
>> - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
>> - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
>>    in linux.git repo.
>> - Add "i_truncate_pagecache_size" member support in ceph_inode_info
>>    struct, this will be used to truncate the pagecache only in kclient
>>    side, because the "i_truncate_size" will always be aligned to BLOCK
>>    SIZE. In fscrypt case we need to use the real size to truncate the
>>    pagecache.
>>
>>
>> Changed in V4:
>> - Retry the truncate request by 20 times before fail it with -EAGAIN.
>> - Remove the "fill_last_block" label and move the code to else branch.
>> - Remove the #3 patch, which has already been sent out separately, in
>>    V3 series.
>> - Improve some comments in the code.
>>
>> Changed in V3:
>> - Fix possibly corrupting the file just before the MDS acquires the
>>    xlock for FILE lock, another client has updated it.
>> - Flush the pagecache buffer before reading the last block for the
>>    when filling the truncate request.
>> - Some other minore fixes.
>>
>>
>>
>> Jeff Layton (5):
>>    libceph: add CEPH_OSD_OP_ASSERT_VER support
>>    ceph: size handling for encrypted inodes in cap updates
>>    ceph: fscrypt_file field handling in MClientRequest messages
>>    ceph: get file size from fscrypt_file when present in inode traces
>>    ceph: handle fscrypt fields in cap messages from MDS
>>
>> Xiubo Li (3):
>>    ceph: add __ceph_get_caps helper support
>>    ceph: add __ceph_sync_read helper support
>>    ceph: add truncate size handling support for fscrypt
>>
>>   fs/ceph/caps.c                  | 136 ++++++++++++++----
>>   fs/ceph/crypto.h                |   4 +
>>   fs/ceph/dir.c                   |   3 +
>>   fs/ceph/file.c                  |  43 ++++--
>>   fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
>>   fs/ceph/mds_client.c            |   9 +-
>>   fs/ceph/mds_client.h            |   2 +
>>   fs/ceph/super.h                 |  10 ++
>>   include/linux/ceph/crypto.h     |  28 ++++
>>   include/linux/ceph/osd_client.h |   6 +-
>>   include/linux/ceph/rados.h      |   4 +
>>   net/ceph/osd_client.c           |   5 +
>>   12 files changed, 427 insertions(+), 59 deletions(-)
>>   create mode 100644 include/linux/ceph/crypto.h
>>
> Thanks Xiubo,
>
> This looks like a great start. I set up an environment vs. a cephadm
> cluster with your fscrypt changes, and started running xfstests against
> it with test_dummy_encryption enabled. It got to generic/014 and the
> test hung waiting on a SETATTR call to come back:
>
> [root@client1 f3cf8b7a-38ec-11ec-a0e4-52540031ba78.client74208]# cat mdsc
> 89447	mds0	setattr	 #1000003b19c
>
> Looking at the MDS that it was talking to, I see:
>
> Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 31.627241 secs
> Nov 03 08:25:09 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : slow request 31.627240 seconds old, received at 2021-11-03T12:24:37.911553+0000: client_request(client.74208:89447 setattr size=102498304 #0x1000003b19c 2021-11-03T12:24:37.895292+0000 caller_uid=0, caller_gid=0{0,}) currently acquired locks
> Nov 03 08:25:14 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 36.627323 secs
> Nov 03 08:25:19 cephadm2 ceph-mds[3133]: log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 41.627389 secs
>
> ...and it still hasn't resolved.
>
> I'll keep looking around a bit more, but I think there are still some
> bugs in here. Let me know if you have thoughts as to what the issue is.

 From MDS side log, it keeps retrying the truncate request:

2021-11-04T10:24:25.542+0800 149d48288700  1 -- 
v1:10.72.47.117:6814/424105754 <== osd.0 v1:10.72.47.117:6800/10035 
249354 ==== osd_op_reply(358495 10000000ed7.00000016 [read 92872704~8] 
v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 164+0+0 
(unknown 4045992944 0 0) 0x55cd75169440 con 0x55cd7514dc00
2021-11-04T10:24:25.542+0800 149d46278700 10 MDSIOContextBase::complete: 
24C_IO_MDC_ReadtruncFinish
2021-11-04T10:24:25.542+0800 149d46278700 10 MDSContext::complete: 
24C_IO_MDC_ReadtruncFinish

It's a bug when hit a file hole. I will fix it soon.

Thanks.

BRs


> Thanks,
Jeff Layton Nov. 5, 2021, 12:13 a.m. UTC | #3
On Wed, 2021-11-03 at 09:22 +0800, xiubli@redhat.com wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> This patch series is based on the "wip-fscrypt-fnames" branch in
> repo https://github.com/ceph/ceph-client.git.
> 
> And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
> branch in repo
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
> 
> ====
> 
> This approach is based on the discussion from V1 and V2, which will
> pass the encrypted last block contents to MDS along with the truncate
> request.
> 
> This will send the encrypted last block contents to MDS along with
> the truncate request when truncating to a smaller size and at the
> same time new size does not align to BLOCK SIZE.
> 
> The MDS side patch is raised in PR
> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> previous great work in PR https://github.com/ceph/ceph/pull/41284.
> 
> The MDS will use the filer.write_trunc(), which could update and
> truncate the file in one shot, instead of filer.truncate().
> 
> This just assume kclient won't support the inline data feature, which
> will be remove soon, more detail please see:
> https://tracker.ceph.com/issues/52916
> 
> Changed in V5:
> - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
> - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
>   in linux.git repo.
> - Add "i_truncate_pagecache_size" member support in ceph_inode_info
>   struct, this will be used to truncate the pagecache only in kclient
>   side, because the "i_truncate_size" will always be aligned to BLOCK
>   SIZE. In fscrypt case we need to use the real size to truncate the
>   pagecache.
> 
> 
> Changed in V4:
> - Retry the truncate request by 20 times before fail it with -EAGAIN.
> - Remove the "fill_last_block" label and move the code to else branch.
> - Remove the #3 patch, which has already been sent out separately, in
>   V3 series.
> - Improve some comments in the code.
> 
> Changed in V3:
> - Fix possibly corrupting the file just before the MDS acquires the
>   xlock for FILE lock, another client has updated it.
> - Flush the pagecache buffer before reading the last block for the
>   when filling the truncate request.
> - Some other minore fixes.
> 
> 
> 
> Jeff Layton (5):
>   libceph: add CEPH_OSD_OP_ASSERT_VER support
>   ceph: size handling for encrypted inodes in cap updates
>   ceph: fscrypt_file field handling in MClientRequest messages
>   ceph: get file size from fscrypt_file when present in inode traces
>   ceph: handle fscrypt fields in cap messages from MDS
> 
> Xiubo Li (3):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/caps.c                  | 136 ++++++++++++++----
>  fs/ceph/crypto.h                |   4 +
>  fs/ceph/dir.c                   |   3 +
>  fs/ceph/file.c                  |  43 ++++--
>  fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
>  fs/ceph/mds_client.c            |   9 +-
>  fs/ceph/mds_client.h            |   2 +
>  fs/ceph/super.h                 |  10 ++
>  include/linux/ceph/crypto.h     |  28 ++++
>  include/linux/ceph/osd_client.h |   6 +-
>  include/linux/ceph/rados.h      |   4 +
>  net/ceph/osd_client.c           |   5 +
>  12 files changed, 427 insertions(+), 59 deletions(-)
>  create mode 100644 include/linux/ceph/crypto.h
> 

Nice work, Xiubo. This looks good.

I've been testing it some today and it seems to work fine so far. I've
got a bit more testing that I want to do tomorrow, but this should
hopefully clear the way for us to finish the content encryption piece!

Many thanks!
Xiubo Li Nov. 5, 2021, 12:50 a.m. UTC | #4
On 11/5/21 8:13 AM, Jeff Layton wrote:
> On Wed, 2021-11-03 at 09:22 +0800, xiubli@redhat.com wrote:
>> From: Jeff Layton <jlayton@kernel.org>
>>
>> This patch series is based on the "wip-fscrypt-fnames" branch in
>> repo https://github.com/ceph/ceph-client.git.
>>
>> And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
>> branch in repo
>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
>>
>> ====
>>
>> This approach is based on the discussion from V1 and V2, which will
>> pass the encrypted last block contents to MDS along with the truncate
>> request.
>>
>> This will send the encrypted last block contents to MDS along with
>> the truncate request when truncating to a smaller size and at the
>> same time new size does not align to BLOCK SIZE.
>>
>> The MDS side patch is raised in PR
>> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
>> previous great work in PR https://github.com/ceph/ceph/pull/41284.
>>
>> The MDS will use the filer.write_trunc(), which could update and
>> truncate the file in one shot, instead of filer.truncate().
>>
>> This just assume kclient won't support the inline data feature, which
>> will be remove soon, more detail please see:
>> https://tracker.ceph.com/issues/52916
>>
>> Changed in V5:
>> - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
>> - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
>>    in linux.git repo.
>> - Add "i_truncate_pagecache_size" member support in ceph_inode_info
>>    struct, this will be used to truncate the pagecache only in kclient
>>    side, because the "i_truncate_size" will always be aligned to BLOCK
>>    SIZE. In fscrypt case we need to use the real size to truncate the
>>    pagecache.
>>
>>
>> Changed in V4:
>> - Retry the truncate request by 20 times before fail it with -EAGAIN.
>> - Remove the "fill_last_block" label and move the code to else branch.
>> - Remove the #3 patch, which has already been sent out separately, in
>>    V3 series.
>> - Improve some comments in the code.
>>
>> Changed in V3:
>> - Fix possibly corrupting the file just before the MDS acquires the
>>    xlock for FILE lock, another client has updated it.
>> - Flush the pagecache buffer before reading the last block for the
>>    when filling the truncate request.
>> - Some other minore fixes.
>>
>>
>>
>> Jeff Layton (5):
>>    libceph: add CEPH_OSD_OP_ASSERT_VER support
>>    ceph: size handling for encrypted inodes in cap updates
>>    ceph: fscrypt_file field handling in MClientRequest messages
>>    ceph: get file size from fscrypt_file when present in inode traces
>>    ceph: handle fscrypt fields in cap messages from MDS
>>
>> Xiubo Li (3):
>>    ceph: add __ceph_get_caps helper support
>>    ceph: add __ceph_sync_read helper support
>>    ceph: add truncate size handling support for fscrypt
>>
>>   fs/ceph/caps.c                  | 136 ++++++++++++++----
>>   fs/ceph/crypto.h                |   4 +
>>   fs/ceph/dir.c                   |   3 +
>>   fs/ceph/file.c                  |  43 ++++--
>>   fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
>>   fs/ceph/mds_client.c            |   9 +-
>>   fs/ceph/mds_client.h            |   2 +
>>   fs/ceph/super.h                 |  10 ++
>>   include/linux/ceph/crypto.h     |  28 ++++
>>   include/linux/ceph/osd_client.h |   6 +-
>>   include/linux/ceph/rados.h      |   4 +
>>   net/ceph/osd_client.c           |   5 +
>>   12 files changed, 427 insertions(+), 59 deletions(-)
>>   create mode 100644 include/linux/ceph/crypto.h
>>
> Nice work, Xiubo. This looks good.
>
> I've been testing it some today and it seems to work fine so far.

Cool.


>   I've
> got a bit more testing that I want to do tomorrow,

At the same time I will test more.


> but this should
> hopefully clear the way for us to finish the content encryption piece!
Yeah, the experimental branch for the content encryption is not working 
well as the fname branch does, we may need more review and testing about it.

BRs

Xiubo

> Many thanks!
Jeff Layton Nov. 5, 2021, 11:15 a.m. UTC | #5
On Fri, 2021-11-05 at 08:50 +0800, Xiubo Li wrote:
> On 11/5/21 8:13 AM, Jeff Layton wrote:
> > On Wed, 2021-11-03 at 09:22 +0800, xiubli@redhat.com wrote:
> > > From: Jeff Layton <jlayton@kernel.org>
> > > 
> > > This patch series is based on the "wip-fscrypt-fnames" branch in
> > > repo https://github.com/ceph/ceph-client.git.
> > > 
> > > And I have picked up 5 patches from the "ceph-fscrypt-size-experimental"
> > > branch in repo
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
> > > 
> > > ====
> > > 
> > > This approach is based on the discussion from V1 and V2, which will
> > > pass the encrypted last block contents to MDS along with the truncate
> > > request.
> > > 
> > > This will send the encrypted last block contents to MDS along with
> > > the truncate request when truncating to a smaller size and at the
> > > same time new size does not align to BLOCK SIZE.
> > > 
> > > The MDS side patch is raised in PR
> > > https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> > > previous great work in PR https://github.com/ceph/ceph/pull/41284.
> > > 
> > > The MDS will use the filer.write_trunc(), which could update and
> > > truncate the file in one shot, instead of filer.truncate().
> > > 
> > > This just assume kclient won't support the inline data feature, which
> > > will be remove soon, more detail please see:
> > > https://tracker.ceph.com/issues/52916
> > > 
> > > Changed in V5:
> > > - Rebase to "wip-fscrypt-fnames" branch in ceph-client.git repo.
> > > - Pick up 5 patches from Jeff's "ceph-fscrypt-size-experimental" branch
> > >    in linux.git repo.
> > > - Add "i_truncate_pagecache_size" member support in ceph_inode_info
> > >    struct, this will be used to truncate the pagecache only in kclient
> > >    side, because the "i_truncate_size" will always be aligned to BLOCK
> > >    SIZE. In fscrypt case we need to use the real size to truncate the
> > >    pagecache.
> > > 
> > > 
> > > Changed in V4:
> > > - Retry the truncate request by 20 times before fail it with -EAGAIN.
> > > - Remove the "fill_last_block" label and move the code to else branch.
> > > - Remove the #3 patch, which has already been sent out separately, in
> > >    V3 series.
> > > - Improve some comments in the code.
> > > 
> > > Changed in V3:
> > > - Fix possibly corrupting the file just before the MDS acquires the
> > >    xlock for FILE lock, another client has updated it.
> > > - Flush the pagecache buffer before reading the last block for the
> > >    when filling the truncate request.
> > > - Some other minore fixes.
> > > 
> > > 
> > > 
> > > Jeff Layton (5):
> > >    libceph: add CEPH_OSD_OP_ASSERT_VER support
> > >    ceph: size handling for encrypted inodes in cap updates
> > >    ceph: fscrypt_file field handling in MClientRequest messages
> > >    ceph: get file size from fscrypt_file when present in inode traces
> > >    ceph: handle fscrypt fields in cap messages from MDS
> > > 
> > > Xiubo Li (3):
> > >    ceph: add __ceph_get_caps helper support
> > >    ceph: add __ceph_sync_read helper support
> > >    ceph: add truncate size handling support for fscrypt
> > > 
> > >   fs/ceph/caps.c                  | 136 ++++++++++++++----
> > >   fs/ceph/crypto.h                |   4 +
> > >   fs/ceph/dir.c                   |   3 +
> > >   fs/ceph/file.c                  |  43 ++++--
> > >   fs/ceph/inode.c                 | 236 +++++++++++++++++++++++++++++---
> > >   fs/ceph/mds_client.c            |   9 +-
> > >   fs/ceph/mds_client.h            |   2 +
> > >   fs/ceph/super.h                 |  10 ++
> > >   include/linux/ceph/crypto.h     |  28 ++++
> > >   include/linux/ceph/osd_client.h |   6 +-
> > >   include/linux/ceph/rados.h      |   4 +
> > >   net/ceph/osd_client.c           |   5 +
> > >   12 files changed, 427 insertions(+), 59 deletions(-)
> > >   create mode 100644 include/linux/ceph/crypto.h
> > > 
> > Nice work, Xiubo. This looks good.
> > 
> > I've been testing it some today and it seems to work fine so far.
> 
> Cool.
> 
> 
> >   I've
> > got a bit more testing that I want to do tomorrow,
> 
> At the same time I will test more.
> 
> 
> > but this should
> > hopefully clear the way for us to finish the content encryption piece!
> Yeah, the experimental branch for the content encryption is not working 
> well as the fname branch does, we may need more review and testing about it.
> 

Definitely. That work is not at all complete yet. We need to make sure
the size handling is rock-solid before we add in content encryption
though. If we get the size handling wrong then it will probably just
manifest as data corruption once encryption is in play.

Heck, we may want to consider an fscrypt mode that just does no-op
encryption for testing this sort of thing.

On another note...one interesting this with this patchset:

[jlayton@client1 scratch]$ ls -l /mnt/scratch/crypt
total 12
-rw-r--r--. 1 jlayton jlayton 1025 Nov  5 06:55 1025
-rw-r--r--. 1 jlayton jlayton 1024 Nov  5 06:54 1k
-rw-r--r--. 1 jlayton jlayton 2048 Nov  5 06:54 2k
-rw-r--r--. 1 jlayton jlayton 7168 Nov  5 06:55 7k
-rw-r--r--. 1 jlayton jlayton    4 Nov  5 06:54 foo

...but when the same client doesn't have the key, the real sizes are
still presented:

[jlayton@client1 ~]$ ls -l /mnt/scratch/crypt
total 12
-rw-r--r--. 1 jlayton jlayton    4 Nov  5 06:54 mmyetGFDwaf_PPqhm2ofMkNOFxBPFyrYJc_uif1vXL8
-rw-r--r--. 1 jlayton jlayton 1024 Nov  5 06:54 OGkEeGaqqLj7YVceGN5SkCF80et25ZkPUwdrd9nqtsg
-rw-r--r--. 1 jlayton jlayton 7168 Nov  5 06:55 RL6qlqBvpAkZEku3SKrTmGqTkJWkWjqM7KtPvYJBAf8
-rw-r--r--. 1 jlayton jlayton 1025 Nov  5 06:55 w1rCnxYQLJTbxHtZC2qtRnDdoIO9-vf_OlKjY0WcwH8
-rw-r--r--. 1 jlayton jlayton 2048 Nov  5 06:54 YcwUK3htDdBkSqJVMebaKgR5xLO6BXz-NpABPa-mUA

On a client that doesn't support fscrypt, the sizes show the rounded-up values (as expected):

[jlayton@client2 ~]$ ls -l /mnt/scratch/crypt/
total 24
-rw-r--r--. 1 jlayton jlayton 4096 Nov  5 06:54 mmyetGFDwaf_PPqhm2ofMkNOFxBPFyrYJc_uif1vXL8
-rw-r--r--. 1 jlayton jlayton 4096 Nov  5 06:54 OGkEeGaqqLj7YVceGN5SkCF80et25ZkPUwdrd9nqtsg
-rw-r--r--. 1 jlayton jlayton 8192 Nov  5 06:55 RL6qlqBvpAkZEku3SKrTmGqTkJWkWjqM7KtPvYJBAf8
-rw-r--r--. 1 jlayton jlayton 4096 Nov  5 06:55 w1rCnxYQLJTbxHtZC2qtRnDdoIO9-vf_OlKjY0WcwH8
-rw-r--r--. 1 jlayton jlayton 4096 Nov  5 06:54 YcwUK3htDdBkSqJVMebaKgR5xLO6BXz-NpABPa-mUAU

Question: should we present the rounded-up sizes to applications on
clients that support fscrypt but do not have the key?

I tend to think that that makes for better opsec, overall. Are there
reasons not to hide the real size when the user doesn't have the key?