mbox series

[v4,0/4] ceph: size handling for the fscrypt

Message ID 20211101020447.75872-1-xiubli@redhat.com (mailing list archive)
Headers show
Series ceph: size handling for the fscrypt | expand

Message

Xiubo Li Nov. 1, 2021, 2:04 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

This patch series is based on the "fscrypt_size_handling" branch in
https://github.com/lxbsz/linux.git, which is based Jeff's
"ceph-fscrypt-content-experimental" branch in
https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
and added two upstream commits, which should be merged already.

These two upstream commits should be removed after Jeff rebase
his "ceph-fscrypt-content-experimental" branch to upstream code.

====

This approach is based on the discussion from V1 and V2, which will
pass the encrypted last block contents to MDS along with the truncate
request.

This will send the encrypted last block contents to MDS along with
the truncate request when truncating to a smaller size and at the
same time new size does not align to BLOCK SIZE.

The MDS side patch is raised in PR
https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
previous great work in PR https://github.com/ceph/ceph/pull/41284.

The MDS will use the filer.write_trunc(), which could update and
truncate the file in one shot, instead of filer.truncate().

This just assume kclient won't support the inline data feature, which
will be remove soon, more detail please see:
https://tracker.ceph.com/issues/52916

Changed in V4:
- Retry the truncate request by 20 times before fail it with -EAGAIN.
- Remove the "fill_last_block" label and move the code to else branch.
- Remove the #3 patch, which has already been sent out separately, in
  V3 series.
- Improve some comments in the code.


Changed in V3:
- Fix possibly corrupting the file just before the MDS acquires the
  xlock for FILE lock, another client has updated it.
- Flush the pagecache buffer before reading the last block for the
  when filling the truncate request.
- Some other minore fixes.

Xiubo Li (4):
  Revert "ceph: make client zero partial trailing block on truncate"
  ceph: add __ceph_get_caps helper support
  ceph: add __ceph_sync_read helper support
  ceph: add truncate size handling support for fscrypt

 fs/ceph/caps.c              |  21 ++--
 fs/ceph/file.c              |  44 +++++---
 fs/ceph/inode.c             | 203 ++++++++++++++++++++++++++++++------
 fs/ceph/super.h             |   6 +-
 include/linux/ceph/crypto.h |  28 +++++
 5 files changed, 251 insertions(+), 51 deletions(-)
 create mode 100644 include/linux/ceph/crypto.h

Comments

Jeffrey Layton Nov. 1, 2021, 10:27 a.m. UTC | #1
On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This patch series is based on the "fscrypt_size_handling" branch in
> https://github.com/lxbsz/linux.git, which is based Jeff's
> "ceph-fscrypt-content-experimental" branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
> and added two upstream commits, which should be merged already.
> 
> These two upstream commits should be removed after Jeff rebase
> his "ceph-fscrypt-content-experimental" branch to upstream code.
> 

I don't think I was clear last time. I'd like for you to post the
_entire_ stack of patches that is based on top of
ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
this point, so I think it's a reasonable place for you to base your
work. That way you're not beginning with a revert.

Again, feel free to cherry-pick any of the patches in any of my other
branches for your series, but I'd like to see a complete series of
patches.

Thanks,
Jeff


> ====
> 
> This approach is based on the discussion from V1 and V2, which will
> pass the encrypted last block contents to MDS along with the truncate
> request.
> 
> This will send the encrypted last block contents to MDS along with
> the truncate request when truncating to a smaller size and at the
> same time new size does not align to BLOCK SIZE.
> 
> The MDS side patch is raised in PR
> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> previous great work in PR https://github.com/ceph/ceph/pull/41284.
> 
> The MDS will use the filer.write_trunc(), which could update and
> truncate the file in one shot, instead of filer.truncate().
> 
> This just assume kclient won't support the inline data feature, which
> will be remove soon, more detail please see:
> https://tracker.ceph.com/issues/52916
> 
> Changed in V4:
> - Retry the truncate request by 20 times before fail it with -EAGAIN.
> - Remove the "fill_last_block" label and move the code to else branch.
> - Remove the #3 patch, which has already been sent out separately, in
>   V3 series.
> - Improve some comments in the code.
> 
> 
> Changed in V3:
> - Fix possibly corrupting the file just before the MDS acquires the
>   xlock for FILE lock, another client has updated it.
> - Flush the pagecache buffer before reading the last block for the
>   when filling the truncate request.
> - Some other minore fixes.
> 
> Xiubo Li (4):
>   Revert "ceph: make client zero partial trailing block on truncate"
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/caps.c              |  21 ++--
>  fs/ceph/file.c              |  44 +++++---
>  fs/ceph/inode.c             | 203 ++++++++++++++++++++++++++++++------
>  fs/ceph/super.h             |   6 +-
>  include/linux/ceph/crypto.h |  28 +++++
>  5 files changed, 251 insertions(+), 51 deletions(-)
>  create mode 100644 include/linux/ceph/crypto.h
>
Jeffrey Layton Nov. 1, 2021, 5:07 p.m. UTC | #2
On Mon, 2021-11-01 at 06:27 -0400, Jeff Layton wrote:
> On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > This patch series is based on the "fscrypt_size_handling" branch in
> > https://github.com/lxbsz/linux.git, which is based Jeff's
> > "ceph-fscrypt-content-experimental" branch in
> > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
> > and added two upstream commits, which should be merged already.
> > 
> > These two upstream commits should be removed after Jeff rebase
> > his "ceph-fscrypt-content-experimental" branch to upstream code.
> > 
> 
> I don't think I was clear last time. I'd like for you to post the
> _entire_ stack of patches that is based on top of
> ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
> this point, so I think it's a reasonable place for you to base your
> work. That way you're not beginning with a revert.
> 
> Again, feel free to cherry-pick any of the patches in any of my other
> branches for your series, but I'd like to see a complete series of
> patches.
> 
> 

To be even more clear:

The main reason this patchset is not helpful is that the
ceph-fscrypt-content-experimental branch in my tree has bitrotted in the
face of other changes that have gone into the testing branch since it
was cut. Also, that branch had several patches that added in actual
encryption of the content, which we don't want to do at this point.

For the work you're doing, what I'd like to see is a patchset based on
top of the ceph-client/wip-fscrypt-fnames branch. That patchset should
make it so what when encryption is enabled, the size handling for the
inode is changed to use the new scheme we've added, but don't do any
actual content encryption yet. Feel free to pick any of the patches in
my trees that you need to do this, but it needs to be the whole series.

What we need to be able to do in this phase is validate that the size
handling is correct in both the encrypted and non-encrypted cases, but
without encrypting the data. Once that piece is working, then we should
be able to add in content encryption.


> 
> > ====
> > 
> > This approach is based on the discussion from V1 and V2, which will
> > pass the encrypted last block contents to MDS along with the truncate
> > request.
> > 
> > This will send the encrypted last block contents to MDS along with
> > the truncate request when truncating to a smaller size and at the
> > same time new size does not align to BLOCK SIZE.
> > 
> > The MDS side patch is raised in PR
> > https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> > previous great work in PR https://github.com/ceph/ceph/pull/41284.
> > 
> > The MDS will use the filer.write_trunc(), which could update and
> > truncate the file in one shot, instead of filer.truncate().
> > 
> > This just assume kclient won't support the inline data feature, which
> > will be remove soon, more detail please see:
> > https://tracker.ceph.com/issues/52916
> > 
> > Changed in V4:
> > - Retry the truncate request by 20 times before fail it with -EAGAIN.
> > - Remove the "fill_last_block" label and move the code to else branch.
> > - Remove the #3 patch, which has already been sent out separately, in
> >   V3 series.
> > - Improve some comments in the code.
> > 
> > 
> > Changed in V3:
> > - Fix possibly corrupting the file just before the MDS acquires the
> >   xlock for FILE lock, another client has updated it.
> > - Flush the pagecache buffer before reading the last block for the
> >   when filling the truncate request.
> > - Some other minore fixes.
> > 
> > Xiubo Li (4):
> >   Revert "ceph: make client zero partial trailing block on truncate"
> >   ceph: add __ceph_get_caps helper support
> >   ceph: add __ceph_sync_read helper support
> >   ceph: add truncate size handling support for fscrypt
> > 
> >  fs/ceph/caps.c              |  21 ++--
> >  fs/ceph/file.c              |  44 +++++---
> >  fs/ceph/inode.c             | 203 ++++++++++++++++++++++++++++++------
> >  fs/ceph/super.h             |   6 +-
> >  include/linux/ceph/crypto.h |  28 +++++
> >  5 files changed, 251 insertions(+), 51 deletions(-)
> >  create mode 100644 include/linux/ceph/crypto.h
> > 
>
Xiubo Li Nov. 2, 2021, 1:02 a.m. UTC | #3
On 11/2/21 1:07 AM, Jeff Layton wrote:
> On Mon, 2021-11-01 at 06:27 -0400, Jeff Layton wrote:
>> On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> This patch series is based on the "fscrypt_size_handling" branch in
>>> https://github.com/lxbsz/linux.git, which is based Jeff's
>>> "ceph-fscrypt-content-experimental" branch in
>>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
>>> and added two upstream commits, which should be merged already.
>>>
>>> These two upstream commits should be removed after Jeff rebase
>>> his "ceph-fscrypt-content-experimental" branch to upstream code.
>>>
>> I don't think I was clear last time. I'd like for you to post the
>> _entire_ stack of patches that is based on top of
>> ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
>> this point, so I think it's a reasonable place for you to base your
>> work. That way you're not beginning with a revert.
>>
>> Again, feel free to cherry-pick any of the patches in any of my other
>> branches for your series, but I'd like to see a complete series of
>> patches.
>>
>>
> To be even more clear:
>
> The main reason this patchset is not helpful is that the
> ceph-fscrypt-content-experimental branch in my tree has bitrotted in the
> face of other changes that have gone into the testing branch since it
> was cut. Also, that branch had several patches that added in actual
> encryption of the content, which we don't want to do at this point.
>
> For the work you're doing, what I'd like to see is a patchset based on
> top of the ceph-client/wip-fscrypt-fnames branch. That patchset should
> make it so what when encryption is enabled, the size handling for the
> inode is changed to use the new scheme we've added, but don't do any
> actual content encryption yet. Feel free to pick any of the patches in
> my trees that you need to do this, but it needs to be the whole series.
>
> What we need to be able to do in this phase is validate that the size
> handling is correct in both the encrypted and non-encrypted cases, but
> without encrypting the data. Once that piece is working, then we should
> be able to add in content encryption.

Okay, get it.

I will cleanup the code and respin it later.

BRs

-- Xiubo


>
>>> ====
>>>
>>> This approach is based on the discussion from V1 and V2, which will
>>> pass the encrypted last block contents to MDS along with the truncate
>>> request.
>>>
>>> This will send the encrypted last block contents to MDS along with
>>> the truncate request when truncating to a smaller size and at the
>>> same time new size does not align to BLOCK SIZE.
>>>
>>> The MDS side patch is raised in PR
>>> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
>>> previous great work in PR https://github.com/ceph/ceph/pull/41284.
>>>
>>> The MDS will use the filer.write_trunc(), which could update and
>>> truncate the file in one shot, instead of filer.truncate().
>>>
>>> This just assume kclient won't support the inline data feature, which
>>> will be remove soon, more detail please see:
>>> https://tracker.ceph.com/issues/52916
>>>
>>> Changed in V4:
>>> - Retry the truncate request by 20 times before fail it with -EAGAIN.
>>> - Remove the "fill_last_block" label and move the code to else branch.
>>> - Remove the #3 patch, which has already been sent out separately, in
>>>    V3 series.
>>> - Improve some comments in the code.
>>>
>>>
>>> Changed in V3:
>>> - Fix possibly corrupting the file just before the MDS acquires the
>>>    xlock for FILE lock, another client has updated it.
>>> - Flush the pagecache buffer before reading the last block for the
>>>    when filling the truncate request.
>>> - Some other minore fixes.
>>>
>>> Xiubo Li (4):
>>>    Revert "ceph: make client zero partial trailing block on truncate"
>>>    ceph: add __ceph_get_caps helper support
>>>    ceph: add __ceph_sync_read helper support
>>>    ceph: add truncate size handling support for fscrypt
>>>
>>>   fs/ceph/caps.c              |  21 ++--
>>>   fs/ceph/file.c              |  44 +++++---
>>>   fs/ceph/inode.c             | 203 ++++++++++++++++++++++++++++++------
>>>   fs/ceph/super.h             |   6 +-
>>>   include/linux/ceph/crypto.h |  28 +++++
>>>   5 files changed, 251 insertions(+), 51 deletions(-)
>>>   create mode 100644 include/linux/ceph/crypto.h
>>>
Xiubo Li Nov. 2, 2021, 9:44 a.m. UTC | #4
On 11/1/21 6:27 PM, Jeff Layton wrote:
> On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> This patch series is based on the "fscrypt_size_handling" branch in
>> https://github.com/lxbsz/linux.git, which is based Jeff's
>> "ceph-fscrypt-content-experimental" branch in
>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
>> and added two upstream commits, which should be merged already.
>>
>> These two upstream commits should be removed after Jeff rebase
>> his "ceph-fscrypt-content-experimental" branch to upstream code.
>>
> I don't think I was clear last time. I'd like for you to post the
> _entire_ stack of patches that is based on top of
> ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
> this point, so I think it's a reasonable place for you to base your
> work. That way you're not beginning with a revert.

Hi Jeff,

BTW, have test by disabling the CONFIG_FS_ENCRYPTION option for branch 
ceph-client/wip-fscrypt-fnames ?

I have tried it today but the kernel will crash always with the 
following script. I tried many times the terminal, which is running 'cat 
/proc/kmsg' will always be stuck without any call trace about it.

# mkdir dir && echo "123" > dir/testfile

By enabling the CONFIG_FS_ENCRYPTION, I haven't countered any issue yet.

I am still debugging on it.


BRs

--Xiubo


>
> Again, feel free to cherry-pick any of the patches in any of my other
> branches for your series, but I'd like to see a complete series of
> patches.
>
> Thanks,
> Jeff
>
>
>> ====
>>
>> This approach is based on the discussion from V1 and V2, which will
>> pass the encrypted last block contents to MDS along with the truncate
>> request.
>>
>> This will send the encrypted last block contents to MDS along with
>> the truncate request when truncating to a smaller size and at the
>> same time new size does not align to BLOCK SIZE.
>>
>> The MDS side patch is raised in PR
>> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
>> previous great work in PR https://github.com/ceph/ceph/pull/41284.
>>
>> The MDS will use the filer.write_trunc(), which could update and
>> truncate the file in one shot, instead of filer.truncate().
>>
>> This just assume kclient won't support the inline data feature, which
>> will be remove soon, more detail please see:
>> https://tracker.ceph.com/issues/52916
>>
>> Changed in V4:
>> - Retry the truncate request by 20 times before fail it with -EAGAIN.
>> - Remove the "fill_last_block" label and move the code to else branch.
>> - Remove the #3 patch, which has already been sent out separately, in
>>    V3 series.
>> - Improve some comments in the code.
>>
>>
>> Changed in V3:
>> - Fix possibly corrupting the file just before the MDS acquires the
>>    xlock for FILE lock, another client has updated it.
>> - Flush the pagecache buffer before reading the last block for the
>>    when filling the truncate request.
>> - Some other minore fixes.
>>
>> Xiubo Li (4):
>>    Revert "ceph: make client zero partial trailing block on truncate"
>>    ceph: add __ceph_get_caps helper support
>>    ceph: add __ceph_sync_read helper support
>>    ceph: add truncate size handling support for fscrypt
>>
>>   fs/ceph/caps.c              |  21 ++--
>>   fs/ceph/file.c              |  44 +++++---
>>   fs/ceph/inode.c             | 203 ++++++++++++++++++++++++++++++------
>>   fs/ceph/super.h             |   6 +-
>>   include/linux/ceph/crypto.h |  28 +++++
>>   5 files changed, 251 insertions(+), 51 deletions(-)
>>   create mode 100644 include/linux/ceph/crypto.h
>>
Jeffrey Layton Nov. 2, 2021, 10:52 a.m. UTC | #5
On Tue, 2021-11-02 at 17:44 +0800, Xiubo Li wrote:
> On 11/1/21 6:27 PM, Jeff Layton wrote:
> > On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > This patch series is based on the "fscrypt_size_handling" branch in
> > > https://github.com/lxbsz/linux.git, which is based Jeff's
> > > "ceph-fscrypt-content-experimental" branch in
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
> > > and added two upstream commits, which should be merged already.
> > > 
> > > These two upstream commits should be removed after Jeff rebase
> > > his "ceph-fscrypt-content-experimental" branch to upstream code.
> > > 
> > I don't think I was clear last time. I'd like for you to post the
> > _entire_ stack of patches that is based on top of
> > ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
> > this point, so I think it's a reasonable place for you to base your
> > work. That way you're not beginning with a revert.
> 
> Hi Jeff,
> 
> BTW, have test by disabling the CONFIG_FS_ENCRYPTION option for branch 
> ceph-client/wip-fscrypt-fnames ?
> 
> I have tried it today but the kernel will crash always with the 
> following script. I tried many times the terminal, which is running 'cat 
> /proc/kmsg' will always be stuck without any call trace about it.
> 
> # mkdir dir && echo "123" > dir/testfile
> 
> By enabling the CONFIG_FS_ENCRYPTION, I haven't countered any issue yet.
> 
> I am still debugging on it.
> 
> 


No, I hadn't noticed that, but I can reproduce it too. AFAICT, bash is
sitting in a pselect() call:

[jlayton@client1 ~]$ sudo cat /proc/1163/stack
[<0>] poll_schedule_timeout.constprop.0+0x53/0xa0
[<0>] do_select+0xb51/0xc70
[<0>] core_sys_select+0x2ac/0x620
[<0>] do_pselect.constprop.0+0x101/0x1b0
[<0>] __x64_sys_pselect6+0x9a/0xc0
[<0>] do_syscall_64+0x3b/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

After playing around a bit more, I saw this KASAN pop, which may be
related:

[ 1046.013880] ==================================================================
[ 1046.017053] BUG: KASAN: out-of-bounds in encode_cap_msg+0x76c/0xa80 [ceph]
[ 1046.019441] Read of size 18446744071716025685 at addr ffff8881011bf558 by task kworker/7:1/82
[ 1046.022243] 
[ 1046.022785] CPU: 7 PID: 82 Comm: kworker/7:1 Tainted: G            E     5.15.0-rc6+ #43
[ 1046.025421] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[ 1046.028159] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 1046.030111] Call Trace:
[ 1046.030983]  dump_stack_lvl+0x57/0x72
[ 1046.032177]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.033864]  print_address_description.constprop.0+0x1f/0x140
[ 1046.035807]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.037221]  ? encode_cap_msg+0x76c/0xa80 [ceph]
[ 1046.038680]  kasan_report.cold+0x7f/0x11b
[ 1046.039853]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.041317]  ? encode_cap_msg+0x76c/0xa80 [ceph]
[ 1046.042782]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.044168]  kasan_check_range+0xf5/0x1d0
[ 1046.045325]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.046679]  memcpy+0x20/0x60
[ 1046.047555]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.048930]  encode_cap_msg+0x76c/0xa80 [ceph]
[ 1046.050383]  ? ceph_kvmalloc+0xdd/0x110 [libceph]
[ 1046.051888]  ? ceph_msg_new2+0xf7/0x210 [libceph]
[ 1046.053395]  __send_cap+0x40/0x180 [ceph]
[ 1046.054696]  ceph_check_caps+0x5a2/0xc50 [ceph]
[ 1046.056482]  ? deref_stack_reg+0xb0/0xb0
[ 1046.057786]  ? ceph_con_workfn+0x224/0x8b0 [libceph]
[ 1046.059471]  ? __ceph_should_report_size+0x90/0x90 [ceph]
[ 1046.061190]  ? lock_is_held_type+0xe0/0x110
[ 1046.062453]  ? find_held_lock+0x85/0xa0
[ 1046.063684]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.065089]  ? lock_release+0x1c7/0x3e0
[ 1046.066225]  ? wait_for_completion+0x150/0x150
[ 1046.067570]  ? __ceph_caps_file_wanted+0x25a/0x380 [ceph]
[ 1046.069319]  handle_cap_grant+0x113c/0x13a0 [ceph]
[ 1046.070962]  ? ceph_kick_flushing_inode_caps+0x240/0x240 [ceph]
[ 1046.081699]  ? __cap_is_valid+0x82/0x100 [ceph]
[ 1046.091755]  ? rb_next+0x1e/0x80
[ 1046.096640]  ? __ceph_caps_issued+0xe0/0x130 [ceph]
[ 1046.101331]  ceph_handle_caps+0x10f9/0x2280 [ceph]
[ 1046.106003]  ? mds_dispatch+0x134/0x2470 [ceph]
[ 1046.110416]  ? ceph_remove_capsnap+0x90/0x90 [ceph]
[ 1046.114901]  ? __mutex_lock+0x180/0xc10
[ 1046.119178]  ? release_sock+0x1d/0xf0
[ 1046.123331]  ? mds_dispatch+0xaf/0x2470 [ceph]
[ 1046.127588]  ? __mutex_unlock_slowpath+0x105/0x3c0
[ 1046.131845]  mds_dispatch+0x6fb/0x2470 [ceph]
[ 1046.136002]  ? tcp_recvmsg+0xe0/0x2c0
[ 1046.140038]  ? ceph_mdsc_handle_mdsmap+0x3c0/0x3c0 [ceph]
[ 1046.144255]  ? wait_for_completion+0x150/0x150
[ 1046.148235]  ceph_con_process_message+0xd9/0x240 [libceph]
[ 1046.152387]  ? iov_iter_advance+0x8e/0x480
[ 1046.156239]  process_message+0xf/0x100 [libceph]
[ 1046.160219]  ceph_con_v2_try_read+0x1561/0x1b00 [libceph]
[ 1046.164317]  ? __handle_control+0x1730/0x1730 [libceph]
[ 1046.168345]  ? __lock_acquire+0x830/0x2c60
[ 1046.172183]  ? __mutex_lock+0x180/0xc10
[ 1046.175910]  ? ceph_con_workfn+0x41/0x8b0 [libceph]
[ 1046.179814]  ? lockdep_hardirqs_on_prepare+0x220/0x220
[ 1046.183688]  ? mutex_lock_io_nested+0xba0/0xba0
[ 1046.187559]  ? lock_release+0x3e0/0x3e0
[ 1046.191422]  ceph_con_workfn+0x224/0x8b0 [libceph]
[ 1046.195464]  process_one_work+0x4fd/0x9a0
[ 1046.199281]  ? pwq_dec_nr_in_flight+0x100/0x100
[ 1046.203075]  ? rwlock_bug.part.0+0x60/0x60
[ 1046.206787]  worker_thread+0x2d4/0x6e0
[ 1046.210488]  ? process_one_work+0x9a0/0x9a0
[ 1046.214254]  kthread+0x1e3/0x210
[ 1046.217911]  ? set_kthread_struct+0x80/0x80
[ 1046.221694]  ret_from_fork+0x22/0x30
[ 1046.225553] 
[ 1046.228927] The buggy address belongs to the page:
[ 1046.232690] page:000000001ee14099 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1011bf
[ 1046.237195] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1046.241352] raw: 0017ffffc0000000 ffffea0004046fc8 ffffea0004046fc8 0000000000000000
[ 1046.245998] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1046.250612] page dumped because: kasan: bad access detected
[ 1046.254948] 
[ 1046.258789] addr ffff8881011bf558 is located in stack of task kworker/7:1/82 at offset 296 in frame:
[ 1046.263501]  ceph_check_caps+0x0/0xc50 [ceph]
[ 1046.267766] 
[ 1046.271643] this frame has 3 objects:
[ 1046.275934]  [32, 36) 'implemented'
[ 1046.275941]  [48, 56) 'oldest_flush_tid'
[ 1046.280091]  [80, 352) 'arg'
[ 1046.284281] 
[ 1046.291847] Memory state around the buggy address:
[ 1046.295874]  ffff8881011bf400: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f2 f2 f2
[ 1046.300247]  ffff8881011bf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1046.304752] >ffff8881011bf500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1046.309172]                                                     ^
[ 1046.313414]  ffff8881011bf580: 00 00 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00
[ 1046.318113]  ffff8881011bf600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1046.322543] ==================================================================

I'll keep investigating too.
Jeffrey Layton Nov. 2, 2021, 11:29 a.m. UTC | #6
On Tue, 2021-11-02 at 06:52 -0400, Jeff Layton wrote:
> On Tue, 2021-11-02 at 17:44 +0800, Xiubo Li wrote:
> > On 11/1/21 6:27 PM, Jeff Layton wrote:
> > > On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > This patch series is based on the "fscrypt_size_handling" branch in
> > > > https://github.com/lxbsz/linux.git, which is based Jeff's
> > > > "ceph-fscrypt-content-experimental" branch in
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
> > > > and added two upstream commits, which should be merged already.
> > > > 
> > > > These two upstream commits should be removed after Jeff rebase
> > > > his "ceph-fscrypt-content-experimental" branch to upstream code.
> > > > 
> > > I don't think I was clear last time. I'd like for you to post the
> > > _entire_ stack of patches that is based on top of
> > > ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
> > > this point, so I think it's a reasonable place for you to base your
> > > work. That way you're not beginning with a revert.
> > 
> > Hi Jeff,
> > 
> > BTW, have test by disabling the CONFIG_FS_ENCRYPTION option for branch 
> > ceph-client/wip-fscrypt-fnames ?
> > 
> > I have tried it today but the kernel will crash always with the 
> > following script. I tried many times the terminal, which is running 'cat 
> > /proc/kmsg' will always be stuck without any call trace about it.
> > 
> > # mkdir dir && echo "123" > dir/testfile
> > 
> > By enabling the CONFIG_FS_ENCRYPTION, I haven't countered any issue yet.
> > 
> > I am still debugging on it.
> > 
> > 
> 
> 
> No, I hadn't noticed that, but I can reproduce it too. AFAICT, bash is
> sitting in a pselect() call:
> 
> [jlayton@client1 ~]$ sudo cat /proc/1163/stack
> [<0>] poll_schedule_timeout.constprop.0+0x53/0xa0
> [<0>] do_select+0xb51/0xc70
> [<0>] core_sys_select+0x2ac/0x620
> [<0>] do_pselect.constprop.0+0x101/0x1b0
> [<0>] __x64_sys_pselect6+0x9a/0xc0
> [<0>] do_syscall_64+0x3b/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> After playing around a bit more, I saw this KASAN pop, which may be
> related:
> 
> [ 1046.013880] ==================================================================
> [ 1046.017053] BUG: KASAN: out-of-bounds in encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.019441] Read of size 18446744071716025685 at addr ffff8881011bf558 by task kworker/7:1/82
> [ 1046.022243] 
> [ 1046.022785] CPU: 7 PID: 82 Comm: kworker/7:1 Tainted: G            E     5.15.0-rc6+ #43
> [ 1046.025421] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
> [ 1046.028159] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [ 1046.030111] Call Trace:
> [ 1046.030983]  dump_stack_lvl+0x57/0x72
> [ 1046.032177]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.033864]  print_address_description.constprop.0+0x1f/0x140
> [ 1046.035807]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.037221]  ? encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.038680]  kasan_report.cold+0x7f/0x11b
> [ 1046.039853]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.041317]  ? encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.042782]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.044168]  kasan_check_range+0xf5/0x1d0
> [ 1046.045325]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.046679]  memcpy+0x20/0x60
> [ 1046.047555]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.048930]  encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.050383]  ? ceph_kvmalloc+0xdd/0x110 [libceph]
> [ 1046.051888]  ? ceph_msg_new2+0xf7/0x210 [libceph]
> [ 1046.053395]  __send_cap+0x40/0x180 [ceph]
> [ 1046.054696]  ceph_check_caps+0x5a2/0xc50 [ceph]
> [ 1046.056482]  ? deref_stack_reg+0xb0/0xb0
> [ 1046.057786]  ? ceph_con_workfn+0x224/0x8b0 [libceph]
> [ 1046.059471]  ? __ceph_should_report_size+0x90/0x90 [ceph]
> [ 1046.061190]  ? lock_is_held_type+0xe0/0x110
> [ 1046.062453]  ? find_held_lock+0x85/0xa0
> [ 1046.063684]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.065089]  ? lock_release+0x1c7/0x3e0
> [ 1046.066225]  ? wait_for_completion+0x150/0x150
> [ 1046.067570]  ? __ceph_caps_file_wanted+0x25a/0x380 [ceph]
> [ 1046.069319]  handle_cap_grant+0x113c/0x13a0 [ceph]
> [ 1046.070962]  ? ceph_kick_flushing_inode_caps+0x240/0x240 [ceph]
> [ 1046.081699]  ? __cap_is_valid+0x82/0x100 [ceph]
> [ 1046.091755]  ? rb_next+0x1e/0x80
> [ 1046.096640]  ? __ceph_caps_issued+0xe0/0x130 [ceph]
> [ 1046.101331]  ceph_handle_caps+0x10f9/0x2280 [ceph]
> [ 1046.106003]  ? mds_dispatch+0x134/0x2470 [ceph]
> [ 1046.110416]  ? ceph_remove_capsnap+0x90/0x90 [ceph]
> [ 1046.114901]  ? __mutex_lock+0x180/0xc10
> [ 1046.119178]  ? release_sock+0x1d/0xf0
> [ 1046.123331]  ? mds_dispatch+0xaf/0x2470 [ceph]
> [ 1046.127588]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.131845]  mds_dispatch+0x6fb/0x2470 [ceph]
> [ 1046.136002]  ? tcp_recvmsg+0xe0/0x2c0
> [ 1046.140038]  ? ceph_mdsc_handle_mdsmap+0x3c0/0x3c0 [ceph]
> [ 1046.144255]  ? wait_for_completion+0x150/0x150
> [ 1046.148235]  ceph_con_process_message+0xd9/0x240 [libceph]
> [ 1046.152387]  ? iov_iter_advance+0x8e/0x480
> [ 1046.156239]  process_message+0xf/0x100 [libceph]
> [ 1046.160219]  ceph_con_v2_try_read+0x1561/0x1b00 [libceph]
> [ 1046.164317]  ? __handle_control+0x1730/0x1730 [libceph]
> [ 1046.168345]  ? __lock_acquire+0x830/0x2c60
> [ 1046.172183]  ? __mutex_lock+0x180/0xc10
> [ 1046.175910]  ? ceph_con_workfn+0x41/0x8b0 [libceph]
> [ 1046.179814]  ? lockdep_hardirqs_on_prepare+0x220/0x220
> [ 1046.183688]  ? mutex_lock_io_nested+0xba0/0xba0
> [ 1046.187559]  ? lock_release+0x3e0/0x3e0
> [ 1046.191422]  ceph_con_workfn+0x224/0x8b0 [libceph]
> [ 1046.195464]  process_one_work+0x4fd/0x9a0
> [ 1046.199281]  ? pwq_dec_nr_in_flight+0x100/0x100
> [ 1046.203075]  ? rwlock_bug.part.0+0x60/0x60
> [ 1046.206787]  worker_thread+0x2d4/0x6e0
> [ 1046.210488]  ? process_one_work+0x9a0/0x9a0
> [ 1046.214254]  kthread+0x1e3/0x210
> [ 1046.217911]  ? set_kthread_struct+0x80/0x80
> [ 1046.221694]  ret_from_fork+0x22/0x30
> [ 1046.225553] 
> [ 1046.228927] The buggy address belongs to the page:
> [ 1046.232690] page:000000001ee14099 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1011bf
> [ 1046.237195] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
> [ 1046.241352] raw: 0017ffffc0000000 ffffea0004046fc8 ffffea0004046fc8 0000000000000000
> [ 1046.245998] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 1046.250612] page dumped because: kasan: bad access detected
> [ 1046.254948] 
> [ 1046.258789] addr ffff8881011bf558 is located in stack of task kworker/7:1/82 at offset 296 in frame:
> [ 1046.263501]  ceph_check_caps+0x0/0xc50 [ceph]
> [ 1046.267766] 
> [ 1046.271643] this frame has 3 objects:
> [ 1046.275934]  [32, 36) 'implemented'
> [ 1046.275941]  [48, 56) 'oldest_flush_tid'
> [ 1046.280091]  [80, 352) 'arg'
> [ 1046.284281] 
> [ 1046.291847] Memory state around the buggy address:
> [ 1046.295874]  ffff8881011bf400: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f2 f2 f2
> [ 1046.300247]  ffff8881011bf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.304752] >ffff8881011bf500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.309172]                                                     ^
> [ 1046.313414]  ffff8881011bf580: 00 00 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00
> [ 1046.318113]  ffff8881011bf600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.322543] ==================================================================
> 
> I'll keep investigating too.

Found it -- this patch seems to fix it. I'll plan to roll it into the
earlier patch that caused the bug, and will push an updated branch to
wip-fscrypt-fnames.

Good catch!

--------------------------------------8<-------------------------------

[PATCH] SQUASH: fix cap encoding when fscrypt is disabled

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 6e9f4de883d1..80f521dd7254 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1312,11 +1312,16 @@ static void encode_cap_msg(struct ceph_msg *msg,
struct cap_msg_args *arg)
 	ceph_encode_64(&p, 0);
 	ceph_encode_64(&p, 0);
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
 	/* fscrypt_auth and fscrypt_file (version 12) */
 	ceph_encode_32(&p, arg->fscrypt_auth_len);
 	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
 	ceph_encode_32(&p, arg->fscrypt_file_len);
 	ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
+#else /* CONFIG_FS_ENCRYPTION */
+	ceph_encode_32(&p, 0);
+	ceph_encode_32(&p, 0);
+#endif /* CONFIG_FS_ENCRYPTION */
 }
 
 /*
Xiubo Li Nov. 2, 2021, 11:31 a.m. UTC | #7
On 11/2/21 6:52 PM, Jeff Layton wrote:
> On Tue, 2021-11-02 at 17:44 +0800, Xiubo Li wrote:
>> On 11/1/21 6:27 PM, Jeff Layton wrote:
>>> On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> This patch series is based on the "fscrypt_size_handling" branch in
>>>> https://github.com/lxbsz/linux.git, which is based Jeff's
>>>> "ceph-fscrypt-content-experimental" branch in
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
>>>> and added two upstream commits, which should be merged already.
>>>>
>>>> These two upstream commits should be removed after Jeff rebase
>>>> his "ceph-fscrypt-content-experimental" branch to upstream code.
>>>>
>>> I don't think I was clear last time. I'd like for you to post the
>>> _entire_ stack of patches that is based on top of
>>> ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
>>> this point, so I think it's a reasonable place for you to base your
>>> work. That way you're not beginning with a revert.
>> Hi Jeff,
>>
>> BTW, have test by disabling the CONFIG_FS_ENCRYPTION option for branch
>> ceph-client/wip-fscrypt-fnames ?
>>
>> I have tried it today but the kernel will crash always with the
>> following script. I tried many times the terminal, which is running 'cat
>> /proc/kmsg' will always be stuck without any call trace about it.
>>
>> # mkdir dir && echo "123" > dir/testfile
>>
>> By enabling the CONFIG_FS_ENCRYPTION, I haven't countered any issue yet.
>>
>> I am still debugging on it.
>>
>>
>
> No, I hadn't noticed that, but I can reproduce it too. AFAICT, bash is
> sitting in a pselect() call:
>
> [jlayton@client1 ~]$ sudo cat /proc/1163/stack
> [<0>] poll_schedule_timeout.constprop.0+0x53/0xa0
> [<0>] do_select+0xb51/0xc70
> [<0>] core_sys_select+0x2ac/0x620
> [<0>] do_pselect.constprop.0+0x101/0x1b0
> [<0>] __x64_sys_pselect6+0x9a/0xc0
> [<0>] do_syscall_64+0x3b/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> After playing around a bit more, I saw this KASAN pop, which may be
> related:
>
> [ 1046.013880] ==================================================================
> [ 1046.017053] BUG: KASAN: out-of-bounds in encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.019441] Read of size 18446744071716025685 at addr ffff8881011bf558 by task kworker/7:1/82
> [ 1046.022243]
> [ 1046.022785] CPU: 7 PID: 82 Comm: kworker/7:1 Tainted: G            E     5.15.0-rc6+ #43
> [ 1046.025421] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
> [ 1046.028159] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [ 1046.030111] Call Trace:
> [ 1046.030983]  dump_stack_lvl+0x57/0x72
> [ 1046.032177]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.033864]  print_address_description.constprop.0+0x1f/0x140
> [ 1046.035807]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.037221]  ? encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.038680]  kasan_report.cold+0x7f/0x11b
> [ 1046.039853]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.041317]  ? encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.042782]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.044168]  kasan_check_range+0xf5/0x1d0
> [ 1046.045325]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.046679]  memcpy+0x20/0x60
> [ 1046.047555]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.048930]  encode_cap_msg+0x76c/0xa80 [ceph]
> [ 1046.050383]  ? ceph_kvmalloc+0xdd/0x110 [libceph]
> [ 1046.051888]  ? ceph_msg_new2+0xf7/0x210 [libceph]
> [ 1046.053395]  __send_cap+0x40/0x180 [ceph]
> [ 1046.054696]  ceph_check_caps+0x5a2/0xc50 [ceph]
> [ 1046.056482]  ? deref_stack_reg+0xb0/0xb0
> [ 1046.057786]  ? ceph_con_workfn+0x224/0x8b0 [libceph]
> [ 1046.059471]  ? __ceph_should_report_size+0x90/0x90 [ceph]
> [ 1046.061190]  ? lock_is_held_type+0xe0/0x110
> [ 1046.062453]  ? find_held_lock+0x85/0xa0
> [ 1046.063684]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.065089]  ? lock_release+0x1c7/0x3e0
> [ 1046.066225]  ? wait_for_completion+0x150/0x150
> [ 1046.067570]  ? __ceph_caps_file_wanted+0x25a/0x380 [ceph]
> [ 1046.069319]  handle_cap_grant+0x113c/0x13a0 [ceph]
> [ 1046.070962]  ? ceph_kick_flushing_inode_caps+0x240/0x240 [ceph]
> [ 1046.081699]  ? __cap_is_valid+0x82/0x100 [ceph]
> [ 1046.091755]  ? rb_next+0x1e/0x80
> [ 1046.096640]  ? __ceph_caps_issued+0xe0/0x130 [ceph]
> [ 1046.101331]  ceph_handle_caps+0x10f9/0x2280 [ceph]
> [ 1046.106003]  ? mds_dispatch+0x134/0x2470 [ceph]
> [ 1046.110416]  ? ceph_remove_capsnap+0x90/0x90 [ceph]
> [ 1046.114901]  ? __mutex_lock+0x180/0xc10
> [ 1046.119178]  ? release_sock+0x1d/0xf0
> [ 1046.123331]  ? mds_dispatch+0xaf/0x2470 [ceph]
> [ 1046.127588]  ? __mutex_unlock_slowpath+0x105/0x3c0
> [ 1046.131845]  mds_dispatch+0x6fb/0x2470 [ceph]
> [ 1046.136002]  ? tcp_recvmsg+0xe0/0x2c0
> [ 1046.140038]  ? ceph_mdsc_handle_mdsmap+0x3c0/0x3c0 [ceph]
> [ 1046.144255]  ? wait_for_completion+0x150/0x150
> [ 1046.148235]  ceph_con_process_message+0xd9/0x240 [libceph]
> [ 1046.152387]  ? iov_iter_advance+0x8e/0x480
> [ 1046.156239]  process_message+0xf/0x100 [libceph]
> [ 1046.160219]  ceph_con_v2_try_read+0x1561/0x1b00 [libceph]
> [ 1046.164317]  ? __handle_control+0x1730/0x1730 [libceph]
> [ 1046.168345]  ? __lock_acquire+0x830/0x2c60
> [ 1046.172183]  ? __mutex_lock+0x180/0xc10
> [ 1046.175910]  ? ceph_con_workfn+0x41/0x8b0 [libceph]
> [ 1046.179814]  ? lockdep_hardirqs_on_prepare+0x220/0x220
> [ 1046.183688]  ? mutex_lock_io_nested+0xba0/0xba0
> [ 1046.187559]  ? lock_release+0x3e0/0x3e0
> [ 1046.191422]  ceph_con_workfn+0x224/0x8b0 [libceph]
> [ 1046.195464]  process_one_work+0x4fd/0x9a0
> [ 1046.199281]  ? pwq_dec_nr_in_flight+0x100/0x100
> [ 1046.203075]  ? rwlock_bug.part.0+0x60/0x60
> [ 1046.206787]  worker_thread+0x2d4/0x6e0
> [ 1046.210488]  ? process_one_work+0x9a0/0x9a0
> [ 1046.214254]  kthread+0x1e3/0x210
> [ 1046.217911]  ? set_kthread_struct+0x80/0x80
> [ 1046.221694]  ret_from_fork+0x22/0x30
> [ 1046.225553]
> [ 1046.228927] The buggy address belongs to the page:
> [ 1046.232690] page:000000001ee14099 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1011bf
> [ 1046.237195] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
> [ 1046.241352] raw: 0017ffffc0000000 ffffea0004046fc8 ffffea0004046fc8 0000000000000000
> [ 1046.245998] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 1046.250612] page dumped because: kasan: bad access detected
> [ 1046.254948]
> [ 1046.258789] addr ffff8881011bf558 is located in stack of task kworker/7:1/82 at offset 296 in frame:
> [ 1046.263501]  ceph_check_caps+0x0/0xc50 [ceph]
> [ 1046.267766]
> [ 1046.271643] this frame has 3 objects:
> [ 1046.275934]  [32, 36) 'implemented'
> [ 1046.275941]  [48, 56) 'oldest_flush_tid'
> [ 1046.280091]  [80, 352) 'arg'
> [ 1046.284281]
> [ 1046.291847] Memory state around the buggy address:
> [ 1046.295874]  ffff8881011bf400: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f2 f2 f2
> [ 1046.300247]  ffff8881011bf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.304752] >ffff8881011bf500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.309172]                                                     ^
> [ 1046.313414]  ffff8881011bf580: 00 00 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00
> [ 1046.318113]  ffff8881011bf600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1046.322543] ==================================================================

Cool.

It seems the issue I mentioned before months ago, which is it doesn't 
allocated enough space for the msg structure, but I couldn't remember in 
which thread.



> I'll keep investigating too.
Xiubo Li Nov. 2, 2021, 12:13 p.m. UTC | #8
On 11/2/21 7:29 PM, Jeff Layton wrote:
> On Tue, 2021-11-02 at 06:52 -0400, Jeff Layton wrote:
>> On Tue, 2021-11-02 at 17:44 +0800, Xiubo Li wrote:
>>> On 11/1/21 6:27 PM, Jeff Layton wrote:
>>>> On Mon, 2021-11-01 at 10:04 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> This patch series is based on the "fscrypt_size_handling" branch in
>>>>> https://github.com/lxbsz/linux.git, which is based Jeff's
>>>>> "ceph-fscrypt-content-experimental" branch in
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
>>>>> and added two upstream commits, which should be merged already.
>>>>>
>>>>> These two upstream commits should be removed after Jeff rebase
>>>>> his "ceph-fscrypt-content-experimental" branch to upstream code.
>>>>>
>>>> I don't think I was clear last time. I'd like for you to post the
>>>> _entire_ stack of patches that is based on top of
>>>> ceph-client/wip-fscrypt-fnames. wip-fscrypt-fnames is pretty stable at
>>>> this point, so I think it's a reasonable place for you to base your
>>>> work. That way you're not beginning with a revert.
>>> Hi Jeff,
>>>
>>> BTW, have test by disabling the CONFIG_FS_ENCRYPTION option for branch
>>> ceph-client/wip-fscrypt-fnames ?
>>>
>>> I have tried it today but the kernel will crash always with the
>>> following script. I tried many times the terminal, which is running 'cat
>>> /proc/kmsg' will always be stuck without any call trace about it.
>>>
>>> # mkdir dir && echo "123" > dir/testfile
>>>
>>> By enabling the CONFIG_FS_ENCRYPTION, I haven't countered any issue yet.
>>>
>>> I am still debugging on it.
>>>
>>>
>>
>> No, I hadn't noticed that, but I can reproduce it too. AFAICT, bash is
>> sitting in a pselect() call:
>>
>> [jlayton@client1 ~]$ sudo cat /proc/1163/stack
>> [<0>] poll_schedule_timeout.constprop.0+0x53/0xa0
>> [<0>] do_select+0xb51/0xc70
>> [<0>] core_sys_select+0x2ac/0x620
>> [<0>] do_pselect.constprop.0+0x101/0x1b0
>> [<0>] __x64_sys_pselect6+0x9a/0xc0
>> [<0>] do_syscall_64+0x3b/0x90
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> After playing around a bit more, I saw this KASAN pop, which may be
>> related:
>>
>> [ 1046.013880] ==================================================================
>> [ 1046.017053] BUG: KASAN: out-of-bounds in encode_cap_msg+0x76c/0xa80 [ceph]
>> [ 1046.019441] Read of size 18446744071716025685 at addr ffff8881011bf558 by task kworker/7:1/82
>> [ 1046.022243]
>> [ 1046.022785] CPU: 7 PID: 82 Comm: kworker/7:1 Tainted: G            E     5.15.0-rc6+ #43
>> [ 1046.025421] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
>> [ 1046.028159] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>> [ 1046.030111] Call Trace:
>> [ 1046.030983]  dump_stack_lvl+0x57/0x72
>> [ 1046.032177]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.033864]  print_address_description.constprop.0+0x1f/0x140
>> [ 1046.035807]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.037221]  ? encode_cap_msg+0x76c/0xa80 [ceph]
>> [ 1046.038680]  kasan_report.cold+0x7f/0x11b
>> [ 1046.039853]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.041317]  ? encode_cap_msg+0x76c/0xa80 [ceph]
>> [ 1046.042782]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.044168]  kasan_check_range+0xf5/0x1d0
>> [ 1046.045325]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.046679]  memcpy+0x20/0x60
>> [ 1046.047555]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.048930]  encode_cap_msg+0x76c/0xa80 [ceph]
>> [ 1046.050383]  ? ceph_kvmalloc+0xdd/0x110 [libceph]
>> [ 1046.051888]  ? ceph_msg_new2+0xf7/0x210 [libceph]
>> [ 1046.053395]  __send_cap+0x40/0x180 [ceph]
>> [ 1046.054696]  ceph_check_caps+0x5a2/0xc50 [ceph]
>> [ 1046.056482]  ? deref_stack_reg+0xb0/0xb0
>> [ 1046.057786]  ? ceph_con_workfn+0x224/0x8b0 [libceph]
>> [ 1046.059471]  ? __ceph_should_report_size+0x90/0x90 [ceph]
>> [ 1046.061190]  ? lock_is_held_type+0xe0/0x110
>> [ 1046.062453]  ? find_held_lock+0x85/0xa0
>> [ 1046.063684]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.065089]  ? lock_release+0x1c7/0x3e0
>> [ 1046.066225]  ? wait_for_completion+0x150/0x150
>> [ 1046.067570]  ? __ceph_caps_file_wanted+0x25a/0x380 [ceph]
>> [ 1046.069319]  handle_cap_grant+0x113c/0x13a0 [ceph]
>> [ 1046.070962]  ? ceph_kick_flushing_inode_caps+0x240/0x240 [ceph]
>> [ 1046.081699]  ? __cap_is_valid+0x82/0x100 [ceph]
>> [ 1046.091755]  ? rb_next+0x1e/0x80
>> [ 1046.096640]  ? __ceph_caps_issued+0xe0/0x130 [ceph]
>> [ 1046.101331]  ceph_handle_caps+0x10f9/0x2280 [ceph]
>> [ 1046.106003]  ? mds_dispatch+0x134/0x2470 [ceph]
>> [ 1046.110416]  ? ceph_remove_capsnap+0x90/0x90 [ceph]
>> [ 1046.114901]  ? __mutex_lock+0x180/0xc10
>> [ 1046.119178]  ? release_sock+0x1d/0xf0
>> [ 1046.123331]  ? mds_dispatch+0xaf/0x2470 [ceph]
>> [ 1046.127588]  ? __mutex_unlock_slowpath+0x105/0x3c0
>> [ 1046.131845]  mds_dispatch+0x6fb/0x2470 [ceph]
>> [ 1046.136002]  ? tcp_recvmsg+0xe0/0x2c0
>> [ 1046.140038]  ? ceph_mdsc_handle_mdsmap+0x3c0/0x3c0 [ceph]
>> [ 1046.144255]  ? wait_for_completion+0x150/0x150
>> [ 1046.148235]  ceph_con_process_message+0xd9/0x240 [libceph]
>> [ 1046.152387]  ? iov_iter_advance+0x8e/0x480
>> [ 1046.156239]  process_message+0xf/0x100 [libceph]
>> [ 1046.160219]  ceph_con_v2_try_read+0x1561/0x1b00 [libceph]
>> [ 1046.164317]  ? __handle_control+0x1730/0x1730 [libceph]
>> [ 1046.168345]  ? __lock_acquire+0x830/0x2c60
>> [ 1046.172183]  ? __mutex_lock+0x180/0xc10
>> [ 1046.175910]  ? ceph_con_workfn+0x41/0x8b0 [libceph]
>> [ 1046.179814]  ? lockdep_hardirqs_on_prepare+0x220/0x220
>> [ 1046.183688]  ? mutex_lock_io_nested+0xba0/0xba0
>> [ 1046.187559]  ? lock_release+0x3e0/0x3e0
>> [ 1046.191422]  ceph_con_workfn+0x224/0x8b0 [libceph]
>> [ 1046.195464]  process_one_work+0x4fd/0x9a0
>> [ 1046.199281]  ? pwq_dec_nr_in_flight+0x100/0x100
>> [ 1046.203075]  ? rwlock_bug.part.0+0x60/0x60
>> [ 1046.206787]  worker_thread+0x2d4/0x6e0
>> [ 1046.210488]  ? process_one_work+0x9a0/0x9a0
>> [ 1046.214254]  kthread+0x1e3/0x210
>> [ 1046.217911]  ? set_kthread_struct+0x80/0x80
>> [ 1046.221694]  ret_from_fork+0x22/0x30
>> [ 1046.225553]
>> [ 1046.228927] The buggy address belongs to the page:
>> [ 1046.232690] page:000000001ee14099 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1011bf
>> [ 1046.237195] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
>> [ 1046.241352] raw: 0017ffffc0000000 ffffea0004046fc8 ffffea0004046fc8 0000000000000000
>> [ 1046.245998] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
>> [ 1046.250612] page dumped because: kasan: bad access detected
>> [ 1046.254948]
>> [ 1046.258789] addr ffff8881011bf558 is located in stack of task kworker/7:1/82 at offset 296 in frame:
>> [ 1046.263501]  ceph_check_caps+0x0/0xc50 [ceph]
>> [ 1046.267766]
>> [ 1046.271643] this frame has 3 objects:
>> [ 1046.275934]  [32, 36) 'implemented'
>> [ 1046.275941]  [48, 56) 'oldest_flush_tid'
>> [ 1046.280091]  [80, 352) 'arg'
>> [ 1046.284281]
>> [ 1046.291847] Memory state around the buggy address:
>> [ 1046.295874]  ffff8881011bf400: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f2 f2 f2
>> [ 1046.300247]  ffff8881011bf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [ 1046.304752] >ffff8881011bf500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [ 1046.309172]                                                     ^
>> [ 1046.313414]  ffff8881011bf580: 00 00 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00
>> [ 1046.318113]  ffff8881011bf600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [ 1046.322543] ==================================================================
>>
>> I'll keep investigating too.
> Found it -- this patch seems to fix it. I'll plan to roll it into the
> earlier patch that caused the bug, and will push an updated branch to
> wip-fscrypt-fnames.
>
> Good catch!
>
> --------------------------------------8<-------------------------------
>
> [PATCH] SQUASH: fix cap encoding when fscrypt is disabled
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/caps.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 6e9f4de883d1..80f521dd7254 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1312,11 +1312,16 @@ static void encode_cap_msg(struct ceph_msg *msg,
> struct cap_msg_args *arg)
>   	ceph_encode_64(&p, 0);
>   	ceph_encode_64(&p, 0);
>   
> +#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
>   	/* fscrypt_auth and fscrypt_file (version 12) */
>   	ceph_encode_32(&p, arg->fscrypt_auth_len);
>   	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
>   	ceph_encode_32(&p, arg->fscrypt_file_len);
>   	ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
> +#else /* CONFIG_FS_ENCRYPTION */
> +	ceph_encode_32(&p, 0);
> +	ceph_encode_32(&p, 0);
> +#endif /* CONFIG_FS_ENCRYPTION */
>   }
>   
>   /*

Yeah, this should be the root cause.