[0/2] ceph: misc fixes for the fscrypt truncate size handling

Message ID	20211108135012.79941-1-xiubli@redhat.com (mailing list archive)
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, khiremat@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com> Subject: [PATCH 0/2] ceph: misc fixes for the fscrypt truncate size handling Date: Mon, 8 Nov 2021 21:50:10 +0800 Message-Id: <20211108135012.79941-1-xiubli@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ceph: misc fixes for the fscrypt truncate size handling \| expand [0/2] ceph: misc fixes for the fscrypt truncate size handling [1/2] ceph: fix possible crash and data corrupt bugs [2/2] ceph: there is no need to round up the sizes when new size is 0

Xiubo Li Nov. 8, 2021, 1:50 p.m. UTC

From: Xiubo Li <xiubli@redhat.com>

Hi Jeff,

The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.

Thanks.

Xiubo Li (2):
  ceph: fix possible crash and data corrupt bugs
  ceph: there is no need to round up the sizes when new size is 0

 fs/ceph/inode.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Xiubo Li Nov. 8, 2021, 2:10 p.m. UTC | #1

Hi Jeff,

After this fixing, there still has one bug and I am still looking at it:

[root@lxbceph1 xfstests]# ./check generic/075
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+

generic/075     [failed, exit status 1] - output mismatch (see 
/mnt/kcephfs/xfstests/results//generic/075.out.bad)
     --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
     +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 
21:20:49.741906997 +0800
     @@ -12,7 +12,4 @@
      -----------------------------------------------
      fsx.2 : -d -N numops -l filelen -S 0
      -----------------------------------------------
     -
     ------------------------------------------------
     -fsx.3 : -d -N numops -l filelen -S 0 -x
     ------------------------------------------------
     ...
     (Run 'diff -u tests/generic/075.out 
/mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the entire diff)
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests


I checked the result outputs, it seems when truncating the size to a 
smaller sizeA and then to a bigger sizeB again, in theory those contents 
between sizeA and sizeB should be zeroed, but it didn't.

The last block updating is correct.

Any idea ?

Thanks.


On 11/8/21 9:50 PM, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
>
> Hi Jeff,
>
> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
>
> Thanks.
>
> Xiubo Li (2):
>    ceph: fix possible crash and data corrupt bugs
>    ceph: there is no need to round up the sizes when new size is 0
>
>   fs/ceph/inode.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>

Jeffrey Layton Nov. 8, 2021, 2:26 p.m. UTC | #2

On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote:
> Hi Jeff,
> 
> After this fixing, there still has one bug and I am still looking at it:
> 
> [root@lxbceph1 xfstests]# ./check generic/075
> FSTYP         -- ceph
> PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+
> 
> generic/075     [failed, exit status 1] - output mismatch (see 
> /mnt/kcephfs/xfstests/results//generic/075.out.bad)
>      --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
>      +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 
> 21:20:49.741906997 +0800
>      @@ -12,7 +12,4 @@
>       -----------------------------------------------
>       fsx.2 : -d -N numops -l filelen -S 0
>       -----------------------------------------------
>      -
>      ------------------------------------------------
>      -fsx.3 : -d -N numops -l filelen -S 0 -x
>      ------------------------------------------------
>      ...
>      (Run 'diff -u tests/generic/075.out 
> /mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the entire diff)
> Ran: generic/075
> Failures: generic/075
> Failed 1 of 1 tests
> 
> 
> I checked the result outputs, it seems when truncating the size to a 
> smaller sizeA and then to a bigger sizeB again, in theory those contents 
> between sizeA and sizeB should be zeroed, but it didn't.
> 
> The last block updating is correct.
> 
> Any idea ?
> 


Yep, that's the one I saw (intermittently) too. I also saw some failures
around generic/029 and generic/030 that may be related. I haven't dug
down as far into the problem as you have though.

The nice thing about fsx is that it gives you a lot of info about what
it does. There is also a way to replay a series of ops too, so you may
want to try to see if you can make a reliable reproducer for this
problem.

Are these truncates running concurrently in different tasks? If so, then
we may need some mechanism to ensure that they are serialized vs. one
another.

> 
> On 11/8/21 9:50 PM, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > Hi Jeff,
> > 
> > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
> > 
> > Thanks.
> > 
> > Xiubo Li (2):
> >    ceph: fix possible crash and data corrupt bugs
> >    ceph: there is no need to round up the sizes when new size is 0
> > 
> >   fs/ceph/inode.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
>

Xiubo Li Nov. 8, 2021, 2:27 p.m. UTC | #3

Possibly we need to pick up some more patches or some code section from 
your experimental branch for the read/write.

On 11/8/21 10:10 PM, Xiubo Li wrote:
> Hi Jeff,
>
> After this fixing, there still has one bug and I am still looking at it:
>
> [root@lxbceph1 xfstests]# ./check generic/075
> FSTYP         -- ceph
> PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+
>
> generic/075     [failed, exit status 1] - output mismatch (see 
> /mnt/kcephfs/xfstests/results//generic/075.out.bad)
>     --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
>     +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 
> 21:20:49.741906997 +0800
>     @@ -12,7 +12,4 @@
>      -----------------------------------------------
>      fsx.2 : -d -N numops -l filelen -S 0
>      -----------------------------------------------
>     -
>     ------------------------------------------------
>     -fsx.3 : -d -N numops -l filelen -S 0 -x
>     ------------------------------------------------
>     ...
>     (Run 'diff -u tests/generic/075.out 
> /mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the entire 
> diff)
> Ran: generic/075
> Failures: generic/075
> Failed 1 of 1 tests
>
>
> I checked the result outputs, it seems when truncating the size to a 
> smaller sizeA and then to a bigger sizeB again, in theory those 
> contents between sizeA and sizeB should be zeroed, but it didn't.
>
> The last block updating is correct.
>
> Any idea ?
>
> Thanks.
>
>
> On 11/8/21 9:50 PM, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Hi Jeff,
>>
>> The #1 could be squashed to the previous "ceph: add truncate size 
>> handling support for fscrypt" commit.
>> The #2 could be squashed to the previous "ceph: fscrypt_file field 
>> handling in MClientRequest messages" commit.
>>
>> Thanks.
>>
>> Xiubo Li (2):
>>    ceph: fix possible crash and data corrupt bugs
>>    ceph: there is no need to round up the sizes when new size is 0
>>
>>   fs/ceph/inode.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>

Jeffrey Layton Nov. 8, 2021, 2:32 p.m. UTC | #4

On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Hi Jeff,
> 
> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
> 
> Thanks.
> 
> Xiubo Li (2):
>   ceph: fix possible crash and data corrupt bugs
>   ceph: there is no need to round up the sizes when new size is 0
> 
>  fs/ceph/inode.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 

Done. I folded these into the patches like you suggested above.

Xiubo Li Nov. 8, 2021, 2:41 p.m. UTC | #5

On 11/8/21 10:26 PM, Jeff Layton wrote:
> On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote:
>> Hi Jeff,
>>
>> After this fixing, there still has one bug and I am still looking at it:
>>
>> [root@lxbceph1 xfstests]# ./check generic/075
>> FSTYP         -- ceph
>> PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+
>>
>> generic/075     [failed, exit status 1] - output mismatch (see
>> /mnt/kcephfs/xfstests/results//generic/075.out.bad)
>>       --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
>>       +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08
>> 21:20:49.741906997 +0800
>>       @@ -12,7 +12,4 @@
>>        -----------------------------------------------
>>        fsx.2 : -d -N numops -l filelen -S 0
>>        -----------------------------------------------
>>       -
>>       ------------------------------------------------
>>       -fsx.3 : -d -N numops -l filelen -S 0 -x
>>       ------------------------------------------------
>>       ...
>>       (Run 'diff -u tests/generic/075.out
>> /mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the entire diff)
>> Ran: generic/075
>> Failures: generic/075
>> Failed 1 of 1 tests
>>
>>
>> I checked the result outputs, it seems when truncating the size to a
>> smaller sizeA and then to a bigger sizeB again, in theory those contents
>> between sizeA and sizeB should be zeroed, but it didn't.
>>
>> The last block updating is correct.
>>
>> Any idea ?
>>
>
> Yep, that's the one I saw (intermittently) too. I also saw some failures
> around generic/029 and generic/030 that may be related. I haven't dug
> down as far into the problem as you have though.
>
> The nice thing about fsx is that it gives you a lot of info about what
> it does. There is also a way to replay a series of ops too, so you may
> want to try to see if you can make a reliable reproducer for this
> problem.

I can reproduce this every time.


> Are these truncates running concurrently in different tasks?

No, from the "075.2.fsxlog" they are serialized. The truncates 
themselves worked well, but there also have some 
mapwrite/write/mapread/read between them. From the logs, I am sure that 
those none zeroed contents are from mapwrite/write. It seems the dirty 
pages are flushed to OSDs just after the truncates.


>   If so, then
> we may need some mechanism to ensure that they are serialized vs. one
> another.

The truncate will hold the 'inode->i_rwsem' lock too, so it won't allow 
the truncate/read/write to run in parallel. But I am not sure the mapwrite ?


>
>> On 11/8/21 9:50 PM, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> Hi Jeff,
>>>
>>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
>>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
>>>
>>> Thanks.
>>>
>>> Xiubo Li (2):
>>>     ceph: fix possible crash and data corrupt bugs
>>>     ceph: there is no need to round up the sizes when new size is 0
>>>
>>>    fs/ceph/inode.c | 8 +++++---
>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>

Xiubo Li Nov. 8, 2021, 2:44 p.m. UTC | #6

On 11/8/21 10:41 PM, Xiubo Li wrote:
>
> On 11/8/21 10:26 PM, Jeff Layton wrote:
>> On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote:
>>> Hi Jeff,
>>>
>>> After this fixing, there still has one bug and I am still looking at 
>>> it:
>>>
>>> [root@lxbceph1 xfstests]# ./check generic/075
>>> FSTYP         -- ceph
>>> PLATFORM      -- Linux/x86_64 lxbceph1 5.15.0-rc6+
>>>
>>> generic/075     [failed, exit status 1] - output mismatch (see
>>> /mnt/kcephfs/xfstests/results//generic/075.out.bad)
>>>       --- tests/generic/075.out    2021-11-08 20:54:07.456980801 +0800
>>>       +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08
>>> 21:20:49.741906997 +0800
>>>       @@ -12,7 +12,4 @@
>>>        -----------------------------------------------
>>>        fsx.2 : -d -N numops -l filelen -S 0
>>>        -----------------------------------------------
>>>       -
>>>       ------------------------------------------------
>>>       -fsx.3 : -d -N numops -l filelen -S 0 -x
>>>       ------------------------------------------------
>>>       ...
>>>       (Run 'diff -u tests/generic/075.out
>>> /mnt/kcephfs/xfstests/results//generic/075.out.bad'  to see the 
>>> entire diff)
>>> Ran: generic/075
>>> Failures: generic/075
>>> Failed 1 of 1 tests
>>>
>>>
>>> I checked the result outputs, it seems when truncating the size to a
>>> smaller sizeA and then to a bigger sizeB again, in theory those 
>>> contents
>>> between sizeA and sizeB should be zeroed, but it didn't.
>>>
>>> The last block updating is correct.
>>>
>>> Any idea ?
>>>
>>
>> Yep, that's the one I saw (intermittently) too. I also saw some failures
>> around generic/029 and generic/030 that may be related. I haven't dug
>> down as far into the problem as you have though.
>>
>> The nice thing about fsx is that it gives you a lot of info about what
>> it does. There is also a way to replay a series of ops too, so you may
>> want to try to see if you can make a reliable reproducer for this
>> problem.
>
> I can reproduce this every time.
>
>
>> Are these truncates running concurrently in different tasks?
>
> No, from the "075.2.fsxlog" they are serialized. The truncates 
> themselves worked well, but there also have some 
> mapwrite/write/mapread/read between them. From the logs, I am sure 
> that those none zeroed contents are from mapwrite/write. It seems the 
> dirty pages are flushed to OSDs just after the truncates.
>
>
>>   If so, then
>> we may need some mechanism to ensure that they are serialized vs. one
>> another.
>
> The truncate will hold the 'inode->i_rwsem' lock too, so it won't 
> allow the truncate/read/write to run in parallel. But I am not sure 
> the mapwrite ?
>
>
The logs segments from "075.2.fsxlog":

2317 3728 mapread    0x330312 thru   0x33a6b1        (0xa3a0 bytes)
2318 3730 punch      from 0x5ffc53 to 0x608516, (0x88c3 bytes)
2319 3731 write      0x6e9465 thru   0x6f8c04        (0xf7a0 bytes)
2320 3732 mapread    0x49f516 thru   0x49f570        (0x5b bytes)
2321 3733 write      0x72847b thru   0x733f14        (0xba9a bytes)
2322 3736 punch      from 0x2a90a0 to 0x2aa68e, (0x15ee bytes)
2323 3739 write      0x644a24 thru   0x64aa30        (0x600d bytes)
2324 3740 trunc      from 0x7aa4b0 to 0x9dbef3
2325 3741 mapread    0x5aa6bd thru   0x5b7246        (0xcb8a bytes)
2326 3742 trunc      from 0x9dbef3 to 0x718ae4
2327 3743 write      0x3ac9b0 thru   0x3aeee0        (0x2531 bytes)
2328 3744 read       0x6e171c thru   0x6f0fd6        (0xf8bb bytes)
2329 3747 trunc      from 0x718ae4 to 0x627ddb
2330 3748 mapread    0xe4e2c thru    0xf0bd5 (0xbdaa bytes)
2331 3752 write      0x71def1 thru   0x71e152        (0x262 bytes)
2332 3753 mapwrite   0x9eb0d8 thru   0x9f4ef7        (0x9e20 bytes)
2333 3754 mapwrite   0x7db56d thru   0x7e1278        (0x5d0c bytes)
2334 3755 punch      from 0x9368cb to 0x9437fb, (0xcf30 bytes)
2335 3757 write      0x366827 thru   0x3699ff        (0x31d9 bytes)
2336 3761 mapwrite   0x529471 thru   0x52b085        (0x1c15 bytes)
2337 3762 trunc      from 0x9f4ef8 to 0x86bfab
2338 3764 write      0x9c85b9 thru   0x9d0bdc        (0x8624 bytes)
2339 3765 mapread    0x11b451 thru   0x11fec5        (0x4a75 bytes)
2340 3766 write      0x5938cb thru   0x59e0d0        (0xa806 bytes)
2341 3767 read       0xe3063 thru    0xe8ee7 (0x5e85 bytes)
2342 3768 punch      from 0x859f3f to 0x8698ec, (0xf9ad bytes)
2343 3771 punch      from 0x86d188 to 0x86eef3, (0x1d6b bytes)
2344 3773 write      0x9f43c9 thru   0x9fffff        (0xbc37 bytes)
2345 3774 trunc      from 0xa00000 to 0x26d4b9
2346 3777 trunc      from 0x26d4b9 to 0x9c695f
2347 3783 trunc      from 0x9c695f to 0x9129ed
2348 3784 mapread    0x448402 thru   0x45074d        (0x834c bytes)
2349 READ BAD DATA: offset = 0x448402, size = 0x834c, fname = 075.2
2350 OFFSET  GOOD    BAD     RANGE
2351 0x448402        0x0000  0x74ea  0x00000
2352 operation# (mod 256) for the bad data may be 116
2353 0x448403        0x0000  0xea74  0x00001
2354 operation# (mod 256) for the bad data may be 116



>>
>>> On 11/8/21 9:50 PM, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> Hi Jeff,
>>>>
>>>> The #1 could be squashed to the previous "ceph: add truncate size 
>>>> handling support for fscrypt" commit.
>>>> The #2 could be squashed to the previous "ceph: fscrypt_file field 
>>>> handling in MClientRequest messages" commit.
>>>>
>>>> Thanks.
>>>>
>>>> Xiubo Li (2):
>>>>     ceph: fix possible crash and data corrupt bugs
>>>>     ceph: there is no need to round up the sizes when new size is 0
>>>>
>>>>    fs/ceph/inode.c | 8 +++++---
>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>

Jeffrey Layton Nov. 8, 2021, 6:36 p.m. UTC | #7

On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Hi Jeff,
> 
> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
> 
> Thanks.
> 
> Xiubo Li (2):
>   ceph: fix possible crash and data corrupt bugs
>   ceph: there is no need to round up the sizes when new size is 0
> 
>  fs/ceph/inode.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 

I'm running xfstests today with the current state of wip-fscrypt-size
(including the two patches above) with test_dummy_encryption enabled.
generic/030 failed. The expected output of this part of the test was:

wrote 5136912/5136912 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
==== Pre-Remount ===
00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58 
|XXXXXXXXXXXXXXXX|
*
004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59 
|YYYYYYYYYYYYYYYY|
*
004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00 
|YYYYYYYY........|
004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
004e7000
==== Post-Remount ==
00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58 
|XXXXXXXXXXXXXXXX|
*
004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59 
|YYYYYYYYYYYYYYYY|
*
004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00 
|YYYYYYYY........|
004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
004e7000

...but I got this instead:

wrote 5136912/5136912 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
==== Pre-Remount ===
00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58 
|XXXXXXXXXXXXXXXX|
*
00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
004e7000
==== Post-Remount ==
00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58 
|XXXXXXXXXXXXXXXX|
*
00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
|................|
*
004e7000


...I suspect this is related to the 075 failures, but I don't see it on
every attempt, which makes me think "race condition". I'll keep hunting.

Cheers,

Xiubo Li Nov. 9, 2021, 12:21 a.m. UTC | #8

On 11/9/21 2:36 AM, Jeff Layton wrote:
> On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Hi Jeff,
>>
>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
>>
>> Thanks.
>>
>> Xiubo Li (2):
>>    ceph: fix possible crash and data corrupt bugs
>>    ceph: there is no need to round up the sizes when new size is 0
>>
>>   fs/ceph/inode.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
> I'm running xfstests today with the current state of wip-fscrypt-size
> (including the two patches above) with test_dummy_encryption enabled.
> generic/030 failed. The expected output of this part of the test was:
>
> wrote 5136912/5136912 bytes at offset 0
> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ==== Pre-Remount ===
> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> |XXXXXXXXXXXXXXXX|
> *
> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> |YYYYYYYYYYYYYYYY|
> *
> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> |YYYYYYYY........|
> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 004e7000
> ==== Post-Remount ==
> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> |XXXXXXXXXXXXXXXX|
> *
> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> |YYYYYYYYYYYYYYYY|
> *
> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> |YYYYYYYY........|
> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 004e7000
>
> ...but I got this instead:
>
> wrote 5136912/5136912 bytes at offset 0
> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ==== Pre-Remount ===
> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> |XXXXXXXXXXXXXXXX|
> *
> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 004e7000
> ==== Post-Remount ==
> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> |XXXXXXXXXXXXXXXX|
> *
> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 004e7000
>
>
> ...I suspect this is related to the 075 failures, but I don't see it on
> every attempt, which makes me think "race condition". I'll keep hunting.

Yeah, seems the same issue.


> Cheers,

Jeffrey Layton Nov. 9, 2021, 1:57 a.m. UTC | #9

On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote:
> On 11/9/21 2:36 AM, Jeff Layton wrote:
> > On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > Hi Jeff,
> > > 
> > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> > > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
> > > 
> > > Thanks.
> > > 
> > > Xiubo Li (2):
> > >    ceph: fix possible crash and data corrupt bugs
> > >    ceph: there is no need to round up the sizes when new size is 0
> > > 
> > >   fs/ceph/inode.c | 8 +++++---
> > >   1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > I'm running xfstests today with the current state of wip-fscrypt-size
> > (including the two patches above) with test_dummy_encryption enabled.
> > generic/030 failed. The expected output of this part of the test was:
> > 
> > wrote 5136912/5136912 bytes at offset 0
> > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > ==== Pre-Remount ===
> > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > XXXXXXXXXXXXXXXX|
> > *
> > 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> > > YYYYYYYYYYYYYYYY|
> > *
> > 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> > > YYYYYYYY........|
> > 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > ................|
> > *
> > 004e7000
> > ==== Post-Remount ==
> > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > XXXXXXXXXXXXXXXX|
> > *
> > 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> > > YYYYYYYYYYYYYYYY|
> > *
> > 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> > > YYYYYYYY........|
> > 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > ................|
> > *
> > 004e7000
> > 
> > ...but I got this instead:
> > 
> > wrote 5136912/5136912 bytes at offset 0
> > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > ==== Pre-Remount ===
> > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > XXXXXXXXXXXXXXXX|
> > *
> > 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > ................|
> > *
> > 004e7000
> > ==== Post-Remount ==
> > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > XXXXXXXXXXXXXXXX|
> > *
> > 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > ................|
> > *
> > 004e7000
> > 
> > 
> > ...I suspect this is related to the 075 failures, but I don't see it on
> > every attempt, which makes me think "race condition". I'll keep hunting.
> 
> Yeah, seems the same issue.
> 

I wonder if we're hitting the same problem that this patch was intended
to fix:

    057ba5b24532 ceph: Fix race between hole punch and page fault

It seems like the problem is very similar. It may be worth testing a
patch that takes the filemap_invalidate_lock across the on the wire
truncate and the vmtruncate.

Jeffrey Layton Nov. 9, 2021, 12:33 p.m. UTC | #10

On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote:
> On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote:
> > On 11/9/21 2:36 AM, Jeff Layton wrote:
> > > On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > Hi Jeff,
> > > > 
> > > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
> > > > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
> > > > 
> > > > Thanks.
> > > > 
> > > > Xiubo Li (2):
> > > >    ceph: fix possible crash and data corrupt bugs
> > > >    ceph: there is no need to round up the sizes when new size is 0
> > > > 
> > > >   fs/ceph/inode.c | 8 +++++---
> > > >   1 file changed, 5 insertions(+), 3 deletions(-)
> > > > 
> > > I'm running xfstests today with the current state of wip-fscrypt-size
> > > (including the two patches above) with test_dummy_encryption enabled.
> > > generic/030 failed. The expected output of this part of the test was:
> > > 
> > > wrote 5136912/5136912 bytes at offset 0
> > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > > ==== Pre-Remount ===
> > > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > > XXXXXXXXXXXXXXXX|
> > > *
> > > 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> > > > YYYYYYYYYYYYYYYY|
> > > *
> > > 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> > > > YYYYYYYY........|
> > > 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > > ................|
> > > *
> > > 004e7000
> > > ==== Post-Remount ==
> > > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > > XXXXXXXXXXXXXXXX|
> > > *
> > > 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
> > > > YYYYYYYYYYYYYYYY|
> > > *
> > > 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
> > > > YYYYYYYY........|
> > > 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > > ................|
> > > *
> > > 004e7000
> > > 
> > > ...but I got this instead:
> > > 
> > > wrote 5136912/5136912 bytes at offset 0
> > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > > ==== Pre-Remount ===
> > > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > > XXXXXXXXXXXXXXXX|
> > > *
> > > 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > > ................|
> > > *
> > > 004e7000
> > > ==== Post-Remount ==
> > > 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
> > > > XXXXXXXXXXXXXXXX|
> > > *
> > > 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> > > > ................|
> > > *
> > > 004e7000
> > > 
> > > 
> > > ...I suspect this is related to the 075 failures, but I don't see it on
> > > every attempt, which makes me think "race condition". I'll keep hunting.
> > 
> > Yeah, seems the same issue.
> > 
> 
> I wonder if we're hitting the same problem that this patch was intended
> to fix:
> 
>     057ba5b24532 ceph: Fix race between hole punch and page fault
> 
> It seems like the problem is very similar. It may be worth testing a
> patch that takes the filemap_invalidate_lock across the on the wire
> truncate and the vmtruncate.


Nope. I tried a draft patch that make setattr hold this lock over the
MDS call and the vmtruncate and it didn't help.

Cheers,

Xiubo Li Nov. 15, 2021, 3:07 a.m. UTC | #11

On 11/9/21 8:33 PM, Jeff Layton wrote:
> On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote:
>> On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote:
>>> On 11/9/21 2:36 AM, Jeff Layton wrote:
>>>> On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
>>>>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Xiubo Li (2):
>>>>>     ceph: fix possible crash and data corrupt bugs
>>>>>     ceph: there is no need to round up the sizes when new size is 0
>>>>>
>>>>>    fs/ceph/inode.c | 8 +++++---
>>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>
>>>> I'm running xfstests today with the current state of wip-fscrypt-size
>>>> (including the two patches above) with test_dummy_encryption enabled.
>>>> generic/030 failed. The expected output of this part of the test was:
>>>>
>>>> wrote 5136912/5136912 bytes at offset 0
>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ==== Pre-Remount ===
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
>>>>> YYYYYYYYYYYYYYYY|
>>>> *
>>>> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
>>>>> YYYYYYYY........|
>>>> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>> ==== Post-Remount ==
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
>>>>> YYYYYYYYYYYYYYYY|
>>>> *
>>>> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
>>>>> YYYYYYYY........|
>>>> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>>
>>>> ...but I got this instead:
>>>>
>>>> wrote 5136912/5136912 bytes at offset 0
>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ==== Pre-Remount ===
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>> ==== Post-Remount ==
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>>
>>>>
>>>> ...I suspect this is related to the 075 failures, but I don't see it on
>>>> every attempt, which makes me think "race condition". I'll keep hunting.
>>> Yeah, seems the same issue.
>>>
>> I wonder if we're hitting the same problem that this patch was intended
>> to fix:
>>
>>      057ba5b24532 ceph: Fix race between hole punch and page fault
>>
>> It seems like the problem is very similar. It may be worth testing a
>> patch that takes the filemap_invalidate_lock across the on the wire
>> truncate and the vmtruncate.
>
> Nope. I tried a draft patch that make setattr hold this lock over the
> MDS call and the vmtruncate and it didn't help.
>
> Cheers,


This bug is very similar to the one I have fixed in:

        f4569b11c4eb ceph: return the real size read when it hits EOF

Which is for read case when truncating a file to a smaller size and then 
a bigger one immediately, the stale data will be kept.

Currently bug is in write case.

BRs

-- Xiubo

Xiubo Li Nov. 16, 2021, 4:56 a.m. UTC | #12

On 11/9/21 8:33 PM, Jeff Layton wrote:
> On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote:
>> On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote:
>>> On 11/9/21 2:36 AM, Jeff Layton wrote:
>>>> On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit.
>>>>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Xiubo Li (2):
>>>>>     ceph: fix possible crash and data corrupt bugs
>>>>>     ceph: there is no need to round up the sizes when new size is 0
>>>>>
>>>>>    fs/ceph/inode.c | 8 +++++---
>>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>
>>>> I'm running xfstests today with the current state of wip-fscrypt-size
>>>> (including the two patches above) with test_dummy_encryption enabled.
>>>> generic/030 failed. The expected output of this part of the test was:
>>>>
>>>> wrote 5136912/5136912 bytes at offset 0
>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ==== Pre-Remount ===
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
>>>>> YYYYYYYYYYYYYYYY|
>>>> *
>>>> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
>>>>> YYYYYYYY........|
>>>> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>> ==== Post-Remount ==
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 004e6210  59 59 59 59 59 59 59 59  59 59 59 59 59 59 59 59
>>>>> YYYYYYYYYYYYYYYY|
>>>> *
>>>> 004e6d00  59 59 59 59 59 59 59 59  00 00 00 00 00 00 00 00
>>>>> YYYYYYYY........|
>>>> 004e6d10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>>
>>>> ...but I got this instead:
>>>>
>>>> wrote 5136912/5136912 bytes at offset 0
>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ==== Pre-Remount ===
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>> ==== Post-Remount ==
>>>> 00000000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58
>>>>> XXXXXXXXXXXXXXXX|
>>>> *
>>>> 00400000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> ................|
>>>> *
>>>> 004e7000
>>>>
>>>>
>>>> ...I suspect this is related to the 075 failures, but I don't see it on
>>>> every attempt, which makes me think "race condition". I'll keep hunting.
>>> Yeah, seems the same issue.
>>>
>> I wonder if we're hitting the same problem that this patch was intended
>> to fix:
>>
>>      057ba5b24532 ceph: Fix race between hole punch and page fault
>>
>> It seems like the problem is very similar. It may be worth testing a
>> patch that takes the filemap_invalidate_lock across the on the wire
>> truncate and the vmtruncate.
>
> Nope. I tried a draft patch that make setattr hold this lock over the
> MDS call and the vmtruncate and it didn't help.

It's buggy by using the `filer.write_trunc()` caller in MDS side, it 
works well in case the truncate_from - truncate_size < 4MB, or it will 
keep the stale file contents without trimming or zeroing them.

I will push the MDS side PR to fix it.

Thanks

-- Xiubo


> Cheers,

Xiubo Li Nov. 16, 2021, 6:14 a.m. UTC | #13

On 11/16/21 12:56 PM, Xiubo Li wrote:
>
> On 11/9/21 8:33 PM, Jeff Layton wrote:
>> On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote:
>>>
[...]
>>>>> ...I suspect this is related to the 075 failures, but I don't see 
>>>>> it on
>>>>> every attempt, which makes me think "race condition". I'll keep 
>>>>> hunting.
>>>> Yeah, seems the same issue.
>>>>
>>> I wonder if we're hitting the same problem that this patch was intended
>>> to fix:
>>>
>>>      057ba5b24532 ceph: Fix race between hole punch and page fault
>>>
>>> It seems like the problem is very similar. It may be worth testing a
>>> patch that takes the filemap_invalidate_lock across the on the wire
>>> truncate and the vmtruncate.
>>
>> Nope. I tried a draft patch that make setattr hold this lock over the
>> MDS call and the vmtruncate and it didn't help.
>
> It's buggy by using the `filer.write_trunc()` caller in MDS side, it 
> works well in case the truncate_from - truncate_size < 4MB, or it will 
> keep the stale file contents without trimming or zeroing them.
>
> I will push the MDS side PR to fix it.
>
Have updated the MDS side PR.

Both the generic/030 and generic/075 tests passed, and I am still 
testing others.

BRs

-- Xiubo


> Thanks
>
> -- Xiubo
>
>
>> Cheers,

[0/2] ceph: misc fixes for the fscrypt truncate size handling

Message

Comments