Message ID | 20211108135012.79941-1-xiubli@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | ceph: misc fixes for the fscrypt truncate size handling | expand |
Hi Jeff, After this fixing, there still has one bug and I am still looking at it: [root@lxbceph1 xfstests]# ./check generic/075 FSTYP -- ceph PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+ generic/075 [failed, exit status 1] - output mismatch (see /mnt/kcephfs/xfstests/results//generic/075.out.bad) --- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800 +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 21:20:49.741906997 +0800 @@ -12,7 +12,4 @@ ----------------------------------------------- fsx.2 : -d -N numops -l filelen -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.3 : -d -N numops -l filelen -S 0 -x ------------------------------------------------ ... (Run 'diff -u tests/generic/075.out /mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the entire diff) Ran: generic/075 Failures: generic/075 Failed 1 of 1 tests I checked the result outputs, it seems when truncating the size to a smaller sizeA and then to a bigger sizeB again, in theory those contents between sizeA and sizeB should be zeroed, but it didn't. The last block updating is correct. Any idea ? Thanks. On 11/8/21 9:50 PM, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > Hi Jeff, > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > Thanks. > > Xiubo Li (2): > ceph: fix possible crash and data corrupt bugs > ceph: there is no need to round up the sizes when new size is 0 > > fs/ceph/inode.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) >
On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote: > Hi Jeff, > > After this fixing, there still has one bug and I am still looking at it: > > [root@lxbceph1 xfstests]# ./check generic/075 > FSTYP -- ceph > PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+ > > generic/075 [failed, exit status 1] - output mismatch (see > /mnt/kcephfs/xfstests/results//generic/075.out.bad) > --- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800 > +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 > 21:20:49.741906997 +0800 > @@ -12,7 +12,4 @@ > ----------------------------------------------- > fsx.2 : -d -N numops -l filelen -S 0 > ----------------------------------------------- > - > ------------------------------------------------ > -fsx.3 : -d -N numops -l filelen -S 0 -x > ------------------------------------------------ > ... > (Run 'diff -u tests/generic/075.out > /mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the entire diff) > Ran: generic/075 > Failures: generic/075 > Failed 1 of 1 tests > > > I checked the result outputs, it seems when truncating the size to a > smaller sizeA and then to a bigger sizeB again, in theory those contents > between sizeA and sizeB should be zeroed, but it didn't. > > The last block updating is correct. > > Any idea ? > Yep, that's the one I saw (intermittently) too. I also saw some failures around generic/029 and generic/030 that may be related. I haven't dug down as far into the problem as you have though. The nice thing about fsx is that it gives you a lot of info about what it does. There is also a way to replay a series of ops too, so you may want to try to see if you can make a reliable reproducer for this problem. Are these truncates running concurrently in different tasks? If so, then we may need some mechanism to ensure that they are serialized vs. one another. > > On 11/8/21 9:50 PM, xiubli@redhat.com wrote: > > From: Xiubo Li <xiubli@redhat.com> > > > > Hi Jeff, > > > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > > > Thanks. > > > > Xiubo Li (2): > > ceph: fix possible crash and data corrupt bugs > > ceph: there is no need to round up the sizes when new size is 0 > > > > fs/ceph/inode.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > >
Possibly we need to pick up some more patches or some code section from your experimental branch for the read/write. On 11/8/21 10:10 PM, Xiubo Li wrote: > Hi Jeff, > > After this fixing, there still has one bug and I am still looking at it: > > [root@lxbceph1 xfstests]# ./check generic/075 > FSTYP -- ceph > PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+ > > generic/075 [failed, exit status 1] - output mismatch (see > /mnt/kcephfs/xfstests/results//generic/075.out.bad) > --- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800 > +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 > 21:20:49.741906997 +0800 > @@ -12,7 +12,4 @@ > ----------------------------------------------- > fsx.2 : -d -N numops -l filelen -S 0 > ----------------------------------------------- > - > ------------------------------------------------ > -fsx.3 : -d -N numops -l filelen -S 0 -x > ------------------------------------------------ > ... > (Run 'diff -u tests/generic/075.out > /mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the entire > diff) > Ran: generic/075 > Failures: generic/075 > Failed 1 of 1 tests > > > I checked the result outputs, it seems when truncating the size to a > smaller sizeA and then to a bigger sizeB again, in theory those > contents between sizeA and sizeB should be zeroed, but it didn't. > > The last block updating is correct. > > Any idea ? > > Thanks. > > > On 11/8/21 9:50 PM, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> Hi Jeff, >> >> The #1 could be squashed to the previous "ceph: add truncate size >> handling support for fscrypt" commit. >> The #2 could be squashed to the previous "ceph: fscrypt_file field >> handling in MClientRequest messages" commit. >> >> Thanks. >> >> Xiubo Li (2): >> ceph: fix possible crash and data corrupt bugs >> ceph: there is no need to round up the sizes when new size is 0 >> >> fs/ceph/inode.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >>
On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > Hi Jeff, > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > Thanks. > > Xiubo Li (2): > ceph: fix possible crash and data corrupt bugs > ceph: there is no need to round up the sizes when new size is 0 > > fs/ceph/inode.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > Done. I folded these into the patches like you suggested above.
On 11/8/21 10:26 PM, Jeff Layton wrote: > On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote: >> Hi Jeff, >> >> After this fixing, there still has one bug and I am still looking at it: >> >> [root@lxbceph1 xfstests]# ./check generic/075 >> FSTYP -- ceph >> PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+ >> >> generic/075 [failed, exit status 1] - output mismatch (see >> /mnt/kcephfs/xfstests/results//generic/075.out.bad) >> --- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800 >> +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 >> 21:20:49.741906997 +0800 >> @@ -12,7 +12,4 @@ >> ----------------------------------------------- >> fsx.2 : -d -N numops -l filelen -S 0 >> ----------------------------------------------- >> - >> ------------------------------------------------ >> -fsx.3 : -d -N numops -l filelen -S 0 -x >> ------------------------------------------------ >> ... >> (Run 'diff -u tests/generic/075.out >> /mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the entire diff) >> Ran: generic/075 >> Failures: generic/075 >> Failed 1 of 1 tests >> >> >> I checked the result outputs, it seems when truncating the size to a >> smaller sizeA and then to a bigger sizeB again, in theory those contents >> between sizeA and sizeB should be zeroed, but it didn't. >> >> The last block updating is correct. >> >> Any idea ? >> > > Yep, that's the one I saw (intermittently) too. I also saw some failures > around generic/029 and generic/030 that may be related. I haven't dug > down as far into the problem as you have though. > > The nice thing about fsx is that it gives you a lot of info about what > it does. There is also a way to replay a series of ops too, so you may > want to try to see if you can make a reliable reproducer for this > problem. I can reproduce this every time. > Are these truncates running concurrently in different tasks? No, from the "075.2.fsxlog" they are serialized. The truncates themselves worked well, but there also have some mapwrite/write/mapread/read between them. From the logs, I am sure that those none zeroed contents are from mapwrite/write. It seems the dirty pages are flushed to OSDs just after the truncates. > If so, then > we may need some mechanism to ensure that they are serialized vs. one > another. The truncate will hold the 'inode->i_rwsem' lock too, so it won't allow the truncate/read/write to run in parallel. But I am not sure the mapwrite ? > >> On 11/8/21 9:50 PM, xiubli@redhat.com wrote: >>> From: Xiubo Li <xiubli@redhat.com> >>> >>> Hi Jeff, >>> >>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. >>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. >>> >>> Thanks. >>> >>> Xiubo Li (2): >>> ceph: fix possible crash and data corrupt bugs >>> ceph: there is no need to round up the sizes when new size is 0 >>> >>> fs/ceph/inode.c | 8 +++++--- >>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>
On 11/8/21 10:41 PM, Xiubo Li wrote: > > On 11/8/21 10:26 PM, Jeff Layton wrote: >> On Mon, 2021-11-08 at 22:10 +0800, Xiubo Li wrote: >>> Hi Jeff, >>> >>> After this fixing, there still has one bug and I am still looking at >>> it: >>> >>> [root@lxbceph1 xfstests]# ./check generic/075 >>> FSTYP -- ceph >>> PLATFORM -- Linux/x86_64 lxbceph1 5.15.0-rc6+ >>> >>> generic/075 [failed, exit status 1] - output mismatch (see >>> /mnt/kcephfs/xfstests/results//generic/075.out.bad) >>> --- tests/generic/075.out 2021-11-08 20:54:07.456980801 +0800 >>> +++ /mnt/kcephfs/xfstests/results//generic/075.out.bad 2021-11-08 >>> 21:20:49.741906997 +0800 >>> @@ -12,7 +12,4 @@ >>> ----------------------------------------------- >>> fsx.2 : -d -N numops -l filelen -S 0 >>> ----------------------------------------------- >>> - >>> ------------------------------------------------ >>> -fsx.3 : -d -N numops -l filelen -S 0 -x >>> ------------------------------------------------ >>> ... >>> (Run 'diff -u tests/generic/075.out >>> /mnt/kcephfs/xfstests/results//generic/075.out.bad' to see the >>> entire diff) >>> Ran: generic/075 >>> Failures: generic/075 >>> Failed 1 of 1 tests >>> >>> >>> I checked the result outputs, it seems when truncating the size to a >>> smaller sizeA and then to a bigger sizeB again, in theory those >>> contents >>> between sizeA and sizeB should be zeroed, but it didn't. >>> >>> The last block updating is correct. >>> >>> Any idea ? >>> >> >> Yep, that's the one I saw (intermittently) too. I also saw some failures >> around generic/029 and generic/030 that may be related. I haven't dug >> down as far into the problem as you have though. >> >> The nice thing about fsx is that it gives you a lot of info about what >> it does. There is also a way to replay a series of ops too, so you may >> want to try to see if you can make a reliable reproducer for this >> problem. > > I can reproduce this every time. > > >> Are these truncates running concurrently in different tasks? > > No, from the "075.2.fsxlog" they are serialized. The truncates > themselves worked well, but there also have some > mapwrite/write/mapread/read between them. From the logs, I am sure > that those none zeroed contents are from mapwrite/write. It seems the > dirty pages are flushed to OSDs just after the truncates. > > >> If so, then >> we may need some mechanism to ensure that they are serialized vs. one >> another. > > The truncate will hold the 'inode->i_rwsem' lock too, so it won't > allow the truncate/read/write to run in parallel. But I am not sure > the mapwrite ? > > The logs segments from "075.2.fsxlog": 2317 3728 mapread 0x330312 thru 0x33a6b1 (0xa3a0 bytes) 2318 3730 punch from 0x5ffc53 to 0x608516, (0x88c3 bytes) 2319 3731 write 0x6e9465 thru 0x6f8c04 (0xf7a0 bytes) 2320 3732 mapread 0x49f516 thru 0x49f570 (0x5b bytes) 2321 3733 write 0x72847b thru 0x733f14 (0xba9a bytes) 2322 3736 punch from 0x2a90a0 to 0x2aa68e, (0x15ee bytes) 2323 3739 write 0x644a24 thru 0x64aa30 (0x600d bytes) 2324 3740 trunc from 0x7aa4b0 to 0x9dbef3 2325 3741 mapread 0x5aa6bd thru 0x5b7246 (0xcb8a bytes) 2326 3742 trunc from 0x9dbef3 to 0x718ae4 2327 3743 write 0x3ac9b0 thru 0x3aeee0 (0x2531 bytes) 2328 3744 read 0x6e171c thru 0x6f0fd6 (0xf8bb bytes) 2329 3747 trunc from 0x718ae4 to 0x627ddb 2330 3748 mapread 0xe4e2c thru 0xf0bd5 (0xbdaa bytes) 2331 3752 write 0x71def1 thru 0x71e152 (0x262 bytes) 2332 3753 mapwrite 0x9eb0d8 thru 0x9f4ef7 (0x9e20 bytes) 2333 3754 mapwrite 0x7db56d thru 0x7e1278 (0x5d0c bytes) 2334 3755 punch from 0x9368cb to 0x9437fb, (0xcf30 bytes) 2335 3757 write 0x366827 thru 0x3699ff (0x31d9 bytes) 2336 3761 mapwrite 0x529471 thru 0x52b085 (0x1c15 bytes) 2337 3762 trunc from 0x9f4ef8 to 0x86bfab 2338 3764 write 0x9c85b9 thru 0x9d0bdc (0x8624 bytes) 2339 3765 mapread 0x11b451 thru 0x11fec5 (0x4a75 bytes) 2340 3766 write 0x5938cb thru 0x59e0d0 (0xa806 bytes) 2341 3767 read 0xe3063 thru 0xe8ee7 (0x5e85 bytes) 2342 3768 punch from 0x859f3f to 0x8698ec, (0xf9ad bytes) 2343 3771 punch from 0x86d188 to 0x86eef3, (0x1d6b bytes) 2344 3773 write 0x9f43c9 thru 0x9fffff (0xbc37 bytes) 2345 3774 trunc from 0xa00000 to 0x26d4b9 2346 3777 trunc from 0x26d4b9 to 0x9c695f 2347 3783 trunc from 0x9c695f to 0x9129ed 2348 3784 mapread 0x448402 thru 0x45074d (0x834c bytes) 2349 READ BAD DATA: offset = 0x448402, size = 0x834c, fname = 075.2 2350 OFFSET GOOD BAD RANGE 2351 0x448402 0x0000 0x74ea 0x00000 2352 operation# (mod 256) for the bad data may be 116 2353 0x448403 0x0000 0xea74 0x00001 2354 operation# (mod 256) for the bad data may be 116 >> >>> On 11/8/21 9:50 PM, xiubli@redhat.com wrote: >>>> From: Xiubo Li <xiubli@redhat.com> >>>> >>>> Hi Jeff, >>>> >>>> The #1 could be squashed to the previous "ceph: add truncate size >>>> handling support for fscrypt" commit. >>>> The #2 could be squashed to the previous "ceph: fscrypt_file field >>>> handling in MClientRequest messages" commit. >>>> >>>> Thanks. >>>> >>>> Xiubo Li (2): >>>> ceph: fix possible crash and data corrupt bugs >>>> ceph: there is no need to round up the sizes when new size is 0 >>>> >>>> fs/ceph/inode.c | 8 +++++--- >>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>
On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > Hi Jeff, > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > Thanks. > > Xiubo Li (2): > ceph: fix possible crash and data corrupt bugs > ceph: there is no need to round up the sizes when new size is 0 > > fs/ceph/inode.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > I'm running xfstests today with the current state of wip-fscrypt-size (including the two patches above) with test_dummy_encryption enabled. generic/030 failed. The expected output of this part of the test was: wrote 5136912/5136912 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ==== Pre-Remount === 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 |XXXXXXXXXXXXXXXX| * 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 |YYYYYYYYYYYYYYYY| * 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 |YYYYYYYY........| 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 004e7000 ==== Post-Remount == 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 |XXXXXXXXXXXXXXXX| * 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 |YYYYYYYYYYYYYYYY| * 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 |YYYYYYYY........| 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 004e7000 ...but I got this instead: wrote 5136912/5136912 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ==== Pre-Remount === 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 |XXXXXXXXXXXXXXXX| * 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 004e7000 ==== Post-Remount == 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 |XXXXXXXXXXXXXXXX| * 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 004e7000 ...I suspect this is related to the 075 failures, but I don't see it on every attempt, which makes me think "race condition". I'll keep hunting. Cheers,
On 11/9/21 2:36 AM, Jeff Layton wrote: > On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> Hi Jeff, >> >> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. >> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. >> >> Thanks. >> >> Xiubo Li (2): >> ceph: fix possible crash and data corrupt bugs >> ceph: there is no need to round up the sizes when new size is 0 >> >> fs/ceph/inode.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> > I'm running xfstests today with the current state of wip-fscrypt-size > (including the two patches above) with test_dummy_encryption enabled. > generic/030 failed. The expected output of this part of the test was: > > wrote 5136912/5136912 bytes at offset 0 > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > ==== Pre-Remount === > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > |XXXXXXXXXXXXXXXX| > * > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > |YYYYYYYYYYYYYYYY| > * > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > |YYYYYYYY........| > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 004e7000 > ==== Post-Remount == > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > |XXXXXXXXXXXXXXXX| > * > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > |YYYYYYYYYYYYYYYY| > * > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > |YYYYYYYY........| > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 004e7000 > > ...but I got this instead: > > wrote 5136912/5136912 bytes at offset 0 > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > ==== Pre-Remount === > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > |XXXXXXXXXXXXXXXX| > * > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 004e7000 > ==== Post-Remount == > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > |XXXXXXXXXXXXXXXX| > * > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 004e7000 > > > ...I suspect this is related to the 075 failures, but I don't see it on > every attempt, which makes me think "race condition". I'll keep hunting. Yeah, seems the same issue. > Cheers,
On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote: > On 11/9/21 2:36 AM, Jeff Layton wrote: > > On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > Hi Jeff, > > > > > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > > > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > > > > > Thanks. > > > > > > Xiubo Li (2): > > > ceph: fix possible crash and data corrupt bugs > > > ceph: there is no need to round up the sizes when new size is 0 > > > > > > fs/ceph/inode.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > I'm running xfstests today with the current state of wip-fscrypt-size > > (including the two patches above) with test_dummy_encryption enabled. > > generic/030 failed. The expected output of this part of the test was: > > > > wrote 5136912/5136912 bytes at offset 0 > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > ==== Pre-Remount === > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > XXXXXXXXXXXXXXXX| > > * > > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > > > YYYYYYYYYYYYYYYY| > > * > > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > > > YYYYYYYY........| > > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ................| > > * > > 004e7000 > > ==== Post-Remount == > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > XXXXXXXXXXXXXXXX| > > * > > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > > > YYYYYYYYYYYYYYYY| > > * > > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > > > YYYYYYYY........| > > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ................| > > * > > 004e7000 > > > > ...but I got this instead: > > > > wrote 5136912/5136912 bytes at offset 0 > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > ==== Pre-Remount === > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > XXXXXXXXXXXXXXXX| > > * > > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ................| > > * > > 004e7000 > > ==== Post-Remount == > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > XXXXXXXXXXXXXXXX| > > * > > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ................| > > * > > 004e7000 > > > > > > ...I suspect this is related to the 075 failures, but I don't see it on > > every attempt, which makes me think "race condition". I'll keep hunting. > > Yeah, seems the same issue. > I wonder if we're hitting the same problem that this patch was intended to fix: 057ba5b24532 ceph: Fix race between hole punch and page fault It seems like the problem is very similar. It may be worth testing a patch that takes the filemap_invalidate_lock across the on the wire truncate and the vmtruncate.
On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote: > On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote: > > On 11/9/21 2:36 AM, Jeff Layton wrote: > > > On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: > > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > > > Hi Jeff, > > > > > > > > The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. > > > > The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. > > > > > > > > Thanks. > > > > > > > > Xiubo Li (2): > > > > ceph: fix possible crash and data corrupt bugs > > > > ceph: there is no need to round up the sizes when new size is 0 > > > > > > > > fs/ceph/inode.c | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > I'm running xfstests today with the current state of wip-fscrypt-size > > > (including the two patches above) with test_dummy_encryption enabled. > > > generic/030 failed. The expected output of this part of the test was: > > > > > > wrote 5136912/5136912 bytes at offset 0 > > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > ==== Pre-Remount === > > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > > XXXXXXXXXXXXXXXX| > > > * > > > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > > > > YYYYYYYYYYYYYYYY| > > > * > > > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > > > > YYYYYYYY........| > > > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ................| > > > * > > > 004e7000 > > > ==== Post-Remount == > > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > > XXXXXXXXXXXXXXXX| > > > * > > > 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 > > > > YYYYYYYYYYYYYYYY| > > > * > > > 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 > > > > YYYYYYYY........| > > > 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ................| > > > * > > > 004e7000 > > > > > > ...but I got this instead: > > > > > > wrote 5136912/5136912 bytes at offset 0 > > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > ==== Pre-Remount === > > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > > XXXXXXXXXXXXXXXX| > > > * > > > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ................| > > > * > > > 004e7000 > > > ==== Post-Remount == > > > 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > > > > XXXXXXXXXXXXXXXX| > > > * > > > 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ................| > > > * > > > 004e7000 > > > > > > > > > ...I suspect this is related to the 075 failures, but I don't see it on > > > every attempt, which makes me think "race condition". I'll keep hunting. > > > > Yeah, seems the same issue. > > > > I wonder if we're hitting the same problem that this patch was intended > to fix: > > 057ba5b24532 ceph: Fix race between hole punch and page fault > > It seems like the problem is very similar. It may be worth testing a > patch that takes the filemap_invalidate_lock across the on the wire > truncate and the vmtruncate. Nope. I tried a draft patch that make setattr hold this lock over the MDS call and the vmtruncate and it didn't help. Cheers,
On 11/9/21 8:33 PM, Jeff Layton wrote: > On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote: >> On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote: >>> On 11/9/21 2:36 AM, Jeff Layton wrote: >>>> On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: >>>>> From: Xiubo Li <xiubli@redhat.com> >>>>> >>>>> Hi Jeff, >>>>> >>>>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. >>>>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. >>>>> >>>>> Thanks. >>>>> >>>>> Xiubo Li (2): >>>>> ceph: fix possible crash and data corrupt bugs >>>>> ceph: there is no need to round up the sizes when new size is 0 >>>>> >>>>> fs/ceph/inode.c | 8 +++++--- >>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>> >>>> I'm running xfstests today with the current state of wip-fscrypt-size >>>> (including the two patches above) with test_dummy_encryption enabled. >>>> generic/030 failed. The expected output of this part of the test was: >>>> >>>> wrote 5136912/5136912 bytes at offset 0 >>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> ==== Pre-Remount === >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 >>>>> YYYYYYYYYYYYYYYY| >>>> * >>>> 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 >>>>> YYYYYYYY........| >>>> 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> ==== Post-Remount == >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 >>>>> YYYYYYYYYYYYYYYY| >>>> * >>>> 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 >>>>> YYYYYYYY........| >>>> 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> >>>> ...but I got this instead: >>>> >>>> wrote 5136912/5136912 bytes at offset 0 >>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> ==== Pre-Remount === >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> ==== Post-Remount == >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> >>>> >>>> ...I suspect this is related to the 075 failures, but I don't see it on >>>> every attempt, which makes me think "race condition". I'll keep hunting. >>> Yeah, seems the same issue. >>> >> I wonder if we're hitting the same problem that this patch was intended >> to fix: >> >> 057ba5b24532 ceph: Fix race between hole punch and page fault >> >> It seems like the problem is very similar. It may be worth testing a >> patch that takes the filemap_invalidate_lock across the on the wire >> truncate and the vmtruncate. > > Nope. I tried a draft patch that make setattr hold this lock over the > MDS call and the vmtruncate and it didn't help. > > Cheers, This bug is very similar to the one I have fixed in: f4569b11c4eb ceph: return the real size read when it hits EOF Which is for read case when truncating a file to a smaller size and then a bigger one immediately, the stale data will be kept. Currently bug is in write case. BRs -- Xiubo
On 11/9/21 8:33 PM, Jeff Layton wrote: > On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote: >> On Tue, 2021-11-09 at 08:21 +0800, Xiubo Li wrote: >>> On 11/9/21 2:36 AM, Jeff Layton wrote: >>>> On Mon, 2021-11-08 at 21:50 +0800, xiubli@redhat.com wrote: >>>>> From: Xiubo Li <xiubli@redhat.com> >>>>> >>>>> Hi Jeff, >>>>> >>>>> The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. >>>>> The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. >>>>> >>>>> Thanks. >>>>> >>>>> Xiubo Li (2): >>>>> ceph: fix possible crash and data corrupt bugs >>>>> ceph: there is no need to round up the sizes when new size is 0 >>>>> >>>>> fs/ceph/inode.c | 8 +++++--- >>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>> >>>> I'm running xfstests today with the current state of wip-fscrypt-size >>>> (including the two patches above) with test_dummy_encryption enabled. >>>> generic/030 failed. The expected output of this part of the test was: >>>> >>>> wrote 5136912/5136912 bytes at offset 0 >>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> ==== Pre-Remount === >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 >>>>> YYYYYYYYYYYYYYYY| >>>> * >>>> 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 >>>>> YYYYYYYY........| >>>> 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> ==== Post-Remount == >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 004e6210 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 >>>>> YYYYYYYYYYYYYYYY| >>>> * >>>> 004e6d00 59 59 59 59 59 59 59 59 00 00 00 00 00 00 00 00 >>>>> YYYYYYYY........| >>>> 004e6d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> >>>> ...but I got this instead: >>>> >>>> wrote 5136912/5136912 bytes at offset 0 >>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> ==== Pre-Remount === >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> ==== Post-Remount == >>>> 00000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 >>>>> XXXXXXXXXXXXXXXX| >>>> * >>>> 00400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> ................| >>>> * >>>> 004e7000 >>>> >>>> >>>> ...I suspect this is related to the 075 failures, but I don't see it on >>>> every attempt, which makes me think "race condition". I'll keep hunting. >>> Yeah, seems the same issue. >>> >> I wonder if we're hitting the same problem that this patch was intended >> to fix: >> >> 057ba5b24532 ceph: Fix race between hole punch and page fault >> >> It seems like the problem is very similar. It may be worth testing a >> patch that takes the filemap_invalidate_lock across the on the wire >> truncate and the vmtruncate. > > Nope. I tried a draft patch that make setattr hold this lock over the > MDS call and the vmtruncate and it didn't help. It's buggy by using the `filer.write_trunc()` caller in MDS side, it works well in case the truncate_from - truncate_size < 4MB, or it will keep the stale file contents without trimming or zeroing them. I will push the MDS side PR to fix it. Thanks -- Xiubo > Cheers,
On 11/16/21 12:56 PM, Xiubo Li wrote: > > On 11/9/21 8:33 PM, Jeff Layton wrote: >> On Mon, 2021-11-08 at 20:57 -0500, Jeff Layton wrote: >>> [...] >>>>> ...I suspect this is related to the 075 failures, but I don't see >>>>> it on >>>>> every attempt, which makes me think "race condition". I'll keep >>>>> hunting. >>>> Yeah, seems the same issue. >>>> >>> I wonder if we're hitting the same problem that this patch was intended >>> to fix: >>> >>> 057ba5b24532 ceph: Fix race between hole punch and page fault >>> >>> It seems like the problem is very similar. It may be worth testing a >>> patch that takes the filemap_invalidate_lock across the on the wire >>> truncate and the vmtruncate. >> >> Nope. I tried a draft patch that make setattr hold this lock over the >> MDS call and the vmtruncate and it didn't help. > > It's buggy by using the `filer.write_trunc()` caller in MDS side, it > works well in case the truncate_from - truncate_size < 4MB, or it will > keep the stale file contents without trimming or zeroing them. > > I will push the MDS side PR to fix it. > Have updated the MDS side PR. Both the generic/030 and generic/075 tests passed, and I am still testing others. BRs -- Xiubo > Thanks > > -- Xiubo > > >> Cheers,
From: Xiubo Li <xiubli@redhat.com> Hi Jeff, The #1 could be squashed to the previous "ceph: add truncate size handling support for fscrypt" commit. The #2 could be squashed to the previous "ceph: fscrypt_file field handling in MClientRequest messages" commit. Thanks. Xiubo Li (2): ceph: fix possible crash and data corrupt bugs ceph: there is no need to round up the sizes when new size is 0 fs/ceph/inode.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)