Message ID | 75f4c780e8402a8f993cb987e85a31e4895f13de.1648730443.git.ritesh.list@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | generic: Add some tests around journal replay/recoveryloop | expand |
On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote: > From: Ritesh Harjani <riteshh@linux.ibm.com> > > Add another falloc test entry which could hit a kernel bug > with ext4 fast_commit feature w/o below kernel commit [1]. > > <log> > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0 > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743 > > This happens when falloc -k size is huge which spans across more than > 1 flex block group in ext4. This causes a bug in fast_commit replay > code which is fixed by kernel commit at [1]. > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > --- > tests/generic/468 | 8 ++++++++ > tests/generic/468.out | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/tests/generic/468 b/tests/generic/468 > index 95752d3b..5e73cff9 100755 > --- a/tests/generic/468 > +++ b/tests/generic/468 > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1 > _require_metadata_journaling $SCRATCH_DEV > _scratch_mount > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > +# This is mainly trying to test recovery operation in case where the data > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > +blocks=32768 > +blocksize=4096 Block size can change based on mkfs parameters. You should extract this dynamically from the filesystem the test is being run on. > +fact=18 What is "fact" supposed to mean? Indeed, wouldn't this simply be better as something like: larger_than_ext4_fg_size=$((32768 * $blksize * 18)) And then > testfile=$SCRATCH_MNT/testfile > > # check inode metadata after shutdown > @@ -85,6 +92,7 @@ for i in fsync fdatasync; do > test_falloc $i "-k " 1024 > test_falloc $i "-k " 4096 > test_falloc $i "-k " 104857600 > + test_falloc $i "-k " $(($blocks*$blocksize*$fact)) test_falloc $i "-k " $larger_than_ext4_fg_size And just scrub all the sizes from the golden output? Cheers, Dave.
On 22/04/04 09:28AM, Dave Chinner wrote: > On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote: > > From: Ritesh Harjani <riteshh@linux.ibm.com> > > > > Add another falloc test entry which could hit a kernel bug > > with ext4 fast_commit feature w/o below kernel commit [1]. > > > > <log> > > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0 > > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743 > > > > This happens when falloc -k size is huge which spans across more than > > 1 flex block group in ext4. This causes a bug in fast_commit replay > > code which is fixed by kernel commit at [1]. > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d > > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > > --- > > tests/generic/468 | 8 ++++++++ > > tests/generic/468.out | 2 ++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/tests/generic/468 b/tests/generic/468 > > index 95752d3b..5e73cff9 100755 > > --- a/tests/generic/468 > > +++ b/tests/generic/468 > > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1 > > _require_metadata_journaling $SCRATCH_DEV > > _scratch_mount > > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > > +# This is mainly trying to test recovery operation in case where the data > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > > +blocks=32768 > > +blocksize=4096 > > Block size can change based on mkfs parameters. You should extract > this dynamically from the filesystem the test is being run on. > Yes, but we still have kept just 4096 because, anything bigger than that like 65536 might require a bigger disk size itself to test. The overall size requirement of the disk will then become ~36G (32768 * 65536 * 18) Hence I went ahead with 4096 which is good enough for testing. But sure, I will add a comment explaining why we have hardcoded it to 4096 so that others don't get confused. Larger than this size disk anyway doesn't get tested much right? > > +fact=18 > > What is "fact" supposed to mean? > > Indeed, wouldn't this simply be better as something like: > > larger_than_ext4_fg_size=$((32768 * $blksize * 18)) > > And then > > > testfile=$SCRATCH_MNT/testfile > > > > # check inode metadata after shutdown > > @@ -85,6 +92,7 @@ for i in fsync fdatasync; do > > test_falloc $i "-k " 1024 > > test_falloc $i "-k " 4096 > > test_falloc $i "-k " 104857600 > > + test_falloc $i "-k " $(($blocks*$blocksize*$fact)) > > test_falloc $i "-k " $larger_than_ext4_fg_size > Yes, looks good to me. Thanks for suggestion. > And just scrub all the sizes from the golden output? > This won't be needed since I still would like to go with 4096 blocksize, to avoid a large disk size requirement which anyway won't be tested much. If this sounds good to you, I will fix rest of the changes as discussed in the next revision. -ritesh
On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote: > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > > > +# This is mainly trying to test recovery operation in case where the data > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > > > +blocks=32768 > > > +blocksize=4096 > > > > Block size can change based on mkfs parameters. You should extract > > this dynamically from the filesystem the test is being run on. > > > > Yes, but we still have kept just 4096 because, anything bigger than that like > 65536 might require a bigger disk size itself to test. The overall size > requirement of the disk will then become ~36G (32768 * 65536 * 18) > Hence I went ahead with 4096 which is good enough for testing. What if the block size is *smaller*? For example, I run an ext4/1k configuration (which is how I test block size > page size on x86 VM's :-). > But sure, I will add a comment explaining why we have hardcoded it to 4096 > so that others don't get confused. Larger than this size disk anyway doesn't get > tested much right? At $WORK we use a 100GB disk by default when running xfstests, and I wouldn't be surprised if theree are other folks who might use larger disk sizes. Maybe test to see whether the scratch disk is too small for the given parameters and if so skip the test using _notrun? - Ted
On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote: > On 22/04/04 09:28AM, Dave Chinner wrote: > > On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote: > > > From: Ritesh Harjani <riteshh@linux.ibm.com> > > > > > > Add another falloc test entry which could hit a kernel bug > > > with ext4 fast_commit feature w/o below kernel commit [1]. > > > > > > <log> > > > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0 > > > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743 > > > > > > This happens when falloc -k size is huge which spans across more than > > > 1 flex block group in ext4. This causes a bug in fast_commit replay > > > code which is fixed by kernel commit at [1]. > > > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d > > > > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > > > --- > > > tests/generic/468 | 8 ++++++++ > > > tests/generic/468.out | 2 ++ > > > 2 files changed, 10 insertions(+) > > > > > > diff --git a/tests/generic/468 b/tests/generic/468 > > > index 95752d3b..5e73cff9 100755 > > > --- a/tests/generic/468 > > > +++ b/tests/generic/468 > > > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1 > > > _require_metadata_journaling $SCRATCH_DEV > > > _scratch_mount > > > > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > > > +# This is mainly trying to test recovery operation in case where the data > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > > > +blocks=32768 > > > +blocksize=4096 > > > > Block size can change based on mkfs parameters. You should extract > > this dynamically from the filesystem the test is being run on. > > > > Yes, but we still have kept just 4096 because, anything bigger than that like > 65536 might require a bigger disk size itself to test. The overall size > requirement of the disk will then become ~36G (32768 * 65536 * 18) > Hence I went ahead with 4096 which is good enough for testing. If the test setup doesn't have a disk large enough, then the test should be skipped. That's what '_require_scratch_size' is for. i.e. _require_scratch_size $larger_than_ext4_fg_size Will do that check once we've calculated the size needed. > But sure, I will add a comment explaining why we have hardcoded it to 4096 > so that others don't get confused. Larger than this size disk anyway doesn't get > tested much right? You shouldn't be constricting the test based on assumptions about test configurations. If someone decides to test 64k block size, then they can size their devices appropriately for the configuration they want to test. If a 64kB block size filesystem can overrun the on-disk structure and fail, then the test should exercise that and fail appropriately. Cheers, Dave.
On 22/04/05 06:00PM, Theodore Ts'o wrote: > On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote: > > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > > > > +# This is mainly trying to test recovery operation in case where the data > > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > > > > +blocks=32768 > > > > +blocksize=4096 > > > > > > Block size can change based on mkfs parameters. You should extract > > > this dynamically from the filesystem the test is being run on. > > > > > > > Yes, but we still have kept just 4096 because, anything bigger than that like > > 65536 might require a bigger disk size itself to test. The overall size > > requirement of the disk will then become ~36G (32768 * 65536 * 18) > > Hence I went ahead with 4096 which is good enough for testing. > > What if the block size is *smaller*? For example, I run an ext4/1k > configuration (which is how I test block size > page size on x86 VM's :-). For 1k bs, this test can still reproduce the problem. Because the given size will easily overflow the required number of blocks in 1K case. > > > But sure, I will add a comment explaining why we have hardcoded it to 4096 > > so that others don't get confused. Larger than this size disk anyway doesn't get > > tested much right? > > At $WORK we use a 100GB disk by default when running xfstests, and I > wouldn't be surprised if theree are other folks who might use larger > disk sizes. Ohk, sure. Thanks for the info. > > Maybe test to see whether the scratch disk is too small for the given > parameters and if so skip the test using _notrun? > Yes, I think I got the point. I will make the changes accordingly. -ritesh
On 22/04/06 02:05PM, Dave Chinner wrote: > On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote: > > On 22/04/04 09:28AM, Dave Chinner wrote: > > > On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote: > > > > From: Ritesh Harjani <riteshh@linux.ibm.com> > > > > > > > > Add another falloc test entry which could hit a kernel bug > > > > with ext4 fast_commit feature w/o below kernel commit [1]. > > > > > > > > <log> > > > > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0 > > > > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743 > > > > > > > > This happens when falloc -k size is huge which spans across more than > > > > 1 flex block group in ext4. This causes a bug in fast_commit replay > > > > code which is fixed by kernel commit at [1]. > > > > > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d > > > > > > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > > > > --- > > > > tests/generic/468 | 8 ++++++++ > > > > tests/generic/468.out | 2 ++ > > > > 2 files changed, 10 insertions(+) > > > > > > > > diff --git a/tests/generic/468 b/tests/generic/468 > > > > index 95752d3b..5e73cff9 100755 > > > > --- a/tests/generic/468 > > > > +++ b/tests/generic/468 > > > > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1 > > > > _require_metadata_journaling $SCRATCH_DEV > > > > _scratch_mount > > > > > > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test. > > > > +# This is mainly trying to test recovery operation in case where the data > > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. > > > > +blocks=32768 > > > > +blocksize=4096 > > > > > > Block size can change based on mkfs parameters. You should extract > > > this dynamically from the filesystem the test is being run on. > > > > > > > Yes, but we still have kept just 4096 because, anything bigger than that like > > 65536 might require a bigger disk size itself to test. The overall size > > requirement of the disk will then become ~36G (32768 * 65536 * 18) > > Hence I went ahead with 4096 which is good enough for testing. > > If the test setup doesn't have a disk large enough, then the test > should be skipped. That's what '_require_scratch_size' is for. > > i.e. _require_scratch_size $larger_than_ext4_fg_size > > Will do that check once we've calculated the size needed. Sure. > > > But sure, I will add a comment explaining why we have hardcoded it to 4096 > > so that others don't get confused. Larger than this size disk anyway doesn't get > > tested much right? > > You shouldn't be constricting the test based on assumptions about > test configurations. If someone decides to test 64k block size, then > they can size their devices appropriately for the configuration they > want to test. If a 64kB block size filesystem can overrun the > on-disk structure and fail, then the test should exercise that and > fail appropriately. Sure Dave. Got the point. I will try and make the changes, such that test doesn't assume any particular user test configuration. And be generic as much as possible so that we could hit the issue we are aiming via this test. -ritesh
diff --git a/tests/generic/468 b/tests/generic/468 index 95752d3b..5e73cff9 100755 --- a/tests/generic/468 +++ b/tests/generic/468 @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1 _require_metadata_journaling $SCRATCH_DEV _scratch_mount +# blocksize and fact are used in the last case of the fsync/fdatasync test. +# This is mainly trying to test recovery operation in case where the data +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4. +blocks=32768 +blocksize=4096 +fact=18 + testfile=$SCRATCH_MNT/testfile # check inode metadata after shutdown @@ -85,6 +92,7 @@ for i in fsync fdatasync; do test_falloc $i "-k " 1024 test_falloc $i "-k " 4096 test_falloc $i "-k " 104857600 + test_falloc $i "-k " $(($blocks*$blocksize*$fact)) done status=0 diff --git a/tests/generic/468.out b/tests/generic/468.out index b3a28d5e..a09cedb8 100644 --- a/tests/generic/468.out +++ b/tests/generic/468.out @@ -5,9 +5,11 @@ QA output created by 468 ==== falloc -k 1024 test with fsync ==== ==== falloc -k 4096 test with fsync ==== ==== falloc -k 104857600 test with fsync ==== +==== falloc -k 2415919104 test with fsync ==== ==== falloc 1024 test with fdatasync ==== ==== falloc 4096 test with fdatasync ==== ==== falloc 104857600 test with fdatasync ==== ==== falloc -k 1024 test with fdatasync ==== ==== falloc -k 4096 test with fdatasync ==== ==== falloc -k 104857600 test with fdatasync ==== +==== falloc -k 2415919104 test with fdatasync ====