mbox series

[0/2] fstest changes for LBS

Message ID 20240122111751.449762-1-kernel@pankajraghav.com (mailing list archive)
Headers show
Series fstest changes for LBS | expand

Message

Pankaj Raghav (Samsung) Jan. 22, 2024, 11:17 a.m. UTC
From: Pankaj Raghav <p.raghav@samsung.com>

Some tests need to be adapted to for LBS[1] based on the filesystem
blocksize. These are generic changes where it uses the filesystem
blocksize instead of assuming it.

There are some more generic test cases that are failing due to logdev
size requirement that changes with filesystem blocksize. I will address
them in a separate series.

[1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/

Pankaj Raghav (2):
  xfs/558: scale blk IO size based on the filesystem blksz
  xfs/161: adapt the test case for LBS filesystem

 tests/xfs/161 | 9 +++++++--
 tests/xfs/558 | 7 ++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)


base-commit: c46ca4d1f6c0c45f9a3ea18bc31ba5ae89e02c70

Comments

Dave Chinner Jan. 23, 2024, 12:25 a.m. UTC | #1
On Mon, Jan 22, 2024 at 12:17:49PM +0100, Pankaj Raghav (Samsung) wrote:
> From: Pankaj Raghav <p.raghav@samsung.com>
> 
> Some tests need to be adapted to for LBS[1] based on the filesystem
> blocksize. These are generic changes where it uses the filesystem
> blocksize instead of assuming it.
> 
> There are some more generic test cases that are failing due to logdev
> size requirement that changes with filesystem blocksize. I will address
> them in a separate series.
> 
> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
> 
> Pankaj Raghav (2):
>   xfs/558: scale blk IO size based on the filesystem blksz
>   xfs/161: adapt the test case for LBS filesystem

Do either of these fail and require fixing for a 64k page size
system running 64kB block size?

i.e. are these actual 64kB block size issues, or just issues with
the LBS patchset?

-Dave.
Pankaj Raghav Jan. 23, 2024, 8:52 a.m. UTC | #2
On 23/01/2024 01:25, Dave Chinner wrote:
> On Mon, Jan 22, 2024 at 12:17:49PM +0100, Pankaj Raghav (Samsung) wrote:
>> From: Pankaj Raghav <p.raghav@samsung.com>
>>
>> Some tests need to be adapted to for LBS[1] based on the filesystem
>> blocksize. These are generic changes where it uses the filesystem
>> blocksize instead of assuming it.
>>
>> There are some more generic test cases that are failing due to logdev
>> size requirement that changes with filesystem blocksize. I will address
>> them in a separate series.
>>
>> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
>>
>> Pankaj Raghav (2):
>>   xfs/558: scale blk IO size based on the filesystem blksz
>>   xfs/161: adapt the test case for LBS filesystem
> 
> Do either of these fail and require fixing for a 64k page size
> system running 64kB block size?
> 
> i.e. are these actual 64kB block size issues, or just issues with
> the LBS patchset?
> 

I had the same question in mind. Unfortunately, I don't have access to any 64k Page size
machine at the moment. I will ask around if I can get access to it.

@Zorro I saw you posted a test report for 64k blocksize. Is it possible for you to
see if these test cases(xfs/161, xfs/558) work in your setup with 64k block size?

CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
Zorro Lang Jan. 23, 2024, 1:43 p.m. UTC | #3
On Tue, Jan 23, 2024 at 09:52:39AM +0100, Pankaj Raghav wrote:
> On 23/01/2024 01:25, Dave Chinner wrote:
> > On Mon, Jan 22, 2024 at 12:17:49PM +0100, Pankaj Raghav (Samsung) wrote:
> >> From: Pankaj Raghav <p.raghav@samsung.com>
> >>
> >> Some tests need to be adapted to for LBS[1] based on the filesystem
> >> blocksize. These are generic changes where it uses the filesystem
> >> blocksize instead of assuming it.
> >>
> >> There are some more generic test cases that are failing due to logdev
> >> size requirement that changes with filesystem blocksize. I will address
> >> them in a separate series.
> >>
> >> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
> >>
> >> Pankaj Raghav (2):
> >>   xfs/558: scale blk IO size based on the filesystem blksz
> >>   xfs/161: adapt the test case for LBS filesystem
> > 
> > Do either of these fail and require fixing for a 64k page size
> > system running 64kB block size?
> > 
> > i.e. are these actual 64kB block size issues, or just issues with
> > the LBS patchset?
> > 
> 
> I had the same question in mind. Unfortunately, I don't have access to any 64k Page size
> machine at the moment. I will ask around if I can get access to it.
> 
> @Zorro I saw you posted a test report for 64k blocksize. Is it possible for you to
> see if these test cases(xfs/161, xfs/558) work in your setup with 64k block size?

Sure, I'll reserve one ppc64le and give it a try. But I remember there're more failed
cases on 64k blocksize xfs.

Thanks,
Zorro

> 
> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
>
Ritesh Harjani (IBM) Jan. 23, 2024, 3:35 p.m. UTC | #4
Pankaj Raghav <p.raghav@samsung.com> writes:

> On 23/01/2024 01:25, Dave Chinner wrote:
>> On Mon, Jan 22, 2024 at 12:17:49PM +0100, Pankaj Raghav (Samsung) wrote:
>>> From: Pankaj Raghav <p.raghav@samsung.com>
>>>
>>> Some tests need to be adapted to for LBS[1] based on the filesystem
>>> blocksize. These are generic changes where it uses the filesystem
>>> blocksize instead of assuming it.
>>>
>>> There are some more generic test cases that are failing due to logdev
>>> size requirement that changes with filesystem blocksize. I will address
>>> them in a separate series.
>>>
>>> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
>>>
>>> Pankaj Raghav (2):
>>>   xfs/558: scale blk IO size based on the filesystem blksz
>>>   xfs/161: adapt the test case for LBS filesystem
>> 
>> Do either of these fail and require fixing for a 64k page size
>> system running 64kB block size?
>> 
>> i.e. are these actual 64kB block size issues, or just issues with
>> the LBS patchset?
>> 
>
> I had the same question in mind. Unfortunately, I don't have access to any 64k Page size
> machine at the moment. I will ask around if I can get access to it.
>
> @Zorro I saw you posted a test report for 64k blocksize. Is it possible for you to
> see if these test cases(xfs/161, xfs/558) work in your setup with 64k block size?
>
> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.

Hi Pankaj,

So I tested this on Linux 6.6 on Power8 qemu (which I had it handy).
xfs/558 passed with both 64k blocksize & with 4k blocksize on a 64k
pagesize system.
However, since on this system the quota was v4.05, it does not support
bigtime feature hence could not run xfs/161. 

xfs/161       [not run] quota: bigtime support not detected
xfs/558 7s ...  21s

I will collect this info on a different system with latest kernel and
will update for xfs/161 too.

-ritesh
Ritesh Harjani (IBM) Jan. 23, 2024, 3:39 p.m. UTC | #5
Zorro Lang <zlang@redhat.com> writes:

> On Tue, Jan 23, 2024 at 09:52:39AM +0100, Pankaj Raghav wrote:
>> On 23/01/2024 01:25, Dave Chinner wrote:
>> > On Mon, Jan 22, 2024 at 12:17:49PM +0100, Pankaj Raghav (Samsung) wrote:
>> >> From: Pankaj Raghav <p.raghav@samsung.com>
>> >>
>> >> Some tests need to be adapted to for LBS[1] based on the filesystem
>> >> blocksize. These are generic changes where it uses the filesystem
>> >> blocksize instead of assuming it.
>> >>
>> >> There are some more generic test cases that are failing due to logdev
>> >> size requirement that changes with filesystem blocksize. I will address
>> >> them in a separate series.
>> >>
>> >> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
>> >>
>> >> Pankaj Raghav (2):
>> >>   xfs/558: scale blk IO size based on the filesystem blksz
>> >>   xfs/161: adapt the test case for LBS filesystem
>> > 
>> > Do either of these fail and require fixing for a 64k page size
>> > system running 64kB block size?
>> > 
>> > i.e. are these actual 64kB block size issues, or just issues with
>> > the LBS patchset?
>> > 
>> 
>> I had the same question in mind. Unfortunately, I don't have access to any 64k Page size
>> machine at the moment. I will ask around if I can get access to it.
>> 
>> @Zorro I saw you posted a test report for 64k blocksize. Is it possible for you to
>> see if these test cases(xfs/161, xfs/558) work in your setup with 64k block size?
>
> Sure, I'll reserve one ppc64le and give it a try. But I remember there're more failed
> cases on 64k blocksize xfs.
>

Please share the lists of failed testcases with 64k bs xfs (if you have it handy).
IIRC, many of them could be due to 64k bs itself, but yes, I can take a look and work on those.

Thanks!
-ritesh
Pankaj Raghav Jan. 23, 2024, 4:33 p.m. UTC | #6
>> @Zorro I saw you posted a test report for 64k blocksize. Is it possible for you to
>> see if these test cases(xfs/161, xfs/558) work in your setup with 64k block size?
> 
> Sure, I'll reserve one ppc64le and give it a try. But I remember there're more failed
> cases on 64k blocksize xfs.
> 

Thanks a lot, Zorro. I am also having issues with xfs/166 with LBS. I am not sure if this exists
on a 64k base page size system.

FYI, there are a lot of generic tests that are failing due to the filesystem size being too small
to fit the log with 64k block size. At least with LBS (I am not sure about 64k base page system),
these are the failures due to filesystem size:

generic/042, generic/081, generic/108, generic/455, generic/457, generic/482, generic/704,
generic/730, generic/731, shared/298.

For example in generic/042 with 64k block size:

max log size 388 smaller than min log size 2028, filesystem is too small
Usage: mkfs.xfs
/* blocksize */         [-b size=num]
/* config file */       [-c options=xxx]
/* metadata */          [-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,
                            inobtcount=0|1,bigtime=0|1]
/* data subvol */       [-d agcount=n,agsize=n,file,name=xxx,size=num,
                            (sunit=value,swidth=value|su=num,sw=num|noalign),
                            sectsize=num
/* force overwrite */   [-f]
/* inode size */        [-i perblock=n|size=num,maxpct=n,attr=0|1|2,
                            projid32bit=0|1,sparse=0|1,nrext64=0|1]
/* no discard */        [-K]
/* log subvol */        [-l agnum=n,internal,size=num,logdev=xxx,version=n
                            sunit=value|su=num,sectsize=num,lazy-count=0|1]
/* label */             [-L label (maximum 12 characters)]
/* naming */            [-n size=num,version=2|ci,ftype=0|1]
/* no-op info only */   [-N]
/* prototype file */    [-p fname]
/* quiet */             [-q]
/* realtime subvol */   [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */        [-s size=num]
/* version */           [-V]
                        devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
      xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).

--
Pankaj
Pankaj Raghav Jan. 23, 2024, 4:40 p.m. UTC | #7
>> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
> 
> Hi Pankaj,
> 
> So I tested this on Linux 6.6 on Power8 qemu (which I had it handy).
> xfs/558 passed with both 64k blocksize & with 4k blocksize on a 64k
> pagesize system.

Thanks for testing it out. I will investigate this further, and see why
I have this failure in LBS for 64k and not for 32k and 16k block sizes.

As this test also expects some invalidation during the page cache writeback,
this might an issue just with LBS and not for 64k page size machines.

Probably I will also spend some time to set up a Power8 qemu to test these failures.

> However, since on this system the quota was v4.05, it does not support
> bigtime feature hence could not run xfs/161. 
> 
> xfs/161       [not run] quota: bigtime support not detected
> xfs/558 7s ...  21s
> 
> I will collect this info on a different system with latest kernel and
> will update for xfs/161 too.
> 

Sounds good! Thanks!

> -ritesh
Ritesh Harjani (IBM) Jan. 23, 2024, 7:42 p.m. UTC | #8
Pankaj Raghav <p.raghav@samsung.com> writes:

>>> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
>> 
>> Hi Pankaj,
>> 
>> So I tested this on Linux 6.6 on Power8 qemu (which I had it handy).
>> xfs/558 passed with both 64k blocksize & with 4k blocksize on a 64k
>> pagesize system.

Ok, so it looks like the testcase xfs/558 is failing on linux-next with
64k blocksize but passing with 4k blocksize.
It thought it was passing on my previous linux 6.6 release, but I guess
those too were just some lucky runs. Here is the report -

linux-next: xfs/558 aggregate results across 11 runs: pass=2 (18.2%), fail=9 (81.8%)
v6.6: xfs/558 aggregate results across 11 runs: pass=5 (45.5%), fail=6 (54.5%)

So I guess, I will spend sometime analyzing why the failure.

Failure log
================
xfs/558 36s ... - output mismatch (see /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad)
    --- tests/xfs/558.out       2023-06-29 12:06:13.824276289 +0000
    +++ /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad       2024-01-23 18:54:56.613116520 +0000
    @@ -1,2 +1,3 @@
     QA output created by 558
    +Expected to hear about writeback iomap invalidations?
     Silence is golden
    ...
    (Run 'diff -u /root/xfstests-dev/tests/xfs/558.out /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad'  to see the entire diff)

HINT: You _MAY_ be missing kernel fix:
      5c665e5b5af6 xfs: remove xfs_map_cow

-ritesh

>
> Thanks for testing it out. I will investigate this further, and see why
> I have this failure in LBS for 64k and not for 32k and 16k block sizes.
>
> As this test also expects some invalidation during the page cache writeback,
> this might an issue just with LBS and not for 64k page size machines.
>
> Probably I will also spend some time to set up a Power8 qemu to test these failures.
>
>> However, since on this system the quota was v4.05, it does not support
>> bigtime feature hence could not run xfs/161. 
>> 
>> xfs/161       [not run] quota: bigtime support not detected
>> xfs/558 7s ...  21s
>> 
>> I will collect this info on a different system with latest kernel and
>> will update for xfs/161 too.
>> 
>
> Sounds good! Thanks!
>
>> -ritesh
Pankaj Raghav Jan. 23, 2024, 8:21 p.m. UTC | #9
On 23/01/2024 20:42, Ritesh Harjani (IBM) wrote:
> Pankaj Raghav <p.raghav@samsung.com> writes:
> 
>>>> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
>>>
>>> Hi Pankaj,
>>>
>>> So I tested this on Linux 6.6 on Power8 qemu (which I had it handy).
>>> xfs/558 passed with both 64k blocksize & with 4k blocksize on a 64k
>>> pagesize system.
> 
> Ok, so it looks like the testcase xfs/558 is failing on linux-next with
> 64k blocksize but passing with 4k blocksize.
> It thought it was passing on my previous linux 6.6 release, but I guess
> those too were just some lucky runs. Here is the report -
> 
> linux-next: xfs/558 aggregate results across 11 runs: pass=2 (18.2%), fail=9 (81.8%)
> v6.6: xfs/558 aggregate results across 11 runs: pass=5 (45.5%), fail=6 (54.5%)
> 

Oh, thanks for reporting back!

I can confirm that it happens 100% of time with my LBS patch enabled for 64k bs.

Let's see what Zorro reports back on a real 64k hardware.

> So I guess, I will spend sometime analyzing why the failure.
> 

Could you try the patch I sent for xfs/558 and see if it works all the time?

The issue is 'xfs_wb*iomap_invalid' not getting triggered when we have larger
bs. I basically increased the blksz in the test based on the underlying bs.
Maybe there is a better solution than what I proposed, but it fixes the test.


> Failure log
> ================
> xfs/558 36s ... - output mismatch (see /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad)
>     --- tests/xfs/558.out       2023-06-29 12:06:13.824276289 +0000
>     +++ /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad       2024-01-23 18:54:56.613116520 +0000
>     @@ -1,2 +1,3 @@
>      QA output created by 558
>     +Expected to hear about writeback iomap invalidations?
>      Silence is golden
>     ...
>     (Run 'diff -u /root/xfstests-dev/tests/xfs/558.out /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad'  to see the entire diff)
> 
> HINT: You _MAY_ be missing kernel fix:
>       5c665e5b5af6 xfs: remove xfs_map_cow
> 
> -ritesh
> 
>>
>> Thanks for testing it out. I will investigate this further, and see why
>> I have this failure in LBS for 64k and not for 32k and 16k block sizes.
>>
>> As this test also expects some invalidation during the page cache writeback,
>> this might an issue just with LBS and not for 64k page size machines.
>>
>> Probably I will also spend some time to set up a Power8 qemu to test these failures.
>>
>>> However, since on this system the quota was v4.05, it does not support
>>> bigtime feature hence could not run xfs/161. 
>>>
>>> xfs/161       [not run] quota: bigtime support not detected
>>> xfs/558 7s ...  21s
>>>
>>> I will collect this info on a different system with latest kernel and
>>> will update for xfs/161 too.
>>>
>>
>> Sounds good! Thanks!
>>
>>> -ritesh
Darrick J. Wong Jan. 24, 2024, 4:58 p.m. UTC | #10
On Tue, Jan 23, 2024 at 09:21:50PM +0100, Pankaj Raghav wrote:
> On 23/01/2024 20:42, Ritesh Harjani (IBM) wrote:
> > Pankaj Raghav <p.raghav@samsung.com> writes:
> > 
> >>>> CCing Ritesh as I saw him post a patch to fix a testcase for 64k block size.
> >>>
> >>> Hi Pankaj,
> >>>
> >>> So I tested this on Linux 6.6 on Power8 qemu (which I had it handy).
> >>> xfs/558 passed with both 64k blocksize & with 4k blocksize on a 64k
> >>> pagesize system.
> > 
> > Ok, so it looks like the testcase xfs/558 is failing on linux-next with
> > 64k blocksize but passing with 4k blocksize.
> > It thought it was passing on my previous linux 6.6 release, but I guess
> > those too were just some lucky runs. Here is the report -
> > 
> > linux-next: xfs/558 aggregate results across 11 runs: pass=2 (18.2%), fail=9 (81.8%)
> > v6.6: xfs/558 aggregate results across 11 runs: pass=5 (45.5%), fail=6 (54.5%)
> > 
> 
> Oh, thanks for reporting back!
> 
> I can confirm that it happens 100% of time with my LBS patch enabled for 64k bs.
> 
> Let's see what Zorro reports back on a real 64k hardware.
> 
> > So I guess, I will spend sometime analyzing why the failure.
> > 
> 
> Could you try the patch I sent for xfs/558 and see if it works all the time?
> 
> The issue is 'xfs_wb*iomap_invalid' not getting triggered when we have larger
> bs. I basically increased the blksz in the test based on the underlying bs.
> Maybe there is a better solution than what I proposed, but it fixes the test.

The only improvement I can think of would be to force-disable large
folios on the file being tested.  Large folios mess with testing because
the race depends on write and writeback needing to walk multiple pages.
Right now the pagecache only institutes large folios if the IO patterns
are large IOs, but in theory that could change some day.

I suspect that the iomap tracepoint data and possibly
trace_mm_filemap_add_to_page_cache might help figure out what size
folios are actually in use during the invalidation test.

(Perhaps it's time for me to add a 64k bs VM to the test fleet.)

--D

> > Failure log
> > ================
> > xfs/558 36s ... - output mismatch (see /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad)
> >     --- tests/xfs/558.out       2023-06-29 12:06:13.824276289 +0000
> >     +++ /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad       2024-01-23 18:54:56.613116520 +0000
> >     @@ -1,2 +1,3 @@
> >      QA output created by 558
> >     +Expected to hear about writeback iomap invalidations?
> >      Silence is golden
> >     ...
> >     (Run 'diff -u /root/xfstests-dev/tests/xfs/558.out /root/xfstests-dev/results//xfs_64k_iomap/xfs/558.out.bad'  to see the entire diff)
> > 
> > HINT: You _MAY_ be missing kernel fix:
> >       5c665e5b5af6 xfs: remove xfs_map_cow
> > 
> > -ritesh
> > 
> >>
> >> Thanks for testing it out. I will investigate this further, and see why
> >> I have this failure in LBS for 64k and not for 32k and 16k block sizes.
> >>
> >> As this test also expects some invalidation during the page cache writeback,
> >> this might an issue just with LBS and not for 64k page size machines.
> >>
> >> Probably I will also spend some time to set up a Power8 qemu to test these failures.
> >>
> >>> However, since on this system the quota was v4.05, it does not support
> >>> bigtime feature hence could not run xfs/161. 
> >>>
> >>> xfs/161       [not run] quota: bigtime support not detected
> >>> xfs/558 7s ...  21s
> >>>
> >>> I will collect this info on a different system with latest kernel and
> >>> will update for xfs/161 too.
> >>>
> >>
> >> Sounds good! Thanks!
> >>
> >>> -ritesh
Pankaj Raghav Jan. 24, 2024, 9:06 p.m. UTC | #11
>> The issue is 'xfs_wb*iomap_invalid' not getting triggered when we have larger
>> bs. I basically increased the blksz in the test based on the underlying bs.
>> Maybe there is a better solution than what I proposed, but it fixes the test.
> 
> The only improvement I can think of would be to force-disable large
> folios on the file being tested.  Large folios mess with testing because
> the race depends on write and writeback needing to walk multiple pages.
> Right now the pagecache only institutes large folios if the IO patterns
> are large IOs, but in theory that could change some day.
> 

Hmm, so we create like a debug parameter to disable large folios while the file is
being tested?

The only issue is that LBS work needs large folio to be enabled.

So I think then the solution is to add a debug parameter to disable large folios
for normal blocksizes (bs <= ps) while running the test but disable this test
altogether for LBS(bs > ps)?


> I suspect that the iomap tracepoint data and possibly
> trace_mm_filemap_add_to_page_cache might help figure out what size
> folios are actually in use during the invalidation test.
> 

Cool! I will see if I can do some analysis by adding trace_mm_filemap_add_to_page_cache
while running the test.

> (Perhaps it's time for me to add a 64k bs VM to the test fleet.)
> 

I confirmed with Chandan that Oracle OCI with Ampere supports 64kb page sizes. We (Luis and I)
are also looking into running kdevops on XFS with 64kb page size and block size as it might
be useful for the LBS work to cross verify the failures.