[v2,00/10] xfs: stable fixes for v4.19.y
mbox series

Message ID 20190204165427.23607-1-mcgrof@kernel.org
Headers show
Series
  • xfs: stable fixes for v4.19.y
Related show

Message

Luis Chamberlain Feb. 4, 2019, 4:54 p.m. UTC
Kernel stable team,

here is a v2 respin of my XFS stable patches for v4.19.y. The only
change in this series is adding the upstream commit to the commit log,
and I've now also Cc'd stable@vger.kernel.org as well. No other issues
were spotted or raised with this series.

Reviews, questions, or rants are greatly appreciated.

  Luis

Brian Foster (1):
  xfs: fix shared extent data corruption due to missing cow reservation

Carlos Maiolino (1):
  xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat

Christoph Hellwig (1):
  xfs: cancel COW blocks before swapext

Christophe JAILLET (1):
  xfs: Fix error code in 'xfs_ioc_getbmap()'

Darrick J. Wong (1):
  xfs: fix PAGE_MASK usage in xfs_free_file_space

Dave Chinner (3):
  xfs: fix overflow in xfs_attr3_leaf_verify
  xfs: fix transient reference count error in
    xfs_buf_resubmit_failed_buffers
  xfs: delalloc -> unwritten COW fork allocation can go wrong

Eric Sandeen (1):
  xfs: fix inverted return from xfs_btree_sblock_verify_crc

Ye Yin (1):
  fs/xfs: fix f_ffree value for statfs when project quota is set

 fs/xfs/libxfs/xfs_attr_leaf.c | 11 +++++++++--
 fs/xfs/libxfs/xfs_bmap.c      |  5 ++++-
 fs/xfs/libxfs/xfs_btree.c     |  2 +-
 fs/xfs/xfs_bmap_util.c        | 10 ++++++++--
 fs/xfs/xfs_buf_item.c         | 28 +++++++++++++++++++++-------
 fs/xfs/xfs_ioctl.c            |  2 +-
 fs/xfs/xfs_qm_bhv.c           |  2 +-
 fs/xfs/xfs_reflink.c          |  1 +
 fs/xfs/xfs_stats.c            |  2 +-
 9 files changed, 47 insertions(+), 16 deletions(-)

Comments

Amir Goldstein Feb. 5, 2019, 6:44 a.m. UTC | #1
On Mon, Feb 4, 2019 at 6:54 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> Kernel stable team,
>
> here is a v2 respin of my XFS stable patches for v4.19.y. The only
> change in this series is adding the upstream commit to the commit log,
> and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> were spotted or raised with this series.
>
> Reviews, questions, or rants are greatly appreciated.

Luis,

Thanks a lot for doing this work.

For the sake of people not following "oscheck", could you please
write a list of configurations you tested with xfstests. auto group?
Any expunged tests we should know about?

I went over the candidate patches and to me, they all look like stable
worthy patches and I have not identified any dependencies.

Original authors and reviewers are in the best position to  verify
those assessments, so please guys, if each one of you acks his
own patch, that shouldn't take a lot of anyone's time.

Specifically, repeating Luis's request from v1 cover letter -
There are two patches by Dave ([6,7/10]) that are originally from
a 7 patch series of assorted fixes:
https://patchwork.kernel.org/cover/10689445/

Please confirm that those two patches do stand on their own.

Thanks,
Amir.


>
>   Luis
>
> Brian Foster (1):
>   xfs: fix shared extent data corruption due to missing cow reservation
>
> Carlos Maiolino (1):
>   xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
>
> Christoph Hellwig (1):
>   xfs: cancel COW blocks before swapext
>
> Christophe JAILLET (1):
>   xfs: Fix error code in 'xfs_ioc_getbmap()'
>
> Darrick J. Wong (1):
>   xfs: fix PAGE_MASK usage in xfs_free_file_space
>
> Dave Chinner (3):
>   xfs: fix overflow in xfs_attr3_leaf_verify
>   xfs: fix transient reference count error in
>     xfs_buf_resubmit_failed_buffers
>   xfs: delalloc -> unwritten COW fork allocation can go wrong
>
> Eric Sandeen (1):
>   xfs: fix inverted return from xfs_btree_sblock_verify_crc
>
> Ye Yin (1):
>   fs/xfs: fix f_ffree value for statfs when project quota is set
>
>  fs/xfs/libxfs/xfs_attr_leaf.c | 11 +++++++++--
>  fs/xfs/libxfs/xfs_bmap.c      |  5 ++++-
>  fs/xfs/libxfs/xfs_btree.c     |  2 +-
>  fs/xfs/xfs_bmap_util.c        | 10 ++++++++--
>  fs/xfs/xfs_buf_item.c         | 28 +++++++++++++++++++++-------
>  fs/xfs/xfs_ioctl.c            |  2 +-
>  fs/xfs/xfs_qm_bhv.c           |  2 +-
>  fs/xfs/xfs_reflink.c          |  1 +
>  fs/xfs/xfs_stats.c            |  2 +-
>  9 files changed, 47 insertions(+), 16 deletions(-)
>
> --
> 2.18.0
>
Dave Chinner Feb. 5, 2019, 10:06 p.m. UTC | #2
On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> Kernel stable team,
> 
> here is a v2 respin of my XFS stable patches for v4.19.y. The only
> change in this series is adding the upstream commit to the commit log,
> and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> were spotted or raised with this series.
> 
> Reviews, questions, or rants are greatly appreciated.

Test results?

The set of changes look fine themselves, but as always, the proof is
in the testing...

Cheers,

Dave.
Sasha Levin Feb. 6, 2019, 4:05 a.m. UTC | #3
On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
>On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
>> Kernel stable team,
>>
>> here is a v2 respin of my XFS stable patches for v4.19.y. The only
>> change in this series is adding the upstream commit to the commit log,
>> and I've now also Cc'd stable@vger.kernel.org as well. No other issues
>> were spotted or raised with this series.
>>
>> Reviews, questions, or rants are greatly appreciated.
>
>Test results?
>
>The set of changes look fine themselves, but as always, the proof is
>in the testing...

Luis noted on v1 that it passes through his oscheck test suite, and I
noted that I haven't seen any regression with the xfstests scripts I
have.

What sort of data are you looking for beyond "we didn't see a
regression"?

--
Thanks,
Sasha
Dave Chinner Feb. 6, 2019, 9:54 p.m. UTC | #4
On Tue, Feb 05, 2019 at 11:05:59PM -0500, Sasha Levin wrote:
> On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> >On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> >>Kernel stable team,
> >>
> >>here is a v2 respin of my XFS stable patches for v4.19.y. The only
> >>change in this series is adding the upstream commit to the commit log,
> >>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> >>were spotted or raised with this series.
> >>
> >>Reviews, questions, or rants are greatly appreciated.
> >
> >Test results?
> >
> >The set of changes look fine themselves, but as always, the proof is
> >in the testing...
> 
> Luis noted on v1 that it passes through his oscheck test suite, and I
> noted that I haven't seen any regression with the xfstests scripts I
> have.
> 
> What sort of data are you looking for beyond "we didn't see a
> regression"?

Nothing special, just a summary of what was tested so we have some
visibility of whether the testing covered the proposed changes
sufficiently.  i.e. something like:

	Patchset was run through ltp and the fstests "auto" group
	with the following configs:

	- mkfs/mount defaults
	- -m reflink=1,rmapbt=1
	- -b size=1k
	- -m crc=0
	....

	No new regressions were reported.


Really, all I'm looking for is a bit more context for the review
process - nobody remembers what configs other people test. However,
it's important in reviewing a backport to know whether a backport to
a fix, say, a bug in the rmap code actually got exercised by the
tests on an rmap enabled filesystem...

Cheers,

Dave.
Sasha Levin Feb. 8, 2019, 6:06 a.m. UTC | #5
On Thu, Feb 07, 2019 at 08:54:54AM +1100, Dave Chinner wrote:
>On Tue, Feb 05, 2019 at 11:05:59PM -0500, Sasha Levin wrote:
>> On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
>> >On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
>> >>Kernel stable team,
>> >>
>> >>here is a v2 respin of my XFS stable patches for v4.19.y. The only
>> >>change in this series is adding the upstream commit to the commit log,
>> >>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
>> >>were spotted or raised with this series.
>> >>
>> >>Reviews, questions, or rants are greatly appreciated.
>> >
>> >Test results?
>> >
>> >The set of changes look fine themselves, but as always, the proof is
>> >in the testing...
>>
>> Luis noted on v1 that it passes through his oscheck test suite, and I
>> noted that I haven't seen any regression with the xfstests scripts I
>> have.
>>
>> What sort of data are you looking for beyond "we didn't see a
>> regression"?
>
>Nothing special, just a summary of what was tested so we have some
>visibility of whether the testing covered the proposed changes
>sufficiently.  i.e. something like:
>
>	Patchset was run through ltp and the fstests "auto" group
>	with the following configs:
>
>	- mkfs/mount defaults
>	- -m reflink=1,rmapbt=1
>	- -b size=1k
>	- -m crc=0
>	....
>
>	No new regressions were reported.
>
>
>Really, all I'm looking for is a bit more context for the review
>process - nobody remembers what configs other people test. However,
>it's important in reviewing a backport to know whether a backport to
>a fix, say, a bug in the rmap code actually got exercised by the
>tests on an rmap enabled filesystem...

Sure! Below are the various configs this was run against. There were
multiple runs over 48+ hours and no regressions from a 4.14.17 baseline
were observed.

[default]
TEST_DEV=/dev/nvme0n1p1
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/nvme0n1p2"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)
MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/nve0n1p3
FSTYP=xfs


[default]
TEST_DEV=/dev/nvme0n1p1
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/nvme0n1p2"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/nvme0n1p3
FSTYP=xfs


[default]
TEST_DEV=/dev/nvme0n1p1
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/nvme0n1p2"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/nvme0n1p3
FSTYP=xfs


[default]
TEST_DEV=/dev/nvme0n1p1
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/nvme0n1p2"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/nvme0n1p3
FSTYP=xfs


[default]
TEST_DEV=/dev/nvme0n1p1
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/nvme0n1p2"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/nvme0n1p3
FSTYP=xfs


[default_pmem]
TEST_DEV=/dev/pmem0
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/pmem1"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/pmem2
FSTYP=xfs


[default_pmem]
TEST_DEV=/dev/pmem0
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/pmem1"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/pmem2
FSTYP=xfs


[default_pmem]
TEST_DEV=/dev/pmem0
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/pmem1"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/pmem2
FSTYP=xfs


[default_pmem]
TEST_DEV=/dev/pmem0
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/pmem1"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/pmem2
FSTYP=xfs


[default_pmem]
TEST_DEV=/dev/pmem0
TEST_DIR=/media/test
SCRATCH_DEV_POOL="/dev/pmem1"
SCRATCH_MNT=/media/scratch
RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/pmem2
FSTYP=xfs


--
Thanks,
Sasha
Luis Chamberlain Feb. 8, 2019, 7:48 p.m. UTC | #6
On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> > Kernel stable team,
> > 
> > here is a v2 respin of my XFS stable patches for v4.19.y. The only
> > change in this series is adding the upstream commit to the commit log,
> > and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> > were spotted or raised with this series.
> > 
> > Reviews, questions, or rants are greatly appreciated.
> 
> Test results?
> 
> The set of changes look fine themselves, but as always, the proof is
> in the testing...

I've first established a baseline for v4.19.18 with fstests using
a series of different sections to test against. I annotated the
failures on an expunge list and then use that expunge list to confirm
no regressions -- no failures if we skip the failures already known for
v4.19.18.

Each different configuration I test against I use a section for. I only
test x86_64 for now but am starting to create a baseline for ppc64le.

The sections I use:

  * xfs
  * xfs_nocrc
  * xfs_nocrc_512
  * xfs_reflink
  * xfs_reflink_1024
  * xfs_logdev
  * xfs_realtimedev

The sections definitions for these are below:

[xfs]
MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/loop15
FSTYP=xfs

[xfs_nocrc]
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/loop15
FSTYP=xfs

[xfs_nocrc_512]
MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/loop15
FSTYP=xfs

[xfs_reflink]
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/loop15
FSTYP=xfs

[xfs_reflink_1024]
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
USE_EXTERNAL=no
LOGWRITES_DEV=/dev/loop15
FSTYP=xfs

[xfs_logdev]
MKFS_OPTIONS="-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0 -lsize=1g"
SCRATCH_LOGDEV=/dev/loop15
USE_EXTERNAL=yes
FSTYP=xfs

[xfs_realtimedev]
MKFS_OPTIONS="-f -lsize=1g"
SCRATCH_LOGDEV=/dev/loop15
SCRATCH_RTDEV=/dev/loop14
USE_EXTERNAL=yes
FSTYP=xfs

These are listed in my example.config which oscheck copies over to
/var/lib/xfstests/config/$(hostname).config upon install if you don't
have one.

I didn't find any regressions against my tests.

The baseline is reflected on oscheck's expunge list per kernel release,
so in this case expunges/4.19.18. A file exists for each section which
tests are known to fail.

I'll put them below here for completeness, but all of these files are
present on my oscheck repository [0], it is what I use to track baselines
for upstream kernels for fstests failures:

$ cat expunges/4.19.18/xfs/unassigned/xfs.txt
generic/091
generic/263
generic/464 # after ~6 runs
generic/475 # after ~15 runs
generic/484
xfs/191-input-validation
xfs/278
xfs/451
xfs/495
xfs/499

$ cat expunges/4.19.18/xfs/unassigned/xfs_nocrc.txt
generic/091
generic/263
generic/464 # after ~39 runs
generic/475 # after ~5-10 runs
generic/484
xfs/191-input-validation
xfs/273
xfs/278
xfs/451
xfs/495
xfs/499

$ cat expunges/4.19.18/xfs/unassigned/xfs_nocrc_512.txt
generic/091
generic/263
generic/475 # after ~33 runs
generic/482 # after ~16 runs
generic/484
xfs/071
xfs/191-input-validation
xfs/273
xfs/278
xfs/451
xfs/495
xfs/499

$ cat expunges/4.19.18/xfs/unassigned/xfs_reflink.txt
generic/091
generic/263
generic/464 # after ~1 run
generic/475 # after ~5 runs
generic/484
xfs/191-input-validation
xfs/278
xfs/451
xfs/495
xfs/499

$ cat expunges/4.19.18/xfs/unassigned/xfs_reflink_1024.txt
generic/091
generic/263
generic/475 # after ~2 runs
generic/484
xfs/191-input-validation
xfs/278
xfs/451
xfs/495
xfs/499

The xfs_logdev and xfs_realtimedev sections use an external log, and as
I have noted before it seems works is needed to rule out an actual
failure.

But for completely the test which fstests says fail for these sections
are below:

$ cat expunges/4.19.18/xfs/unassigned/xfs_logdev.txt
generic/034
generic/039
generic/040
generic/041
generic/054
generic/055
generic/056
generic/057
generic/059
generic/065
generic/066
generic/073
generic/081
generic/090
generic/091
generic/101
generic/104
generic/106
generic/107
generic/177
generic/204
generic/207
generic/223
generic/260
generic/263
generic/311
generic/321
generic/322
generic/325
generic/335
generic/336
generic/341
generic/342
generic/343
generic/347
generic/348
generic/361
generic/376
generic/455
generic/459
generic/464 # fails after ~2 runs
generic/475 # fails after ~5 runs, crashes sometimes
generic/482
generic/483
generic/484
generic/489
generic/498
generic/500
generic/502
generic/510
generic/512
generic/520
shared/002
shared/298
xfs/030
xfs/033
xfs/045
xfs/070
xfs/137
xfs/138
xfs/191-input-validation
xfs/194
xfs/195
xfs/199
xfs/278
xfs/284
xfs/291
xfs/294
xfs/424
xfs/451
xfs/495
xfs/499

$ cat expunges/4.19.18/xfs/unassigned/xfs_realtimedev.txt
generic/034
generic/039
generic/040
generic/041
generic/054
generic/056
generic/057
generic/059
generic/065
generic/066
generic/073
generic/081
generic/090
generic/091
generic/101
generic/104
generic/106
generic/107
generic/177
generic/204
generic/207
generic/223
generic/260
generic/263
generic/311
generic/321
generic/322
generic/325
generic/335
generic/336
generic/341
generic/342
generic/343
generic/347
generic/348
generic/361
generic/376
generic/455
generic/459
generic/464 # fails after ~40 runs
generic/475 # fails, and sometimes crashes
generic/482
generic/483
generic/484
generic/489
generic/498
generic/500
generic/502
generic/510
generic/512
generic/520
shared/002
shared/298
xfs/002
xfs/030
xfs/033
xfs/068
xfs/070
xfs/137
xfs/138
xfs/191-input-validation
xfs/194
xfs/195
xfs/199
xfs/278
xfs/291
xfs/294
xfs/419
xfs/424
xfs/451
xfs/495
xfs/499

Perhaps worth noting which was curious is that I could not get to
trigger generic/464 on sections xfs_nocrc_512 and xfs_reflink_1024.

Athough I don't have a full baseline for ppc64le I did confirm that
backporting upstream commit 837514f7a4ca fixes the kernel.org bug [1]
report triggerable via generic/070 on ppc64le.

If you have any questions please let me know.

[0] https://gitlab.com/mcgrof/oscheck
[1] https://bugzilla.kernel.org/show_bug.cgi?id=201577

  Luis
Luis Chamberlain Feb. 8, 2019, 8:06 p.m. UTC | #7
On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
> On Thu, Feb 07, 2019 at 08:54:54AM +1100, Dave Chinner wrote:
> > On Tue, Feb 05, 2019 at 11:05:59PM -0500, Sasha Levin wrote:
> > > On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> > > >On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> > > >>Kernel stable team,
> > > >>
> > > >>here is a v2 respin of my XFS stable patches for v4.19.y. The only
> > > >>change in this series is adding the upstream commit to the commit log,
> > > >>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> > > >>were spotted or raised with this series.
> > > >>
> > > >>Reviews, questions, or rants are greatly appreciated.
> > > >
> > > >Test results?
> > > >
> > > >The set of changes look fine themselves, but as always, the proof is
> > > >in the testing...
> > > 
> > > Luis noted on v1 that it passes through his oscheck test suite, and I
> > > noted that I haven't seen any regression with the xfstests scripts I
> > > have.
> > > 
> > > What sort of data are you looking for beyond "we didn't see a
> > > regression"?
> > 
> > Nothing special, just a summary of what was tested so we have some
> > visibility of whether the testing covered the proposed changes
> > sufficiently.  i.e. something like:
> > 
> > 	Patchset was run through ltp and the fstests "auto" group
> > 	with the following configs:
> > 
> > 	- mkfs/mount defaults
> > 	- -m reflink=1,rmapbt=1
> > 	- -b size=1k
> > 	- -m crc=0
> > 	....
> > 
> > 	No new regressions were reported.
> > 
> > 
> > Really, all I'm looking for is a bit more context for the review
> > process - nobody remembers what configs other people test. However,
> > it's important in reviewing a backport to know whether a backport to
> > a fix, say, a bug in the rmap code actually got exercised by the
> > tests on an rmap enabled filesystem...
> 
> Sure! Below are the various configs this was run against.

To be clear, that was Sasha's own effort. I just replied with my own
set of test and results against the baseline to confirm no regressions
were found.

My tests run on 8-core kvm vms with 8 GiB of RAM, and qcow2 images which
reside on an XFS partition mounted on nvme drives on the hypervisor, the
hypervisor runs CentOS 7, on 3.10.0-862.3.2.el7.x86_64.

For the guest I use different qcow2 images. One is 100 GiB and is used
to expose a disk to the guest so it can use it where to store the files
use dfor the SCRATCH_DEV_POOL. For the SCRATCH_DEV_POOL I use loopback devices,
using files created on the guest's own /media/truncated/ partition,
using the 100 GiB partition. I end up with 8 loopback devices to test
for then:

SCRATCH_DEV_POOL="/dev/loop5 /dev/loop6 /dev/loop6 /dev/loop7 /dev/loop8 /dev/loop9 /dev/loop10 /dev/loop11"

The loopback devices are setup using my oscheck's $(./gendisks.sh -d)
script.

Since Sasha seems to have a system rigged for testing XFS what I could
do is collaborate with Sasha to consolidate our sections for testing and
also have both of our systems run all tests to at least have two
different test systems confirming no regressions. That is, if Sasha
is up or that. Otherwise I'll continue with whatever rig I can get
my hands on each time I test.

I have an expunge list, and he has his own, we need to consolidate that
as well with time.

Since some tests have a failure rate which is not 1 -- ie, it doesn't
fail 100% of the time, I am considering adding a *spinner tester* for each
test which runs each test 1000 times and records when if first fails.
It assumes that if you can run a test 1000 times, we really don't have
it as an expunge. If there is a better term for failure rate let's use
it, just not familiar, but I'm sure this nomenclature must exist.

A curious thing I noted was that the ppc64le bug didn't actually fail
for me as a straight forward test. That is, I had to *first* manually
mkfs.xfs with the big block specification for the partition used for
TEST_DEV and then also the first device in SCRATCH_DEV_POOL with big
block. Only after I did this and then run the test did I get with 100%
failure rate the ability to trigger the failure.

It has me wondering how many other test may fail if we did the same.

  Luis
Dave Chinner Feb. 8, 2019, 9:29 p.m. UTC | #8
On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
> On Thu, Feb 07, 2019 at 08:54:54AM +1100, Dave Chinner wrote:
> >On Tue, Feb 05, 2019 at 11:05:59PM -0500, Sasha Levin wrote:
> >>On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> >>>On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> >>>>Kernel stable team,
> >>>>
> >>>>here is a v2 respin of my XFS stable patches for v4.19.y. The only
> >>>>change in this series is adding the upstream commit to the commit log,
> >>>>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> >>>>were spotted or raised with this series.
> >>>>
> >>>>Reviews, questions, or rants are greatly appreciated.
> >>>
> >>>Test results?
> >>>
> >>>The set of changes look fine themselves, but as always, the proof is
> >>>in the testing...
> >>
> >>Luis noted on v1 that it passes through his oscheck test suite, and I
> >>noted that I haven't seen any regression with the xfstests scripts I
> >>have.
> >>
> >>What sort of data are you looking for beyond "we didn't see a
> >>regression"?
> >
> >Nothing special, just a summary of what was tested so we have some
> >visibility of whether the testing covered the proposed changes
> >sufficiently.  i.e. something like:
> >
> >	Patchset was run through ltp and the fstests "auto" group
> >	with the following configs:
> >
> >	- mkfs/mount defaults
> >	- -m reflink=1,rmapbt=1
> >	- -b size=1k
> >	- -m crc=0
> >	....
> >
> >	No new regressions were reported.
> >
> >
> >Really, all I'm looking for is a bit more context for the review
> >process - nobody remembers what configs other people test. However,
> >it's important in reviewing a backport to know whether a backport to
> >a fix, say, a bug in the rmap code actually got exercised by the
> >tests on an rmap enabled filesystem...
> 
> Sure! Below are the various configs this was run against. There were
> multiple runs over 48+ hours and no regressions from a 4.14.17 baseline
> were observed.

Thanks, Sasha. As an ongoing thing, I reckon a "grep _OPTIONS
<config_files>" (catches both mkfs and mount options) would be
sufficient as a summary of what was tested in the series
decription...

Cheers,

Dave.
Dave Chinner Feb. 8, 2019, 9:32 p.m. UTC | #9
On Fri, Feb 08, 2019 at 11:48:29AM -0800, Luis Chamberlain wrote:
> On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> > On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> > > Kernel stable team,
> > > 
> > > here is a v2 respin of my XFS stable patches for v4.19.y. The only
> > > change in this series is adding the upstream commit to the commit log,
> > > and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> > > were spotted or raised with this series.
> > > 
> > > Reviews, questions, or rants are greatly appreciated.
> > 
> > Test results?
> > 
> > The set of changes look fine themselves, but as always, the proof is
> > in the testing...
> 
> I've first established a baseline for v4.19.18 with fstests using
> a series of different sections to test against. I annotated the
> failures on an expunge list and then use that expunge list to confirm
> no regressions -- no failures if we skip the failures already known for
> v4.19.18.
> 
> Each different configuration I test against I use a section for. I only
> test x86_64 for now but am starting to create a baseline for ppc64le.
> 
> The sections I use:
> 
>   * xfs
>   * xfs_nocrc
>   * xfs_nocrc_512
>   * xfs_reflink
>   * xfs_reflink_1024
>   * xfs_logdev
>   * xfs_realtimedev

Yup, that seems to cover most common things :)

> The xfs_logdev and xfs_realtimedev sections use an external log, and as
> I have noted before it seems works is needed to rule out an actual
> failure.

Yeah, there are many tests that don't work properly with external
devices, esp. RT devices. That's a less critical area to cover, but
it's still good to run it :)

Thanks, Luis!

-Dave.
Luis Chamberlain Feb. 8, 2019, 9:50 p.m. UTC | #10
On Sat, Feb 09, 2019 at 08:32:01AM +1100, Dave Chinner wrote:
> On Fri, Feb 08, 2019 at 11:48:29AM -0800, Luis Chamberlain wrote:
> > On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> > > On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> > > > Kernel stable team,
> > > > 
> > > > here is a v2 respin of my XFS stable patches for v4.19.y. The only
> > > > change in this series is adding the upstream commit to the commit log,
> > > > and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> > > > were spotted or raised with this series.
> > > > 
> > > > Reviews, questions, or rants are greatly appreciated.
> > > 
> > > Test results?
> > > 
> > > The set of changes look fine themselves, but as always, the proof is
> > > in the testing...
> > 
> > I've first established a baseline for v4.19.18 with fstests using
> > a series of different sections to test against. I annotated the
> > failures on an expunge list and then use that expunge list to confirm
> > no regressions -- no failures if we skip the failures already known for
> > v4.19.18.
> > 
> > Each different configuration I test against I use a section for. I only
> > test x86_64 for now but am starting to create a baseline for ppc64le.
> > 
> > The sections I use:
> > 
> >   * xfs
> >   * xfs_nocrc
> >   * xfs_nocrc_512
> >   * xfs_reflink
> >   * xfs_reflink_1024
> >   * xfs_logdev
> >   * xfs_realtimedev
> 
> Yup, that seems to cover most common things :)

To be clear in the future I hope to also have a baseline for:

  * xfs_bigblock

But that is *currently* [0] only possible on the following architectures
with the respective kernel config:

aarch64:
CONFIG_ARM64_64K_PAGES=y

ppc64le:
CONFIG_PPC_64K_PAGES=y

[0] Someone is working on 64k pages on x86 I think?

  Luis
Luis Chamberlain Feb. 8, 2019, 10:17 p.m. UTC | #11
On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
> Sure! Below are the various configs this was run against. There were
> multiple runs over 48+ hours and no regressions from a 4.14.17 baseline
> were observed.

In an effort to consolidate our sections:

> [default]
> TEST_DEV=/dev/nvme0n1p1
> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
> MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'

This matches my "xfs" section.

> USE_EXTERNAL=no
> LOGWRITES_DEV=/dev/nve0n1p3
> FSTYP=xfs
> 
> 
> [default]
> TEST_DEV=/dev/nvme0n1p1
> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
> MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1,'

This matches my "xfs_reflink"

> USE_EXTERNAL=no
> LOGWRITES_DEV=/dev/nvme0n1p3
> FSTYP=xfs
> 
> 
> [default]
> TEST_DEV=/dev/nvme0n1p1
> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
> MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'

This matches my "xfs_reflink_1024" section.

> USE_EXTERNAL=no
> LOGWRITES_DEV=/dev/nvme0n1p3
> FSTYP=xfs
> 
> 
> [default]
> TEST_DEV=/dev/nvme0n1p1
> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
> MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,'

This matches my "xfs_nocrc" section.

> USE_EXTERNAL=no
> LOGWRITES_DEV=/dev/nvme0n1p3
> FSTYP=xfs
> 
> 
> [default]
> TEST_DEV=/dev/nvme0n1p1
> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
> MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'

This matches my "xfs_nocrc_512" section.

> USE_EXTERNAL=no
> LOGWRITES_DEV=/dev/nvme0n1p3
> FSTYP=xfs
> 
> 
> [default_pmem]
> TEST_DEV=/dev/pmem0

I'll have to add this to my framework. Have you found pmem
issues not present on other sections?

> TEST_DIR=/media/test
> SCRATCH_DEV_POOL="/dev/pmem1"
> SCRATCH_MNT=/media/scratch
> RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
> MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'

OK so you just repeat the above options vervbatim but for pmem.
Correct?

Any reason you don't name the sections with more finer granularity?
It would help me in ensuring when we revise both of tests we can more
easily ensure we're talking about apples, pears, or bananas.

FWIW, I run two different bare metal hosts now, and each has a VM guest
per section above. One host I use for tracking stable, the other host for
my changes. This ensures I don't mess things up easier and I can re-test
any time fast.

I dedicate a VM guest to test *one* section. I do this with oscheck
easily:

./oscheck.sh --test-section xfs_nocrc | tee log-xfs-4.19.18+

For instance will just test xfs_nocrc section. On average each section
takes about 1 hour to run.

I could run the tests on raw nvme and do away with the guests, but
that loses some of my ability to debug on crashes easily and out to
baremetal.. but curious, how long do your tests takes? How about per
section? Say just the default "xfs" section?

IIRC you also had your system on hyperV :) so maybe you can still debug
easily on crashes.

  Luis
Sasha Levin Feb. 9, 2019, 5:53 p.m. UTC | #12
On Sat, Feb 09, 2019 at 08:29:21AM +1100, Dave Chinner wrote:
>On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
>> On Thu, Feb 07, 2019 at 08:54:54AM +1100, Dave Chinner wrote:
>> >On Tue, Feb 05, 2019 at 11:05:59PM -0500, Sasha Levin wrote:
>> >>On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
>> >>>On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
>> >>>>Kernel stable team,
>> >>>>
>> >>>>here is a v2 respin of my XFS stable patches for v4.19.y. The only
>> >>>>change in this series is adding the upstream commit to the commit log,
>> >>>>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
>> >>>>were spotted or raised with this series.
>> >>>>
>> >>>>Reviews, questions, or rants are greatly appreciated.
>> >>>
>> >>>Test results?
>> >>>
>> >>>The set of changes look fine themselves, but as always, the proof is
>> >>>in the testing...
>> >>
>> >>Luis noted on v1 that it passes through his oscheck test suite, and I
>> >>noted that I haven't seen any regression with the xfstests scripts I
>> >>have.
>> >>
>> >>What sort of data are you looking for beyond "we didn't see a
>> >>regression"?
>> >
>> >Nothing special, just a summary of what was tested so we have some
>> >visibility of whether the testing covered the proposed changes
>> >sufficiently.  i.e. something like:
>> >
>> >	Patchset was run through ltp and the fstests "auto" group
>> >	with the following configs:
>> >
>> >	- mkfs/mount defaults
>> >	- -m reflink=1,rmapbt=1
>> >	- -b size=1k
>> >	- -m crc=0
>> >	....
>> >
>> >	No new regressions were reported.
>> >
>> >
>> >Really, all I'm looking for is a bit more context for the review
>> >process - nobody remembers what configs other people test. However,
>> >it's important in reviewing a backport to know whether a backport to
>> >a fix, say, a bug in the rmap code actually got exercised by the
>> >tests on an rmap enabled filesystem...
>>
>> Sure! Below are the various configs this was run against. There were
>> multiple runs over 48+ hours and no regressions from a 4.14.17 baseline
>> were observed.
>
>Thanks, Sasha. As an ongoing thing, I reckon a "grep _OPTIONS
><config_files>" (catches both mkfs and mount options) would be
>sufficient as a summary of what was tested in the series
>decription...

Will do.

--
Thanks,
Sasha
Sasha Levin Feb. 9, 2019, 9:56 p.m. UTC | #13
On Fri, Feb 08, 2019 at 02:17:26PM -0800, Luis Chamberlain wrote:
>On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
>> Sure! Below are the various configs this was run against. There were
>> multiple runs over 48+ hours and no regressions from a 4.14.17 baseline
>> were observed.
>
>In an effort to consolidate our sections:
>
>> [default]
>> TEST_DEV=/dev/nvme0n1p1
>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
>> MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'
>
>This matches my "xfs" section.
>
>> USE_EXTERNAL=no
>> LOGWRITES_DEV=/dev/nve0n1p3
>> FSTYP=xfs
>>
>>
>> [default]
>> TEST_DEV=/dev/nvme0n1p1
>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
>> MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1,'
>
>This matches my "xfs_reflink"
>
>> USE_EXTERNAL=no
>> LOGWRITES_DEV=/dev/nvme0n1p3
>> FSTYP=xfs
>>
>>
>> [default]
>> TEST_DEV=/dev/nvme0n1p1
>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
>> MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
>
>This matches my "xfs_reflink_1024" section.
>
>> USE_EXTERNAL=no
>> LOGWRITES_DEV=/dev/nvme0n1p3
>> FSTYP=xfs
>>
>>
>> [default]
>> TEST_DEV=/dev/nvme0n1p1
>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
>> MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
>
>This matches my "xfs_nocrc" section.
>
>> USE_EXTERNAL=no
>> LOGWRITES_DEV=/dev/nvme0n1p3
>> FSTYP=xfs
>>
>>
>> [default]
>> TEST_DEV=/dev/nvme0n1p1
>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/nvme0n1p2"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)
>> MKFS_OPTIONS='-f -m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'
>
>This matches my "xfs_nocrc_512" section.
>
>> USE_EXTERNAL=no
>> LOGWRITES_DEV=/dev/nvme0n1p3
>> FSTYP=xfs
>>
>>
>> [default_pmem]
>> TEST_DEV=/dev/pmem0
>
>I'll have to add this to my framework. Have you found pmem
>issues not present on other sections?

Originally I've added this because the xfs folks suggested that pmem vs
block exercises very different code paths and we should be testing both
of them.

Looking at the baseline I have, it seems that there are differences
between the failing tests. For example, with "MKFS_OPTIONS='-f -m
crc=1,reflink=0,rmapbt=0, -i sparse=0'", generic/524 seems to fail on
pmem but not on block.

>> TEST_DIR=/media/test
>> SCRATCH_DEV_POOL="/dev/pmem1"
>> SCRATCH_MNT=/media/scratch
>> RESULT_BASE=$PWD/results/$HOST/$(uname -r)-pmem
>> MKFS_OPTIONS='-f -m crc=1,reflink=0,rmapbt=0, -i sparse=0'
>
>OK so you just repeat the above options vervbatim but for pmem.
>Correct?

Right.

>Any reason you don't name the sections with more finer granularity?
>It would help me in ensuring when we revise both of tests we can more
>easily ensure we're talking about apples, pears, or bananas.

Nope, I'll happily rename them if there are "official" names for it :)

>FWIW, I run two different bare metal hosts now, and each has a VM guest
>per section above. One host I use for tracking stable, the other host for
>my changes. This ensures I don't mess things up easier and I can re-test
>any time fast.
>
>I dedicate a VM guest to test *one* section. I do this with oscheck
>easily:
>
>./oscheck.sh --test-section xfs_nocrc | tee log-xfs-4.19.18+
>
>For instance will just test xfs_nocrc section. On average each section
>takes about 1 hour to run.

We have a similar setup then. I just spawn the VM on azure for each
section and run them all in parallel that way.

I thought oscheck runs everything on a single VM, is it a built in
mechanism to spawn a VM for each config? If so, I can add some code in
to support azure and we can use the same codebase.

>I could run the tests on raw nvme and do away with the guests, but
>that loses some of my ability to debug on crashes easily and out to
>baremetal.. but curious, how long do your tests takes? How about per
>section? Say just the default "xfs" section?

I think that the longest config takes about 5 hours, otherwise
everything tends to take about 2 hours.

I basically run these on "repeat" until I issue a stop order, so in a
timespan of 48 hours some configs run ~20 times and some only ~10.

>IIRC you also had your system on hyperV :) so maybe you can still debug
>easily on crashes.
>
>  Luis
Sasha Levin Feb. 10, 2019, 12:06 a.m. UTC | #14
On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
>Kernel stable team,
>
>here is a v2 respin of my XFS stable patches for v4.19.y. The only
>change in this series is adding the upstream commit to the commit log,
>and I've now also Cc'd stable@vger.kernel.org as well. No other issues
>were spotted or raised with this series.
>
>Reviews, questions, or rants are greatly appreciated.
>
>  Luis
>
>Brian Foster (1):
>  xfs: fix shared extent data corruption due to missing cow reservation
>
>Carlos Maiolino (1):
>  xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
>
>Christoph Hellwig (1):
>  xfs: cancel COW blocks before swapext
>
>Christophe JAILLET (1):
>  xfs: Fix error code in 'xfs_ioc_getbmap()'
>
>Darrick J. Wong (1):
>  xfs: fix PAGE_MASK usage in xfs_free_file_space
>
>Dave Chinner (3):
>  xfs: fix overflow in xfs_attr3_leaf_verify
>  xfs: fix transient reference count error in
>    xfs_buf_resubmit_failed_buffers
>  xfs: delalloc -> unwritten COW fork allocation can go wrong
>
>Eric Sandeen (1):
>  xfs: fix inverted return from xfs_btree_sblock_verify_crc
>
>Ye Yin (1):
>  fs/xfs: fix f_ffree value for statfs when project quota is set
>
> fs/xfs/libxfs/xfs_attr_leaf.c | 11 +++++++++--
> fs/xfs/libxfs/xfs_bmap.c      |  5 ++++-
> fs/xfs/libxfs/xfs_btree.c     |  2 +-
> fs/xfs/xfs_bmap_util.c        | 10 ++++++++--
> fs/xfs/xfs_buf_item.c         | 28 +++++++++++++++++++++-------
> fs/xfs/xfs_ioctl.c            |  2 +-
> fs/xfs/xfs_qm_bhv.c           |  2 +-
> fs/xfs/xfs_reflink.c          |  1 +
> fs/xfs/xfs_stats.c            |  2 +-
> 9 files changed, 47 insertions(+), 16 deletions(-)

Queued for 4.19, thank you.

--
Thanks,
Sasha
Dave Chinner Feb. 10, 2019, 10:12 p.m. UTC | #15
On Fri, Feb 08, 2019 at 01:50:57PM -0800, Luis Chamberlain wrote:
> On Sat, Feb 09, 2019 at 08:32:01AM +1100, Dave Chinner wrote:
> > On Fri, Feb 08, 2019 at 11:48:29AM -0800, Luis Chamberlain wrote:
> > > On Wed, Feb 06, 2019 at 09:06:55AM +1100, Dave Chinner wrote:
> > > > On Mon, Feb 04, 2019 at 08:54:17AM -0800, Luis Chamberlain wrote:
> > > > > Kernel stable team,
> > > > > 
> > > > > here is a v2 respin of my XFS stable patches for v4.19.y. The only
> > > > > change in this series is adding the upstream commit to the commit log,
> > > > > and I've now also Cc'd stable@vger.kernel.org as well. No other issues
> > > > > were spotted or raised with this series.
> > > > > 
> > > > > Reviews, questions, or rants are greatly appreciated.
> > > > 
> > > > Test results?
> > > > 
> > > > The set of changes look fine themselves, but as always, the proof is
> > > > in the testing...
> > > 
> > > I've first established a baseline for v4.19.18 with fstests using
> > > a series of different sections to test against. I annotated the
> > > failures on an expunge list and then use that expunge list to confirm
> > > no regressions -- no failures if we skip the failures already known for
> > > v4.19.18.
> > > 
> > > Each different configuration I test against I use a section for. I only
> > > test x86_64 for now but am starting to create a baseline for ppc64le.
> > > 
> > > The sections I use:
> > > 
> > >   * xfs
> > >   * xfs_nocrc
> > >   * xfs_nocrc_512
> > >   * xfs_reflink
> > >   * xfs_reflink_1024
> > >   * xfs_logdev
> > >   * xfs_realtimedev
> > 
> > Yup, that seems to cover most common things :)
> 
> To be clear in the future I hope to also have a baseline for:
> 
>   * xfs_bigblock
> 
> But that is *currently* [0] only possible on the following architectures
> with the respective kernel config:
> 
> aarch64:
> CONFIG_ARM64_64K_PAGES=y
> 
> ppc64le:
> CONFIG_PPC_64K_PAGES=y
> 
> [0] Someone is working on 64k pages on x86 I think?

Yup, I am, but that got derailed by wanting fsx coverage w/
dedup/clone/copy_file_range before going any further with it. That
was one of the triggers that lead to finding all those data
corruption and API problems late last year...

Cheers,

Dave.
Luis Chamberlain Feb. 11, 2019, 7:46 p.m. UTC | #16
On Sat, Feb 09, 2019 at 04:56:27PM -0500, Sasha Levin wrote:
> On Fri, Feb 08, 2019 at 02:17:26PM -0800, Luis Chamberlain wrote:
> > On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
> > Have you found pmem
> > issues not present on other sections?
> 
> Originally I've added this because the xfs folks suggested that pmem vs
> block exercises very different code paths and we should be testing both
> of them.
> 
> Looking at the baseline I have, it seems that there are differences
> between the failing tests. For example, with "MKFS_OPTIONS='-f -m
> crc=1,reflink=0,rmapbt=0, -i sparse=0'",

That's my "xfs" section.

> generic/524 seems to fail on pmem but not on block.

This is useful thanks! Can you get the failure rate? How often does it
fail when you run the test? Always? Does it *never* fail on block? How
many consecutive runs did you have run on block?

To help with this oscheck has naggy-check.sh, you could run it until
a failure is hit:

./naggy-check.sh -f -s xfs generic/524

And on another host:

./naggy-check.sh -f -s xfs_pmem generic/524

> > Any reason you don't name the sections with more finer granularity?
> > It would help me in ensuring when we revise both of tests we can more
> > easily ensure we're talking about apples, pears, or bananas.
> 
> Nope, I'll happily rename them if there are "official" names for it :)

Well since I am pushing out the stable fixes and am using oscheck to
be transparent about how I test and what I track, and since I'm using
section names, yes it would be useful to me. Simply adding a _pmem
postfix to the pmem ones would suffice.

> > FWIW, I run two different bare metal hosts now, and each has a VM guest
> > per section above. One host I use for tracking stable, the other host for
> > my changes. This ensures I don't mess things up easier and I can re-test
> > any time fast.
> > 
> > I dedicate a VM guest to test *one* section. I do this with oscheck
> > easily:
> > 
> > ./oscheck.sh --test-section xfs_nocrc | tee log-xfs-4.19.18+
> > 
> > For instance will just test xfs_nocrc section. On average each section
> > takes about 1 hour to run.
> 
> We have a similar setup then. I just spawn the VM on azure for each
> section and run them all in parallel that way.

Indeed.

> I thought oscheck runs everything on a single VM,

By default it does.

> is it a built in
> mechanism to spawn a VM for each config?

Yes:

./oscheck.sh --test-section xfs_nocrc_512

For instance will test section xfs_nocrc_512 *only* on that host.

> If so, I can add some code in
> to support azure and we can use the same codebase.

Groovy. I believe the next step will if you can send me your delta
of expunges, and then I can run naggy-check.sh on them to see if I
can reach similar results. I believe you have a larger expunge list.
I suspect some of this may you may not have certain quirks handled.
We will see. But getting this right and to sync our testing should
yield good confirmation of failures.

> > I could run the tests on raw nvme and do away with the guests, but
> > that loses some of my ability to debug on crashes easily and out to
> > baremetal.. but curious, how long do your tests takes? How about per
> > section? Say just the default "xfs" section?
> 
> I think that the longest config takes about 5 hours, otherwise
> everything tends to take about 2 hours.

Oh wow, mine are only 1 hour each. Guess I got a decent rig now :)

> I basically run these on "repeat" until I issue a stop order, so in a
> timespan of 48 hours some configs run ~20 times and some only ~10.

I see... so you iterate over all tests and many times a day and this is
how you've built your expunge list. Correct?

It could could explain how you may end up with a larger set. This can
mean some tests only fail at a non-100% failure rate, for these I'm
annotating the failure rate as a comment on each expunge line. Having a
consistent format for this and proper agreed upon term would be good.
Right now I just mention how oftem I have to run a test before reaching
a failure.  This provides a rough estimate how many times one should
iterate running the test in a loop before detecting a failure. Of course
this may not always be acurate, given systems vary and this could play
an impact on the failure... but at least it provides some guidance. It
would be curious to see if we end up with similar failure rates for
tests don't always fail. And if there is a divergence, how big this
could be.

  Luis
Luis Chamberlain Feb. 11, 2019, 8:09 p.m. UTC | #17
On Fri, Feb 8, 2019 at 1:48 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> Perhaps worth noting which was curious is that I could not get to
> trigger generic/464 on sections xfs_nocrc_512 and xfs_reflink_1024.

Well I just hit a failure for generic/464 on 4.19.17 after ~3996 runs
for the xfs_nocrc_512 section, and after 7382 runs for
xfs_refkilnk_1024.
I've updated the expunge list to reflect the difficult to hit failure
of generic/464 and its failure rate on xfs_nocrc_512 on oscheck.

 Luis