[v3] generic: concurrent IO test with mixed IO types
diff mbox

Message ID 1434014273-1059-1-git-send-email-eguan@redhat.com
State New
Headers show

Commit Message

Eryu Guan June 11, 2015, 9:17 a.m. UTC
Test concurrent buffered I/O, DIO, AIO, mmap I/O and splice I/O on the
same files.

Signed-off-by: Eryu Guan <eguan@redhat.com>
---

This fio job file has been proven to be potent, it triggers WARNINGs on ext4
and xfs with 4.1-rc6 kernel.

ext4: WARNING: at fs/ext4/inode.c:1328
xfs: WARNING: CPU: 7 PID: 3090 at fs/xfs/xfs_file.c:726 xfs_file_dio_aio_write+0x176/0x2a8 [xfs]()

The ext4 issue should be fixed by Lukas's patch
ext4: fix reservation release on invalidatepage for delalloc fs

And it ever paniced kernel in mm code and hung xfs.

I reduced the numjobs and iodepth to reduce the test time(~25s on my test host)
and scale them by $LOAD_FACTOR. And it still could trigger the warning on ext4
and xfs with reduced workload.

v2:
- use mktemp to create tmp fio job file
v3:
- initialize scratch dev and mount it correctly before test
- roll back to use /tmp/$$ as tmp

 tests/generic/090     | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/090.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 125 insertions(+)
 create mode 100755 tests/generic/090
 create mode 100644 tests/generic/090.out

Comments

Dave Chinner June 17, 2015, 10:15 p.m. UTC | #1
On Thu, Jun 11, 2015 at 05:17:53PM +0800, Eryu Guan wrote:
> Test concurrent buffered I/O, DIO, AIO, mmap I/O and splice I/O on the
> same files.
> 
> Signed-off-by: Eryu Guan <eguan@redhat.com>
> ---
> 
> This fio job file has been proven to be potent, it triggers WARNINGs on ext4
> and xfs with 4.1-rc6 kernel.
> 
> ext4: WARNING: at fs/ext4/inode.c:1328
> xfs: WARNING: CPU: 7 PID: 3090 at fs/xfs/xfs_file.c:726 xfs_file_dio_aio_write+0x176/0x2a8 [xfs]()

Ok, so that warning is expected on XFS - that's intentional,
WARN_ONCE() output indicating a data coherency problem has occurred
because of the because the application is mixing buffered/mmap IO
with direct IO on the same file and direct Io has been unable to
cleanly invalidate the cache. i.e.  it's information to us
developers explaining why the user is complaining about data
corruption....

So this test is never going to pass on XFS unless you tell the test
harness to ignore the dmesg output...

> And it ever paniced kernel in mm code and hung xfs.

The "hung XFS" case will probably be the pipe mutex inversion
problem in the generic splice code. i.e.

.splice_read -> xfs_file_splice_read -> IOLOCK_SHARED ->
generic_file_splice_read -> splice_to_pipe -> pipe_lock()

vs:

iter_file_splice_write -> pipe_lock() -> vfs_iter_write ->
xfs_file_write_iter -> xfs_file_buffered_aio_write -> IOLOCK_EXCL

Can you confirm this? If so, there's not much we can actually do
about this - the recent big splice rewrite replaced the
pipe_lock/i_mutex inversion deadlock with a different pipe_lock
inversion deadlock....

> diff --git a/tests/generic/group b/tests/generic/group
> index 0c8964c..2e534a5 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -92,6 +92,7 @@
>  087 perms auto quick
>  088 perms auto quick
>  089 metadata auto
> +090 auto rw stress

Hence I'm not sure "auto" is the correct group here. "dangerous" is
more likely because it is exercising a problem we can't fix and will
prevent the auto test group from making progress past this test.

Cheers,

Dave.
Eryu Guan June 18, 2015, 3:04 a.m. UTC | #2
On Thu, Jun 18, 2015 at 08:15:25AM +1000, Dave Chinner wrote:
> On Thu, Jun 11, 2015 at 05:17:53PM +0800, Eryu Guan wrote:
> > Test concurrent buffered I/O, DIO, AIO, mmap I/O and splice I/O on the
> > same files.
> > 
> > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > ---
> > 
> > This fio job file has been proven to be potent, it triggers WARNINGs on ext4
> > and xfs with 4.1-rc6 kernel.
> > 
> > ext4: WARNING: at fs/ext4/inode.c:1328
> > xfs: WARNING: CPU: 7 PID: 3090 at fs/xfs/xfs_file.c:726 xfs_file_dio_aio_write+0x176/0x2a8 [xfs]()
> 
> Ok, so that warning is expected on XFS - that's intentional,
> WARN_ONCE() output indicating a data coherency problem has occurred
> because of the because the application is mixing buffered/mmap IO
> with direct IO on the same file and direct Io has been unable to
> cleanly invalidate the cache. i.e.  it's information to us
> developers explaining why the user is complaining about data
> corruption....
> 
> So this test is never going to pass on XFS unless you tell the test
> harness to ignore the dmesg output...

I can send a v4 to disable dmesg check if FSTYP is xfs, but that will
ignore any other WARNINGs/Oops too, which seems not ideal to me either.

I'm fine to go either way here(disable the dmesg check or not). But I
personally prefer changing the WARN_ON_ONCE to something like xfs_warn()
or xfs_warn_ratelimited() to give out the warning.

> 
> > And it ever paniced kernel in mm code and hung xfs.
> 
> The "hung XFS" case will probably be the pipe mutex inversion
> problem in the generic splice code. i.e.
> 
> .splice_read -> xfs_file_splice_read -> IOLOCK_SHARED ->
> generic_file_splice_read -> splice_to_pipe -> pipe_lock()
> 
> vs:
> 
> iter_file_splice_write -> pipe_lock() -> vfs_iter_write ->
> xfs_file_write_iter -> xfs_file_buffered_aio_write -> IOLOCK_EXCL
> 
> Can you confirm this? If so, there's not much we can actually do
> about this - the recent big splice rewrite replaced the
> pipe_lock/i_mutex inversion deadlock with a different pipe_lock
> inversion deadlock....

Yes, XFS deadlocks in the splice code with RHEL7.1 kernel but doesn't
deadlock with 4.1-rc[567] kernels (I only confirmed on these kernels
just now), so ...

> 
> > diff --git a/tests/generic/group b/tests/generic/group
> > index 0c8964c..2e534a5 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -92,6 +92,7 @@
> >  087 perms auto quick
> >  088 perms auto quick
> >  089 metadata auto
> > +090 auto rw stress
> 
> Hence I'm not sure "auto" is the correct group here. "dangerous" is
> more likely because it is exercising a problem we can't fix and will
> prevent the auto test group from making progress past this test.

I think the auto group should be fine here.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner June 18, 2015, 11:31 p.m. UTC | #3
On Thu, Jun 18, 2015 at 11:04:31AM +0800, Eryu Guan wrote:
> On Thu, Jun 18, 2015 at 08:15:25AM +1000, Dave Chinner wrote:
> > On Thu, Jun 11, 2015 at 05:17:53PM +0800, Eryu Guan wrote:
> > > Test concurrent buffered I/O, DIO, AIO, mmap I/O and splice I/O on the
> > > same files.
> > > 
> > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > ---
> > > 
> > > This fio job file has been proven to be potent, it triggers WARNINGs on ext4
> > > and xfs with 4.1-rc6 kernel.
> > > 
> > > ext4: WARNING: at fs/ext4/inode.c:1328
> > > xfs: WARNING: CPU: 7 PID: 3090 at fs/xfs/xfs_file.c:726 xfs_file_dio_aio_write+0x176/0x2a8 [xfs]()
> > 
> > Ok, so that warning is expected on XFS - that's intentional,
> > WARN_ONCE() output indicating a data coherency problem has occurred
> > because of the because the application is mixing buffered/mmap IO
> > with direct IO on the same file and direct Io has been unable to
> > cleanly invalidate the cache. i.e.  it's information to us
> > developers explaining why the user is complaining about data
> > corruption....
> > 
> > So this test is never going to pass on XFS unless you tell the test
> > harness to ignore the dmesg output...
> 
> I can send a v4 to disable dmesg check if FSTYP is xfs, but that will
> ignore any other WARNINGs/Oops too, which seems not ideal to me either.

Such conditional output issues are dealt with by adding filters to
the output....

> I'm fine to go either way here(disable the dmesg check or not). But I
> personally prefer changing the WARN_ON_ONCE to something like xfs_warn()
> or xfs_warn_ratelimited() to give out the warning.

History tells us that such warnings get ignored and not reported,
and we lose lots of hair before we find out that the bug reporter
thought it "wasn't important" and so "didn't include it" in any of
the bug reports.

Data coherency problems are important enough that we WARN_ON_ONCE is
justified - it's something we need to know about sooner rather than
later, and it's something that application developers also need to
be aware of. They won't notice an xfs warning in the logs, but they
will notice abort() or some other syslog monitor telling them
there's been a kernel warning....

> > > And it ever paniced kernel in mm code and hung xfs.
> > 
> > The "hung XFS" case will probably be the pipe mutex inversion
> > problem in the generic splice code. i.e.
> > 
> > .splice_read -> xfs_file_splice_read -> IOLOCK_SHARED ->
> > generic_file_splice_read -> splice_to_pipe -> pipe_lock()
> > 
> > vs:
> > 
> > iter_file_splice_write -> pipe_lock() -> vfs_iter_write ->
> > xfs_file_write_iter -> xfs_file_buffered_aio_write -> IOLOCK_EXCL
> > 
> > Can you confirm this? If so, there's not much we can actually do
> > about this - the recent big splice rewrite replaced the
> > pipe_lock/i_mutex inversion deadlock with a different pipe_lock
> > inversion deadlock....
> 
> Yes, XFS deadlocks in the splice code with RHEL7.1 kernel but doesn't
> deadlock with 4.1-rc[567] kernels (I only confirmed on these kernels
> just now), so ...

Oh, ok, so the current upstream is fine; RHEL7 has the
pre-write_iter rewrite of the splice code, so the deadlock must be
of the older variety. We can ignore that, then.

> > > diff --git a/tests/generic/group b/tests/generic/group
> > > index 0c8964c..2e534a5 100644
> > > --- a/tests/generic/group
> > > +++ b/tests/generic/group
> > > @@ -92,6 +92,7 @@
> > >  087 perms auto quick
> > >  088 perms auto quick
> > >  089 metadata auto
> > > +090 auto rw stress
> > 
> > Hence I'm not sure "auto" is the correct group here. "dangerous" is
> > more likely because it is exercising a problem we can't fix and will
> > prevent the auto test group from making progress past this test.
> 
> I think the auto group should be fine here.

If it doesn't fail on current upstream kernels, that will be fine.
If it fails, and there is no likely resolution of the failure in the
forseeable future, then it does not belong in the auto group.

Cheers,

Dave.

Patch
diff mbox

diff --git a/tests/generic/090 b/tests/generic/090
new file mode 100755
index 0000000..e7cca52
--- /dev/null
+++ b/tests/generic/090
@@ -0,0 +1,122 @@ 
+#! /bin/bash
+# FS QA Test generic/090
+#
+# Concurrent mixed I/O (buffer I/O, aiodio, mmap, splice) on the same files
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Red Hat Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+
+iodepth=$((16 * LOAD_FACTOR))
+iodepth_batch=$((8 * LOAD_FACTOR))
+numjobs=$((5 * LOAD_FACTOR))
+fio_config=$tmp.fio
+cat >$fio_config <<EOF
+[global]
+bs=8k
+iodepth=$iodepth
+iodepth_batch=$iodepth_batch
+randrepeat=1
+size=1m
+directory=$SCRATCH_MNT
+numjobs=$numjobs
+[job1]
+ioengine=sync
+bs=1k
+direct=1
+rw=randread
+filename=file1:file2
+[job2]
+ioengine=libaio
+rw=randwrite
+direct=1
+filename=file1:file2
+[job3]
+bs=1k
+ioengine=posixaio
+rw=randwrite
+direct=1
+filename=file1:file2
+[job4]
+ioengine=splice
+direct=1
+rw=randwrite
+filename=file1:file2
+[job5]
+bs=1k
+ioengine=sync
+rw=randread
+filename=file1:file2
+[job6]
+ioengine=posixaio
+rw=randwrite
+filename=file1:file2
+[job7]
+ioengine=splice
+rw=randwrite
+filename=file1:file2
+[job8]
+ioengine=mmap
+rw=randwrite
+bs=1k
+filename=file1:file2
+[job9]
+ioengine=mmap
+rw=randwrite
+direct=1
+filename=file1:file2
+EOF
+# with ioengine=mmap and direct=1, fio requires bs to be at least pagesize,
+# which is a fio built-in var.
+echo 'bs=$pagesize' >> $fio_config
+
+rm -f $seqres.full
+_require_fio $fio_config
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+echo "Silence is golden"
+$FIO_PROG $fio_config >>$seqres.full 2>&1
+
+# all done, expect no hang no oops no fs corruption,
+# _check_dmesg and _check_filesystems will do the check work for us
+status=0
+exit
diff --git a/tests/generic/090.out b/tests/generic/090.out
new file mode 100644
index 0000000..2b5100d
--- /dev/null
+++ b/tests/generic/090.out
@@ -0,0 +1,2 @@ 
+QA output created by 090
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 0c8964c..2e534a5 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -92,6 +92,7 @@ 
 087 perms auto quick
 088 perms auto quick
 089 metadata auto
+090 auto rw stress
 091 rw auto quick
 092 auto quick prealloc
 093 attr cap udf auto