diff mbox

overlay: stress test changes to top and bottom layers simultaneously

Message ID 20161208035844.18544-1-tytso@mit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Theodore Ts'o Dec. 8, 2016, 3:58 a.m. UTC
Introduce a test which runs fsstress on the top and bottom overlayfs
directories simultaneously to find potential races that plagued wrapfs
derived file systems.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 tests/overlay/017     | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/overlay/017.out |  2 ++
 tests/overlay/group   |  1 +
 3 files changed, 84 insertions(+)
 create mode 100755 tests/overlay/017
 create mode 100644 tests/overlay/017.out

Comments

Eryu Guan Dec. 8, 2016, 4:37 a.m. UTC | #1
On Wed, Dec 07, 2016 at 10:58:44PM -0500, Theodore Ts'o wrote:
> Introduce a test which runs fsstress on the top and bottom overlayfs
> directories simultaneously to find potential races that plagued wrapfs
> derived file systems.

I'm not sure if overlayfs supports doing changes to lowerdir when
overlay is mounted. cc'ed linux-unionfs@vger.kernel.org for some inputs.

But from kernel Documentation:

"Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock."

So at least kernel should not crash or deadlock.

> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  tests/overlay/017     | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/overlay/017.out |  2 ++
>  tests/overlay/group   |  1 +
>  3 files changed, 84 insertions(+)
>  create mode 100755 tests/overlay/017
>  create mode 100644 tests/overlay/017.out
> 
> diff --git a/tests/overlay/017 b/tests/overlay/017
> new file mode 100755
> index 00000000..1dbde766
> --- /dev/null
> +++ b/tests/overlay/017
> @@ -0,0 +1,81 @@
> +#! /bin/bash
> +# FS QA Test 017
> +#
> +# Run fsstress on lower dir and top dir at the same time
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +_supported_fs overlay
> +_supported_os Linux
> +_require_scratch
> +
> +# Remove all files from previous tests
> +_scratch_mkfs
> +
> +lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
> +mkdir -p $lowerdir
> +
> +_scratch_mount
> +
> +echo "Silence is golden"
> +
> +d_low=$lowerdir/fsstress
> +d_top=$SCRATCH_MNT/fsstress
> +mkdir -p $d_low $d_top

I think you need to create $d_low before mounting overlayfs. As online
changes to lower dir is not supported. e.g.

lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
d_low=$lowerdir/fsstress
d_top=$SCRATCH_MNT/fsstress
mkdir -p $d_low

_scratch_mount

> +
> +echo $FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v > $seqres.full.1
> +$FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v >> $seqres.full.1 2>&1 &
> +
> +echo $FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v > $seqres.full.2
> +$FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v >> $seqres.full.2 2>&1 &
> +
> +ret=0
> +if ! wait %1
> +then
> +        echo "--------------------------------------"       >>$seqres.full.1
> +	echo "fsstress on lower directory returned $? - see $seqres.full.1"
> +        echo "--------------------------------------"       >>$seqres.full.1
> +	ret=1
> +fi
> +
> +if ! wait %2
> +then
> +        echo "--------------------------------------"       >>$seqres.full.2
> +	echo "fsstress on overlay directory returned $? - see $seqres.full.2"
> +        echo "--------------------------------------"       >>$seqres.full.2
> +	ret=1
> +fi

I'm not sure if we can depend on the return status from fsstress, as the
documentation says this results in undefined behavior. I think we can
skip all the return value check and set status to 0 and exit as long as
fsstress processes exit and don't crash or deadlock the kernel.

> +
> +cat $seqres.full.1 $seqres.full.2 > $seqres.full
> +rm $seqres.full.1 $seqres.full.2
> +
> +if [ "$ret" -eq 1 ]
> +then
> +	status=1
> +else
> +	status=0
> +fi
> +
> +exit $status
> diff --git a/tests/overlay/017.out b/tests/overlay/017.out
> new file mode 100644
> index 00000000..82228448
> --- /dev/null
> +++ b/tests/overlay/017.out
> @@ -0,0 +1,2 @@
> +QA output created by 017
> +Silence is golden
> diff --git a/tests/overlay/group b/tests/overlay/group
> index 5740d2a2..b310aebe 100644
> --- a/tests/overlay/group
> +++ b/tests/overlay/group
> @@ -19,3 +19,4 @@
>  014 auto quick copyup
>  015 auto quick whiteout
>  016 auto quick
> +017 auto quick

It runs for 10 minutes, not a "quick" test :)

Thanks,
Eryu
> -- 
> 2.11.0.rc0.7.gbe5a750
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Dec. 8, 2016, 6:37 a.m. UTC | #2
On Thu, Dec 8, 2016 at 6:37 AM, Eryu Guan <eguan@redhat.com> wrote:
> On Wed, Dec 07, 2016 at 10:58:44PM -0500, Theodore Ts'o wrote:
>> Introduce a test which runs fsstress on the top and bottom overlayfs
>> directories simultaneously to find potential races that plagued wrapfs
>> derived file systems.
>

Good idea! Ted,
do you have a reference for said races in past fs?
I am asking because it could be useful to focus on more specific
concurrent access patterns.

I think it is also very relevant to check for race with upper vs. top.
lower vs. top gives you only write/read races.
upper vs. top gives you also write/write races.

I am not that familiar with fsstress modes, but I suppose it is possible
to setup a work set in lower before starting the concurrent access test.
I have a feeling that would be beneficial to catching yet another class of bugs.

> I'm not sure if overlayfs supports doing changes to lowerdir when
> overlay is mounted. cc'ed linux-unionfs@vger.kernel.org for some inputs.
>
> But from kernel Documentation:
>
> "Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
>
> So at least kernel should not crash or deadlock.
>

Yes, that is the only thing that this test should be verifying.

>>
>> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>> ---
>>  tests/overlay/017     | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/overlay/017.out |  2 ++
>>  tests/overlay/group   |  1 +
>>  3 files changed, 84 insertions(+)
>>  create mode 100755 tests/overlay/017
>>  create mode 100644 tests/overlay/017.out
>>
>> diff --git a/tests/overlay/017 b/tests/overlay/017
>> new file mode 100755
>> index 00000000..1dbde766
>> --- /dev/null
>> +++ b/tests/overlay/017
>> @@ -0,0 +1,81 @@
>> +#! /bin/bash
>> +# FS QA Test 017
>> +#
>> +# Run fsstress on lower dir and top dir at the same time
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1     # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +     cd /
>> +     rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +_supported_fs overlay
>> +_supported_os Linux
>> +_require_scratch
>> +
>> +# Remove all files from previous tests
>> +_scratch_mkfs
>> +
>> +lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
>> +mkdir -p $lowerdir
>> +
>> +_scratch_mount
>> +
>> +echo "Silence is golden"
>> +
>> +d_low=$lowerdir/fsstress
>> +d_top=$SCRATCH_MNT/fsstress
>> +mkdir -p $d_low $d_top
>
> I think you need to create $d_low before mounting overlayfs. As online
> changes to lower dir is not supported. e.g.
>
> lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
> d_low=$lowerdir/fsstress
> d_top=$SCRATCH_MNT/fsstress
> mkdir -p $d_low
>
> _scratch_mount
>

Correct. Although technically, the existence of $d_low is only checked
on the first access to $d_top.

>> +
>> +echo $FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v > $seqres.full.1
>> +$FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v >> $seqres.full.1 2>&1 &
>> +
>> +echo $FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v > $seqres.full.2
>> +$FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v >> $seqres.full.2 2>&1 &
>> +
>> +ret=0
>> +if ! wait %1
>> +then
>> +        echo "--------------------------------------"       >>$seqres.full.1
>> +     echo "fsstress on lower directory returned $? - see $seqres.full.1"
>> +        echo "--------------------------------------"       >>$seqres.full.1
>> +     ret=1
>> +fi
>> +
>> +if ! wait %2
>> +then
>> +        echo "--------------------------------------"       >>$seqres.full.2
>> +     echo "fsstress on overlay directory returned $? - see $seqres.full.2"
>> +        echo "--------------------------------------"       >>$seqres.full.2
>> +     ret=1
>> +fi
>
> I'm not sure if we can depend on the return status from fsstress, as the
> documentation says this results in undefined behavior. I think we can
> skip all the return value check and set status to 0 and exit as long as
> fsstress processes exit and don't crash or deadlock the kernel.
>
>> +
>> +cat $seqres.full.1 $seqres.full.2 > $seqres.full
>> +rm $seqres.full.1 $seqres.full.2
>> +
>> +if [ "$ret" -eq 1 ]
>> +then
>> +     status=1
>> +else
>> +     status=0
>> +fi
>> +
>> +exit $status
>> diff --git a/tests/overlay/017.out b/tests/overlay/017.out
>> new file mode 100644
>> index 00000000..82228448
>> --- /dev/null
>> +++ b/tests/overlay/017.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 017
>> +Silence is golden
>> diff --git a/tests/overlay/group b/tests/overlay/group
>> index 5740d2a2..b310aebe 100644
>> --- a/tests/overlay/group
>> +++ b/tests/overlay/group
>> @@ -19,3 +19,4 @@
>>  014 auto quick copyup
>>  015 auto quick whiteout
>>  016 auto quick
>> +017 auto quick
>
> It runs for 10 minutes, not a "quick" test :)
>
> Thanks,
> Eryu
>> --
>> 2.11.0.rc0.7.gbe5a750
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 8, 2016, 2:16 p.m. UTC | #3
On Thu, Dec 08, 2016 at 08:37:58AM +0200, Amir Goldstein wrote:
> do you have a reference for said races in past fs?
> I am asking because it could be useful to focus on more specific
> concurrent access patterns.

The thing which brought this up was the the equivalent test scenario
was causing crashes in a kernel implementation of sdcardfs.  Sdcardfs
is this horrible hack, originally implemented in FUSE, that (a)
provides case insensitive lookups, and (b) implements Android's
application-centric permissions system.  Someone (I think at Samsung)
tried to implement this in the kernel using wrapfs[1] as a base.

[1] http://wrapfs.filesystems.org/

It turns out that if you run fsstress on the top and bottom
directories, it will crash very quickly.  This was reproduced not just
on sdcardfs, but on the base wrapfs.  For more details, please see [2].
(BTW, this is not terribly surprising because Al Viro has made fairly
disparaging comments about wrapfs-based file systems in the past.)

[2] http://www.fsl.cs.sunysb.edu/pipermail/wrapfs/2016-October/000140.html

Since we also have some users of overlayfs in a product capacity, I
whipped up a test to make sure that overlayfs didn't suffer from the
same problem, and I was glad to see that it did _not_ crash or hang,
as advertised.

Since the use of unionfs is deprecated, I didn't bother to test it,
although I could if someone was really curious whether it would BUG or
not.  (Given that both wrapfs and sdcardfs did BUG, I'm pretty that
unionfs, as a wrapfs derivitive, also would go down in flames.  It
might also be fun, since there are some Ubuntu Docker users using
AUFS, to see if AUFS --- as another wrapfs derivitive --- might also
be succeptible.)

> I am not that familiar with fsstress modes, but I suppose it is possible
> to setup a work set in lower before starting the concurrent access test.
> I have a feeling that would be beneficial to catching yet another class of bugs.

The trick here is that we are starting fsstress with the same seed in
the top and bottom directories, at the same time.  That means the same
pathnames are likely to be modified at the top and bottom, leading to
races both in the VFS and in the overlay file system.

> > But from kernel Documentation:
> >
> > "Changes to the underlying filesystems while part of a mounted overlay
> > filesystem are not allowed.  If the underlying filesystem is changed,
> > the behavior of the overlay is undefined, though it will not result in
> > a crash or deadlock."
> >
> > So at least kernel should not crash or deadlock.
> 
> Yes, that is the only thing that this test should be verifying.

That's why I wrote this test, in fact.  :-)

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Dec. 8, 2016, 2:27 p.m. UTC | #4
On Thu, Dec 8, 2016 at 6:37 AM, Eryu Guan <eguan@redhat.com> wrote:
> On Wed, Dec 07, 2016 at 10:58:44PM -0500, Theodore Ts'o wrote:
...
>> diff --git a/tests/overlay/group b/tests/overlay/group
>> index 5740d2a2..b310aebe 100644
>> --- a/tests/overlay/group
>> +++ b/tests/overlay/group
>> @@ -19,3 +19,4 @@
>>  014 auto quick copyup
>>  015 auto quick whiteout
>>  016 auto quick
>> +017 auto quick
>
> It runs for 10 minutes, not a "quick" test :)
>

Yes better make it "stress".

and FYI, I was so overwhelmed with how much this test grinds my laptop's
CPU that I rushed to fix the command:

./check -overlay -x stress overlay/???

Which seems to have been broken forever.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. R. Okajima Dec. 9, 2016, 5:30 p.m. UTC | #5
"Theodore Ts'o":
> Since the use of unionfs is deprecated, I didn't bother to test it,
> although I could if someone was really curious whether it would BUG or
> not.  (Given that both wrapfs and sdcardfs did BUG, I'm pretty that
> unionfs, as a wrapfs derivitive, also would go down in flames.  It
> might also be fun, since there are some Ubuntu Docker users using
> AUFS, to see if AUFS --- as another wrapfs derivitive --- might also
> be succeptible.)

If you mean that aufs is based upon wrapfs or unionfs, then it is not
true. aufs1 was based upon unionfs acutally, but aufs2 and later are
totally re-written and added many new approaches and ideas.

Some of those new approaches are opposite of overlayfs' ones.
For example,
- overlayfs added d_real_inode(), lock_inode() and others into VFS. They
  aim to get the layers' inode via overlayfs' dentry. Honestly speaking
  I am afraid they confuse people who were calling d_inode().
- aufs aims to be an ordinary filesystem against VFS, and VFS uses its
  dentry as usual. Acessing the layers' inode is totally hidden within
  aufs and VFS doesn't care about it.
  One exception aufs modifies VFS (actually MM module) is vma_struct
  which contains a file pointer to point the file mmapped. Aufs adds
  another file pointer to show the path under procfs.

Also aufs allows users to modify the files on the layers direcly,
ie. bypassing aufs. Next time when user accesses those files, aufs
re-validates it and detect the changes. Additioanlly aufs has
another option to discard the caches when the files on the layers
changed directly.


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/overlay/017 b/tests/overlay/017
new file mode 100755
index 00000000..1dbde766
--- /dev/null
+++ b/tests/overlay/017
@@ -0,0 +1,81 @@ 
+#! /bin/bash
+# FS QA Test 017
+#
+# Run fsstress on lower dir and top dir at the same time
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs overlay
+_supported_os Linux
+_require_scratch
+
+# Remove all files from previous tests
+_scratch_mkfs
+
+lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
+mkdir -p $lowerdir
+
+_scratch_mount
+
+echo "Silence is golden"
+
+d_low=$lowerdir/fsstress
+d_top=$SCRATCH_MNT/fsstress
+mkdir -p $d_low $d_top
+
+echo $FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v > $seqres.full.1
+$FSSTRESS_PROG -s 42 -d $d_low -p 4 -n 1000 -l100 -v >> $seqres.full.1 2>&1 &
+
+echo $FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v > $seqres.full.2
+$FSSTRESS_PROG -s 42 -d $d_top -p 4 -n 1000 -l100 -v >> $seqres.full.2 2>&1 &
+
+ret=0
+if ! wait %1
+then
+        echo "--------------------------------------"       >>$seqres.full.1
+	echo "fsstress on lower directory returned $? - see $seqres.full.1"
+        echo "--------------------------------------"       >>$seqres.full.1
+	ret=1
+fi
+
+if ! wait %2
+then
+        echo "--------------------------------------"       >>$seqres.full.2
+	echo "fsstress on overlay directory returned $? - see $seqres.full.2"
+        echo "--------------------------------------"       >>$seqres.full.2
+	ret=1
+fi
+
+cat $seqres.full.1 $seqres.full.2 > $seqres.full
+rm $seqres.full.1 $seqres.full.2
+
+if [ "$ret" -eq 1 ]
+then
+	status=1
+else
+	status=0
+fi
+
+exit $status
diff --git a/tests/overlay/017.out b/tests/overlay/017.out
new file mode 100644
index 00000000..82228448
--- /dev/null
+++ b/tests/overlay/017.out
@@ -0,0 +1,2 @@ 
+QA output created by 017
+Silence is golden
diff --git a/tests/overlay/group b/tests/overlay/group
index 5740d2a2..b310aebe 100644
--- a/tests/overlay/group
+++ b/tests/overlay/group
@@ -19,3 +19,4 @@ 
 014 auto quick copyup
 015 auto quick whiteout
 016 auto quick
+017 auto quick