diff mbox

fstests: add generic test to verify xattr replace operations are atomic

Message ID 1415392826-11606-1-git-send-email-fdmanana@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Filipe Manana Nov. 7, 2014, 8:40 p.m. UTC
This test verifies that replacing a xattr's value is an atomic
operation. This is motivated by an issue in btrfs where replacing
a xattr's value wasn't an atomic operation, it consisted of
removing the old value and then inserting the new value in a
btree. This made readers (getxattr and listxattrs) not getting
neither the old nor the new value during a short time window.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 tests/generic/326     | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/326.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 110 insertions(+)
 create mode 100755 tests/generic/326
 create mode 100644 tests/generic/326.out

Comments

Dave Chinner Nov. 9, 2014, 11:45 p.m. UTC | #1
On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
> This test verifies that replacing a xattr's value is an atomic
> operation. This is motivated by an issue in btrfs where replacing
> a xattr's value wasn't an atomic operation, it consisted of
> removing the old value and then inserting the new value in a
> btree. This made readers (getxattr and listxattrs) not getting
> neither the old nor the new value during a short time window.

OK, seems like a good thing to test that the application can only
see the old or the new value.

However, I can't help but wonder about whether the btrfs behaviour
is crash safe as it wasn't designed to be atomic from the ground up.
i.e. if the system crashes half way through a attribute overwrite,
what does btrfs end up with as a result? XFS is guaranteed at a
transactional level to return either the old or the new value,
depending on where in the operaiton the crash occurred, but I'd just
assumed that everyone did attribute replace atomically so it never
occurred to me that it might be an issue...

Perhaps another test involving dm-flakey to trigger errors and
shutdowns whilst in the middle of replacing attributes might be a
good thing?

[....]

> +_cleanup()
> +{
> +	if [ ! -z $setter_pid ]; then
> +		kill $setter_pid &> /dev/null
> +	fi

No need to do this if[] then. You're already redirecting all errors
to /dev/null, so if $setter_pid is empty it will just output the
help string.

> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/attr
> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_attrs
> +
> +rm -f $seqres.full
> +
> +xattr_name="user.something"
> +xattr_value1="foobar"
> +xattr_value2="rabbit_hole"
> +
> +set_xattr_loop()
> +{
> +	local name=$1
> +
> +	local cur_val=$xattr_value1
> +	while true; do
> +		$SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
> +		if [ "$cur_val" == "$xattr_value1" ]; then
> +			cur_val=$xattr_value2
> +		else
> +			cur_val=$xattr_value1
> +		fi
> +	done
> +}
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +test_file="test_xattr_replace"
> +touch $SCRATCH_MNT/$test_file
> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
> +
> +set_xattr_loop $test_file &
> +setter_pid=$!
> +
> +for ((i = 0; i < 1000; i++)); do
> +	xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \
> +		$SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2)
> +	if [ "$xattr_val" != "\"$xattr_value1\"" -a \
> +		"$xattr_val" != "\"$xattr_value2\"" ]; then
> +		_fail "Missing or unexpected xattr value: $xattr_val"
> +	fi

I'd suggest that a filter is a much better way to do this.

name_filter() {
	grep "$xattr_name=" | cut -d '=' -f 2 | \
		sed -e "s/$xattr_value1/GOOD/" \
		    -e "s/$xattr_value2/GOOD/"
}

for (); do
	$GETFATTR_PROG --absolute-names -n $xattr_name \
		$SCRATCH_MNT/$test_file | name_filter
done

And so the output file should just be:

GOOD
GOOD
....

And if it is bad, then it outputs the bad attribute value instead
of "GOOD", and the test fails on the golden output match.

> +kill $setter_pid &> /dev/null
> +unset setter_pid

See above, no need to unset the var.
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/generic/326.out b/tests/generic/326.out
> new file mode 100644
> index 0000000..4ac0db5
> --- /dev/null
> +++ b/tests/generic/326.out
> @@ -0,0 +1,2 @@
> +QA output created by 326
> +Silence is golden

Except when you are using comparisons and "_fail" instead of
filters. ;)

BTW, new tests should be numbere as the first unused number, not at
the tail (i.e generic/036). If there are collisions with other new
tests when I merge them, I'll renumber them at merge time.

>  323 auto aio stress
>  324 auto fsr quick
>  325 auto quick data log
> +326 auto quick xattr

There is no existing xattr group - normally xattr tests are
considered part of the "metadata" group. If you want to introduce an
xattr test group, do it as a separate patch and include all the
tests (across all the test directories) that manipulate xattrs in
some way.

Cheers,

Dave.
Filipe Manana Nov. 10, 2014, 12:49 a.m. UTC | #2
On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
>> This test verifies that replacing a xattr's value is an atomic
>> operation. This is motivated by an issue in btrfs where replacing
>> a xattr's value wasn't an atomic operation, it consisted of
>> removing the old value and then inserting the new value in a
>> btree. This made readers (getxattr and listxattrs) not getting
>> neither the old nor the new value during a short time window.
>
> OK, seems like a good thing to test that the application can only
> see the old or the new value.
>
> However, I can't help but wonder about whether the btrfs behaviour
> is crash safe as it wasn't designed to be atomic from the ground up.
> i.e. if the system crashes half way through a attribute overwrite,
> what does btrfs end up with as a result? XFS is guaranteed at a
> transactional level to return either the old or the new value,
> depending on where in the operaiton the crash occurred, but I'd just
> assumed that everyone did attribute replace atomically so it never
> occurred to me that it might be an issue...

It's crash safe. Both the delete and insert were done in the same
transaction, so a crash in between both operations (or after both and
before the transaction commit) would result in always seeing the old
value (or better saying, the last persisted value by a transaction
commit or fsync).

>
> Perhaps another test involving dm-flakey to trigger errors and
> shutdowns whilst in the middle of replacing attributes might be a
> good thing?
>
> [....]
>
>> +_cleanup()
>> +{
>> +     if [ ! -z $setter_pid ]; then
>> +             kill $setter_pid &> /dev/null
>> +     fi
>
> No need to do this if[] then. You're already redirecting all errors
> to /dev/null, so if $setter_pid is empty it will just output the
> help string.
>
>> +     rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/attr
>> +
>> +# real QA test starts here
>> +_need_to_be_root
>> +_supported_fs generic
>> +_supported_os Linux
>> +_require_scratch
>> +_require_attrs
>> +
>> +rm -f $seqres.full
>> +
>> +xattr_name="user.something"
>> +xattr_value1="foobar"
>> +xattr_value2="rabbit_hole"
>> +
>> +set_xattr_loop()
>> +{
>> +     local name=$1
>> +
>> +     local cur_val=$xattr_value1
>> +     while true; do
>> +             $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
>> +             if [ "$cur_val" == "$xattr_value1" ]; then
>> +                     cur_val=$xattr_value2
>> +             else
>> +                     cur_val=$xattr_value1
>> +             fi
>> +     done
>> +}
>> +
>> +_scratch_mkfs >>$seqres.full 2>&1
>> +_scratch_mount
>> +
>> +test_file="test_xattr_replace"
>> +touch $SCRATCH_MNT/$test_file
>> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
>> +
>> +set_xattr_loop $test_file &
>> +setter_pid=$!
>> +
>> +for ((i = 0; i < 1000; i++)); do
>> +     xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \
>> +             $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2)
>> +     if [ "$xattr_val" != "\"$xattr_value1\"" -a \
>> +             "$xattr_val" != "\"$xattr_value2\"" ]; then
>> +             _fail "Missing or unexpected xattr value: $xattr_val"
>> +     fi
>
> I'd suggest that a filter is a much better way to do this.
>
> name_filter() {
>         grep "$xattr_name=" | cut -d '=' -f 2 | \
>                 sed -e "s/$xattr_value1/GOOD/" \
>                     -e "s/$xattr_value2/GOOD/"
> }
>
> for (); do
>         $GETFATTR_PROG --absolute-names -n $xattr_name \
>                 $SCRATCH_MNT/$test_file | name_filter
> done
>
> And so the output file should just be:
>
> GOOD
> GOOD
> ....
>
> And if it is bad, then it outputs the bad attribute value instead
> of "GOOD", and the test fails on the golden output match.
>
>> +kill $setter_pid &> /dev/null
>> +unset setter_pid
>
> See above, no need to unset the var.
>> +
>> +echo "Silence is golden"
>> +status=0
>> +exit
>> diff --git a/tests/generic/326.out b/tests/generic/326.out
>> new file mode 100644
>> index 0000000..4ac0db5
>> --- /dev/null
>> +++ b/tests/generic/326.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 326
>> +Silence is golden
>
> Except when you are using comparisons and "_fail" instead of
> filters. ;)
>
> BTW, new tests should be numbere as the first unused number, not at
> the tail (i.e generic/036). If there are collisions with other new
> tests when I merge them, I'll renumber them at merge time.
>
>>  323 auto aio stress
>>  324 auto fsr quick
>>  325 auto quick data log
>> +326 auto quick xattr
>
> There is no existing xattr group - normally xattr tests are
> considered part of the "metadata" group. If you want to introduce an
> xattr test group, do it as a separate patch and include all the
> tests (across all the test directories) that manipulate xattrs in
> some way.

Agreed.
V2 following.

Thanks

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Nov. 10, 2014, 1:31 a.m. UTC | #3
On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote:
> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
> >> This test verifies that replacing a xattr's value is an atomic
> >> operation. This is motivated by an issue in btrfs where replacing
> >> a xattr's value wasn't an atomic operation, it consisted of
> >> removing the old value and then inserting the new value in a
> >> btree. This made readers (getxattr and listxattrs) not getting
> >> neither the old nor the new value during a short time window.
> >
> > OK, seems like a good thing to test that the application can only
> > see the old or the new value.
> >
> > However, I can't help but wonder about whether the btrfs behaviour
> > is crash safe as it wasn't designed to be atomic from the ground up.
> > i.e. if the system crashes half way through a attribute overwrite,
> > what does btrfs end up with as a result? XFS is guaranteed at a
> > transactional level to return either the old or the new value,
> > depending on where in the operaiton the crash occurred, but I'd just
> > assumed that everyone did attribute replace atomically so it never
> > occurred to me that it might be an issue...
> 
> It's crash safe. Both the delete and insert were done in the same
> transaction, so a crash in between both operations (or after both and
> before the transaction commit) would result in always seeing the old
> value (or better saying, the last persisted value by a transaction
> commit or fsync).

Alright, so no crash issues because all the modifications are in a
single transaction. However, if both modifications are made in the
same transaction, then this bug implies that a user can read a
metadata object in the btree whilst somethign else is concurrently
modifying it, right? i.e. that there is no serialisation between
inode metadata reads and transactional modification operations?

Cheers,

Dave.
Filipe Manana Nov. 10, 2014, 11:53 a.m. UTC | #4
On Mon, Nov 10, 2014 at 1:31 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote:
>> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
>> >> This test verifies that replacing a xattr's value is an atomic
>> >> operation. This is motivated by an issue in btrfs where replacing
>> >> a xattr's value wasn't an atomic operation, it consisted of
>> >> removing the old value and then inserting the new value in a
>> >> btree. This made readers (getxattr and listxattrs) not getting
>> >> neither the old nor the new value during a short time window.
>> >
>> > OK, seems like a good thing to test that the application can only
>> > see the old or the new value.
>> >
>> > However, I can't help but wonder about whether the btrfs behaviour
>> > is crash safe as it wasn't designed to be atomic from the ground up.
>> > i.e. if the system crashes half way through a attribute overwrite,
>> > what does btrfs end up with as a result? XFS is guaranteed at a
>> > transactional level to return either the old or the new value,
>> > depending on where in the operaiton the crash occurred, but I'd just
>> > assumed that everyone did attribute replace atomically so it never
>> > occurred to me that it might be an issue...
>>
>> It's crash safe. Both the delete and insert were done in the same
>> transaction, so a crash in between both operations (or after both and
>> before the transaction commit) would result in always seeing the old
>> value (or better saying, the last persisted value by a transaction
>> commit or fsync).
>
> Alright, so no crash issues because all the modifications are in a
> single transaction. However, if both modifications are made in the
> same transaction, then this bug implies that a user can read a
> metadata object in the btree whilst somethign else is concurrently
> modifying it, right? i.e. that there is no serialisation between
> inode metadata reads and transactional modification operations?

Nop, not exactly the possibility to read while being modified. The
replace steps were:

lock btree node(s), delete current value from leaf, unlock btree
nodes, lock btree nodes, write new value to leaf, unlock btree nodes

Readers (a getxattr or listxattrs syscall for e.g.), could just after
the first btree node unlocking done by the writer: lock btree nodes,
get current value (none found), unlock btree nodes, and then return
-ENODATA.



>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Dave Chinner Nov. 10, 2014, 7:47 p.m. UTC | #5
On Mon, Nov 10, 2014 at 11:53:00AM +0000, Filipe David Manana wrote:
> On Mon, Nov 10, 2014 at 1:31 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote:
> >> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
> >> >> This test verifies that replacing a xattr's value is an atomic
> >> >> operation. This is motivated by an issue in btrfs where replacing
> >> >> a xattr's value wasn't an atomic operation, it consisted of
> >> >> removing the old value and then inserting the new value in a
> >> >> btree. This made readers (getxattr and listxattrs) not getting
> >> >> neither the old nor the new value during a short time window.
> >> >
> >> > OK, seems like a good thing to test that the application can only
> >> > see the old or the new value.
> >> >
> >> > However, I can't help but wonder about whether the btrfs behaviour
> >> > is crash safe as it wasn't designed to be atomic from the ground up.
> >> > i.e. if the system crashes half way through a attribute overwrite,
> >> > what does btrfs end up with as a result? XFS is guaranteed at a
> >> > transactional level to return either the old or the new value,
> >> > depending on where in the operaiton the crash occurred, but I'd just
> >> > assumed that everyone did attribute replace atomically so it never
> >> > occurred to me that it might be an issue...
> >>
> >> It's crash safe. Both the delete and insert were done in the same
> >> transaction, so a crash in between both operations (or after both and
> >> before the transaction commit) would result in always seeing the old
> >> value (or better saying, the last persisted value by a transaction
> >> commit or fsync).
> >
> > Alright, so no crash issues because all the modifications are in a
> > single transaction. However, if both modifications are made in the
> > same transaction, then this bug implies that a user can read a
> > metadata object in the btree whilst somethign else is concurrently
> > modifying it, right? i.e. that there is no serialisation between
> > inode metadata reads and transactional modification operations?
> 
> Nop, not exactly the possibility to read while being modified. The
> replace steps were:
> 
> lock btree node(s), delete current value from leaf, unlock btree
> nodes, lock btree nodes, write new value to leaf, unlock btree nodes

You missed "transaction start" and "transaction commit", which
define the boundaries of a modification in progress.

> Readers (a getxattr or listxattrs syscall for e.g.), could just after
> the first btree node unlocking done by the writer: lock btree nodes,
> get current value (none found), unlock btree nodes, and then return
> -ENODATA.

Right, and that's a exact demonstration "read while modifying" when
"modifying" means an object has been modified in an uncommitted
transaction. This indicates the transaction failed both the
Atomicity and Isolation properties of an ACID transaction model.

I hope this isn't a common pattern in btrfs -  it worries me just to
know that the btrfs transaction subsytem does not, at minimum, throw
warnings when ACID attributes are violated...

Cheers,

Dave.
diff mbox

Patch

diff --git a/tests/generic/326 b/tests/generic/326
new file mode 100755
index 0000000..c110fc0
--- /dev/null
+++ b/tests/generic/326
@@ -0,0 +1,107 @@ 
+#! /bin/bash
+# FSQA Test No. 326
+#
+# Verify that replacing a xattr's value is an atomic operation.
+# This is motivated by an issue in btrfs where replacing a xattr's value
+# wasn't an atomic operation, it consisted of removing the old value and
+# then inserting the new value in a btree. This made readers (getxattr
+# and listxattrs) not getting neither the old nor the new value during
+# a short time window.
+#
+# The btrfs issue was fixed by the following linux kernel patch:
+#
+#    Btrfs: make xattr replace operations atomic
+#
+#-----------------------------------------------------------------------
+#
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana <fdmanana@suse.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	if [ ! -z $setter_pid ]; then
+		kill $setter_pid &> /dev/null
+	fi
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_attrs
+
+rm -f $seqres.full
+
+xattr_name="user.something"
+xattr_value1="foobar"
+xattr_value2="rabbit_hole"
+
+set_xattr_loop()
+{
+	local name=$1
+
+	local cur_val=$xattr_value1
+	while true; do
+		$SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
+		if [ "$cur_val" == "$xattr_value1" ]; then
+			cur_val=$xattr_value2
+		else
+			cur_val=$xattr_value1
+		fi
+	done
+}
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+test_file="test_xattr_replace"
+touch $SCRATCH_MNT/$test_file
+$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
+
+set_xattr_loop $test_file &
+setter_pid=$!
+
+for ((i = 0; i < 1000; i++)); do
+	xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \
+		$SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2)
+	if [ "$xattr_val" != "\"$xattr_value1\"" -a \
+		"$xattr_val" != "\"$xattr_value2\"" ]; then
+		_fail "Missing or unexpected xattr value: $xattr_val"
+	fi
+done
+
+kill $setter_pid &> /dev/null
+unset setter_pid
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/generic/326.out b/tests/generic/326.out
new file mode 100644
index 0000000..4ac0db5
--- /dev/null
+++ b/tests/generic/326.out
@@ -0,0 +1,2 @@ 
+QA output created by 326
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 9c82a6f..01f442d 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -152,3 +152,4 @@ 
 323 auto aio stress
 324 auto fsr quick
 325 auto quick data log
+326 auto quick xattr