Message ID | 1415392826-11606-1-git-send-email-fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote: > This test verifies that replacing a xattr's value is an atomic > operation. This is motivated by an issue in btrfs where replacing > a xattr's value wasn't an atomic operation, it consisted of > removing the old value and then inserting the new value in a > btree. This made readers (getxattr and listxattrs) not getting > neither the old nor the new value during a short time window. OK, seems like a good thing to test that the application can only see the old or the new value. However, I can't help but wonder about whether the btrfs behaviour is crash safe as it wasn't designed to be atomic from the ground up. i.e. if the system crashes half way through a attribute overwrite, what does btrfs end up with as a result? XFS is guaranteed at a transactional level to return either the old or the new value, depending on where in the operaiton the crash occurred, but I'd just assumed that everyone did attribute replace atomically so it never occurred to me that it might be an issue... Perhaps another test involving dm-flakey to trigger errors and shutdowns whilst in the middle of replacing attributes might be a good thing? [....] > +_cleanup() > +{ > + if [ ! -z $setter_pid ]; then > + kill $setter_pid &> /dev/null > + fi No need to do this if[] then. You're already redirecting all errors to /dev/null, so if $setter_pid is empty it will just output the help string. > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/attr > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs generic > +_supported_os Linux > +_require_scratch > +_require_attrs > + > +rm -f $seqres.full > + > +xattr_name="user.something" > +xattr_value1="foobar" > +xattr_value2="rabbit_hole" > + > +set_xattr_loop() > +{ > + local name=$1 > + > + local cur_val=$xattr_value1 > + while true; do > + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name > + if [ "$cur_val" == "$xattr_value1" ]; then > + cur_val=$xattr_value2 > + else > + cur_val=$xattr_value1 > + fi > + done > +} > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +test_file="test_xattr_replace" > +touch $SCRATCH_MNT/$test_file > +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file > + > +set_xattr_loop $test_file & > +setter_pid=$! > + > +for ((i = 0; i < 1000; i++)); do > + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \ > + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2) > + if [ "$xattr_val" != "\"$xattr_value1\"" -a \ > + "$xattr_val" != "\"$xattr_value2\"" ]; then > + _fail "Missing or unexpected xattr value: $xattr_val" > + fi I'd suggest that a filter is a much better way to do this. name_filter() { grep "$xattr_name=" | cut -d '=' -f 2 | \ sed -e "s/$xattr_value1/GOOD/" \ -e "s/$xattr_value2/GOOD/" } for (); do $GETFATTR_PROG --absolute-names -n $xattr_name \ $SCRATCH_MNT/$test_file | name_filter done And so the output file should just be: GOOD GOOD .... And if it is bad, then it outputs the bad attribute value instead of "GOOD", and the test fails on the golden output match. > +kill $setter_pid &> /dev/null > +unset setter_pid See above, no need to unset the var. > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/generic/326.out b/tests/generic/326.out > new file mode 100644 > index 0000000..4ac0db5 > --- /dev/null > +++ b/tests/generic/326.out > @@ -0,0 +1,2 @@ > +QA output created by 326 > +Silence is golden Except when you are using comparisons and "_fail" instead of filters. ;) BTW, new tests should be numbere as the first unused number, not at the tail (i.e generic/036). If there are collisions with other new tests when I merge them, I'll renumber them at merge time. > 323 auto aio stress > 324 auto fsr quick > 325 auto quick data log > +326 auto quick xattr There is no existing xattr group - normally xattr tests are considered part of the "metadata" group. If you want to introduce an xattr test group, do it as a separate patch and include all the tests (across all the test directories) that manipulate xattrs in some way. Cheers, Dave.
On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote: >> This test verifies that replacing a xattr's value is an atomic >> operation. This is motivated by an issue in btrfs where replacing >> a xattr's value wasn't an atomic operation, it consisted of >> removing the old value and then inserting the new value in a >> btree. This made readers (getxattr and listxattrs) not getting >> neither the old nor the new value during a short time window. > > OK, seems like a good thing to test that the application can only > see the old or the new value. > > However, I can't help but wonder about whether the btrfs behaviour > is crash safe as it wasn't designed to be atomic from the ground up. > i.e. if the system crashes half way through a attribute overwrite, > what does btrfs end up with as a result? XFS is guaranteed at a > transactional level to return either the old or the new value, > depending on where in the operaiton the crash occurred, but I'd just > assumed that everyone did attribute replace atomically so it never > occurred to me that it might be an issue... It's crash safe. Both the delete and insert were done in the same transaction, so a crash in between both operations (or after both and before the transaction commit) would result in always seeing the old value (or better saying, the last persisted value by a transaction commit or fsync). > > Perhaps another test involving dm-flakey to trigger errors and > shutdowns whilst in the middle of replacing attributes might be a > good thing? > > [....] > >> +_cleanup() >> +{ >> + if [ ! -z $setter_pid ]; then >> + kill $setter_pid &> /dev/null >> + fi > > No need to do this if[] then. You're already redirecting all errors > to /dev/null, so if $setter_pid is empty it will just output the > help string. > >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/attr >> + >> +# real QA test starts here >> +_need_to_be_root >> +_supported_fs generic >> +_supported_os Linux >> +_require_scratch >> +_require_attrs >> + >> +rm -f $seqres.full >> + >> +xattr_name="user.something" >> +xattr_value1="foobar" >> +xattr_value2="rabbit_hole" >> + >> +set_xattr_loop() >> +{ >> + local name=$1 >> + >> + local cur_val=$xattr_value1 >> + while true; do >> + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name >> + if [ "$cur_val" == "$xattr_value1" ]; then >> + cur_val=$xattr_value2 >> + else >> + cur_val=$xattr_value1 >> + fi >> + done >> +} >> + >> +_scratch_mkfs >>$seqres.full 2>&1 >> +_scratch_mount >> + >> +test_file="test_xattr_replace" >> +touch $SCRATCH_MNT/$test_file >> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file >> + >> +set_xattr_loop $test_file & >> +setter_pid=$! >> + >> +for ((i = 0; i < 1000; i++)); do >> + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \ >> + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2) >> + if [ "$xattr_val" != "\"$xattr_value1\"" -a \ >> + "$xattr_val" != "\"$xattr_value2\"" ]; then >> + _fail "Missing or unexpected xattr value: $xattr_val" >> + fi > > I'd suggest that a filter is a much better way to do this. > > name_filter() { > grep "$xattr_name=" | cut -d '=' -f 2 | \ > sed -e "s/$xattr_value1/GOOD/" \ > -e "s/$xattr_value2/GOOD/" > } > > for (); do > $GETFATTR_PROG --absolute-names -n $xattr_name \ > $SCRATCH_MNT/$test_file | name_filter > done > > And so the output file should just be: > > GOOD > GOOD > .... > > And if it is bad, then it outputs the bad attribute value instead > of "GOOD", and the test fails on the golden output match. > >> +kill $setter_pid &> /dev/null >> +unset setter_pid > > See above, no need to unset the var. >> + >> +echo "Silence is golden" >> +status=0 >> +exit >> diff --git a/tests/generic/326.out b/tests/generic/326.out >> new file mode 100644 >> index 0000000..4ac0db5 >> --- /dev/null >> +++ b/tests/generic/326.out >> @@ -0,0 +1,2 @@ >> +QA output created by 326 >> +Silence is golden > > Except when you are using comparisons and "_fail" instead of > filters. ;) > > BTW, new tests should be numbere as the first unused number, not at > the tail (i.e generic/036). If there are collisions with other new > tests when I merge them, I'll renumber them at merge time. > >> 323 auto aio stress >> 324 auto fsr quick >> 325 auto quick data log >> +326 auto quick xattr > > There is no existing xattr group - normally xattr tests are > considered part of the "metadata" group. If you want to introduce an > xattr test group, do it as a separate patch and include all the > tests (across all the test directories) that manipulate xattrs in > some way. Agreed. V2 following. Thanks > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote: > On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote: > >> This test verifies that replacing a xattr's value is an atomic > >> operation. This is motivated by an issue in btrfs where replacing > >> a xattr's value wasn't an atomic operation, it consisted of > >> removing the old value and then inserting the new value in a > >> btree. This made readers (getxattr and listxattrs) not getting > >> neither the old nor the new value during a short time window. > > > > OK, seems like a good thing to test that the application can only > > see the old or the new value. > > > > However, I can't help but wonder about whether the btrfs behaviour > > is crash safe as it wasn't designed to be atomic from the ground up. > > i.e. if the system crashes half way through a attribute overwrite, > > what does btrfs end up with as a result? XFS is guaranteed at a > > transactional level to return either the old or the new value, > > depending on where in the operaiton the crash occurred, but I'd just > > assumed that everyone did attribute replace atomically so it never > > occurred to me that it might be an issue... > > It's crash safe. Both the delete and insert were done in the same > transaction, so a crash in between both operations (or after both and > before the transaction commit) would result in always seeing the old > value (or better saying, the last persisted value by a transaction > commit or fsync). Alright, so no crash issues because all the modifications are in a single transaction. However, if both modifications are made in the same transaction, then this bug implies that a user can read a metadata object in the btree whilst somethign else is concurrently modifying it, right? i.e. that there is no serialisation between inode metadata reads and transactional modification operations? Cheers, Dave.
On Mon, Nov 10, 2014 at 1:31 AM, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote: >> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote: >> >> This test verifies that replacing a xattr's value is an atomic >> >> operation. This is motivated by an issue in btrfs where replacing >> >> a xattr's value wasn't an atomic operation, it consisted of >> >> removing the old value and then inserting the new value in a >> >> btree. This made readers (getxattr and listxattrs) not getting >> >> neither the old nor the new value during a short time window. >> > >> > OK, seems like a good thing to test that the application can only >> > see the old or the new value. >> > >> > However, I can't help but wonder about whether the btrfs behaviour >> > is crash safe as it wasn't designed to be atomic from the ground up. >> > i.e. if the system crashes half way through a attribute overwrite, >> > what does btrfs end up with as a result? XFS is guaranteed at a >> > transactional level to return either the old or the new value, >> > depending on where in the operaiton the crash occurred, but I'd just >> > assumed that everyone did attribute replace atomically so it never >> > occurred to me that it might be an issue... >> >> It's crash safe. Both the delete and insert were done in the same >> transaction, so a crash in between both operations (or after both and >> before the transaction commit) would result in always seeing the old >> value (or better saying, the last persisted value by a transaction >> commit or fsync). > > Alright, so no crash issues because all the modifications are in a > single transaction. However, if both modifications are made in the > same transaction, then this bug implies that a user can read a > metadata object in the btree whilst somethign else is concurrently > modifying it, right? i.e. that there is no serialisation between > inode metadata reads and transactional modification operations? Nop, not exactly the possibility to read while being modified. The replace steps were: lock btree node(s), delete current value from leaf, unlock btree nodes, lock btree nodes, write new value to leaf, unlock btree nodes Readers (a getxattr or listxattrs syscall for e.g.), could just after the first btree node unlocking done by the writer: lock btree nodes, get current value (none found), unlock btree nodes, and then return -ENODATA. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Mon, Nov 10, 2014 at 11:53:00AM +0000, Filipe David Manana wrote: > On Mon, Nov 10, 2014 at 1:31 AM, Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote: > >> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote: > >> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote: > >> >> This test verifies that replacing a xattr's value is an atomic > >> >> operation. This is motivated by an issue in btrfs where replacing > >> >> a xattr's value wasn't an atomic operation, it consisted of > >> >> removing the old value and then inserting the new value in a > >> >> btree. This made readers (getxattr and listxattrs) not getting > >> >> neither the old nor the new value during a short time window. > >> > > >> > OK, seems like a good thing to test that the application can only > >> > see the old or the new value. > >> > > >> > However, I can't help but wonder about whether the btrfs behaviour > >> > is crash safe as it wasn't designed to be atomic from the ground up. > >> > i.e. if the system crashes half way through a attribute overwrite, > >> > what does btrfs end up with as a result? XFS is guaranteed at a > >> > transactional level to return either the old or the new value, > >> > depending on where in the operaiton the crash occurred, but I'd just > >> > assumed that everyone did attribute replace atomically so it never > >> > occurred to me that it might be an issue... > >> > >> It's crash safe. Both the delete and insert were done in the same > >> transaction, so a crash in between both operations (or after both and > >> before the transaction commit) would result in always seeing the old > >> value (or better saying, the last persisted value by a transaction > >> commit or fsync). > > > > Alright, so no crash issues because all the modifications are in a > > single transaction. However, if both modifications are made in the > > same transaction, then this bug implies that a user can read a > > metadata object in the btree whilst somethign else is concurrently > > modifying it, right? i.e. that there is no serialisation between > > inode metadata reads and transactional modification operations? > > Nop, not exactly the possibility to read while being modified. The > replace steps were: > > lock btree node(s), delete current value from leaf, unlock btree > nodes, lock btree nodes, write new value to leaf, unlock btree nodes You missed "transaction start" and "transaction commit", which define the boundaries of a modification in progress. > Readers (a getxattr or listxattrs syscall for e.g.), could just after > the first btree node unlocking done by the writer: lock btree nodes, > get current value (none found), unlock btree nodes, and then return > -ENODATA. Right, and that's a exact demonstration "read while modifying" when "modifying" means an object has been modified in an uncommitted transaction. This indicates the transaction failed both the Atomicity and Isolation properties of an ACID transaction model. I hope this isn't a common pattern in btrfs - it worries me just to know that the btrfs transaction subsytem does not, at minimum, throw warnings when ACID attributes are violated... Cheers, Dave.
diff --git a/tests/generic/326 b/tests/generic/326 new file mode 100755 index 0000000..c110fc0 --- /dev/null +++ b/tests/generic/326 @@ -0,0 +1,107 @@ +#! /bin/bash +# FSQA Test No. 326 +# +# Verify that replacing a xattr's value is an atomic operation. +# This is motivated by an issue in btrfs where replacing a xattr's value +# wasn't an atomic operation, it consisted of removing the old value and +# then inserting the new value in a btree. This made readers (getxattr +# and listxattrs) not getting neither the old nor the new value during +# a short time window. +# +# The btrfs issue was fixed by the following linux kernel patch: +# +# Btrfs: make xattr replace operations atomic +# +#----------------------------------------------------------------------- +# +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana <fdmanana@suse.com> +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + if [ ! -z $setter_pid ]; then + kill $setter_pid &> /dev/null + fi + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/attr + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch +_require_attrs + +rm -f $seqres.full + +xattr_name="user.something" +xattr_value1="foobar" +xattr_value2="rabbit_hole" + +set_xattr_loop() +{ + local name=$1 + + local cur_val=$xattr_value1 + while true; do + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name + if [ "$cur_val" == "$xattr_value1" ]; then + cur_val=$xattr_value2 + else + cur_val=$xattr_value1 + fi + done +} + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +test_file="test_xattr_replace" +touch $SCRATCH_MNT/$test_file +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file + +set_xattr_loop $test_file & +setter_pid=$! + +for ((i = 0; i < 1000; i++)); do + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \ + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2) + if [ "$xattr_val" != "\"$xattr_value1\"" -a \ + "$xattr_val" != "\"$xattr_value2\"" ]; then + _fail "Missing or unexpected xattr value: $xattr_val" + fi +done + +kill $setter_pid &> /dev/null +unset setter_pid + +echo "Silence is golden" +status=0 +exit diff --git a/tests/generic/326.out b/tests/generic/326.out new file mode 100644 index 0000000..4ac0db5 --- /dev/null +++ b/tests/generic/326.out @@ -0,0 +1,2 @@ +QA output created by 326 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 9c82a6f..01f442d 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -152,3 +152,4 @@ 323 auto aio stress 324 auto fsr quick 325 auto quick data log +326 auto quick xattr
This test verifies that replacing a xattr's value is an atomic operation. This is motivated by an issue in btrfs where replacing a xattr's value wasn't an atomic operation, it consisted of removing the old value and then inserting the new value in a btree. This made readers (getxattr and listxattrs) not getting neither the old nor the new value during a short time window. Signed-off-by: Filipe Manana <fdmanana@suse.com> --- tests/generic/326 | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/326.out | 2 + tests/generic/group | 1 + 3 files changed, 110 insertions(+) create mode 100755 tests/generic/326 create mode 100644 tests/generic/326.out