diff mbox

btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

Message ID 20170215090338.GA11666@giis.co.in (mailing list archive)
State New, archived
Headers show

Commit Message

Lakshmipathi.G Feb. 15, 2017, 9:03 a.m. UTC
Signed-off-by: Lakshmipathi.G <Lakshmipathi.G@giis.co.in>
---
 .../020-raid5-datastripe-corruption/test.sh        | 224 +++++++++++++++++++++
 1 file changed, 224 insertions(+)
 create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh

Comments

Lakshmipathi.G Feb. 15, 2017, 8:56 p.m. UTC | #1
On Wed, Feb 15, 2017 at 05:29:33PM +0800, Qu Wenruo wrote:
> 
> 
> At 02/15/2017 05:03 PM, Lakshmipathi.G wrote:
> >Signed-off-by: Lakshmipathi.G <Lakshmipathi.G@giis.co.in>
> >---
> > .../020-raid5-datastripe-corruption/test.sh        | 224 +++++++++++++++++++++
> > 1 file changed, 224 insertions(+)
> > create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >
> >diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >new file mode 100755
> >index 0000000..d04c430
> >--- /dev/null
> >+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >@@ -0,0 +1,224 @@
> >+#!/bin/bash
> >+#
> >+# Raid5: Inject data stripe corruption and fix them using scrub.
> >+#
> >+# Script will perform the following:
> >+# 1) Create Raid5 using 3 loopback devices.
> >+# 2) Ensure file layout is created in a predictable manner.
> >+#    Each data stripe(64KB) should uniquely start with 'DNxxxx',
> >+#    where N represents the data stripe number.(ex:D0xxxx,D1xxxx etc)
> 
> If you want really predictable layout, you could just upload compressed
> images for this purpose.
> 
> Which makes things super easy, and unlike fstests, btrfs-progs self-test
> accepts such images.
> 
> >+# 3) Once file is created with specific layout, gather data stripe details
> >+#    like devicename, position and actual on-disk data.
> >+# 4) Now use 'dd' to verify the data-stripe against its expected value
> >+#    and inject corruption by zero'ing out contents.
> >+# 5) After injecting corruption, running online-scrub is expected to fix
> >+#    the corrupted data stripe with the help of parity block and
> >+#    corresponding data stripe.
> 
> You should also verify parity stripe is not corrupted.
> It's already known that RAID5/6 will corrupted parity while recovering data
> stripe.
> 
> Kernel patch for this, with detailed bug info.
> https://patchwork.kernel.org/patch/9553581/
> 
> >+# 6) Finally, validate the data stripe has original un-corrupted value.
> >+#
> >+#  Note: This script doesn't handle parity block corruption.
> 
> Normally such test case should belong to xfstests (renamed to fstests
> recently) as we're verifying kernel behavior, not btrfs-progs behavior.
> 
> But since fstests test case should be as generic as possible, and we don't
> have a good enough tool to corrupt given data/parity stripe, my previously
> submitted test case is rejected.
> 
> Personally speaking, this seems to be a dilemma for me.
> 
> We really need a test case for this, bugs has been spotted that RAID5/6
> scrub will corrupt P/Q while recovering data stripe.
> But we need to enhance btrfs-corrupt-block to a better shape to make fstests
> to accept it, and it won't take a short time.
> 
> So I really have no idea what should we do for such test.
> 
> Thanks,
> Qu

Will check compressed images for parity strpe testing. I assume at the moment,
we currently support single static compressed image. Adding more than one static
compressed images like disk1.img disk2.img disk3.img for RAID is supported in
existing test framework?

Using compressed images for checking parity seems little easier than computing
via scripting.

Looked into patch description:

After scrubbing dev3 only:
0xcdcd (Good)  |      0xcdcd      | 0xcdcd (Bad) 
    (D1)              (D2)            (P) 

So the Parity stripe (P) always get replaced by exact content of D1/D2 (data-stripe)
or by random  data? If it always  get replaced by exact value from either
D1 or D2.  I think current script can be modified to detect that bug. If parity gets
replaced by random value, then it will the make task more difficult.

Yes, without better support for RAID with tools like btrfs-corrupt-block, it will be
hard to play-around with RAID to create test scripts.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lakshmipathi.G Feb. 16, 2017, 3:51 p.m. UTC | #2
On Thu, Feb 16, 2017 at 09:12:31AM +0800, Qu Wenruo wrote:
> 
> 
> At 02/16/2017 04:56 AM, Lakshmipathi.G wrote:
> >On Wed, Feb 15, 2017 at 05:29:33PM +0800, Qu Wenruo wrote:
> >>
> >>
> >>At 02/15/2017 05:03 PM, Lakshmipathi.G wrote:
> >>>Signed-off-by: Lakshmipathi.G <Lakshmipathi.G@giis.co.in>
> >>>---
> >>>.../020-raid5-datastripe-corruption/test.sh        | 224 +++++++++++++++++++++
> >>>1 file changed, 224 insertions(+)
> >>>create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>
> >>>diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>new file mode 100755
> >>>index 0000000..d04c430
> >>>--- /dev/null
> >>>+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>@@ -0,0 +1,224 @@
> >>>+#!/bin/bash
> >>>+#
> >>>+# Raid5: Inject data stripe corruption and fix them using scrub.
> >>>+#
> >>>+# Script will perform the following:
> >>>+# 1) Create Raid5 using 3 loopback devices.
> >>>+# 2) Ensure file layout is created in a predictable manner.
> >>>+#    Each data stripe(64KB) should uniquely start with 'DNxxxx',
> >>>+#    where N represents the data stripe number.(ex:D0xxxx,D1xxxx etc)
> >>
> >>If you want really predictable layout, you could just upload compressed
> >>images for this purpose.
> >>
> >>Which makes things super easy, and unlike fstests, btrfs-progs self-test
> >>accepts such images.
> >>
> >>>+# 3) Once file is created with specific layout, gather data stripe details
> >>>+#    like devicename, position and actual on-disk data.
> >>>+# 4) Now use 'dd' to verify the data-stripe against its expected value
> >>>+#    and inject corruption by zero'ing out contents.
> >>>+# 5) After injecting corruption, running online-scrub is expected to fix
> >>>+#    the corrupted data stripe with the help of parity block and
> >>>+#    corresponding data stripe.
> >>
> >>You should also verify parity stripe is not corrupted.
> >>It's already known that RAID5/6 will corrupted parity while recovering data
> >>stripe.
> >>
> >>Kernel patch for this, with detailed bug info.
> >>https://patchwork.kernel.org/patch/9553581/
> >>
> >>>+# 6) Finally, validate the data stripe has original un-corrupted value.
> >>>+#
> >>>+#  Note: This script doesn't handle parity block corruption.
> >>
> >>Normally such test case should belong to xfstests (renamed to fstests
> >>recently) as we're verifying kernel behavior, not btrfs-progs behavior.
> >>
> >>But since fstests test case should be as generic as possible, and we don't
> >>have a good enough tool to corrupt given data/parity stripe, my previously
> >>submitted test case is rejected.
> >>
> >>Personally speaking, this seems to be a dilemma for me.
> >>
> >>We really need a test case for this, bugs has been spotted that RAID5/6
> >>scrub will corrupt P/Q while recovering data stripe.
> >>But we need to enhance btrfs-corrupt-block to a better shape to make fstests
> >>to accept it, and it won't take a short time.
> >>
> >>So I really have no idea what should we do for such test.
> >>
> >>Thanks,
> >>Qu
> >
> >Will check compressed images for parity strpe testing. I assume at the moment,
> >we currently support single static compressed image. Adding more than one static
> >compressed images like disk1.img disk2.img disk3.img for RAID is supported in
> >existing test framework?
> 
> Not yet, but since you can use test.sh instead of running check_image() from
> test frameset, it's never a big problem.
> 
ok, will check it out.
> >
> >Using compressed images for checking parity seems little easier than computing
> >via scripting.
> >
> >Looked into patch description:
> >
> >After scrubbing dev3 only:
> >0xcdcd (Good)  |      0xcdcd      | 0xcdcd (Bad)
> >    (D1)              (D2)            (P)
> >
> >So the Parity stripe (P) always get replaced by exact content of D1/D2 (data-stripe)
> >or by random  data?
> 
> Neither. it's just XOR result of D2(never changed, 0xcdcd) and old D1(wrong,
> 0x0000).
> 0xcdcd XOR 0x0000 = 0xcdcd
> 
> So you got 0xcdcd, bad result.
> 
> If you corrupt D1 with random data, then parity will be random then.
> 
> >If it always  get replaced by exact value from either
> >D1 or D2.  I think current script can be modified to detect that bug. If parity gets
> >replaced by random value, then it will the make task more difficult.
> 
> Not hard to detect.
> As the content is completely under your control, you know the correct parity
> value, and you can verify it very easy.
> 
The script corrupts data-stripe (D1 or D2) in the random manner. So lets assume wrong 
parity will be in random format.

I tried for one-liners in computing XOR of two strings. 
str1 = "D0xxxxx"
str2 = "D1xxxxx"

failed to figure it out. I think parity will be ""00010000000000000000000000000000"
for above case. For higher-numbered  data-stripe (D15xxxx), parity will be slighly 
different like "00001000000000000000000000000000"

parity value can be hard-coded as above instead of computing them based on data-stripe
inside the script. But finding location is another issue, at the moment
the script uses expensive/dumb approach like "cat /dev | hexdump | grep" to
find the location. If I use the same method, it may give more 1 parity location.

For example, D0xxxx/D1xxxx parity will be same as D2xxxx/D3xxxx parity.
With device rotation, it is possible same device will have multiple parity.
(in this case parity value will be same too).

I'm thinking solution like:
1) Find all possible parity value from a device
$cat /dev/loop2 | hexdump -e '"%010_ad  |" 16/1 "%02X" "\n" ' | grep "00010000000000000000000000000000"
0009502496  |00010000000000000000000000000000
0009633568  |00010000000000000000000000000000
0009649952  |00010000000000000000000000000000
0063176704  |00010000000000000000000000000000

Though few of them maynt be actual parity block. just count them as parity anyway (count=4)

2) After scrubbing, again check whether still count is 4, otherwise error out. There is huge chance parity 
value is wrong. (I tried manually - found parity 63176704 missing few times after scrub and 
offset has wrong data.)

> 
> So I'm working to improve btrfs-corrupt-block after finishing RAID5/6 bug
> fixes.
> 
> Thanks,
> Qu

ok,If possible, please include 'example usages' section in the man page of new btrfs-corrupt-block.
thanks.

Cheers.
Lakshmipathi.G

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lakshmipathi.G Feb. 21, 2017, 9:41 a.m. UTC | #3
> >
> >Looked into patch description:
> >
> >After scrubbing dev3 only:
> >0xcdcd (Good)  |      0xcdcd      | 0xcdcd (Bad)
> >    (D1)              (D2)            (P)
> >
> >So the Parity stripe (P) always get replaced by exact content of D1/D2 (data-stripe)
> >or by random  data?
> 
> Neither. it's just XOR result of D2(never changed, 0xcdcd) and old D1(wrong,
> 0x0000).
> 0xcdcd XOR 0x0000 = 0xcdcd
> 
> So you got 0xcdcd, bad result.
> 
> If you corrupt D1 with random data, then parity will be random then.
> 
> >If it always  get replaced by exact value from either
> >D1 or D2.  I think current script can be modified to detect that bug. If parity gets
> >replaced by random value, then it will the make task more difficult.
> 
> Not hard to detect.
> As the content is completely under your control, you know the correct parity
> value, and you can verify it very easy.
> 

version-3 of this script calculates exact data/parity location, instead of dumping data 
and searching locations. Tested with upto 8MB file, from the output all 128 data-stripes 
and 64 parity stripe location seems fine. It constantly hit the parity bug with the script.


If the script gets accepted, will add slightly other corruption variants likes:
- corrupt all even stripe (D2,D4..)
- corrupt all odd stripe  (D1,D3..)
- corrupt all parity stripes
- corrupt all both data stripe (D0 & D1) and expect error message
(Cover above cases for RAID6)

thanks.

Cheers.
Lakshmipathi.G

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
new file mode 100755
index 0000000..d04c430
--- /dev/null
+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
@@ -0,0 +1,224 @@ 
+#!/bin/bash
+#
+# Raid5: Inject data stripe corruption and fix them using scrub.
+# 
+# Script will perform the following:
+# 1) Create Raid5 using 3 loopback devices.
+# 2) Ensure file layout is created in a predictable manner. 
+#    Each data stripe(64KB) should uniquely start with 'DNxxxx',   
+#    where N represents the data stripe number.(ex:D0xxxx,D1xxxx etc)
+# 3) Once file is created with specific layout, gather data stripe details 
+#    like devicename, position and actual on-disk data.
+# 4) Now use 'dd' to verify the data-stripe against its expected value
+#    and inject corruption by zero'ing out contents.
+# 5) After injecting corruption, running online-scrub is expected to fix 
+#    the corrupted data stripe with the help of parity block and 
+#    corresponding data stripe.
+# 6) Finally, validate the data stripe has original un-corrupted value.
+#
+#  Note: This script doesn't handle parity block corruption.
+
+source $TOP/tests/common
+
+check_prereq btrfs
+check_prereq mkfs.btrfs
+
+setup_root_helper
+prepare_test_dev 512M
+
+ndevs=3
+declare -a devs
+device_name=""
+stripe_offset=""
+stripe_content=""
+
+LAYOUT_TMP=$(mktemp --tmpdir btrfs-progs-raid5-file.layoutXXXXXX)
+STRIPEINFO_TMP=$(mktemp --tmpdir btrfs-progs-raid5-file.infoXXXXXX)
+
+prepare_devices()
+{
+	for i in `seq $ndevs`; do
+		touch img$i
+		chmod a+rw img$i
+		truncate -s0 img$i
+		truncate -s512M img$i
+		devs[$i]=`run_check_stdout $SUDO_HELPER losetup --find --show img$i`
+	done
+}
+
+cleanup_devices()
+{
+	for dev in ${devs[@]}; do
+		run_check $SUDO_HELPER losetup -d $dev
+	done
+	for i in `seq $ndevs`; do
+		truncate -s0 img$i
+	done
+	run_check $SUDO_HELPER losetup --all
+}
+
+test_do_mkfs()
+{
+	run_check $SUDO_HELPER $TOP/mkfs.btrfs -f	\
+		$@
+}
+
+test_mkfs_multi()
+{
+	test_do_mkfs $@ ${devs[@]}
+}
+
+#$1 Filename
+#$2 Expected no.of data stripes for the file.
+create_layout(){
+	fname=$1
+	size=$(( $2 * 65536 ))
+	n=0
+	bs_value=1
+	stripe=0
+	while (( $n < $size ))
+	do
+		if [ $(( $n % 65536 )) -eq 0 ]; then
+			val='D'$stripe
+			echo -n $val
+        		stripe=$(( $stripe+1 ))
+			# ensure proper value		
+			bs_value=`echo "${#val}"` 
+        	else
+			echo -n 'x'
+			bs_value=1
+		fi
+        n=$(( $n+$bs_value ))
+	done | dd of="$TEST_MNT"/$fname bs=$bs_value conv=notrunc &> /dev/null
+}
+
+find_data_stripe_details(){
+	for dev in ${devs[@]}; do
+		echo $dev >> $LAYOUT_TMP
+		$SUDO_HELPER cat $dev | hexdump -e '"%010_ad|" 16/1 "%_p" "|\n"' |
+		grep -P 'D[0-9]+xx'  >> $LAYOUT_TMP
+	done
+}
+
+#Collect data stripe information in a readable manner.
+save_data_stripe_details(){
+	devname=""
+	for entry in `cat $LAYOUT_TMP`; do  
+		echo $entry | grep -q '^\/dev\/loop' > /dev/null
+
+		if [ $? -eq 0 ]; then
+			devname=$entry	
+		else
+			echo $devname"|"$entry >> $STRIPEINFO_TMP
+		fi
+	done
+	#Order by data stripe. D0 comes before D1.
+	sort -t'|'  -k3 $STRIPEINFO_TMP -o $STRIPEINFO_TMP
+}
+
+#Corrupt given data stripe
+corrupt_data_stripe(){
+
+	data_stripe_num=$1
+	data_stripe_entry="D"${data_stripe_num}"xxxx"
+	stripe_entry=`grep "${data_stripe_entry}" $STRIPEINFO_TMP`
+
+	#Each entry will have format like "device|position|16-byte content"
+	#Example: /dev/loop1|0063176704|D0xxxxxxxxxxxxxx|
+	device_name=$(echo $stripe_entry | awk -F"|" '{print $1}')
+	stripe_offset=$(echo $stripe_entry | awk -F"|" '{print $2}')
+	#Remove leading zeros
+	stripe_offset=$(echo $stripe_offset | sed 's/^0*//')
+	stripe_content=$(echo $stripe_entry | awk -F"|" '{print $3}')
+
+	echo "Corrupting $device_name at position $stripe_offset \
+	which has $stripe_content" >> "$RESULTS"
+
+	#verify the value at this position 
+	original_value=$($SUDO_HELPER dd 2>/dev/null if=$device_name bs=1 \
+	count=16 skip=$stripe_offset)
+
+	if [ $original_value != $stripe_content ];then
+		 echo "$original_value != $stripe_content"
+		_fail "Data stripe mismatch. Possible use of incorrect block."
+	else
+		echo "Found on-disk value: $original_value " >> "$RESULTS"
+	fi
+
+	#Corrupt the given data stripe
+	$SUDO_HELPER dd if=/dev/zero of=$device_name bs=1 count=4 \
+	seek=$stripe_offset conv=notrunc &> /dev/null
+
+	#Fetch value again.
+	corrupted_value=$($SUDO_HELPER dd 2>/dev/null if=$device_name \
+	bs=1 count=16 skip=$stripe_offset)
+
+	if [ $corrupted_value == $original_value ];then
+		 echo "original:$original_value corrupted:$corrupted_value"
+		_fail "Corruption failed. Possible use of incorrect block."
+	else
+		echo "Corruption completed at $stripe_offset" >> "$RESULTS"
+	fi
+
+# Corruption done.
+}
+
+#Verify data stripe after scrub
+verify_data_stripe(){
+
+	value_after_scrub=$($SUDO_HELPER dd 2>/dev/null if=$device_name bs=1 \
+	count=16 skip=$stripe_offset)
+	if [ $value_after_scrub != $stripe_content ];then
+		_fail "Scrub failed to fix data stripe corruption."
+	else
+		echo "Scrub corrected value: $value_after_scrub" >> "$RESULTS"
+	fi
+}
+
+#$1 Filename
+#$2 File with 'n' no.of data stripes
+#$3 Data stripe to corrupt
+test_raid5_datastripe_corruption(){
+	filename=$1
+	stripe_num=$2
+	test_stripe=$3
+
+	prepare_devices
+	dev1=${devs[1]}
+	dev2=${devs[2]}
+	dev3=${devs[3]}
+
+	test_mkfs_multi -d raid5 -m raid5
+	run_check $SUDO_HELPER mount $dev1 $TEST_MNT
+	create_layout $filename $stripe_num
+	run_check $SUDO_HELPER umount "$TEST_MNT"
+
+	#Gather data stripe informations like specific device,offset 
+	find_data_stripe_details
+	save_data_stripe_details
+	corrupt_data_stripe $test_stripe
+
+	#Mount the device and start scrub
+	run_check $SUDO_HELPER mount $dev1 $TEST_MNT
+	run_check $SUDO_HELPER btrfs scrub start $TEST_MNT
+	#Introduce delay, hopefully scrubbing will be finished.
+	sleep 10 
+
+	#Validate 
+	verify_data_stripe
+
+	#cleanup
+	run_check $SUDO_HELPER umount "$TEST_MNT"
+	cleanup_devices
+	rm -f $LAYOUT_TMP
+	rm -f $STRIPEINFO_TMP
+}
+
+
+test_raid5_datastripe_corruption file128k.txt 2 1 #file with 2 stripe,corrupt 1st.
+test_raid5_datastripe_corruption file192k.txt 3 2 #file with 3 stripe,corrupt 2nd.
+test_raid5_datastripe_corruption file256k.txt 4 3
+test_raid5_datastripe_corruption file512k.txt 8 6
+test_raid5_datastripe_corruption file768k.txt 12 10
+test_raid5_datastripe_corruption file1m.txt 16 14 #1MB file, corrupt 14th stripe.
+test_raid5_datastripe_corruption file1m.txt 32 23 #2MB file, corrupt 23rd stripe