diff mbox series

[blktests] tests/rnbd: Implement RNBD regression test

Message ID 20241223054535.295371-1-lizhijian@fujitsu.com (mailing list archive)
State New
Headers show
Series [blktests] tests/rnbd: Implement RNBD regression test | expand

Commit Message

Li Zhijian Dec. 23, 2024, 5:45 a.m. UTC
This test case has been in my possession for quite some time.
I am upstreaming it now because it has once again detected a regression in
a recent kernel release [1].

It's just stupid to connect and disconnect RNBD on localhost and expect
no dmesg exceptions, with some attempts actually succeeding.

Please note that currently, only RTRS over RXE is supported.

[1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
Copy to the RDMA/rtrs guys:

Cc: Jack Wang <jinpu.wang@ionos.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
---
 tests/rnbd/001     | 37 ++++++++++++++++++++++++++++
 tests/rnbd/001.out |  2 ++
 tests/rnbd/rc      | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)
 create mode 100755 tests/rnbd/001
 create mode 100644 tests/rnbd/001.out
 create mode 100644 tests/rnbd/rc

Comments

Shinichiro Kawasaki Dec. 24, 2024, 2:47 a.m. UTC | #1
On Dec 23, 2024 / 13:45, Li Zhijian wrote:
> This test case has been in my possession for quite some time.
> I am upstreaming it now because it has once again detected a regression in
> a recent kernel release [1].
> 
> It's just stupid to connect and disconnect RNBD on localhost and expect
> no dmesg exceptions, with some attempts actually succeeding.
> 
> Please note that currently, only RTRS over RXE is supported.
> 
> [1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/

Hello Li, thank you for the patch. I ran the test case that this patch adds with
the kernel v6.13-rc4 KASAN enabled. I observed "BUG: KASAN: slab-use-after-free
in __list_add_valid_or_report". I confirmed that your kernel patch avoids
the message. So, the new blktests test case catches a kernel BUG. Good.

Having said that, even with the kernel fix patch, still I observe the test
case failure in my QEMU test environment. The count j is almost always zero.
I once observed the count became 3, but it is far smaller than the criteria 10.
I guess test environment performance factors might affect the pass/fail results.
Do we really need to check the start/stop success ratio? I think the BUG can be
caught regardless of the check.

Also, please find my other reivew comments in line.

> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> Copy to the RDMA/rtrs guys:
> 
> Cc: Jack Wang <jinpu.wang@ionos.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
> ---
>  tests/rnbd/001     | 37 ++++++++++++++++++++++++++++
>  tests/rnbd/001.out |  2 ++
>  tests/rnbd/rc      | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 99 insertions(+)
>  create mode 100755 tests/rnbd/001
>  create mode 100644 tests/rnbd/001.out
>  create mode 100644 tests/rnbd/rc
> 
> diff --git a/tests/rnbd/001 b/tests/rnbd/001
> new file mode 100755
> index 000000000000..220468f0f5b4
> --- /dev/null
> +++ b/tests/rnbd/001
> @@ -0,0 +1,37 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright 2024, Fujitsu
> +#

I suggest to describe shortly here that this test can catch the kernel
regression, in same manner as the commit message.

> +. tests/rnbd/rc
> +
> +DESCRIPTION="Start Stop RNBD"
> +CHECK_DMESG=1

Do you expect this test completes quickly (around 30 seconds)? If so,
I suggest to add QUICK=1 here.

> +
> +requires() {
> +	_have_rnbd

I suggest to add "_have_loop" here.

> +}
> +
> +test_start_stop()
> +{
> +	_setup_rnbd || return
> +
> +	local loop_dev i j
> +	loop_dev="$(losetup -f)"
> +
> +	for ((i=0;i<100;i++))
> +	do
> +		_start_rnbd_client "${loop_dev}" &>/dev/null &&
> +		_stop_rnbd_client &>/dev/null && ((j++))
> +	done
> +
> +	# We expect at least 10% start/stop successfully
> +	if [[ $j -lt 10 ]]; then
> +		echo "Failed: $j/$i"
> +	fi
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +	test_start_stop
> +	echo "Test complete"
> +}
> diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out
> new file mode 100644
> index 000000000000..c1f9980d0f7b
> --- /dev/null
> +++ b/tests/rnbd/001.out
> @@ -0,0 +1,2 @@
> +Running rnbd/001
> +Test complete
> diff --git a/tests/rnbd/rc b/tests/rnbd/rc
> new file mode 100644
> index 000000000000..143ba0b59f38
> --- /dev/null
> +++ b/tests/rnbd/rc
> @@ -0,0 +1,60 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright 2024, Fujitsu
> +#
> +# RNBD tests.
> +
> +. common/rc
> +. common/multipath-over-rdma
> +
> +_have_rnbd() {
> +	if [[ "$USE_RXE" != 1 ]]; then
> +		SKIP_REASONS+=("Only USE_RXE=1 is supported")
> +		return 1
> +	fi
> +	_have_driver rdma_rxe || return
> +	_have_driver rnbd_server || return
> +	_have_driver rnbd_client
> +}
> +
> +_setup_rnbd() {
> +	modprobe -q rnbd_server
> +	modprobe -q rnbd_client
> +	start_soft_rdma

I was not able to find stop_soft_rdma() in this patch. It might be the better to
add _cleanup_rnbd() to call stop_soft_rdma().

> +	for i in $(rdma_network_interfaces)
> +	do
> +		ipv4_addr=$(get_ipv4_addr "$i")
> +		if [[ -n "${ipv4_addr}" ]]; then
> +			def_traddr=${ipv4_addr}
> +		fi
> +	done
> +}
> +
> +_start_rnbd_client() {
> +	local a b
> +	local blkdev=$1
> +	local before after
> +
> +	before=$(ls -d /sys/block/rnbd* 2>/dev/null)
> +	if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then
> +		return 1
> +	fi
> +
> +	# Retrieve the newly added rnbd entry
> +	after=$(ls -d /sys/block/rnbd* 2>/dev/null)
> +	for a in $after
> +	do
> +		[[ -n "$before" ]] || break
> +
> +		for b in $before
> +		do
> +			[[ "$a" = "$b" ]] || break
> +		done
> +	done
> +
> +	rnbd_node=$a

Nit: this rnbd_node is a global variable. To clarify it, I suggest to use
capital letters for its name and declare it at the beginning of this script
file, like,

declare RNBD_NODE

> +}
> +
> +_stop_rnbd_client() {
> +	echo "normal" > "$rnbd_node"/rnbd/unmap_device
> +}
> -- 
> 2.47.0
>
Li Zhijian Dec. 24, 2024, 5:49 a.m. UTC | #2
Shinichiro,


On 24/12/2024 10:47, Shinichiro Kawasaki wrote:
> On Dec 23, 2024 / 13:45, Li Zhijian wrote:
>> This test case has been in my possession for quite some time.
>> I am upstreaming it now because it has once again detected a regression in
>> a recent kernel release [1].
>>
>> It's just stupid to connect and disconnect RNBD on localhost and expect
>> no dmesg exceptions, with some attempts actually succeeding.
>>
>> Please note that currently, only RTRS over RXE is supported.
>>
>> [1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/
> 
> Hello Li, thank you for the patch. I ran the test case that this patch adds with
> the kernel v6.13-rc4 KASAN enabled. I observed "BUG: KASAN: slab-use-after-free
> in __list_add_valid_or_report". I confirmed that your kernel patch avoids
> the message. So, the new blktests test case catches a kernel BUG. Good.

Thanks for the testing

> 
> Having said that, even with the kernel fix patch, still I observe the test
> case failure in my QEMU test environment. The count j is almost always zero.
> I once observed the count became 3, but it is far smaller than the criteria 10.
> I guess test environment performance factors might affect the pass/fail results.
> Do we really need to check the start/stop success ratio? 

Good question
I often observed ~50% success in my QEMU environment, zero success is not expected IMO.


I think the BUG can be
> caught regardless of the check.
> 

Yes, this is true.


> Also, please find my other reivew comments in line.
> 
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>> Copy to the RDMA/rtrs guys:
>>
>> Cc: Jack Wang <jinpu.wang@ionos.com>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Leon Romanovsky <leon@kernel.org>
>> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
>> ---
>>   tests/rnbd/001     | 37 ++++++++++++++++++++++++++++
>>   tests/rnbd/001.out |  2 ++
>>   tests/rnbd/rc      | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 99 insertions(+)
>>   create mode 100755 tests/rnbd/001
>>   create mode 100644 tests/rnbd/001.out
>>   create mode 100644 tests/rnbd/rc
>>
>> diff --git a/tests/rnbd/001 b/tests/rnbd/001
>> new file mode 100755
>> index 000000000000..220468f0f5b4
>> --- /dev/null
>> +++ b/tests/rnbd/001
>> @@ -0,0 +1,37 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-3.0+
>> +# Copyright 2024, Fujitsu
>> +#
> 
> I suggest to describe shortly here that this test can catch the kernel
> regression, in same manner as the commit message.

Ok, I will update it.


> 
>> +. tests/rnbd/rc
>> +
>> +DESCRIPTION="Start Stop RNBD"
>> +CHECK_DMESG=1
> 
> Do you expect this test completes quickly (around 30 seconds)? If so,
> I suggest to add QUICK=1 here.

Yes, QUICK=1 is fine.(It finishes in 15 seconds)


> 
>> +
>> +requires() {
>> +	_have_rnbd
> 
> I suggest to add "_have_loop" here.

Ok.


> 
>> +}
>> +
>> +test_start_stop()
>> +{
>> +	_setup_rnbd || return
>> +
>> +	local loop_dev i j
>> +	loop_dev="$(losetup -f)"
>> +
>> +	for ((i=0;i<100;i++))
>> +	do
>> +		_start_rnbd_client "${loop_dev}" &>/dev/null &&
>> +		_stop_rnbd_client &>/dev/null && ((j++))>> +	done
>> +
>> +	# We expect at least 10% start/stop successfully

I will consider the ratio again.



>> +	if [[ $j -lt 10 ]]; then
>> +		echo "Failed: $j/$i"
>> +	fi
>> +}
>> +
>> +test() {
>> +	echo "Running ${TEST_NAME}"
>> +	test_start_stop
>> +	echo "Test complete"
>> +}
>> diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out
>> new file mode 100644
>> index 000000000000..c1f9980d0f7b
>> --- /dev/null
>> +++ b/tests/rnbd/001.out
>> @@ -0,0 +1,2 @@
>> +Running rnbd/001
>> +Test complete
>> diff --git a/tests/rnbd/rc b/tests/rnbd/rc
>> new file mode 100644
>> index 000000000000..143ba0b59f38
>> --- /dev/null
>> +++ b/tests/rnbd/rc
>> @@ -0,0 +1,60 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-3.0+
>> +# Copyright 2024, Fujitsu
>> +#
>> +# RNBD tests.
>> +
>> +. common/rc
>> +. common/multipath-over-rdma
>> +
>> +_have_rnbd() {
>> +	if [[ "$USE_RXE" != 1 ]]; then
>> +		SKIP_REASONS+=("Only USE_RXE=1 is supported")
>> +		return 1
>> +	fi
>> +	_have_driver rdma_rxe || return
>> +	_have_driver rnbd_server || return
>> +	_have_driver rnbd_client
>> +}
>> +
>> +_setup_rnbd() {
>> +	modprobe -q rnbd_server
>> +	modprobe -q rnbd_client
>> +	start_soft_rdma
> 
> I was not able to find stop_soft_rdma() in this patch. It might be the better to
> add _cleanup_rnbd() to call stop_soft_rdma().

Good catch, it sounds good to me.


> 
>> +	for i in $(rdma_network_interfaces)
>> +	do
>> +		ipv4_addr=$(get_ipv4_addr "$i")
>> +		if [[ -n "${ipv4_addr}" ]]; then
>> +			def_traddr=${ipv4_addr}
>> +		fi
>> +	done
>> +}
>> +
>> +_start_rnbd_client() {
>> +	local a b
>> +	local blkdev=$1
>> +	local before after
>> +
>> +	before=$(ls -d /sys/block/rnbd* 2>/dev/null)
>> +	if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then
>> +		return 1
>> +	fi
>> +
>> +	# Retrieve the newly added rnbd entry
>> +	after=$(ls -d /sys/block/rnbd* 2>/dev/null)
>> +	for a in $after
>> +	do
>> +		[[ -n "$before" ]] || break
>> +
>> +		for b in $before
>> +		do
>> +			[[ "$a" = "$b" ]] || break
>> +		done
>> +	done
>> +
>> +	rnbd_node=$a
> 
> Nit: this rnbd_node is a global variable. To clarify it, I suggest to use
> capital letters for its name and declare it at the beginning of this script
> file, like,
> 
> declare RNBD_NODE

Sounds good.


Thanks
Zhijian

> 
>> +}
>> +
>> +_stop_rnbd_client() {
>> +	echo "normal" > "$rnbd_node"/rnbd/unmap_device
>> +}
>> -- 
>> 2.47.0
diff mbox series

Patch

diff --git a/tests/rnbd/001 b/tests/rnbd/001
new file mode 100755
index 000000000000..220468f0f5b4
--- /dev/null
+++ b/tests/rnbd/001
@@ -0,0 +1,37 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright 2024, Fujitsu
+#
+. tests/rnbd/rc
+
+DESCRIPTION="Start Stop RNBD"
+CHECK_DMESG=1
+
+requires() {
+	_have_rnbd
+}
+
+test_start_stop()
+{
+	_setup_rnbd || return
+
+	local loop_dev i j
+	loop_dev="$(losetup -f)"
+
+	for ((i=0;i<100;i++))
+	do
+		_start_rnbd_client "${loop_dev}" &>/dev/null &&
+		_stop_rnbd_client &>/dev/null && ((j++))
+	done
+
+	# We expect at least 10% start/stop successfully
+	if [[ $j -lt 10 ]]; then
+		echo "Failed: $j/$i"
+	fi
+}
+
+test() {
+	echo "Running ${TEST_NAME}"
+	test_start_stop
+	echo "Test complete"
+}
diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out
new file mode 100644
index 000000000000..c1f9980d0f7b
--- /dev/null
+++ b/tests/rnbd/001.out
@@ -0,0 +1,2 @@ 
+Running rnbd/001
+Test complete
diff --git a/tests/rnbd/rc b/tests/rnbd/rc
new file mode 100644
index 000000000000..143ba0b59f38
--- /dev/null
+++ b/tests/rnbd/rc
@@ -0,0 +1,60 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright 2024, Fujitsu
+#
+# RNBD tests.
+
+. common/rc
+. common/multipath-over-rdma
+
+_have_rnbd() {
+	if [[ "$USE_RXE" != 1 ]]; then
+		SKIP_REASONS+=("Only USE_RXE=1 is supported")
+		return 1
+	fi
+	_have_driver rdma_rxe || return
+	_have_driver rnbd_server || return
+	_have_driver rnbd_client
+}
+
+_setup_rnbd() {
+	modprobe -q rnbd_server
+	modprobe -q rnbd_client
+	start_soft_rdma
+	for i in $(rdma_network_interfaces)
+	do
+		ipv4_addr=$(get_ipv4_addr "$i")
+		if [[ -n "${ipv4_addr}" ]]; then
+			def_traddr=${ipv4_addr}
+		fi
+	done
+}
+
+_start_rnbd_client() {
+	local a b
+	local blkdev=$1
+	local before after
+
+	before=$(ls -d /sys/block/rnbd* 2>/dev/null)
+	if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then
+		return 1
+	fi
+
+	# Retrieve the newly added rnbd entry
+	after=$(ls -d /sys/block/rnbd* 2>/dev/null)
+	for a in $after
+	do
+		[[ -n "$before" ]] || break
+
+		for b in $before
+		do
+			[[ "$a" = "$b" ]] || break
+		done
+	done
+
+	rnbd_node=$a
+}
+
+_stop_rnbd_client() {
+	echo "normal" > "$rnbd_node"/rnbd/unmap_device
+}