diff mbox series

[blktests] block/008: avoid _offline_cpu() call in sub-shell

Message ID 20220704111638.1109883-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests] block/008: avoid _offline_cpu() call in sub-shell | expand

Commit Message

Shin'ichiro Kawasaki July 4, 2022, 11:16 a.m. UTC
The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE.
However, the commit bd6b882b2650 ("block/008: check CPU offline failure
due to many IRQs") put _offline_cpu() call in sub-shell, then the set
value to RESTORE_CPUS_ONLINE no longer affects function caller's
environment. This resulted in offlined CPUs not restored by _cleanup()
when the test case block/008 calls only _offline_cpu() and does not call
_online_cpu().

To fix the issue, avoid _offline_cpu() call in sub-shell. Use file
redirect to get output of _offline_cpu() instead of sub-shell execution.

Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/20220703180956.2922025-1-yi.zhang@redhat.com/
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 tests/block/008 | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Bart Van Assche July 6, 2022, 1:55 p.m. UTC | #1
On 7/4/22 04:16, Shin'ichiro Kawasaki wrote:
> The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE.
> However, the commit bd6b882b2650 ("block/008: check CPU offline failure
> due to many IRQs") put _offline_cpu() call in sub-shell, then the set
> value to RESTORE_CPUS_ONLINE no longer affects function caller's
> environment. This resulted in offlined CPUs not restored by _cleanup()
> when the test case block/008 calls only _offline_cpu() and does not call
> _online_cpu().
> 
> To fix the issue, avoid _offline_cpu() call in sub-shell. Use file
> redirect to get output of _offline_cpu() instead of sub-shell execution.
> 
> Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs")
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Tested-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/20220703180956.2922025-1-yi.zhang@redhat.com/
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>   tests/block/008 | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/block/008 b/tests/block/008
> index 75aae65..cd09352 100755
> --- a/tests/block/008
> +++ b/tests/block/008
> @@ -34,6 +34,7 @@ test_device() {
>   	local offline_cpus=()
>   	local offlining=1
>   	local max_offline=${#HOTPLUGGABLE_CPUS[@]}
> +	local o=$TMPDIR/offline_cpu_out
>   	if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then
>   		(( max_offline-- ))
>   	fi
> @@ -60,18 +61,18 @@ test_device() {
>   
>   		if (( offlining )); then
>   			idx=$((RANDOM % ${#online_cpus[@]}))
> -			if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
> +			if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then
>   				offline_cpus+=("${online_cpus[$idx]}")
>   				unset "online_cpus[$idx]"
>   				online_cpus=("${online_cpus[@]}")
> -			elif [[ $err =~ "No space left on device" ]]; then
> +			elif [[ $(<"$o") =~ "No space left on device" ]]; then
>   				# ENOSPC means CPU offline failure due to IRQ
>   				# vector shortage. Keep current number of
>   				# offline CPUs as maximum CPUs to offline.
>   				max_offline=${#offline_cpus[@]}
>   				offlining=0
>   			else
> -				echo "Failed to offline CPU: $err"
> +				echo "Failed to offline CPU: $(<"$o")"
>   				break
>   			fi
>   		fi

Has it been considered to move RESTORE_CPUS_ONLINE=1 from _online_cpu() 
/ _offline_cpu() into the callers of these functions instead of making 
the above change?

Thanks,

Bart.
Shin'ichiro Kawasaki July 7, 2022, 3:13 a.m. UTC | #2
On Jul 06, 2022 / 06:55, Bart Van Assche wrote:
> On 7/4/22 04:16, Shin'ichiro Kawasaki wrote:
> > The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE.
> > However, the commit bd6b882b2650 ("block/008: check CPU offline failure
> > due to many IRQs") put _offline_cpu() call in sub-shell, then the set
> > value to RESTORE_CPUS_ONLINE no longer affects function caller's
> > environment. This resulted in offlined CPUs not restored by _cleanup()
> > when the test case block/008 calls only _offline_cpu() and does not call
> > _online_cpu().
> > 
> > To fix the issue, avoid _offline_cpu() call in sub-shell. Use file
> > redirect to get output of _offline_cpu() instead of sub-shell execution.
> > 
> > Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs")
> > Reported-by: Yi Zhang <yi.zhang@redhat.com>
> > Tested-by: Yi Zhang <yi.zhang@redhat.com>
> > Link: https://lore.kernel.org/linux-block/20220703180956.2922025-1-yi.zhang@redhat.com/
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >   tests/block/008 | 7 ++++---
> >   1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/block/008 b/tests/block/008
> > index 75aae65..cd09352 100755
> > --- a/tests/block/008
> > +++ b/tests/block/008
> > @@ -34,6 +34,7 @@ test_device() {
> >   	local offline_cpus=()
> >   	local offlining=1
> >   	local max_offline=${#HOTPLUGGABLE_CPUS[@]}
> > +	local o=$TMPDIR/offline_cpu_out
> >   	if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then
> >   		(( max_offline-- ))
> >   	fi
> > @@ -60,18 +61,18 @@ test_device() {
> >   		if (( offlining )); then
> >   			idx=$((RANDOM % ${#online_cpus[@]}))
> > -			if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
> > +			if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then
> >   				offline_cpus+=("${online_cpus[$idx]}")
> >   				unset "online_cpus[$idx]"
> >   				online_cpus=("${online_cpus[@]}")
> > -			elif [[ $err =~ "No space left on device" ]]; then
> > +			elif [[ $(<"$o") =~ "No space left on device" ]]; then
> >   				# ENOSPC means CPU offline failure due to IRQ
> >   				# vector shortage. Keep current number of
> >   				# offline CPUs as maximum CPUs to offline.
> >   				max_offline=${#offline_cpus[@]}
> >   				offlining=0
> >   			else
> > -				echo "Failed to offline CPU: $err"
> > +				echo "Failed to offline CPU: $(<"$o")"
> >   				break
> >   			fi
> >   		fi
> 
> Has it been considered to move RESTORE_CPUS_ONLINE=1 from _online_cpu() /
> _offline_cpu() into the callers of these functions instead of making the
> above change?

Yes, I thought about it. Looking in the codes in "common/cpuhotplug" and
_cleanup in "check", I guess it was intended to hide RESTORE_CPUS_ONLINE from
test cases. So, I chose the fix above, but I agree the file redirection does not
look the best. And it still leaves the limitation that the _offline_cpu() can
not be called in sub-shell.

As another fix candidate, how about the fix below? It moves the
RESTORE_CPUS_ONLINE=1 from _offline_cpu() to _have_cpu_hotplug(), which is more
unlikely to be called in sub-shell. It still hides RESTORE_CPUS_ONLINE and
should fix the issue.

diff --git a/common/cpuhotplug b/common/cpuhotplug
index d3aabfa..f57a5aa 100644
--- a/common/cpuhotplug
+++ b/common/cpuhotplug
@@ -27,17 +27,21 @@ _have_cpu_hotplug() {
 		SKIP_REASON="CPU hotplugging is not supported"
 		return 1
 	fi
+
+	# shellcheck disable=SC2034
+	RESTORE_CPUS_ONLINE=1
 	return 0
 }
 
 _online_cpu() {
-	# shellcheck disable=SC2034
-	RESTORE_CPUS_ONLINE=1
 	echo 1 > "/sys/devices/system/cpu/cpu$1/online"
 }
 
 _offline_cpu() {
-	# shellcheck disable=SC2034
-	RESTORE_CPUS_ONLINE=1
+	if [[ -z ${RESTORE_CPUS_ONLINE} ]]; then
+		echo "offlining cpu but RESTORE_CPUS_ONLINE is not set"
+		return 1
+	fi
+
 	echo 0 > "/sys/devices/system/cpu/cpu$1/online"
 }
diff mbox series

Patch

diff --git a/tests/block/008 b/tests/block/008
index 75aae65..cd09352 100755
--- a/tests/block/008
+++ b/tests/block/008
@@ -34,6 +34,7 @@  test_device() {
 	local offline_cpus=()
 	local offlining=1
 	local max_offline=${#HOTPLUGGABLE_CPUS[@]}
+	local o=$TMPDIR/offline_cpu_out
 	if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then
 		(( max_offline-- ))
 	fi
@@ -60,18 +61,18 @@  test_device() {
 
 		if (( offlining )); then
 			idx=$((RANDOM % ${#online_cpus[@]}))
-			if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
+			if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then
 				offline_cpus+=("${online_cpus[$idx]}")
 				unset "online_cpus[$idx]"
 				online_cpus=("${online_cpus[@]}")
-			elif [[ $err =~ "No space left on device" ]]; then
+			elif [[ $(<"$o") =~ "No space left on device" ]]; then
 				# ENOSPC means CPU offline failure due to IRQ
 				# vector shortage. Keep current number of
 				# offline CPUs as maximum CPUs to offline.
 				max_offline=${#offline_cpus[@]}
 				offlining=0
 			else
-				echo "Failed to offline CPU: $err"
+				echo "Failed to offline CPU: $(<"$o")"
 				break
 			fi
 		fi