diff mbox series

[blktests] block/008: fix cpu online restore

Message ID 20220703180956.2922025-1-yi.zhang@redhat.com (mailing list archive)
State New, archived
Headers show
Series [blktests] block/008: fix cpu online restore | expand

Commit Message

Yi Zhang July 3, 2022, 6:09 p.m. UTC
The offline cpus cannot be restored during _cleanup when only _offline_cpu
executed, fix it by reset RESTORE_CPUS_ONLINE=1 during test.

Fixes: bd6b882 ("block/008: check CPU offline failure due to many IRQs")
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
---
 tests/block/008 | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Shinichiro Kawasaki July 4, 2022, 5:20 a.m. UTC | #1
On Jul 04, 2022 / 02:09, Yi Zhang wrote:
> The offline cpus cannot be restored during _cleanup when only _offline_cpu
> executed, fix it by reset RESTORE_CPUS_ONLINE=1 during test.
> 
> Fixes: bd6b882 ("block/008: check CPU offline failure due to many IRQs")
> Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

Hello Yi, thank you for catching this. The commit bd6b882 put the _offline_cpu
call into a sub-shell, then RESTORE_CPUS_ONLINE reset in the function no longer
affects _cleanup function. When I made the commit, I overlooked that point.

Your change should fix the issue but it makes the RESTORE_CPUS_ONLINE=1 in
_offline_cpu meaningless. Instead, I suggest following patch. Could you confirm
it fixes the issue in your environment?

diff --git a/tests/block/008 b/tests/block/008
index 75aae65..cd09352 100755
--- a/tests/block/008
+++ b/tests/block/008
@@ -34,6 +34,7 @@ test_device() {
 	local offline_cpus=()
 	local offlining=1
 	local max_offline=${#HOTPLUGGABLE_CPUS[@]}
+	local o=$TMPDIR/offline_cpu_out
 	if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then
 		(( max_offline-- ))
 	fi
@@ -60,18 +61,18 @@ test_device() {
 
 		if (( offlining )); then
 			idx=$((RANDOM % ${#online_cpus[@]}))
-			if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
+			if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then
 				offline_cpus+=("${online_cpus[$idx]}")
 				unset "online_cpus[$idx]"
 				online_cpus=("${online_cpus[@]}")
-			elif [[ $err =~ "No space left on device" ]]; then
+			elif [[ $(<"$o") =~ "No space left on device" ]]; then
 				# ENOSPC means CPU offline failure due to IRQ
 				# vector shortage. Keep current number of
 				# offline CPUs as maximum CPUs to offline.
 				max_offline=${#offline_cpus[@]}
 				offlining=0
 			else
-				echo "Failed to offline CPU: $err"
+				echo "Failed to offline CPU: $(<"$o")"
 				break
 			fi
 		fi
Yi Zhang July 4, 2022, 6:47 a.m. UTC | #2
On Mon, Jul 4, 2022 at 1:20 PM Shinichiro Kawasaki
<shinichiro.kawasaki@wdc.com> wrote:
>
> On Jul 04, 2022 / 02:09, Yi Zhang wrote:
> > The offline cpus cannot be restored during _cleanup when only _offline_cpu
> > executed, fix it by reset RESTORE_CPUS_ONLINE=1 during test.
> >
> > Fixes: bd6b882 ("block/008: check CPU offline failure due to many IRQs")
> > Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
>
> Hello Yi, thank you for catching this. The commit bd6b882 put the _offline_cpu
> call into a sub-shell, then RESTORE_CPUS_ONLINE reset in the function no longer
> affects _cleanup function. When I made the commit, I overlooked that point.
>
> Your change should fix the issue but it makes the RESTORE_CPUS_ONLINE=1 in
> _offline_cpu meaningless. Instead, I suggest following patch. Could you confirm
> it fixes the issue in your environment?

Yeah, your change works well, feel free to add
Tested-by: Yi Zhang <yi.zhang@redhat.com>


>
> diff --git a/tests/block/008 b/tests/block/008
> index 75aae65..cd09352 100755
> --- a/tests/block/008
> +++ b/tests/block/008
> @@ -34,6 +34,7 @@ test_device() {
>         local offline_cpus=()
>         local offlining=1
>         local max_offline=${#HOTPLUGGABLE_CPUS[@]}
> +       local o=$TMPDIR/offline_cpu_out
>         if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then
>                 (( max_offline-- ))
>         fi
> @@ -60,18 +61,18 @@ test_device() {
>
>                 if (( offlining )); then
>                         idx=$((RANDOM % ${#online_cpus[@]}))
> -                       if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
> +                       if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then
>                                 offline_cpus+=("${online_cpus[$idx]}")
>                                 unset "online_cpus[$idx]"
>                                 online_cpus=("${online_cpus[@]}")
> -                       elif [[ $err =~ "No space left on device" ]]; then
> +                       elif [[ $(<"$o") =~ "No space left on device" ]]; then
>                                 # ENOSPC means CPU offline failure due to IRQ
>                                 # vector shortage. Keep current number of
>                                 # offline CPUs as maximum CPUs to offline.
>                                 max_offline=${#offline_cpus[@]}
>                                 offlining=0
>                         else
> -                               echo "Failed to offline CPU: $err"
> +                               echo "Failed to offline CPU: $(<"$o")"
>                                 break
>                         fi
>                 fi
>
>
> --
> Shin'ichiro Kawasaki
>


--
Best Regards,
  Yi Zhang
Shinichiro Kawasaki July 4, 2022, 11:46 a.m. UTC | #3
On Jul 04, 2022 / 14:47, Yi Zhang wrote:
> On Mon, Jul 4, 2022 at 1:20 PM Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com> wrote:
> >
> > On Jul 04, 2022 / 02:09, Yi Zhang wrote:
> > > The offline cpus cannot be restored during _cleanup when only _offline_cpu
> > > executed, fix it by reset RESTORE_CPUS_ONLINE=1 during test.
> > >
> > > Fixes: bd6b882 ("block/008: check CPU offline failure due to many IRQs")
> > > Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
> >
> > Hello Yi, thank you for catching this. The commit bd6b882 put the _offline_cpu
> > call into a sub-shell, then RESTORE_CPUS_ONLINE reset in the function no longer
> > affects _cleanup function. When I made the commit, I overlooked that point.
> >
> > Your change should fix the issue but it makes the RESTORE_CPUS_ONLINE=1 in
> > _offline_cpu meaningless. Instead, I suggest following patch. Could you confirm
> > it fixes the issue in your environment?
> 
> Yeah, your change works well, feel free to add
> Tested-by: Yi Zhang <yi.zhang@redhat.com>

Thanks! I've posted the change with your tag as a formal patch.
diff mbox series

Patch

diff --git a/tests/block/008 b/tests/block/008
index 75aae65..f61f33a 100755
--- a/tests/block/008
+++ b/tests/block/008
@@ -91,6 +91,11 @@  test_device() {
 		fi
 	done
 
+	# The RESTORE_CPUS_ONLINE setting was ignored with previous
+	# err=$(_offline_cpu "${online_cpus[$idx]}" when only _offline_cpu
+	# executed, reset it here then the offline cpus can be retored
+	# shellcheck disable=SC2034
+	RESTORE_CPUS_ONLINE=1
 	FIO_PERF_FIELDS=("read iops")
 	_fio_perf_report