Message ID | 20220704111638.1109883-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests] block/008: avoid _offline_cpu() call in sub-shell | expand |
On 7/4/22 04:16, Shin'ichiro Kawasaki wrote: > The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE. > However, the commit bd6b882b2650 ("block/008: check CPU offline failure > due to many IRQs") put _offline_cpu() call in sub-shell, then the set > value to RESTORE_CPUS_ONLINE no longer affects function caller's > environment. This resulted in offlined CPUs not restored by _cleanup() > when the test case block/008 calls only _offline_cpu() and does not call > _online_cpu(). > > To fix the issue, avoid _offline_cpu() call in sub-shell. Use file > redirect to get output of _offline_cpu() instead of sub-shell execution. > > Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs") > Reported-by: Yi Zhang <yi.zhang@redhat.com> > Tested-by: Yi Zhang <yi.zhang@redhat.com> > Link: https://lore.kernel.org/linux-block/20220703180956.2922025-1-yi.zhang@redhat.com/ > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > tests/block/008 | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tests/block/008 b/tests/block/008 > index 75aae65..cd09352 100755 > --- a/tests/block/008 > +++ b/tests/block/008 > @@ -34,6 +34,7 @@ test_device() { > local offline_cpus=() > local offlining=1 > local max_offline=${#HOTPLUGGABLE_CPUS[@]} > + local o=$TMPDIR/offline_cpu_out > if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then > (( max_offline-- )) > fi > @@ -60,18 +61,18 @@ test_device() { > > if (( offlining )); then > idx=$((RANDOM % ${#online_cpus[@]})) > - if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then > + if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then > offline_cpus+=("${online_cpus[$idx]}") > unset "online_cpus[$idx]" > online_cpus=("${online_cpus[@]}") > - elif [[ $err =~ "No space left on device" ]]; then > + elif [[ $(<"$o") =~ "No space left on device" ]]; then > # ENOSPC means CPU offline failure due to IRQ > # vector shortage. Keep current number of > # offline CPUs as maximum CPUs to offline. > max_offline=${#offline_cpus[@]} > offlining=0 > else > - echo "Failed to offline CPU: $err" > + echo "Failed to offline CPU: $(<"$o")" > break > fi > fi Has it been considered to move RESTORE_CPUS_ONLINE=1 from _online_cpu() / _offline_cpu() into the callers of these functions instead of making the above change? Thanks, Bart.
On Jul 06, 2022 / 06:55, Bart Van Assche wrote: > On 7/4/22 04:16, Shin'ichiro Kawasaki wrote: > > The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE. > > However, the commit bd6b882b2650 ("block/008: check CPU offline failure > > due to many IRQs") put _offline_cpu() call in sub-shell, then the set > > value to RESTORE_CPUS_ONLINE no longer affects function caller's > > environment. This resulted in offlined CPUs not restored by _cleanup() > > when the test case block/008 calls only _offline_cpu() and does not call > > _online_cpu(). > > > > To fix the issue, avoid _offline_cpu() call in sub-shell. Use file > > redirect to get output of _offline_cpu() instead of sub-shell execution. > > > > Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs") > > Reported-by: Yi Zhang <yi.zhang@redhat.com> > > Tested-by: Yi Zhang <yi.zhang@redhat.com> > > Link: https://lore.kernel.org/linux-block/20220703180956.2922025-1-yi.zhang@redhat.com/ > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > --- > > tests/block/008 | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/tests/block/008 b/tests/block/008 > > index 75aae65..cd09352 100755 > > --- a/tests/block/008 > > +++ b/tests/block/008 > > @@ -34,6 +34,7 @@ test_device() { > > local offline_cpus=() > > local offlining=1 > > local max_offline=${#HOTPLUGGABLE_CPUS[@]} > > + local o=$TMPDIR/offline_cpu_out > > if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then > > (( max_offline-- )) > > fi > > @@ -60,18 +61,18 @@ test_device() { > > if (( offlining )); then > > idx=$((RANDOM % ${#online_cpus[@]})) > > - if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then > > + if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then > > offline_cpus+=("${online_cpus[$idx]}") > > unset "online_cpus[$idx]" > > online_cpus=("${online_cpus[@]}") > > - elif [[ $err =~ "No space left on device" ]]; then > > + elif [[ $(<"$o") =~ "No space left on device" ]]; then > > # ENOSPC means CPU offline failure due to IRQ > > # vector shortage. Keep current number of > > # offline CPUs as maximum CPUs to offline. > > max_offline=${#offline_cpus[@]} > > offlining=0 > > else > > - echo "Failed to offline CPU: $err" > > + echo "Failed to offline CPU: $(<"$o")" > > break > > fi > > fi > > Has it been considered to move RESTORE_CPUS_ONLINE=1 from _online_cpu() / > _offline_cpu() into the callers of these functions instead of making the > above change? Yes, I thought about it. Looking in the codes in "common/cpuhotplug" and _cleanup in "check", I guess it was intended to hide RESTORE_CPUS_ONLINE from test cases. So, I chose the fix above, but I agree the file redirection does not look the best. And it still leaves the limitation that the _offline_cpu() can not be called in sub-shell. As another fix candidate, how about the fix below? It moves the RESTORE_CPUS_ONLINE=1 from _offline_cpu() to _have_cpu_hotplug(), which is more unlikely to be called in sub-shell. It still hides RESTORE_CPUS_ONLINE and should fix the issue. diff --git a/common/cpuhotplug b/common/cpuhotplug index d3aabfa..f57a5aa 100644 --- a/common/cpuhotplug +++ b/common/cpuhotplug @@ -27,17 +27,21 @@ _have_cpu_hotplug() { SKIP_REASON="CPU hotplugging is not supported" return 1 fi + + # shellcheck disable=SC2034 + RESTORE_CPUS_ONLINE=1 return 0 } _online_cpu() { - # shellcheck disable=SC2034 - RESTORE_CPUS_ONLINE=1 echo 1 > "/sys/devices/system/cpu/cpu$1/online" } _offline_cpu() { - # shellcheck disable=SC2034 - RESTORE_CPUS_ONLINE=1 + if [[ -z ${RESTORE_CPUS_ONLINE} ]]; then + echo "offlining cpu but RESTORE_CPUS_ONLINE is not set" + return 1 + fi + echo 0 > "/sys/devices/system/cpu/cpu$1/online" }
diff --git a/tests/block/008 b/tests/block/008 index 75aae65..cd09352 100755 --- a/tests/block/008 +++ b/tests/block/008 @@ -34,6 +34,7 @@ test_device() { local offline_cpus=() local offlining=1 local max_offline=${#HOTPLUGGABLE_CPUS[@]} + local o=$TMPDIR/offline_cpu_out if [[ ${#HOTPLUGGABLE_CPUS[@]} -eq ${#ALL_CPUS[@]} ]]; then (( max_offline-- )) fi @@ -60,18 +61,18 @@ test_device() { if (( offlining )); then idx=$((RANDOM % ${#online_cpus[@]})) - if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then + if _offline_cpu "${online_cpus[$idx]}" > "$o" 2>&1; then offline_cpus+=("${online_cpus[$idx]}") unset "online_cpus[$idx]" online_cpus=("${online_cpus[@]}") - elif [[ $err =~ "No space left on device" ]]; then + elif [[ $(<"$o") =~ "No space left on device" ]]; then # ENOSPC means CPU offline failure due to IRQ # vector shortage. Keep current number of # offline CPUs as maximum CPUs to offline. max_offline=${#offline_cpus[@]} offlining=0 else - echo "Failed to offline CPU: $err" + echo "Failed to offline CPU: $(<"$o")" break fi fi