diff mbox series

[blktests] block/008: check CPU offline failure due to many IRQs

Message ID 20220128094512.24508-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests] block/008: check CPU offline failure due to many IRQs | expand

Commit Message

Shinichiro Kawasaki Jan. 28, 2022, 9:45 a.m. UTC
When systems have more IRQs than a single CPU can handle, the test case
block/008 fails with kernel message such as,

   "CPU 31 has 111 vectors, 90 available. Cannot disable CPU"

The failure cause is that the test case offlined too many CPUs and the
left online CPU can not hold all of the required IRQ vectors. To avoid
this failure, check error message of CPU offline. If CPU offline failure
cause is IRQ vector resource shortage, do not handle it as a failure.
Also keep the actual number of CPUs which can be offlined without the
failure and use this number for the test.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 tests/block/008 | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

Comments

Shinichiro Kawasaki Feb. 14, 2022, 2:56 a.m. UTC | #1
On Jan 28, 2022 / 18:45, Shin'ichiro Kawasaki wrote:
> When systems have more IRQs than a single CPU can handle, the test case
> block/008 fails with kernel message such as,
> 
>    "CPU 31 has 111 vectors, 90 available. Cannot disable CPU"
> 
> The failure cause is that the test case offlined too many CPUs and the
> left online CPU can not hold all of the required IRQ vectors. To avoid
> this failure, check error message of CPU offline. If CPU offline failure
> cause is IRQ vector resource shortage, do not handle it as a failure.
> Also keep the actual number of CPUs which can be offlined without the
> failure and use this number for the test.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

This is a gentle reminder. Reviews by blktests experts will be appreciated.
Thanks in advance.
Johannes Thumshirn Feb. 15, 2022, 5:04 p.m. UTC | #2
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Omar Sandoval Feb. 17, 2022, 11:48 p.m. UTC | #3
On Fri, Jan 28, 2022 at 06:45:12PM +0900, Shin'ichiro Kawasaki wrote:
> When systems have more IRQs than a single CPU can handle, the test case
> block/008 fails with kernel message such as,
> 
>    "CPU 31 has 111 vectors, 90 available. Cannot disable CPU"
> 
> The failure cause is that the test case offlined too many CPUs and the
> left online CPU can not hold all of the required IRQ vectors. To avoid
> this failure, check error message of CPU offline. If CPU offline failure
> cause is IRQ vector resource shortage, do not handle it as a failure.
> Also keep the actual number of CPUs which can be offlined without the
> failure and use this number for the test.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Thanks, applied.
diff mbox series

Patch

diff --git a/tests/block/008 b/tests/block/008
index 7445f8f..75aae65 100755
--- a/tests/block/008
+++ b/tests/block/008
@@ -60,17 +60,30 @@  test_device() {
 
 		if (( offlining )); then
 			idx=$((RANDOM % ${#online_cpus[@]}))
-			_offline_cpu "${online_cpus[$idx]}"
-			offline_cpus+=("${online_cpus[$idx]}")
-			unset "online_cpus[$idx]"
-			online_cpus=("${online_cpus[@]}")
-		else
+			if err=$(_offline_cpu "${online_cpus[$idx]}" 2>&1); then
+				offline_cpus+=("${online_cpus[$idx]}")
+				unset "online_cpus[$idx]"
+				online_cpus=("${online_cpus[@]}")
+			elif [[ $err =~ "No space left on device" ]]; then
+				# ENOSPC means CPU offline failure due to IRQ
+				# vector shortage. Keep current number of
+				# offline CPUs as maximum CPUs to offline.
+				max_offline=${#offline_cpus[@]}
+				offlining=0
+			else
+				echo "Failed to offline CPU: $err"
+				break
+			fi
+		fi
+
+		if (( !offlining )); then
 			idx=$((RANDOM % ${#offline_cpus[@]}))
 			_online_cpu "${offline_cpus[$idx]}"
 			online_cpus+=("${offline_cpus[$idx]}")
 			unset "offline_cpus[$idx]"
 			offline_cpus=("${offline_cpus[@]}")
 		fi
+
 		end_time=$(date +%s)
 		if (( end_time - start_time > timeout + 15 )); then
 			echo "fio did not finish after $timeout seconds!"