Message ID | 20220719025216.1395353-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests,v2] common/cpuhotplug: allow _offline_cpu() call in sub-shell | expand |
Hi Shin'ichiro This change introduced one new issue that the offline operation will be skipped when there are more than one test devs. # cat config TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1) # ./check block/008 block/008 => nvme0n1 (do IO while hotplugging CPUs) [passed] read iops 336729 ... 343317 runtime 12.791s ... 12.498s block/008 => nvme1n1 (do IO while hotplugging CPUs) [failed] read iops ... runtime 0.225s ... 0.225s --- tests/block/008.out 2022-07-26 01:26:32.733912082 -0400 +++ /mnt/tests/kernel/storage/SSD/nvme_blktest/blktests/results/nvme1n1/block/008.out.bad 2022-07-26 04:29:01.940374953 -0400 @@ -1,2 +1,3 @@ Running block/008 +Failed to offline CPU: offlining cpu but RESTORE_CPUS_ONLINE is not set Test complete block/008 => nvme2n1 (do IO while hotplugging CPUs) [failed] read iops ... runtime 0.231s ... 0.231s --- tests/block/008.out 2022-07-26 01:26:32.733912082 -0400 +++ /mnt/tests/kernel/storage/SSD/nvme_blktest/blktests/results/nvme2n1/block/008.out.bad 2022-07-26 04:29:02.210381850 -0400 @@ -1,2 +1,3 @@ Running block/008 +Failed to offline CPU: offlining cpu but RESTORE_CPUS_ONLINE is not set Test complete On Tue, Jul 19, 2022 at 11:04 AM Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote: > > The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE. > However, the commit bd6b882b2650 ("block/008: check CPU offline failure > due to many IRQs") put _offline_cpu() call in sub-shell, then the set > value to RESTORE_CPUS_ONLINE no longer affects function caller's > environment. This resulted in off-lined CPUs not restored by _cleanup() > when the test case block/008 calls only _offline_cpu() and does not call > _online_cpu(). > > To fix the issue, set RESTORE_CPUS_ONLINE in _have_cpu_hotplug() in > place of _offline_cpu(). _have_cpu_hotplug() is less likely to be called > in sub-shell. In same manner, do not set RESTORE_CPUS_ONLINE in > _online_cpu() either. Check that RESTORE_CPUS_ONLINE is set in > _offline_cpu() to avoid unexpected CPUs left off-lined. This check also > avoids a shellcheck warning. > > Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs") > Reported-by: Yi Zhang <yi.zhang@redhat.com> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > Changes from v1: > * Change fix approach: fix _offline_cpu() instaed of block/008 test script. > > common/cpuhotplug | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/common/cpuhotplug b/common/cpuhotplug > index 7facd0d..e91c3bc 100644 > --- a/common/cpuhotplug > +++ b/common/cpuhotplug > @@ -27,17 +27,20 @@ _have_cpu_hotplug() { > SKIP_REASONS+=("CPU hotplugging is not supported") > return 1 > fi > + > + RESTORE_CPUS_ONLINE=1 > return 0 > } > > _online_cpu() { > - # shellcheck disable=SC2034 > - RESTORE_CPUS_ONLINE=1 > echo 1 > "/sys/devices/system/cpu/cpu$1/online" > } > > _offline_cpu() { > - # shellcheck disable=SC2034 > - RESTORE_CPUS_ONLINE=1 > + if [[ -z ${RESTORE_CPUS_ONLINE} ]]; then > + echo "offlining cpu but RESTORE_CPUS_ONLINE is not set" > + return 1 > + fi > + > echo 0 > "/sys/devices/system/cpu/cpu$1/online" > } > -- > 2.36.1 >
On Jul 26, 2022 / 16:33, Yi Zhang wrote: > Hi Shin'ichiro > > This change introduced one new issue that the offline operation will > be skipped when there are more than one test devs. Yi, thanks for finding this. Then this v2 fix is not good. Tomorrow, I'll do some more confirmation and will revert v2 and apply v1 instead. v1 fix is a bit dirty, but should work well.
On Jul 26, 2022 / 10:45, Shinichiro Kawasaki wrote: > On Jul 26, 2022 / 16:33, Yi Zhang wrote: > > Hi Shin'ichiro > > > > This change introduced one new issue that the offline operation will > > be skipped when there are more than one test devs. > > Yi, thanks for finding this. Then this v2 fix is not good. Tomorrow, I'll do > some more confirmation and will revert v2 and apply v1 instead. v1 fix is a bit > dirty, but should work well. Yi, I have reverted v2 and merged v1 to the blktests upstream. In case you still see issues, please let me know.
diff --git a/common/cpuhotplug b/common/cpuhotplug index 7facd0d..e91c3bc 100644 --- a/common/cpuhotplug +++ b/common/cpuhotplug @@ -27,17 +27,20 @@ _have_cpu_hotplug() { SKIP_REASONS+=("CPU hotplugging is not supported") return 1 fi + + RESTORE_CPUS_ONLINE=1 return 0 } _online_cpu() { - # shellcheck disable=SC2034 - RESTORE_CPUS_ONLINE=1 echo 1 > "/sys/devices/system/cpu/cpu$1/online" } _offline_cpu() { - # shellcheck disable=SC2034 - RESTORE_CPUS_ONLINE=1 + if [[ -z ${RESTORE_CPUS_ONLINE} ]]; then + echo "offlining cpu but RESTORE_CPUS_ONLINE is not set" + return 1 + fi + echo 0 > "/sys/devices/system/cpu/cpu$1/online" }
The helper function _offline_cpu() sets a value to RESTORE_CPUS_ONLINE. However, the commit bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs") put _offline_cpu() call in sub-shell, then the set value to RESTORE_CPUS_ONLINE no longer affects function caller's environment. This resulted in off-lined CPUs not restored by _cleanup() when the test case block/008 calls only _offline_cpu() and does not call _online_cpu(). To fix the issue, set RESTORE_CPUS_ONLINE in _have_cpu_hotplug() in place of _offline_cpu(). _have_cpu_hotplug() is less likely to be called in sub-shell. In same manner, do not set RESTORE_CPUS_ONLINE in _online_cpu() either. Check that RESTORE_CPUS_ONLINE is set in _offline_cpu() to avoid unexpected CPUs left off-lined. This check also avoids a shellcheck warning. Fixes: bd6b882b2650 ("block/008: check CPU offline failure due to many IRQs") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- Changes from v1: * Change fix approach: fix _offline_cpu() instaed of block/008 test script. common/cpuhotplug | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)