Message ID | 20250211152812.54018-4-vignesh.raman@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ci: enable lockdep detection | expand |
Hi Vignesh, thanks for your patch. Em ter., 11 de fev. de 2025 às 12:29, Vignesh Raman <vignesh.raman@collabora.com> escreveu: > > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci. > This will output warnings when kernel locking errors are encountered > and will continue executing tests. To detect if lockdep has been > triggered, check the debug_locks value in /proc/lockdep_stats after > the tests have run. When debug_locks is 0, it indicates that lockdep > has detected issues and turned itself off. Check this value, and if > lockdep is detected, exit with an error and configure it as a warning > in GitLab CI. > > GitLab CI ignores exit codes other than 1 by default. Pass the correct > exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or > exit on failure. > > Also update the documentation. > > Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> > --- > > v2: > - Lockdep failures are reported as pipeline warnings, > and the documentation is updated. > > --- > Documentation/gpu/automated_testing.rst | 4 ++++ > drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++ > drivers/gpu/drm/ci/test.yml | 19 ++++++++++++++++--- > 3 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst > index 6d7c6086034d..62aa3ede02a5 100644 > --- a/Documentation/gpu/automated_testing.rst > +++ b/Documentation/gpu/automated_testing.rst > @@ -115,6 +115,10 @@ created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines) > 5. The various jobs will be run and when the pipeline is finished, all jobs > should be green unless a regression has been found. > > +6. Warnings in the pipeline indicate that lockdep > +(see Documentation/locking/lockdep-design.rst) issues have been detected > +during the tests. > + > > How to update test expectations > =============================== > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh > index 68b042e43b7f..2a0599f12c58 100755 > --- a/drivers/gpu/drm/ci/igt_runner.sh > +++ b/drivers/gpu/drm/ci/igt_runner.sh > @@ -85,5 +85,16 @@ deqp-runner junit \ > --limit 50 \ > --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml" > > +# Check if /proc/lockdep_stats exists > +if [ -f /proc/lockdep_stats ]; then > + # If debug_locks is 0, it indicates lockdep is detected and it turns itself off. > + debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}') > + if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then > + echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information." > + cat /proc/lockdep_stats > + ret=101 > + fi > +fi > + > cd $oldpath > exit $ret > diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml > index 0eab020a33b9..3af735dbf6bd 100644 > --- a/drivers/gpu/drm/ci/test.yml > +++ b/drivers/gpu/drm/ci/test.yml > @@ -1,6 +1,8 @@ > .lava-test: > extends: > - .container+build-rules > + variables: > + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' > timeout: "1h30m" > rules: > - !reference [.scheduled_pipeline-rules, rules] > @@ -13,6 +15,9 @@ > - mv -n install/* artifacts/. > # Override it with our lava-submit.sh script > - ./artifacts/lava-submit.sh > + allow_failure: > + exit_codes: > + - 101 Maybe we could have this rule more generically instead of just in lava, so we can re-use it in other jobs as well and we don't need to repeat it. Regards, Helen > > .lava-igt:arm32: > extends: > @@ -88,9 +93,14 @@ > - igt:arm64 > tags: > - $RUNNER_TAG > + allow_failure: > + exit_codes: > + - 101 > > .software-driver: > stage: software-driver > + variables: > + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' > timeout: "1h30m" > rules: > - !reference [.scheduled_pipeline-rules, rules] > @@ -108,6 +118,9 @@ > - debian/x86_64_test-gl > - testing:x86_64 > - igt:x86_64 > + allow_failure: > + exit_codes: > + - 101 > > .msm-sc7180: > extends: > @@ -153,7 +166,7 @@ msm:apq8016: > BM_KERNEL_EXTRA_ARGS: clk_ignore_unused > RUNNER_TAG: google-freedreno-db410c > script: > - - ./install/bare-metal/fastboot.sh > + - ./install/bare-metal/fastboot.sh || exit $? > > msm:apq8096: > extends: > @@ -167,7 +180,7 @@ msm:apq8096: > GPU_VERSION: apq8096 > RUNNER_TAG: google-freedreno-db820c > script: > - - ./install/bare-metal/fastboot.sh > + - ./install/bare-metal/fastboot.sh || exit $? > > msm:sdm845: > extends: > @@ -181,7 +194,7 @@ msm:sdm845: > GPU_VERSION: sdm845 > RUNNER_TAG: google-freedreno-cheza > script: > - - ./install/bare-metal/cros-servo.sh > + - ./install/bare-metal/cros-servo.sh || exit $? > > msm:sm8350-hdk: > extends: > -- > 2.43.0 >
Hi Helen, On 13/02/25 18:13, Helen Mae Koike Fornazier wrote: > Hi Vignesh, > > thanks for your patch. > > Em ter., 11 de fev. de 2025 às 12:29, Vignesh Raman > <vignesh.raman@collabora.com> escreveu: >> >> We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci. >> This will output warnings when kernel locking errors are encountered >> and will continue executing tests. To detect if lockdep has been >> triggered, check the debug_locks value in /proc/lockdep_stats after >> the tests have run. When debug_locks is 0, it indicates that lockdep >> has detected issues and turned itself off. Check this value, and if >> lockdep is detected, exit with an error and configure it as a warning >> in GitLab CI. >> >> GitLab CI ignores exit codes other than 1 by default. Pass the correct >> exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or >> exit on failure. >> >> Also update the documentation. >> >> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> >> --- >> >> v2: >> - Lockdep failures are reported as pipeline warnings, >> and the documentation is updated. >> >> --- >> Documentation/gpu/automated_testing.rst | 4 ++++ >> drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++ >> drivers/gpu/drm/ci/test.yml | 19 ++++++++++++++++--- >> 3 files changed, 31 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst >> index 6d7c6086034d..62aa3ede02a5 100644 >> --- a/Documentation/gpu/automated_testing.rst >> +++ b/Documentation/gpu/automated_testing.rst >> @@ -115,6 +115,10 @@ created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines) >> 5. The various jobs will be run and when the pipeline is finished, all jobs >> should be green unless a regression has been found. >> >> +6. Warnings in the pipeline indicate that lockdep >> +(see Documentation/locking/lockdep-design.rst) issues have been detected >> +during the tests. >> + >> >> How to update test expectations >> =============================== >> diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh >> index 68b042e43b7f..2a0599f12c58 100755 >> --- a/drivers/gpu/drm/ci/igt_runner.sh >> +++ b/drivers/gpu/drm/ci/igt_runner.sh >> @@ -85,5 +85,16 @@ deqp-runner junit \ >> --limit 50 \ >> --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml" >> >> +# Check if /proc/lockdep_stats exists >> +if [ -f /proc/lockdep_stats ]; then >> + # If debug_locks is 0, it indicates lockdep is detected and it turns itself off. >> + debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}') >> + if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then >> + echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information." >> + cat /proc/lockdep_stats >> + ret=101 >> + fi >> +fi >> + >> cd $oldpath >> exit $ret >> diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml >> index 0eab020a33b9..3af735dbf6bd 100644 >> --- a/drivers/gpu/drm/ci/test.yml >> +++ b/drivers/gpu/drm/ci/test.yml >> @@ -1,6 +1,8 @@ >> .lava-test: >> extends: >> - .container+build-rules >> + variables: >> + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' >> timeout: "1h30m" >> rules: >> - !reference [.scheduled_pipeline-rules, rules] >> @@ -13,6 +15,9 @@ >> - mv -n install/* artifacts/. >> # Override it with our lava-submit.sh script >> - ./artifacts/lava-submit.sh >> + allow_failure: >> + exit_codes: >> + - 101 > > Maybe we could have this rule more generically instead of just in lava, > so we can re-use it in other jobs as well and we don't need to repeat it. Yes agreed. I will post a patch with this update. Regards, Vignesh > > > Regards, > Helen > >> >> .lava-igt:arm32: >> extends: >> @@ -88,9 +93,14 @@ >> - igt:arm64 >> tags: >> - $RUNNER_TAG >> + allow_failure: >> + exit_codes: >> + - 101 >> >> .software-driver: >> stage: software-driver >> + variables: >> + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' >> timeout: "1h30m" >> rules: >> - !reference [.scheduled_pipeline-rules, rules] >> @@ -108,6 +118,9 @@ >> - debian/x86_64_test-gl >> - testing:x86_64 >> - igt:x86_64 >> + allow_failure: >> + exit_codes: >> + - 101 >> >> .msm-sc7180: >> extends: >> @@ -153,7 +166,7 @@ msm:apq8016: >> BM_KERNEL_EXTRA_ARGS: clk_ignore_unused >> RUNNER_TAG: google-freedreno-db410c >> script: >> - - ./install/bare-metal/fastboot.sh >> + - ./install/bare-metal/fastboot.sh || exit $? >> >> msm:apq8096: >> extends: >> @@ -167,7 +180,7 @@ msm:apq8096: >> GPU_VERSION: apq8096 >> RUNNER_TAG: google-freedreno-db820c >> script: >> - - ./install/bare-metal/fastboot.sh >> + - ./install/bare-metal/fastboot.sh || exit $? >> >> msm:sdm845: >> extends: >> @@ -181,7 +194,7 @@ msm:sdm845: >> GPU_VERSION: sdm845 >> RUNNER_TAG: google-freedreno-cheza >> script: >> - - ./install/bare-metal/cros-servo.sh >> + - ./install/bare-metal/cros-servo.sh || exit $? >> >> msm:sm8350-hdk: >> extends: >> -- >> 2.43.0 >> > >
diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst index 6d7c6086034d..62aa3ede02a5 100644 --- a/Documentation/gpu/automated_testing.rst +++ b/Documentation/gpu/automated_testing.rst @@ -115,6 +115,10 @@ created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines) 5. The various jobs will be run and when the pipeline is finished, all jobs should be green unless a regression has been found. +6. Warnings in the pipeline indicate that lockdep +(see Documentation/locking/lockdep-design.rst) issues have been detected +during the tests. + How to update test expectations =============================== diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh index 68b042e43b7f..2a0599f12c58 100755 --- a/drivers/gpu/drm/ci/igt_runner.sh +++ b/drivers/gpu/drm/ci/igt_runner.sh @@ -85,5 +85,16 @@ deqp-runner junit \ --limit 50 \ --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml" +# Check if /proc/lockdep_stats exists +if [ -f /proc/lockdep_stats ]; then + # If debug_locks is 0, it indicates lockdep is detected and it turns itself off. + debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}') + if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then + echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information." + cat /proc/lockdep_stats + ret=101 + fi +fi + cd $oldpath exit $ret diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml index 0eab020a33b9..3af735dbf6bd 100644 --- a/drivers/gpu/drm/ci/test.yml +++ b/drivers/gpu/drm/ci/test.yml @@ -1,6 +1,8 @@ .lava-test: extends: - .container+build-rules + variables: + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' timeout: "1h30m" rules: - !reference [.scheduled_pipeline-rules, rules] @@ -13,6 +15,9 @@ - mv -n install/* artifacts/. # Override it with our lava-submit.sh script - ./artifacts/lava-submit.sh + allow_failure: + exit_codes: + - 101 .lava-igt:arm32: extends: @@ -88,9 +93,14 @@ - igt:arm64 tags: - $RUNNER_TAG + allow_failure: + exit_codes: + - 101 .software-driver: stage: software-driver + variables: + FF_USE_NEW_BASH_EVAL_STRATEGY: 'true' timeout: "1h30m" rules: - !reference [.scheduled_pipeline-rules, rules] @@ -108,6 +118,9 @@ - debian/x86_64_test-gl - testing:x86_64 - igt:x86_64 + allow_failure: + exit_codes: + - 101 .msm-sc7180: extends: @@ -153,7 +166,7 @@ msm:apq8016: BM_KERNEL_EXTRA_ARGS: clk_ignore_unused RUNNER_TAG: google-freedreno-db410c script: - - ./install/bare-metal/fastboot.sh + - ./install/bare-metal/fastboot.sh || exit $? msm:apq8096: extends: @@ -167,7 +180,7 @@ msm:apq8096: GPU_VERSION: apq8096 RUNNER_TAG: google-freedreno-db820c script: - - ./install/bare-metal/fastboot.sh + - ./install/bare-metal/fastboot.sh || exit $? msm:sdm845: extends: @@ -181,7 +194,7 @@ msm:sdm845: GPU_VERSION: sdm845 RUNNER_TAG: google-freedreno-cheza script: - - ./install/bare-metal/cros-servo.sh + - ./install/bare-metal/cros-servo.sh || exit $? msm:sm8350-hdk: extends:
We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci. This will output warnings when kernel locking errors are encountered and will continue executing tests. To detect if lockdep has been triggered, check the debug_locks value in /proc/lockdep_stats after the tests have run. When debug_locks is 0, it indicates that lockdep has detected issues and turned itself off. Check this value, and if lockdep is detected, exit with an error and configure it as a warning in GitLab CI. GitLab CI ignores exit codes other than 1 by default. Pass the correct exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or exit on failure. Also update the documentation. Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> --- v2: - Lockdep failures are reported as pipeline warnings, and the documentation is updated. --- Documentation/gpu/automated_testing.rst | 4 ++++ drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++ drivers/gpu/drm/ci/test.yml | 19 ++++++++++++++++--- 3 files changed, 31 insertions(+), 3 deletions(-)