diff mbox series

[v2,3/3] drm/ci: enable lockdep detection

Message ID 20250211152812.54018-4-vignesh.raman@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/ci: enable lockdep detection | expand

Commit Message

Vignesh Raman Feb. 11, 2025, 3:28 p.m. UTC
We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
This will output warnings when kernel locking errors are encountered
and will continue executing tests. To detect if lockdep has been
triggered, check the debug_locks value in /proc/lockdep_stats after
the tests have run. When debug_locks is 0, it indicates that lockdep
has detected issues and turned itself off. Check this value, and if
lockdep is detected, exit with an error and configure it as a warning
in GitLab CI.

GitLab CI ignores exit codes other than 1 by default. Pass the correct
exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or
exit on failure.

Also update the documentation.

Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
---

v2:
  - Lockdep failures are reported as pipeline warnings,
    and the documentation is updated.

---
 Documentation/gpu/automated_testing.rst |  4 ++++
 drivers/gpu/drm/ci/igt_runner.sh        | 11 +++++++++++
 drivers/gpu/drm/ci/test.yml             | 19 ++++++++++++++++---
 3 files changed, 31 insertions(+), 3 deletions(-)

Comments

Helen Mae Koike Fornazier Feb. 13, 2025, 12:43 p.m. UTC | #1
Hi Vignesh,

thanks for your patch.

Em ter., 11 de fev. de 2025 às 12:29, Vignesh Raman
<vignesh.raman@collabora.com> escreveu:
>
> We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
> This will output warnings when kernel locking errors are encountered
> and will continue executing tests. To detect if lockdep has been
> triggered, check the debug_locks value in /proc/lockdep_stats after
> the tests have run. When debug_locks is 0, it indicates that lockdep
> has detected issues and turned itself off. Check this value, and if
> lockdep is detected, exit with an error and configure it as a warning
> in GitLab CI.
>
> GitLab CI ignores exit codes other than 1 by default. Pass the correct
> exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or
> exit on failure.
>
> Also update the documentation.
>
> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
> ---
>
> v2:
>   - Lockdep failures are reported as pipeline warnings,
>     and the documentation is updated.
>
> ---
>  Documentation/gpu/automated_testing.rst |  4 ++++
>  drivers/gpu/drm/ci/igt_runner.sh        | 11 +++++++++++
>  drivers/gpu/drm/ci/test.yml             | 19 ++++++++++++++++---
>  3 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst
> index 6d7c6086034d..62aa3ede02a5 100644
> --- a/Documentation/gpu/automated_testing.rst
> +++ b/Documentation/gpu/automated_testing.rst
> @@ -115,6 +115,10 @@ created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines)
>  5. The various jobs will be run and when the pipeline is finished, all jobs
>  should be green unless a regression has been found.
>
> +6. Warnings in the pipeline indicate that lockdep
> +(see Documentation/locking/lockdep-design.rst) issues have been detected
> +during the tests.
> +
>
>  How to update test expectations
>  ===============================
> diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
> index 68b042e43b7f..2a0599f12c58 100755
> --- a/drivers/gpu/drm/ci/igt_runner.sh
> +++ b/drivers/gpu/drm/ci/igt_runner.sh
> @@ -85,5 +85,16 @@ deqp-runner junit \
>     --limit 50 \
>     --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml"
>
> +# Check if /proc/lockdep_stats exists
> +if [ -f /proc/lockdep_stats ]; then
> +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
> +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
> +    if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then
> +        echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information."
> +        cat /proc/lockdep_stats
> +        ret=101
> +    fi
> +fi
> +
>  cd $oldpath
>  exit $ret
> diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
> index 0eab020a33b9..3af735dbf6bd 100644
> --- a/drivers/gpu/drm/ci/test.yml
> +++ b/drivers/gpu/drm/ci/test.yml
> @@ -1,6 +1,8 @@
>  .lava-test:
>    extends:
>      - .container+build-rules
> +  variables:
> +    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
>    timeout: "1h30m"
>    rules:
>      - !reference [.scheduled_pipeline-rules, rules]
> @@ -13,6 +15,9 @@
>      - mv -n install/* artifacts/.
>      # Override it with our lava-submit.sh script
>      - ./artifacts/lava-submit.sh
> +  allow_failure:
> +    exit_codes:
> +      - 101

Maybe we could have this rule more generically instead of just in lava,
so we can re-use it in other jobs as well and we don't need to repeat it.


Regards,
Helen

>
>  .lava-igt:arm32:
>    extends:
> @@ -88,9 +93,14 @@
>      - igt:arm64
>    tags:
>      - $RUNNER_TAG
> +  allow_failure:
> +    exit_codes:
> +      - 101
>
>  .software-driver:
>    stage: software-driver
> +  variables:
> +    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
>    timeout: "1h30m"
>    rules:
>      - !reference [.scheduled_pipeline-rules, rules]
> @@ -108,6 +118,9 @@
>      - debian/x86_64_test-gl
>      - testing:x86_64
>      - igt:x86_64
> +  allow_failure:
> +    exit_codes:
> +      - 101
>
>  .msm-sc7180:
>    extends:
> @@ -153,7 +166,7 @@ msm:apq8016:
>      BM_KERNEL_EXTRA_ARGS: clk_ignore_unused
>      RUNNER_TAG: google-freedreno-db410c
>    script:
> -    - ./install/bare-metal/fastboot.sh
> +    - ./install/bare-metal/fastboot.sh || exit $?
>
>  msm:apq8096:
>    extends:
> @@ -167,7 +180,7 @@ msm:apq8096:
>      GPU_VERSION: apq8096
>      RUNNER_TAG: google-freedreno-db820c
>    script:
> -    - ./install/bare-metal/fastboot.sh
> +    - ./install/bare-metal/fastboot.sh || exit $?
>
>  msm:sdm845:
>    extends:
> @@ -181,7 +194,7 @@ msm:sdm845:
>      GPU_VERSION: sdm845
>      RUNNER_TAG: google-freedreno-cheza
>    script:
> -    - ./install/bare-metal/cros-servo.sh
> +    - ./install/bare-metal/cros-servo.sh || exit $?
>
>  msm:sm8350-hdk:
>    extends:
> --
> 2.43.0
>
Vignesh Raman Feb. 14, 2025, 7:52 a.m. UTC | #2
Hi Helen,

On 13/02/25 18:13, Helen Mae Koike Fornazier wrote:
> Hi Vignesh,
> 
> thanks for your patch.
> 
> Em ter., 11 de fev. de 2025 às 12:29, Vignesh Raman
> <vignesh.raman@collabora.com> escreveu:
>>
>> We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
>> This will output warnings when kernel locking errors are encountered
>> and will continue executing tests. To detect if lockdep has been
>> triggered, check the debug_locks value in /proc/lockdep_stats after
>> the tests have run. When debug_locks is 0, it indicates that lockdep
>> has detected issues and turned itself off. Check this value, and if
>> lockdep is detected, exit with an error and configure it as a warning
>> in GitLab CI.
>>
>> GitLab CI ignores exit codes other than 1 by default. Pass the correct
>> exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or
>> exit on failure.
>>
>> Also update the documentation.
>>
>> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
>> ---
>>
>> v2:
>>    - Lockdep failures are reported as pipeline warnings,
>>      and the documentation is updated.
>>
>> ---
>>   Documentation/gpu/automated_testing.rst |  4 ++++
>>   drivers/gpu/drm/ci/igt_runner.sh        | 11 +++++++++++
>>   drivers/gpu/drm/ci/test.yml             | 19 ++++++++++++++++---
>>   3 files changed, 31 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst
>> index 6d7c6086034d..62aa3ede02a5 100644
>> --- a/Documentation/gpu/automated_testing.rst
>> +++ b/Documentation/gpu/automated_testing.rst
>> @@ -115,6 +115,10 @@ created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines)
>>   5. The various jobs will be run and when the pipeline is finished, all jobs
>>   should be green unless a regression has been found.
>>
>> +6. Warnings in the pipeline indicate that lockdep
>> +(see Documentation/locking/lockdep-design.rst) issues have been detected
>> +during the tests.
>> +
>>
>>   How to update test expectations
>>   ===============================
>> diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
>> index 68b042e43b7f..2a0599f12c58 100755
>> --- a/drivers/gpu/drm/ci/igt_runner.sh
>> +++ b/drivers/gpu/drm/ci/igt_runner.sh
>> @@ -85,5 +85,16 @@ deqp-runner junit \
>>      --limit 50 \
>>      --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml"
>>
>> +# Check if /proc/lockdep_stats exists
>> +if [ -f /proc/lockdep_stats ]; then
>> +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
>> +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
>> +    if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then
>> +        echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information."
>> +        cat /proc/lockdep_stats
>> +        ret=101
>> +    fi
>> +fi
>> +
>>   cd $oldpath
>>   exit $ret
>> diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
>> index 0eab020a33b9..3af735dbf6bd 100644
>> --- a/drivers/gpu/drm/ci/test.yml
>> +++ b/drivers/gpu/drm/ci/test.yml
>> @@ -1,6 +1,8 @@
>>   .lava-test:
>>     extends:
>>       - .container+build-rules
>> +  variables:
>> +    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
>>     timeout: "1h30m"
>>     rules:
>>       - !reference [.scheduled_pipeline-rules, rules]
>> @@ -13,6 +15,9 @@
>>       - mv -n install/* artifacts/.
>>       # Override it with our lava-submit.sh script
>>       - ./artifacts/lava-submit.sh
>> +  allow_failure:
>> +    exit_codes:
>> +      - 101
> 
> Maybe we could have this rule more generically instead of just in lava,
> so we can re-use it in other jobs as well and we don't need to repeat it.

Yes agreed. I will post a patch with this update.

Regards,
Vignesh

> 
> 
> Regards,
> Helen
> 
>>
>>   .lava-igt:arm32:
>>     extends:
>> @@ -88,9 +93,14 @@
>>       - igt:arm64
>>     tags:
>>       - $RUNNER_TAG
>> +  allow_failure:
>> +    exit_codes:
>> +      - 101
>>
>>   .software-driver:
>>     stage: software-driver
>> +  variables:
>> +    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
>>     timeout: "1h30m"
>>     rules:
>>       - !reference [.scheduled_pipeline-rules, rules]
>> @@ -108,6 +118,9 @@
>>       - debian/x86_64_test-gl
>>       - testing:x86_64
>>       - igt:x86_64
>> +  allow_failure:
>> +    exit_codes:
>> +      - 101
>>
>>   .msm-sc7180:
>>     extends:
>> @@ -153,7 +166,7 @@ msm:apq8016:
>>       BM_KERNEL_EXTRA_ARGS: clk_ignore_unused
>>       RUNNER_TAG: google-freedreno-db410c
>>     script:
>> -    - ./install/bare-metal/fastboot.sh
>> +    - ./install/bare-metal/fastboot.sh || exit $?
>>
>>   msm:apq8096:
>>     extends:
>> @@ -167,7 +180,7 @@ msm:apq8096:
>>       GPU_VERSION: apq8096
>>       RUNNER_TAG: google-freedreno-db820c
>>     script:
>> -    - ./install/bare-metal/fastboot.sh
>> +    - ./install/bare-metal/fastboot.sh || exit $?
>>
>>   msm:sdm845:
>>     extends:
>> @@ -181,7 +194,7 @@ msm:sdm845:
>>       GPU_VERSION: sdm845
>>       RUNNER_TAG: google-freedreno-cheza
>>     script:
>> -    - ./install/bare-metal/cros-servo.sh
>> +    - ./install/bare-metal/cros-servo.sh || exit $?
>>
>>   msm:sm8350-hdk:
>>     extends:
>> --
>> 2.43.0
>>
> 
>
diff mbox series

Patch

diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst
index 6d7c6086034d..62aa3ede02a5 100644
--- a/Documentation/gpu/automated_testing.rst
+++ b/Documentation/gpu/automated_testing.rst
@@ -115,6 +115,10 @@  created (eg. https://gitlab.freedesktop.org/janedoe/linux/-/pipelines)
 5. The various jobs will be run and when the pipeline is finished, all jobs
 should be green unless a regression has been found.
 
+6. Warnings in the pipeline indicate that lockdep
+(see Documentation/locking/lockdep-design.rst) issues have been detected
+during the tests.
+
 
 How to update test expectations
 ===============================
diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index 68b042e43b7f..2a0599f12c58 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -85,5 +85,16 @@  deqp-runner junit \
    --limit 50 \
    --template "See $ARTIFACTS_BASE_URL/results/{{testcase}}.xml"
 
+# Check if /proc/lockdep_stats exists
+if [ -f /proc/lockdep_stats ]; then
+    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
+    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
+    if [ "$debug_locks" -eq 0 ] && [ "$ret" -eq 0 ]; then
+        echo "Warning: LOCKDEP issue detected. Please check dmesg logs for more information."
+        cat /proc/lockdep_stats
+        ret=101
+    fi
+fi
+
 cd $oldpath
 exit $ret
diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 0eab020a33b9..3af735dbf6bd 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -1,6 +1,8 @@ 
 .lava-test:
   extends:
     - .container+build-rules
+  variables:
+    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
   timeout: "1h30m"
   rules:
     - !reference [.scheduled_pipeline-rules, rules]
@@ -13,6 +15,9 @@ 
     - mv -n install/* artifacts/.
     # Override it with our lava-submit.sh script
     - ./artifacts/lava-submit.sh
+  allow_failure:
+    exit_codes:
+      - 101
 
 .lava-igt:arm32:
   extends:
@@ -88,9 +93,14 @@ 
     - igt:arm64
   tags:
     - $RUNNER_TAG
+  allow_failure:
+    exit_codes:
+      - 101
 
 .software-driver:
   stage: software-driver
+  variables:
+    FF_USE_NEW_BASH_EVAL_STRATEGY: 'true'
   timeout: "1h30m"
   rules:
     - !reference [.scheduled_pipeline-rules, rules]
@@ -108,6 +118,9 @@ 
     - debian/x86_64_test-gl
     - testing:x86_64
     - igt:x86_64
+  allow_failure:
+    exit_codes:
+      - 101
 
 .msm-sc7180:
   extends:
@@ -153,7 +166,7 @@  msm:apq8016:
     BM_KERNEL_EXTRA_ARGS: clk_ignore_unused
     RUNNER_TAG: google-freedreno-db410c
   script:
-    - ./install/bare-metal/fastboot.sh
+    - ./install/bare-metal/fastboot.sh || exit $?
 
 msm:apq8096:
   extends:
@@ -167,7 +180,7 @@  msm:apq8096:
     GPU_VERSION: apq8096
     RUNNER_TAG: google-freedreno-db820c
   script:
-    - ./install/bare-metal/fastboot.sh
+    - ./install/bare-metal/fastboot.sh || exit $?
 
 msm:sdm845:
   extends:
@@ -181,7 +194,7 @@  msm:sdm845:
     GPU_VERSION: sdm845
     RUNNER_TAG: google-freedreno-cheza
   script:
-    - ./install/bare-metal/cros-servo.sh
+    - ./install/bare-metal/cros-servo.sh || exit $?
 
 msm:sm8350-hdk:
   extends: