diff mbox series

[v1] drm/ci: enable lockdep detection

Message ID 20240812112030.81774-1-vignesh.raman@collabora.com (mailing list archive)
State New, archived
Headers show
Series [v1] drm/ci: enable lockdep detection | expand

Commit Message

Vignesh Raman Aug. 12, 2024, 11:20 a.m. UTC
We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
This will output warnings when kernel locking errors are encountered
and will continue executing tests. To detect if lockdep has been
triggered, check the debug_locks value in /proc/lockdep_stats after
the tests have run. When debug_locks is 0, it indicates that lockdep
has detected issues and turned itself off. So check this value and
exit with an error if lockdep is detected.

Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
---

v1:
  - Pipeline link to show lockdep_stats before and after tests,
    https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
  
---
 drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Helen Mae Koike Fornazier Aug. 12, 2024, 8:17 p.m. UTC | #1
Hi Vignesh,

Thanks for your patch.


---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote ---

 > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci. 
 > This will output warnings when kernel locking errors are encountered 
 > and will continue executing tests. To detect if lockdep has been 
 > triggered, check the debug_locks value in /proc/lockdep_stats after 
 > the tests have run. When debug_locks is 0, it indicates that lockdep 
 > has detected issues and turned itself off. So check this value and 
 > exit with an error if lockdep is detected. 

Should we exit with an error? Or with a warning? (GitLab-CI supports that).
Well, I guess it is serious enough.

Should we also track on the xfail folder? So we can annotate those errors as well?
Did you have an entire pipeline with this? To see if everything is still green?

Helen

 >  
 > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com> 
 > --- 
 >  
 > v1: 
 >  - Pipeline link to show lockdep_stats before and after tests, 
 >  https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721 
 >  
 > --- 
 >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++ 
 >  1 file changed, 11 insertions(+) 
 >  
 > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh 
 > index f38836ec837c..d2c043cd8c6a 100755 
 > --- a/drivers/gpu/drm/ci/igt_runner.sh 
 > +++ b/drivers/gpu/drm/ci/igt_runner.sh 
 > @@ -85,6 +85,17 @@ deqp-runner junit \ 
 >  --limit 50 \ 
 >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml" 
 >  
 > +# Check if /proc/lockdep_stats exists 
 > +if [ -f /proc/lockdep_stats ]; then 
 > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off. 
 > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}') 
 > +    if [ "$debug_locks" -eq 0 ]; then 
 > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information." 
 > +        cat /proc/lockdep_stats 
 > +        ret=1 
 > +    fi 
 > +fi 
 > + 
 >  # Store the results also in the simpler format used by the runner in ChromeOS CI 
 >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt 
 >  
 > -- 
 > 2.43.0 
 >  
 >
Vignesh Raman Aug. 13, 2024, 5:26 a.m. UTC | #2
Hi Helen,

On 13/08/24 01:47, Helen Mae Koike Fornazier wrote:
> 
> Hi Vignesh,
> 
> Thanks for your patch.
> 
> 
> ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote ---
> 
>   > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
>   > This will output warnings when kernel locking errors are encountered
>   > and will continue executing tests. To detect if lockdep has been
>   > triggered, check the debug_locks value in /proc/lockdep_stats after
>   > the tests have run. When debug_locks is 0, it indicates that lockdep
>   > has detected issues and turned itself off. So check this value and
>   > exit with an error if lockdep is detected.
> 
> Should we exit with an error? Or with a warning? (GitLab-CI supports that).
> Well, I guess it is serious enough.

I think we can exit with an error since we check the status at the end 
of the tests.

> 
> Should we also track on the xfail folder? So we can annotate those errors as well?

Do you mean reporting this error in expectation files?

> Did you have an entire pipeline with this? To see if everything is still green?

Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711

This is a test branch in which I reverted a fix for the lockdep issue.
We see 'WARNING: bad unlock balance detected!' in logs and pipeline is 
still green.

Regards,
Vignesh

> 
> Helen
> 
>   >
>   > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com>
>   > ---
>   >
>   > v1:
>   >  - Pipeline link to show lockdep_stats before and after tests,
>   >  https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>   >
>   > ---
>   >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
>   >  1 file changed, 11 insertions(+)
>   >
>   > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
>   > index f38836ec837c..d2c043cd8c6a 100755
>   > --- a/drivers/gpu/drm/ci/igt_runner.sh
>   > +++ b/drivers/gpu/drm/ci/igt_runner.sh
>   > @@ -85,6 +85,17 @@ deqp-runner junit \
>   >  --limit 50 \
>   >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
>   >
>   > +# Check if /proc/lockdep_stats exists
>   > +if [ -f /proc/lockdep_stats ]; then
>   > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
>   > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
>   > +    if [ "$debug_locks" -eq 0 ]; then
>   > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information."
>   > +        cat /proc/lockdep_stats
>   > +        ret=1
>   > +    fi
>   > +fi
>   > +
>   >  # Store the results also in the simpler format used by the runner in ChromeOS CI
>   >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt
>   >
>   > --
>   > 2.43.0
>   >
>   >
Helen Mae Koike Fornazier Aug. 13, 2024, 8:14 p.m. UTC | #3
---- On Tue, 13 Aug 2024 02:26:48 -0300 Vignesh Raman  wrote ---

 > Hi Helen, 
 >  
 > On 13/08/24 01:47, Helen Mae Koike Fornazier wrote: 
 > > 
 > > Hi Vignesh, 
 > > 
 > > Thanks for your patch. 
 > > 
 > > 
 > > ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote --- 
 > > 
 > >   > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci. 
 > >   > This will output warnings when kernel locking errors are encountered 
 > >   > and will continue executing tests. To detect if lockdep has been 
 > >   > triggered, check the debug_locks value in /proc/lockdep_stats after 
 > >   > the tests have run. When debug_locks is 0, it indicates that lockdep 
 > >   > has detected issues and turned itself off. So check this value and 
 > >   > exit with an error if lockdep is detected. 
 > > 
 > > Should we exit with an error? Or with a warning? (GitLab-CI supports that). 
 > > Well, I guess it is serious enough. 
 >  
 > I think we can exit with an error since we check the status at the end 
 > of the tests. 

I mean, we can exit with a specific error and configure this specific error in gitlab-ci to be a warning,
so the job will be yellow and not red.

But maybe the lockdep issue should be a strong error.

 >  
 > > 
 > > Should we also track on the xfail folder? So we can annotate those errors as well? 
 >  
 > Do you mean reporting this error in expectation files? 

I wonder if there will be cases were we are getting this error and we should ignore it, so in the code
we should check the xfail files to see if we should exit with an error or ignore it.

For instance, if we have a case where we are having this error, and it is flaky, we might want to add it
to the flakes file list.

Maybe this is not the case, I'm just wondering.


 >  
 > > Did you have an entire pipeline with this? To see if everything is still green? 
 >  
 > Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711 
 >  
 > This is a test branch in which I reverted a fix for the lockdep issue. 
 > We see 'WARNING: bad unlock balance detected!' in logs and pipeline is 
 > still green. 

But with your patch, it would red right?
With the current patch, is the pipeline still all green?

Regards,
Helen

 >  
 > Regards, 
 > Vignesh 
 >  
 > > 
 > > Helen 
 > > 
 > >   > 
 > >   > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com> 
 > >   > --- 
 > >   > 
 > >   > v1: 
 > >   >  - Pipeline link to show lockdep_stats before and after tests, 
 > >   > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721 
 > >   > 
 > >   > --- 
 > >   >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++ 
 > >   >  1 file changed, 11 insertions(+) 
 > >   > 
 > >   > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh 
 > >   > index f38836ec837c..d2c043cd8c6a 100755 
 > >   > --- a/drivers/gpu/drm/ci/igt_runner.sh 
 > >   > +++ b/drivers/gpu/drm/ci/igt_runner.sh 
 > >   > @@ -85,6 +85,17 @@ deqp-runner junit \ 
 > >   >  --limit 50 \ 
 > >   >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml" 
 > >   > 
 > >   > +# Check if /proc/lockdep_stats exists 
 > >   > +if [ -f /proc/lockdep_stats ]; then 
 > >   > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off. 
 > >   > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}') 
 > >   > +    if [ "$debug_locks" -eq 0 ]; then 
 > >   > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information." 
 > >   > +        cat /proc/lockdep_stats 
 > >   > +        ret=1 
 > >   > +    fi 
 > >   > +fi 
 > >   > + 
 > >   >  # Store the results also in the simpler format used by the runner in ChromeOS CI 
 > >   >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt 
 > >   > 
 > >   > -- 
 > >   > 2.43.0 
 > >   > 
 > >   > 
 >
Vignesh Raman Aug. 14, 2024, 9:42 a.m. UTC | #4
Hi Helen,

On 14/08/24 01:44, Helen Mae Koike Fornazier wrote:
> 
> 
> 
> 
> ---- On Tue, 13 Aug 2024 02:26:48 -0300 Vignesh Raman  wrote ---
> 
>   > Hi Helen,
>   >
>   > On 13/08/24 01:47, Helen Mae Koike Fornazier wrote:
>   > >
>   > > Hi Vignesh,
>   > >
>   > > Thanks for your patch.
>   > >
>   > >
>   > > ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote ---
>   > >
>   > >   > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
>   > >   > This will output warnings when kernel locking errors are encountered
>   > >   > and will continue executing tests. To detect if lockdep has been
>   > >   > triggered, check the debug_locks value in /proc/lockdep_stats after
>   > >   > the tests have run. When debug_locks is 0, it indicates that lockdep
>   > >   > has detected issues and turned itself off. So check this value and
>   > >   > exit with an error if lockdep is detected.
>   > >
>   > > Should we exit with an error? Or with a warning? (GitLab-CI supports that).
>   > > Well, I guess it is serious enough.
>   >
>   > I think we can exit with an error since we check the status at the end
>   > of the tests.
> 
> I mean, we can exit with a specific error and configure this specific error in gitlab-ci to be a warning,
> so the job will be yellow and not red.
> 
> But maybe the lockdep issue should be a strong error.

Yes agree. We can exit with an error for lockdep issue instead of a warning.

> 
>   >
>   > >
>   > > Should we also track on the xfail folder? So we can annotate those errors as well?
>   >
>   > Do you mean reporting this error in expectation files?
> 
> I wonder if there will be cases were we are getting this error and we should ignore it, so in the code
> we should check the xfail files to see if we should exit with an error or ignore it.
> 
> For instance, if we have a case where we are having this error, and it is flaky, we might want to add it
> to the flakes file list.
> 
> Maybe this is not the case, I'm just wondering.


The tests are passing but log shows lockdep warning 
(https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711).

Moreover if the lockdep warning is emitted, lockdep will not continue to 
run and there is no need to check this warning for each tests.
So added the check at the end of the tests.

> 
> 
>   >
>   > > Did you have an entire pipeline with this? To see if everything is still green?
>   >
>   > Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711
>   >
>   > This is a test branch in which I reverted a fix for the lockdep issue.
>   > We see 'WARNING: bad unlock balance detected!' in logs and pipeline is
>   > still green.
> 
> But with your patch, it would red right?

Yes it would fail and the pipeline will be red.

> With the current patch, is the pipeline still all green?

With this current patch, it will fail.
Pipeline link to show lockdep_stats before and after tests,
https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721

Regards,
Vignesh

> 
> Regards,
> Helen
> 
>   >
>   > Regards,
>   > Vignesh
>   >
>   > >
>   > > Helen
>   > >
>   > >   >
>   > >   > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com>
>   > >   > ---
>   > >   >
>   > >   > v1:
>   > >   >  - Pipeline link to show lockdep_stats before and after tests,
>   > >   > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>   > >   >
>   > >   > ---
>   > >   >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
>   > >   >  1 file changed, 11 insertions(+)
>   > >   >
>   > >   > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
>   > >   > index f38836ec837c..d2c043cd8c6a 100755
>   > >   > --- a/drivers/gpu/drm/ci/igt_runner.sh
>   > >   > +++ b/drivers/gpu/drm/ci/igt_runner.sh
>   > >   > @@ -85,6 +85,17 @@ deqp-runner junit \
>   > >   >  --limit 50 \
>   > >   >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
>   > >   >
>   > >   > +# Check if /proc/lockdep_stats exists
>   > >   > +if [ -f /proc/lockdep_stats ]; then
>   > >   > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
>   > >   > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
>   > >   > +    if [ "$debug_locks" -eq 0 ]; then
>   > >   > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information."
>   > >   > +        cat /proc/lockdep_stats
>   > >   > +        ret=1
>   > >   > +    fi
>   > >   > +fi
>   > >   > +
>   > >   >  # Store the results also in the simpler format used by the runner in ChromeOS CI
>   > >   >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt
>   > >   >
>   > >   > --
>   > >   > 2.43.0
>   > >   >
>   > >   >
>   >
Rob Clark Aug. 14, 2024, 5:41 p.m. UTC | #5
On Wed, Aug 14, 2024 at 2:42 AM Vignesh Raman
<vignesh.raman@collabora.com> wrote:
>
> Hi Helen,
>
> On 14/08/24 01:44, Helen Mae Koike Fornazier wrote:
> >
> >
> >
> >
> > ---- On Tue, 13 Aug 2024 02:26:48 -0300 Vignesh Raman  wrote ---
> >
> >   > Hi Helen,
> >   >
> >   > On 13/08/24 01:47, Helen Mae Koike Fornazier wrote:
> >   > >
> >   > > Hi Vignesh,
> >   > >
> >   > > Thanks for your patch.
> >   > >
> >   > >
> >   > > ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote ---
> >   > >
> >   > >   > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
> >   > >   > This will output warnings when kernel locking errors are encountered
> >   > >   > and will continue executing tests. To detect if lockdep has been
> >   > >   > triggered, check the debug_locks value in /proc/lockdep_stats after
> >   > >   > the tests have run. When debug_locks is 0, it indicates that lockdep
> >   > >   > has detected issues and turned itself off. So check this value and
> >   > >   > exit with an error if lockdep is detected.
> >   > >
> >   > > Should we exit with an error? Or with a warning? (GitLab-CI supports that).
> >   > > Well, I guess it is serious enough.
> >   >
> >   > I think we can exit with an error since we check the status at the end
> >   > of the tests.
> >
> > I mean, we can exit with a specific error and configure this specific error in gitlab-ci to be a warning,
> > so the job will be yellow and not red.
> >
> > But maybe the lockdep issue should be a strong error.
>
> Yes agree. We can exit with an error for lockdep issue instead of a warning.

I think that is too strong, lockdep can warn about things which can
never happen in practice.  (We've never completely solved some of the
things that lockdep complains about in runpm vs shrinker reclaim.)

Surfacing it as a warning is fine.

BR,
-R

> >
> >   >
> >   > >
> >   > > Should we also track on the xfail folder? So we can annotate those errors as well?
> >   >
> >   > Do you mean reporting this error in expectation files?
> >
> > I wonder if there will be cases were we are getting this error and we should ignore it, so in the code
> > we should check the xfail files to see if we should exit with an error or ignore it.
> >
> > For instance, if we have a case where we are having this error, and it is flaky, we might want to add it
> > to the flakes file list.
> >
> > Maybe this is not the case, I'm just wondering.
>
>
> The tests are passing but log shows lockdep warning
> (https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711).
>
> Moreover if the lockdep warning is emitted, lockdep will not continue to
> run and there is no need to check this warning for each tests.
> So added the check at the end of the tests.
>
> >
> >
> >   >
> >   > > Did you have an entire pipeline with this? To see if everything is still green?
> >   >
> >   > Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711
> >   >
> >   > This is a test branch in which I reverted a fix for the lockdep issue.
> >   > We see 'WARNING: bad unlock balance detected!' in logs and pipeline is
> >   > still green.
> >
> > But with your patch, it would red right?
>
> Yes it would fail and the pipeline will be red.
>
> > With the current patch, is the pipeline still all green?
>
> With this current patch, it will fail.
> Pipeline link to show lockdep_stats before and after tests,
> https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>
> Regards,
> Vignesh
>
> >
> > Regards,
> > Helen
> >
> >   >
> >   > Regards,
> >   > Vignesh
> >   >
> >   > >
> >   > > Helen
> >   > >
> >   > >   >
> >   > >   > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com>
> >   > >   > ---
> >   > >   >
> >   > >   > v1:
> >   > >   >  - Pipeline link to show lockdep_stats before and after tests,
> >   > >   > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
> >   > >   >
> >   > >   > ---
> >   > >   >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
> >   > >   >  1 file changed, 11 insertions(+)
> >   > >   >
> >   > >   > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
> >   > >   > index f38836ec837c..d2c043cd8c6a 100755
> >   > >   > --- a/drivers/gpu/drm/ci/igt_runner.sh
> >   > >   > +++ b/drivers/gpu/drm/ci/igt_runner.sh
> >   > >   > @@ -85,6 +85,17 @@ deqp-runner junit \
> >   > >   >  --limit 50 \
> >   > >   >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
> >   > >   >
> >   > >   > +# Check if /proc/lockdep_stats exists
> >   > >   > +if [ -f /proc/lockdep_stats ]; then
> >   > >   > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
> >   > >   > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
> >   > >   > +    if [ "$debug_locks" -eq 0 ]; then
> >   > >   > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information."
> >   > >   > +        cat /proc/lockdep_stats
> >   > >   > +        ret=1
> >   > >   > +    fi
> >   > >   > +fi
> >   > >   > +
> >   > >   >  # Store the results also in the simpler format used by the runner in ChromeOS CI
> >   > >   >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt
> >   > >   >
> >   > >   > --
> >   > >   > 2.43.0
> >   > >   >
> >   > >   >
> >   >
Vignesh Raman Sept. 13, 2024, 12:19 p.m. UTC | #6
Hi Rob, Helen,

On 14/08/24 23:11, Rob Clark wrote:
> On Wed, Aug 14, 2024 at 2:42 AM Vignesh Raman
> <vignesh.raman@collabora.com> wrote:
>>
>> Hi Helen,
>>
>> On 14/08/24 01:44, Helen Mae Koike Fornazier wrote:
>>>
>>>
>>>
>>>
>>> ---- On Tue, 13 Aug 2024 02:26:48 -0300 Vignesh Raman  wrote ---
>>>
>>>    > Hi Helen,
>>>    >
>>>    > On 13/08/24 01:47, Helen Mae Koike Fornazier wrote:
>>>    > >
>>>    > > Hi Vignesh,
>>>    > >
>>>    > > Thanks for your patch.
>>>    > >
>>>    > >
>>>    > > ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman  wrote ---
>>>    > >
>>>    > >   > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
>>>    > >   > This will output warnings when kernel locking errors are encountered
>>>    > >   > and will continue executing tests. To detect if lockdep has been
>>>    > >   > triggered, check the debug_locks value in /proc/lockdep_stats after
>>>    > >   > the tests have run. When debug_locks is 0, it indicates that lockdep
>>>    > >   > has detected issues and turned itself off. So check this value and
>>>    > >   > exit with an error if lockdep is detected.
>>>    > >
>>>    > > Should we exit with an error? Or with a warning? (GitLab-CI supports that).
>>>    > > Well, I guess it is serious enough.
>>>    >
>>>    > I think we can exit with an error since we check the status at the end
>>>    > of the tests.
>>>
>>> I mean, we can exit with a specific error and configure this specific error in gitlab-ci to be a warning,
>>> so the job will be yellow and not red.
>>>
>>> But maybe the lockdep issue should be a strong error.
>>
>> Yes agree. We can exit with an error for lockdep issue instead of a warning.
> 
> I think that is too strong, lockdep can warn about things which can
> never happen in practice.  (We've never completely solved some of the
> things that lockdep complains about in runpm vs shrinker reclaim.)
> 
> Surfacing it as a warning is fine.

Will send another patch which will exit with an error if lockdep is 
detected and configure it as a warning in GitLab CI.

Regards,
Vignesh

> 
> BR,
> -R
> 
>>>
>>>    >
>>>    > >
>>>    > > Should we also track on the xfail folder? So we can annotate those errors as well?
>>>    >
>>>    > Do you mean reporting this error in expectation files?
>>>
>>> I wonder if there will be cases were we are getting this error and we should ignore it, so in the code
>>> we should check the xfail files to see if we should exit with an error or ignore it.
>>>
>>> For instance, if we have a case where we are having this error, and it is flaky, we might want to add it
>>> to the flakes file list.
>>>
>>> Maybe this is not the case, I'm just wondering.
>>
>>
>> The tests are passing but log shows lockdep warning
>> (https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711).
>>
>> Moreover if the lockdep warning is emitted, lockdep will not continue to
>> run and there is no need to check this warning for each tests.
>> So added the check at the end of the tests.
>>
>>>
>>>
>>>    >
>>>    > > Did you have an entire pipeline with this? To see if everything is still green?
>>>    >
>>>    > Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711
>>>    >
>>>    > This is a test branch in which I reverted a fix for the lockdep issue.
>>>    > We see 'WARNING: bad unlock balance detected!' in logs and pipeline is
>>>    > still green.
>>>
>>> But with your patch, it would red right?
>>
>> Yes it would fail and the pipeline will be red.
>>
>>> With the current patch, is the pipeline still all green?
>>
>> With this current patch, it will fail.
>> Pipeline link to show lockdep_stats before and after tests,
>> https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>>
>> Regards,
>> Vignesh
>>
>>>
>>> Regards,
>>> Helen
>>>
>>>    >
>>>    > Regards,
>>>    > Vignesh
>>>    >
>>>    > >
>>>    > > Helen
>>>    > >
>>>    > >   >
>>>    > >   > Signed-off-by: Vignesh Raman vignesh.raman@collabora.com>
>>>    > >   > ---
>>>    > >   >
>>>    > >   > v1:
>>>    > >   >  - Pipeline link to show lockdep_stats before and after tests,
>>>    > >   > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>>>    > >   >
>>>    > >   > ---
>>>    > >   >  drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
>>>    > >   >  1 file changed, 11 insertions(+)
>>>    > >   >
>>>    > >   > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
>>>    > >   > index f38836ec837c..d2c043cd8c6a 100755
>>>    > >   > --- a/drivers/gpu/drm/ci/igt_runner.sh
>>>    > >   > +++ b/drivers/gpu/drm/ci/igt_runner.sh
>>>    > >   > @@ -85,6 +85,17 @@ deqp-runner junit \
>>>    > >   >  --limit 50 \
>>>    > >   >  --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
>>>    > >   >
>>>    > >   > +# Check if /proc/lockdep_stats exists
>>>    > >   > +if [ -f /proc/lockdep_stats ]; then
>>>    > >   > +    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
>>>    > >   > +    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
>>>    > >   > +    if [ "$debug_locks" -eq 0 ]; then
>>>    > >   > +        echo "LOCKDEP issue detected. Please check dmesg logs for more information."
>>>    > >   > +        cat /proc/lockdep_stats
>>>    > >   > +        ret=1
>>>    > >   > +    fi
>>>    > >   > +fi
>>>    > >   > +
>>>    > >   >  # Store the results also in the simpler format used by the runner in ChromeOS CI
>>>    > >   >  #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt
>>>    > >   >
>>>    > >   > --
>>>    > >   > 2.43.0
>>>    > >   >
>>>    > >   >
>>>    >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index f38836ec837c..d2c043cd8c6a 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -85,6 +85,17 @@  deqp-runner junit \
    --limit 50 \
    --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
 
+# Check if /proc/lockdep_stats exists
+if [ -f /proc/lockdep_stats ]; then
+    # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
+    debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
+    if [ "$debug_locks" -eq 0 ]; then
+        echo "LOCKDEP issue detected. Please check dmesg logs for more information."
+        cat /proc/lockdep_stats
+        ret=1
+    fi
+fi
+
 # Store the results also in the simpler format used by the runner in ChromeOS CI
 #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt