diff mbox series

drm/ttm: soften TTM warnings

Message ID 20210303155757.82497-1-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/ttm: soften TTM warnings | expand

Commit Message

Christian König March 3, 2021, 3:57 p.m. UTC
QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.

Make sure we warn only once until the QXL driver is fixed.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Daniel Vetter March 3, 2021, 5:19 p.m. UTC | #1
On Wed, Mar 3, 2021 at 4:57 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.
>
> Make sure we warn only once until the QXL driver is fixed.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Can you pls add FIXME comments to each that qxl is broken and needs to
be fixed first? Also please add a References: link to the bug report
on lore.kernel.org or wherever it was.

With that: r-b: me
-Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 20a25660b35b..245fa2c05927 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -136,7 +136,8 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
>         struct ttm_bo_device *bdev = bo->bdev;
>         struct ttm_resource_manager *man;
>
> -       dma_resv_assert_held(bo->base.resv);
> +       if (!bo->deleted)
> +               dma_resv_assert_held(bo->base.resv);
>
>         if (bo->pin_count) {
>                 ttm_bo_del_from_lru(bo);
> @@ -509,7 +510,7 @@ static void ttm_bo_release(struct kref *kref)
>                  * shrinkers, now that they are queued for
>                  * destruction.
>                  */
> -               if (WARN_ON(bo->pin_count)) {
> +               if (WARN_ON_ONCE(bo->pin_count)) {
>                         bo->pin_count = 0;
>                         ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>                 }
> --
> 2.25.1
>
Christian König March 3, 2021, 8:36 p.m. UTC | #2
Am 03.03.21 um 18:19 schrieb Daniel Vetter:
> On Wed, Mar 3, 2021 at 4:57 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.
>>
>> Make sure we warn only once until the QXL driver is fixed.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> Can you pls add FIXME comments to each that qxl is broken and needs to
> be fixed first? Also please add a References: link to the bug report
> on lore.kernel.org or wherever it was.

Was there a bug report? I only got notifications by mail so far.

Christian.

>
> With that: r-b: me
> -Daniel
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 20a25660b35b..245fa2c05927 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -136,7 +136,8 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
>>          struct ttm_bo_device *bdev = bo->bdev;
>>          struct ttm_resource_manager *man;
>>
>> -       dma_resv_assert_held(bo->base.resv);
>> +       if (!bo->deleted)
>> +               dma_resv_assert_held(bo->base.resv);
>>
>>          if (bo->pin_count) {
>>                  ttm_bo_del_from_lru(bo);
>> @@ -509,7 +510,7 @@ static void ttm_bo_release(struct kref *kref)
>>                   * shrinkers, now that they are queued for
>>                   * destruction.
>>                   */
>> -               if (WARN_ON(bo->pin_count)) {
>> +               if (WARN_ON_ONCE(bo->pin_count)) {
>>                          bo->pin_count = 0;
>>                          ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>                  }
>> --
>> 2.25.1
>>
>
Daniel Vetter March 3, 2021, 8:48 p.m. UTC | #3
On Wed, Mar 3, 2021 at 9:36 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
>
>
> Am 03.03.21 um 18:19 schrieb Daniel Vetter:
> > On Wed, Mar 3, 2021 at 4:57 PM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.
> >>
> >> Make sure we warn only once until the QXL driver is fixed.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> > Can you pls add FIXME comments to each that qxl is broken and needs to
> > be fixed first? Also please add a References: link to the bug report
> > on lore.kernel.org or wherever it was.
>
> Was there a bug report? I only got notifications by mail so far.

Well that's how bug reports work in the wider kernel community. You
need to fish out the right link from mail archives and link to those.

Yes it's not great, but it's imo better to reference these bug reports
than have nothing at all.
-Daniel

>
> Christian.
>
> >
> > With that: r-b: me
> > -Daniel
> >
> >> ---
> >>   drivers/gpu/drm/ttm/ttm_bo.c | 5 +++--
> >>   1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >> index 20a25660b35b..245fa2c05927 100644
> >> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >> @@ -136,7 +136,8 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
> >>          struct ttm_bo_device *bdev = bo->bdev;
> >>          struct ttm_resource_manager *man;
> >>
> >> -       dma_resv_assert_held(bo->base.resv);
> >> +       if (!bo->deleted)
> >> +               dma_resv_assert_held(bo->base.resv);
> >>
> >>          if (bo->pin_count) {
> >>                  ttm_bo_del_from_lru(bo);
> >> @@ -509,7 +510,7 @@ static void ttm_bo_release(struct kref *kref)
> >>                   * shrinkers, now that they are queued for
> >>                   * destruction.
> >>                   */
> >> -               if (WARN_ON(bo->pin_count)) {
> >> +               if (WARN_ON_ONCE(bo->pin_count)) {
> >>                          bo->pin_count = 0;
> >>                          ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
> >>                  }
> >> --
> >> 2.25.1
> >>
> >
>
Gerd Hoffmann March 4, 2021, 2:05 p.m. UTC | #4
On Wed, Mar 03, 2021 at 04:57:57PM +0100, Christian König wrote:
> QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.
> 
> Make sure we warn only once until the QXL driver is fixed.

> -	dma_resv_assert_held(bo->base.resv);
> +	if (!bo->deleted)
> +		dma_resv_assert_held(bo->base.resv);

Hmm?  I'm not aware of qxl having problems with this one.
Did I miss something?

> -		if (WARN_ON(bo->pin_count)) {
> +		if (WARN_ON_ONCE(bo->pin_count)) {

Well, as temporary thing this is rather pointless, qxl fix for this one
is already queued in drm-misc-fixes so this would only land after the
qxl fixes ...

But I think using WARN_ON_ONCE() is a good idea in general, especially
in a code path like this where a single bug can easily cause a flood of
stack traces.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>

take care,
  Gerd
Christian König March 4, 2021, 6:02 p.m. UTC | #5
Am 04.03.21 um 15:05 schrieb Gerd Hoffmann:
> On Wed, Mar 03, 2021 at 04:57:57PM +0100, Christian König wrote:
>> QXL indeed unrefs pinned BOs and the warnings are spamming peoples log files.
>>
>> Make sure we warn only once until the QXL driver is fixed.
>> -	dma_resv_assert_held(bo->base.resv);
>> +	if (!bo->deleted)
>> +		dma_resv_assert_held(bo->base.resv);
> Hmm?  I'm not aware of qxl having problems with this one.
> Did I miss something?

See the mail from Peter, but asserts were triggered when the pin_count 
was non zero and destruction.

>
>> -		if (WARN_ON(bo->pin_count)) {
>> +		if (WARN_ON_ONCE(bo->pin_count)) {
> Well, as temporary thing this is rather pointless, qxl fix for this one
> is already queued in drm-misc-fixes so this would only land after the
> qxl fixes ...
>
> But I think using WARN_ON_ONCE() is a good idea in general, especially
> in a code path like this where a single bug can easily cause a flood of
> stack traces.

Well that flood of stack traces can also be helpful, cause it makes 
people report such kind of issues immediately.

Anyway I'm going to keep that WARN_ON_ONCE for a cycle or two and if I 
don't hear any more complains I'm going to completely remove this 
"feature" and just always warn when we see a non zero pin_count on 
destruction.

Christian.

>
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
>
> take care,
>    Gerd
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 20a25660b35b..245fa2c05927 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -136,7 +136,8 @@  void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
 	struct ttm_bo_device *bdev = bo->bdev;
 	struct ttm_resource_manager *man;
 
-	dma_resv_assert_held(bo->base.resv);
+	if (!bo->deleted)
+		dma_resv_assert_held(bo->base.resv);
 
 	if (bo->pin_count) {
 		ttm_bo_del_from_lru(bo);
@@ -509,7 +510,7 @@  static void ttm_bo_release(struct kref *kref)
 		 * shrinkers, now that they are queued for
 		 * destruction.
 		 */
-		if (WARN_ON(bo->pin_count)) {
+		if (WARN_ON_ONCE(bo->pin_count)) {
 			bo->pin_count = 0;
 			ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
 		}