diff mbox series

drm/i915: Kick rcu harder to free objects

Message ID 20220906174609.23494-1-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Kick rcu harder to free objects | expand

Commit Message

Ville Syrjälä Sept. 6, 2022, 5:46 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

On gen3 the selftests are pretty much always tripping this:
<4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
<4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]

Looks to be due to the status page object lingering on the
purge_list. Call synchronize_rcu() ahead of it to make more
sure all objects have been freed.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Tvrtko Ursulin Sept. 8, 2022, 12:23 p.m. UTC | #1
On 06/09/2022 18:46, Ville Syrjala wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On gen3 the selftests are pretty much always tripping this:
> <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]
> 
> Looks to be due to the status page object lingering on the
> purge_list. Call synchronize_rcu() ahead of it to make more
> sure all objects have been freed.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0f49ec9d494a..5b61f7ad6473 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>   		flush_delayed_work(&i915->bdev.wq);
>   		rcu_barrier();
>   	}
> +	synchronize_rcu();

Looks a bit suspicious that a loop would not free all but one last rcu 
grace would. Definitely fixes the issue in your testing?

Perhaps the fact there is a cond_resched in __i915_gem_free_objects, but 
then again free count should reflect the state and keep it looping in here..

Regards,

Tvrtko
Ville Syrjälä Sept. 8, 2022, 1:30 p.m. UTC | #2
On Thu, Sep 08, 2022 at 01:23:50PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/09/2022 18:46, Ville Syrjala wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > On gen3 the selftests are pretty much always tripping this:
> > <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
> > <4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]
> > 
> > Looks to be due to the status page object lingering on the
> > purge_list. Call synchronize_rcu() ahead of it to make more
> > sure all objects have been freed.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0f49ec9d494a..5b61f7ad6473 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> >   		flush_delayed_work(&i915->bdev.wq);
> >   		rcu_barrier();
> >   	}
> > +	synchronize_rcu();
> 
> Looks a bit suspicious that a loop would not free all but one last rcu 
> grace would. Definitely fixes the issue in your testing?

Definite is a bit hard to say with fuzzy stuff like this. But yes,
so far didn't see the warn triggering anymore. CI results show the same.

> 
> Perhaps the fact there is a cond_resched in __i915_gem_free_objects, but 
> then again free count should reflect the state and keep it looping in here..
> 
> Regards,
> 
> Tvrtko
Nirmoy Das Sept. 8, 2022, 2:32 p.m. UTC | #3
Hi Ville,


I fixed a similar issue in DII but I couldn't reproduce it in drm

http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.

I wonder if that fixes the problem you are facing then I can send that 
to drm.

diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c
index 7809be3a6840..5438e9277924 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct drm_i915_private 
*dev_priv)

  void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
  {
-    i915_gem_drain_freed_objects(dev_priv);
+    i915_gem_drain_workqueue(dev_priv);
      GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
      GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
      drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);


Regards,

Nirmoy

On 9/6/2022 7:46 PM, Ville Syrjala wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> On gen3 the selftests are pretty much always tripping this:
> <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]
>
> Looks to be due to the status page object lingering on the
> purge_list. Call synchronize_rcu() ahead of it to make more
> sure all objects have been freed.
>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0f49ec9d494a..5b61f7ad6473 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>   		flush_delayed_work(&i915->bdev.wq);
>   		rcu_barrier();
>   	}
> +	synchronize_rcu();
>   }
>   
>   /*
Tvrtko Ursulin Sept. 8, 2022, 2:55 p.m. UTC | #4
On 08/09/2022 15:32, Das, Nirmoy wrote:
> Hi Ville,
> 
> 
> I fixed a similar issue in DII but I couldn't reproduce it in drm
> 
> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
> 
> I wonder if that fixes the problem you are facing then I can send that 
> to drm.
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> b/drivers/gpu/drm/i915/i915_gem.c
> index 7809be3a6840..5438e9277924 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct drm_i915_private 
> *dev_priv)
> 
>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   {
> -    i915_gem_drain_freed_objects(dev_priv);
> +    i915_gem_drain_workqueue(dev_priv);
>       GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>       drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);

Yes why not, more black magic (count to three) but if it works... :) I 
also spy the general area has been a bit neglected. Like:

i915_gem_driver_remove:
...
   i915_gem_drain_workqueue
   i915_gem_drain_freed_objects

While i915_gem_drain_workqueue:
...
   i915_gem_drain_freed_objects

So i915_gem_drain_freed_objects in i915_gem_driver_remove is redundant 
already.

Should i915_gem_drain_freed_objects be unexported and all callers made 
just call i915_gem_drain_workqueue after your patch? Or if "drain free 
objects" is considered more self descriptive it could be made as an 
alias to i915_gem_drain_workqueue.

Regards,

Tvrtko

> 
> 
> Regards,
> 
> Nirmoy
> 
> On 9/6/2022 7:46 PM, Ville Syrjala wrote:
>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>
>> On gen3 the selftests are pretty much always tripping this:
>> <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
>> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at 
>> drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 
>> [i915]
>>
>> Looks to be due to the status page object lingering on the
>> purge_list. Call synchronize_rcu() ahead of it to make more
>> sure all objects have been freed.
>>
>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 0f49ec9d494a..5b61f7ad6473 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct 
>> drm_i915_private *i915)
>>           flush_delayed_work(&i915->bdev.wq);
>>           rcu_barrier();
>>       }
>> +    synchronize_rcu();
>>   }
>>   /*
Ville Syrjälä Sept. 8, 2022, 3:11 p.m. UTC | #5
On Thu, Sep 08, 2022 at 04:32:56PM +0200, Das, Nirmoy wrote:
> Hi Ville,
> 
> 
> I fixed a similar issue in DII but I couldn't reproduce it in drm
> 
> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
> 
> I wonder if that fixes the problem you are facing then I can send that 
> to drm.

CI can tell you. It has been complaining about this for ages without
anyone doing anything about it.

> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> b/drivers/gpu/drm/i915/i915_gem.c
> index 7809be3a6840..5438e9277924 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct drm_i915_private 
> *dev_priv)
> 
>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   {
> -    i915_gem_drain_freed_objects(dev_priv);
> +    i915_gem_drain_workqueue(dev_priv);
>       GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>       drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
> 
> 
> Regards,
> 
> Nirmoy
> 
> On 9/6/2022 7:46 PM, Ville Syrjala wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > On gen3 the selftests are pretty much always tripping this:
> > <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
> > <4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]
> >
> > Looks to be due to the status page object lingering on the
> > purge_list. Call synchronize_rcu() ahead of it to make more
> > sure all objects have been freed.
> >
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0f49ec9d494a..5b61f7ad6473 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> >   		flush_delayed_work(&i915->bdev.wq);
> >   		rcu_barrier();
> >   	}
> > +	synchronize_rcu();
> >   }
> >   
> >   /*
Nirmoy Das Sept. 8, 2022, 7:22 p.m. UTC | #6
On 9/8/2022 4:55 PM, Tvrtko Ursulin wrote:
>
> On 08/09/2022 15:32, Das, Nirmoy wrote:
>> Hi Ville,
>>
>>
>> I fixed a similar issue in DII but I couldn't reproduce it in drm
>>
>> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
>>
>> I wonder if that fixes the problem you are facing then I can send 
>> that to drm.
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 7809be3a6840..5438e9277924 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct 
>> drm_i915_private *dev_priv)
>>
>>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>>   {
>> -    i915_gem_drain_freed_objects(dev_priv);
>> +    i915_gem_drain_workqueue(dev_priv);
>>       GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>>       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>>       drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
>
> Yes why not, more black magic (count to three) but if it works... :) I 
> also spy the general area has been a bit neglected. Like:


Not sure what should be the correct solution here.  I wonder if we might 
have to change this because of

https://lwn.net/Articles/906975/ ?


>
> i915_gem_driver_remove:
> ...
>   i915_gem_drain_workqueue
>   i915_gem_drain_freed_objects
>
> While i915_gem_drain_workqueue:
> ...
>   i915_gem_drain_freed_objects
>
> So i915_gem_drain_freed_objects in i915_gem_driver_remove is redundant 
> already.
>
> Should i915_gem_drain_freed_objects be unexported and all callers made 
> just call i915_gem_drain_workqueue after your patch? Or if "drain free 
> objects" is considered more self descriptive it could be made as an 
> alias to i915_gem_drain_workqueue.


We are using i915_gem_drain_freed_objects() in many places and replacing 
that with i915_gem_drain_workqueue() might have performance implication.


Nirmoy


>
> Regards,
>
> Tvrtko
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>> On 9/6/2022 7:46 PM, Ville Syrjala wrote:
>>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>
>>> On gen3 the selftests are pretty much always tripping this:
>>> <4> [383.822424] pci 0000:00:02.0: 
>>> drm_WARN_ON(dev_priv->mm.shrink_count)
>>> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at 
>>> drivers/gpu/drm/i915/i915_gem.c:1223 
>>> i915_gem_cleanup_early+0x96/0xb0 [i915]
>>>
>>> Looks to be due to the status page object lingering on the
>>> purge_list. Call synchronize_rcu() ahead of it to make more
>>> sure all objects have been freed.
>>>
>>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 0f49ec9d494a..5b61f7ad6473 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct 
>>> drm_i915_private *i915)
>>>           flush_delayed_work(&i915->bdev.wq);
>>>           rcu_barrier();
>>>       }
>>> +    synchronize_rcu();
>>>   }
>>>   /*
Nirmoy Das Sept. 8, 2022, 7:34 p.m. UTC | #7
On 9/8/2022 5:11 PM, Ville Syrjälä wrote:
> On Thu, Sep 08, 2022 at 04:32:56PM +0200, Das, Nirmoy wrote:
>> Hi Ville,
>>
>>
>> I fixed a similar issue in DII but I couldn't reproduce it in drm
>>
>> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
>>
>> I wonder if that fixes the problem you are facing then I can send that
>> to drm.
> CI can tell you. It has been complaining about this for ages


Could you please share a url/failed test name. I must be searching the 
wrong hw/test(https://intel-gfx-ci.01.org/tree/drm-tip/fi-ivb-3770.html).


Thanks,

Nirmoy

> without
> anyone doing anything about it.
>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 7809be3a6840..5438e9277924 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct drm_i915_private
>> *dev_priv)
>>
>>    void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>>    {
>> -    i915_gem_drain_freed_objects(dev_priv);
>> +    i915_gem_drain_workqueue(dev_priv);
>>        GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>>        GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>>        drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>> On 9/6/2022 7:46 PM, Ville Syrjala wrote:
>>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>
>>> On gen3 the selftests are pretty much always tripping this:
>>> <4> [383.822424] pci 0000:00:02.0: drm_WARN_ON(dev_priv->mm.shrink_count)
>>> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at drivers/gpu/drm/i915/i915_gem.c:1223 i915_gem_cleanup_early+0x96/0xb0 [i915]
>>>
>>> Looks to be due to the status page object lingering on the
>>> purge_list. Call synchronize_rcu() ahead of it to make more
>>> sure all objects have been freed.
>>>
>>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 0f49ec9d494a..5b61f7ad6473 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>>>    		flush_delayed_work(&i915->bdev.wq);
>>>    		rcu_barrier();
>>>    	}
>>> +	synchronize_rcu();
>>>    }
>>>    
>>>    /*
Ville Syrjälä Sept. 9, 2022, 7:29 a.m. UTC | #8
On Thu, Sep 08, 2022 at 09:34:04PM +0200, Das, Nirmoy wrote:
> 
> On 9/8/2022 5:11 PM, Ville Syrjälä wrote:
> > On Thu, Sep 08, 2022 at 04:32:56PM +0200, Das, Nirmoy wrote:
> >> Hi Ville,
> >>
> >>
> >> I fixed a similar issue in DII but I couldn't reproduce it in drm
> >>
> >> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
> >>
> >> I wonder if that fixes the problem you are facing then I can send that
> >> to drm.
> > CI can tell you. It has been complaining about this for ages
> 
> 
> Could you please share a url/failed test name. I must be searching the 
> wrong hw/test(https://intel-gfx-ci.01.org/tree/drm-tip/fi-ivb-3770.html).

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12102/fi-pnv-d510/igt@i915_selftest@live@requests.html

gen3 hits it nearly 100% of the time in the live selftests.
IIRC it also looked like shard-snb has also hit it more
sporadically in some tests.

There's also at least one bug about it
https://gitlab.freedesktop.org/drm/intel/-/issues/4528
which I guess is why ci is able to ignore this, and that
then has enabled all the humans to ignore it as well.
Ville Syrjälä Sept. 21, 2022, 7:56 a.m. UTC | #9
On Fri, Sep 09, 2022 at 10:29:55AM +0300, Ville Syrjälä wrote:
> On Thu, Sep 08, 2022 at 09:34:04PM +0200, Das, Nirmoy wrote:
> > 
> > On 9/8/2022 5:11 PM, Ville Syrjälä wrote:
> > > On Thu, Sep 08, 2022 at 04:32:56PM +0200, Das, Nirmoy wrote:
> > >> Hi Ville,
> > >>
> > >>
> > >> I fixed a similar issue in DII but I couldn't reproduce it in drm
> > >>
> > >> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
> > >>
> > >> I wonder if that fixes the problem you are facing then I can send that
> > >> to drm.
> > > CI can tell you. It has been complaining about this for ages
> > 
> > 
> > Could you please share a url/failed test name. I must be searching the 
> > wrong hw/test(https://intel-gfx-ci.01.org/tree/drm-tip/fi-ivb-3770.html).
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12102/fi-pnv-d510/igt@i915_selftest@live@requests.html
> 
> gen3 hits it nearly 100% of the time in the live selftests.
> IIRC it also looked like shard-snb has also hit it more
> sporadically in some tests.
> 
> There's also at least one bug about it
> https://gitlab.freedesktop.org/drm/intel/-/issues/4528
> which I guess is why ci is able to ignore this, and that
> then has enabled all the humans to ignore it as well.

Did we get anywhere with this? I'm getting tired of 
seeing the red in CI.
Nirmoy Das Sept. 22, 2022, 8:39 a.m. UTC | #10
On 9/21/2022 9:56 AM, Ville Syrjälä wrote:
> On Fri, Sep 09, 2022 at 10:29:55AM +0300, Ville Syrjälä wrote:
>> On Thu, Sep 08, 2022 at 09:34:04PM +0200, Das, Nirmoy wrote:
>>> On 9/8/2022 5:11 PM, Ville Syrjälä wrote:
>>>> On Thu, Sep 08, 2022 at 04:32:56PM +0200, Das, Nirmoy wrote:
>>>>> Hi Ville,
>>>>>
>>>>>
>>>>> I fixed a similar issue in DII but I couldn't reproduce it in drm
>>>>>
>>>>> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
>>>>>
>>>>> I wonder if that fixes the problem you are facing then I can send that
>>>>> to drm.
>>>> CI can tell you. It has been complaining about this for ages
>>>
>>> Could you please share a url/failed test name. I must be searching the
>>> wrong hw/test(https://intel-gfx-ci.01.org/tree/drm-tip/fi-ivb-3770.html).
>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12102/fi-pnv-d510/igt@i915_selftest@live@requests.html
>>
>> gen3 hits it nearly 100% of the time in the live selftests.
>> IIRC it also looked like shard-snb has also hit it more
>> sporadically in some tests.
>>
>> There's also at least one bug about it
>> https://gitlab.freedesktop.org/drm/intel/-/issues/4528
>> which I guess is why ci is able to ignore this, and that
>> then has enabled all the humans to ignore it as well.
> Did we get anywhere with this? I'm getting tired of
> seeing the red in CI.

Please review 
https://patchwork.freedesktop.org/patch/501991/?series=108314&rev=1 
Sorry, I should've Cced you.

It seems to fix the issue:

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_108314v1/fi-pnv-d510/igt@i915_selftest@live@requests.html


Regards,

Nirmoy


>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0f49ec9d494a..5b61f7ad6473 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1098,6 +1098,7 @@  void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
 		flush_delayed_work(&i915->bdev.wq);
 		rcu_barrier();
 	}
+	synchronize_rcu();
 }
 
 /*