diff mbox

drm/i915: get runtime PM reference around GEM set_caching IOCTL

Message ID 20151105115628.GZ669@nuc-i3427.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Nov. 5, 2015, 11:56 a.m. UTC
On Thu, Nov 05, 2015 at 01:28:32PM +0200, Imre Deak wrote:
> On ke, 2015-11-04 at 20:57 +0000, Chris Wilson wrote:
> > On Wed, Nov 04, 2015 at 09:25:32PM +0200, Imre Deak wrote:
> > > After Damien's D3 fix I started to get runtime suspend residency for the
> > > first time and that revealed a breakage on the set_caching IOCTL path
> > > that accesses the HW but doesn't take an RPM ref. Fix this up.
> > 
> > Why here (and in so many random places) and not around the PTE write
> > itself?
> 
> Imo we should take the RPM ref outside of any of our locks. Otherwise we
> need hacks like we have currently in the runtime suspend handler to work
> around lock inversions. It works now, but we couldn't do the same trick
> if we needed to take struct_mutex for example in the resume handler too
> for some reason.

On the other hand, it leads to hard to track down bugs and improper
documentation. Neither solution is perfect.

Note since intel_runtime_suspend has ample barriers of its own, you can
avoid the struct_mutex if you have a dedicated dev_priv->mm.fault_list.

Something along the lines of:



The tricky part is reviewing the i915_gem_release_mmap() callers to
ensure that they have the right barrier. If you make
i915_gem_release_mmap() assert it holds an rpm ref, and then make the
i915_gem_release_all_mmaps() unlink the node directly that should do the
trick.
-Chris

Comments

Imre Deak Nov. 5, 2015, 10:57 p.m. UTC | #1
On Thu, 2015-11-05 at 11:56 +0000, Chris Wilson wrote:
> On Thu, Nov 05, 2015 at 01:28:32PM +0200, Imre Deak wrote:
> > On ke, 2015-11-04 at 20:57 +0000, Chris Wilson wrote:
> > > On Wed, Nov 04, 2015 at 09:25:32PM +0200, Imre Deak wrote:
> > > > After Damien's D3 fix I started to get runtime suspend residency for the
> > > > first time and that revealed a breakage on the set_caching IOCTL path
> > > > that accesses the HW but doesn't take an RPM ref. Fix this up.
> > > 
> > > Why here (and in so many random places) and not around the PTE write
> > > itself?
> > 
> > Imo we should take the RPM ref outside of any of our locks. Otherwise we
> > need hacks like we have currently in the runtime suspend handler to work
> > around lock inversions. It works now, but we couldn't do the same trick
> > if we needed to take struct_mutex for example in the resume handler too
> > for some reason.
> 
> On the other hand, it leads to hard to track down bugs and improper
> documentation. Neither solution is perfect.

I think I understand the documentation part, not sure how that could be
solved. Perhaps RPM-held asserts right before the point where the HW
needs to be on?

What do you mean by hard to track down bugs? Couldn't that also be
tackled by asserts?

> Note since intel_runtime_suspend has ample barriers of its own, you can
> avoid the struct_mutex if you have a dedicated dev_priv->mm.fault_list.
> 
> Something along the lines of:

Ok, looks interesting. But as you said we would have to then make
callers of i915_gem_release_mmap() wake up the device, which isn't
strictly necessary. (and imo as such goes somewhat against the
documentation argument)

--Imre


> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 86734be..fe91ce5 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -180,11 +180,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>         }
>         if (obj->stolen)
>                 seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> -       if (obj->pin_display || obj->fault_mappable) {
> +       if (obj->pin_display || !list_empty(&obj->fault_link)) {
>                 char s[3], *t = s;
>                 if (obj->pin_display)
>                         *t++ = 'p';
> -               if (obj->fault_mappable)
> +               if (!list_empty(&obj->fault_link))
>                         *t++ = 'f';
>                 *t = '\0';
>                 seq_printf(m, " (%s mappable)", s);
> @@ -474,7 +474,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  
>         size = count = mappable_size = mappable_count = 0;
>         list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -               if (obj->fault_mappable) {
> +               if (!list_empty(&obj->fault_link)) {
>                         size += i915_gem_obj_ggtt_size(obj);
>                         ++count;
>                 }
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 1d88745..179594e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1447,28 +1447,10 @@ static int intel_runtime_suspend(struct device *device)
>         DRM_DEBUG_KMS("Suspending device\n");
>  
>         /*
> -        * We could deadlock here in case another thread holding struct_mutex
> -        * calls RPM suspend concurrently, since the RPM suspend will wait
> -        * first for this RPM suspend to finish. In this case the concurrent
> -        * RPM resume will be followed by its RPM suspend counterpart. Still
> -        * for consistency return -EAGAIN, which will reschedule this suspend.
> -        */
> -       if (!mutex_trylock(&dev->struct_mutex)) {
> -               DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
> -               /*
> -                * Bump the expiration timestamp, otherwise the suspend won't
> -                * be rescheduled.
> -                */
> -               pm_runtime_mark_last_busy(device);
> -
> -               return -EAGAIN;
> -       }
> -       /*
>          * We are safe here against re-faults, since the fault handler takes
>          * an RPM reference.
>          */
>         i915_gem_release_all_mmaps(dev_priv);
> -       mutex_unlock(&dev->struct_mutex);
>  
>         intel_suspend_gt_powersave(dev);
>         intel_runtime_pm_disable_interrupts(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 55611d8..5e4904a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1268,6 +1268,8 @@ struct i915_gem_mm {
>          */
>         struct list_head unbound_list;
>  
> +       struct list_head fault_list;
> +
>         /** Usable portion of the GTT for GEM */
>         unsigned long stolen_base; /* limited to low memory (32-bit) */
>  
> @@ -2025,6 +2027,8 @@ struct drm_i915_gem_object {
>  
>         struct list_head batch_pool_link;
>  
> +       struct list_head fault_link;
> +
>         /**
>          * This is set if the object is on the active lists (has pending
>          * rendering and so a non-zero seqno), and is not set if it i s on
> @@ -2069,13 +2073,6 @@ struct drm_i915_gem_object {
>          */
>         unsigned int map_and_fenceable:1;
>  
> -       /**
> -        * Whether the current gtt mapping needs to be mappable (and isn't just
> -        * mappable by accident). Track pin and fault separate for a more
> -        * accurate mappable working set.
> -        */
> -       unsigned int fault_mappable:1;
> -
>         /*
>          * Is the object to be mapped as read-only to the GPU
>          * Only honoured if hardware has relevant pte bit
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 407b6b3..a90d1d8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1814,9 +1814,10 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>                                 break;
>                 }
>  
> -               obj->fault_mappable = true;
> +               if (list_empty(&obj->fault_link))
> +                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
>         } else {
> -               if (!obj->fault_mappable) {
> +               if (list_empty(&obj->fault_link)) {
>                         unsigned long size = min_t(unsigned long,
>                                                    vma->vm_end - vma->vm_start,
>                                                    obj->base.size);
> @@ -1830,7 +1831,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>                                         break;
>                         }
>  
> -                       obj->fault_mappable = true;
> +                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
>                 } else
>                         ret = vm_insert_pfn(vma,
>                                             (unsigned long)vmf->virtual_address,
> @@ -1903,20 +1904,20 @@ out:
>  void
>  i915_gem_release_mmap(struct drm_i915_gem_object *obj)
>  {
> -       if (!obj->fault_mappable)
> +       if (list_empty(&obj->fault_link))
>                 return;
>  
>         drm_vma_node_unmap(&obj->base.vma_node,
>                            obj->base.dev->anon_inode->i_mapping);
> -       obj->fault_mappable = false;
> +       list_del_init(&obj->fault_link);
>  }
>  
>  void
>  i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
>  {
> -       struct drm_i915_gem_object *obj;
> +       struct drm_i915_gem_object *obj, *n;
>  
> -       list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> +       list_for_each_entry_safe(obj, n, &dev_priv->mm.fault_list, fault_link)
>                 i915_gem_release_mmap(obj);
>  }
>  
> @@ -4224,6 +4229,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>         INIT_LIST_HEAD(&obj->obj_exec_link);
>         INIT_LIST_HEAD(&obj->vma_list);
>         INIT_LIST_HEAD(&obj->batch_pool_link);
> +       INIT_LIST_HEAD(&obj->fault_link);
>  
>         obj->ops = ops;
>  
> @@ -4858,6 +4864,7 @@ i915_gem_load(struct drm_device *dev)
>         INIT_LIST_HEAD(&dev_priv->context_list);
>         INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
>         INIT_LIST_HEAD(&dev_priv->mm.bound_list);
> +       INIT_LIST_HEAD(&dev_priv->mm.fault_list);
>         INIT_LIST_HEAD(&dev_priv->mm.fence_list);
>         for (i = 0; i < I915_NUM_RINGS; i++)
>                 init_ring_lists(&dev_priv->ring[i]);
> 
> 
> The tricky part is reviewing the i915_gem_release_mmap() callers to
> ensure that they have the right barrier. If you make
> i915_gem_release_mmap() assert it holds an rpm ref, and then make the
> i915_gem_release_all_mmaps() unlink the node directly that should do the
> trick.
> -Chris
>
Imre Deak Nov. 5, 2015, 11:24 p.m. UTC | #2
On Fri, 2015-11-06 at 00:57 +0200, Imre Deak wrote:
> On Thu, 2015-11-05 at 11:56 +0000, Chris Wilson wrote:
> > On Thu, Nov 05, 2015 at 01:28:32PM +0200, Imre Deak wrote:
> > > On ke, 2015-11-04 at 20:57 +0000, Chris Wilson wrote:
> > > > On Wed, Nov 04, 2015 at 09:25:32PM +0200, Imre Deak wrote:
> > > > > After Damien's D3 fix I started to get runtime suspend residency for the
> > > > > first time and that revealed a breakage on the set_caching IOCTL path
> > > > > that accesses the HW but doesn't take an RPM ref. Fix this up.
> > > > 
> > > > Why here (and in so many random places) and not around the PTE write
> > > > itself?
> > > 
> > > Imo we should take the RPM ref outside of any of our locks. Otherwise we
> > > need hacks like we have currently in the runtime suspend handler to work
> > > around lock inversions. It works now, but we couldn't do the same trick
> > > if we needed to take struct_mutex for example in the resume handler too
> > > for some reason.
> > 
> > On the other hand, it leads to hard to track down bugs and improper
> > documentation. Neither solution is perfect.
> 
> I think I understand the documentation part, not sure how that could be
> solved. Perhaps RPM-held asserts right before the point where the HW
> needs to be on?
> 
> What do you mean by hard to track down bugs? Couldn't that also be
> tackled by asserts?
> 
> > Note since intel_runtime_suspend has ample barriers of its own, you can
> > avoid the struct_mutex if you have a dedicated dev_priv->mm.fault_list.
> > 
> > Something along the lines of:
> 
> Ok, looks interesting. But as you said we would have to then make
> callers of i915_gem_release_mmap() wake up the device, which isn't
> strictly necessary. (and imo as such goes somewhat against the
> documentation argument)

Actually, we could also use pm_runtime_get_noresume(). But I find this
would just complicate things without a real benefit.

> --Imre
> 
> 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 86734be..fe91ce5 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -180,11 +180,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >         }
> >         if (obj->stolen)
> >                 seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> > -       if (obj->pin_display || obj->fault_mappable) {
> > +       if (obj->pin_display || !list_empty(&obj->fault_link)) {
> >                 char s[3], *t = s;
> >                 if (obj->pin_display)
> >                         *t++ = 'p';
> > -               if (obj->fault_mappable)
> > +               if (!list_empty(&obj->fault_link))
> >                         *t++ = 'f';
> >                 *t = '\0';
> >                 seq_printf(m, " (%s mappable)", s);
> > @@ -474,7 +474,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
> >  
> >         size = count = mappable_size = mappable_count = 0;
> >         list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> > -               if (obj->fault_mappable) {
> > +               if (!list_empty(&obj->fault_link)) {
> >                         size += i915_gem_obj_ggtt_size(obj);
> >                         ++count;
> >                 }
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 1d88745..179594e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1447,28 +1447,10 @@ static int intel_runtime_suspend(struct device *device)
> >         DRM_DEBUG_KMS("Suspending device\n");
> >  
> >         /*
> > -        * We could deadlock here in case another thread holding struct_mutex
> > -        * calls RPM suspend concurrently, since the RPM suspend will wait
> > -        * first for this RPM suspend to finish. In this case the concurrent
> > -        * RPM resume will be followed by its RPM suspend counterpart. Still
> > -        * for consistency return -EAGAIN, which will reschedule this suspend.
> > -        */
> > -       if (!mutex_trylock(&dev->struct_mutex)) {
> > -               DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
> > -               /*
> > -                * Bump the expiration timestamp, otherwise the suspend won't
> > -                * be rescheduled.
> > -                */
> > -               pm_runtime_mark_last_busy(device);
> > -
> > -               return -EAGAIN;
> > -       }
> > -       /*
> >          * We are safe here against re-faults, since the fault handler takes
> >          * an RPM reference.
> >          */
> >         i915_gem_release_all_mmaps(dev_priv);
> > -       mutex_unlock(&dev->struct_mutex);
> >  
> >         intel_suspend_gt_powersave(dev);
> >         intel_runtime_pm_disable_interrupts(dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 55611d8..5e4904a 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1268,6 +1268,8 @@ struct i915_gem_mm {
> >          */
> >         struct list_head unbound_list;
> >  
> > +       struct list_head fault_list;
> > +
> >         /** Usable portion of the GTT for GEM */
> >         unsigned long stolen_base; /* limited to low memory (32-bit) */
> >  
> > @@ -2025,6 +2027,8 @@ struct drm_i915_gem_object {
> >  
> >         struct list_head batch_pool_link;
> >  
> > +       struct list_head fault_link;
> > +
> >         /**
> >          * This is set if the object is on the active lists (has pending
> >          * rendering and so a non-zero seqno), and is not set if it i s on
> > @@ -2069,13 +2073,6 @@ struct drm_i915_gem_object {
> >          */
> >         unsigned int map_and_fenceable:1;
> >  
> > -       /**
> > -        * Whether the current gtt mapping needs to be mappable (and isn't just
> > -        * mappable by accident). Track pin and fault separate for a more
> > -        * accurate mappable working set.
> > -        */
> > -       unsigned int fault_mappable:1;
> > -
> >         /*
> >          * Is the object to be mapped as read-only to the GPU
> >          * Only honoured if hardware has relevant pte bit
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 407b6b3..a90d1d8 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1814,9 +1814,10 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >                                 break;
> >                 }
> >  
> > -               obj->fault_mappable = true;
> > +               if (list_empty(&obj->fault_link))
> > +                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
> >         } else {
> > -               if (!obj->fault_mappable) {
> > +               if (list_empty(&obj->fault_link)) {
> >                         unsigned long size = min_t(unsigned long,
> >                                                    vma->vm_end - vma->vm_start,
> >                                                    obj->base.size);
> > @@ -1830,7 +1831,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >                                         break;
> >                         }
> >  
> > -                       obj->fault_mappable = true;
> > +                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
> >                 } else
> >                         ret = vm_insert_pfn(vma,
> >                                             (unsigned long)vmf->virtual_address,
> > @@ -1903,20 +1904,20 @@ out:
> >  void
> >  i915_gem_release_mmap(struct drm_i915_gem_object *obj)
> >  {
> > -       if (!obj->fault_mappable)
> > +       if (list_empty(&obj->fault_link))
> >                 return;
> >  
> >         drm_vma_node_unmap(&obj->base.vma_node,
> >                            obj->base.dev->anon_inode->i_mapping);
> > -       obj->fault_mappable = false;
> > +       list_del_init(&obj->fault_link);
> >  }
> >  
> >  void
> >  i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
> >  {
> > -       struct drm_i915_gem_object *obj;
> > +       struct drm_i915_gem_object *obj, *n;
> >  
> > -       list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> > +       list_for_each_entry_safe(obj, n, &dev_priv->mm.fault_list, fault_link)
> >                 i915_gem_release_mmap(obj);
> >  }
> >  
> > @@ -4224,6 +4229,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
> >         INIT_LIST_HEAD(&obj->obj_exec_link);
> >         INIT_LIST_HEAD(&obj->vma_list);
> >         INIT_LIST_HEAD(&obj->batch_pool_link);
> > +       INIT_LIST_HEAD(&obj->fault_link);
> >  
> >         obj->ops = ops;
> >  
> > @@ -4858,6 +4864,7 @@ i915_gem_load(struct drm_device *dev)
> >         INIT_LIST_HEAD(&dev_priv->context_list);
> >         INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
> >         INIT_LIST_HEAD(&dev_priv->mm.bound_list);
> > +       INIT_LIST_HEAD(&dev_priv->mm.fault_list);
> >         INIT_LIST_HEAD(&dev_priv->mm.fence_list);
> >         for (i = 0; i < I915_NUM_RINGS; i++)
> >                 init_ring_lists(&dev_priv->ring[i]);
> > 
> > 
> > The tricky part is reviewing the i915_gem_release_mmap() callers to
> > ensure that they have the right barrier. If you make
> > i915_gem_release_mmap() assert it holds an rpm ref, and then make the
> > i915_gem_release_all_mmaps() unlink the node directly that should do the
> > trick.
> > -Chris
> > 
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Nov. 6, 2015, 8:54 a.m. UTC | #3
On Fri, Nov 06, 2015 at 12:57:35AM +0200, Imre Deak wrote:
> On Thu, 2015-11-05 at 11:56 +0000, Chris Wilson wrote:
> > On Thu, Nov 05, 2015 at 01:28:32PM +0200, Imre Deak wrote:
> > > On ke, 2015-11-04 at 20:57 +0000, Chris Wilson wrote:
> > > > On Wed, Nov 04, 2015 at 09:25:32PM +0200, Imre Deak wrote:
> > > > > After Damien's D3 fix I started to get runtime suspend residency for the
> > > > > first time and that revealed a breakage on the set_caching IOCTL path
> > > > > that accesses the HW but doesn't take an RPM ref. Fix this up.
> > > > 
> > > > Why here (and in so many random places) and not around the PTE write
> > > > itself?
> > > 
> > > Imo we should take the RPM ref outside of any of our locks. Otherwise we
> > > need hacks like we have currently in the runtime suspend handler to work
> > > around lock inversions. It works now, but we couldn't do the same trick
> > > if we needed to take struct_mutex for example in the resume handler too
> > > for some reason.
> > 
> > On the other hand, it leads to hard to track down bugs and improper
> > documentation. Neither solution is perfect.
> 
> I think I understand the documentation part, not sure how that could be
> solved. Perhaps RPM-held asserts right before the point where the HW
> needs to be on?
> 
> What do you mean by hard to track down bugs? Couldn't that also be
> tackled by asserts?
> 
> > Note since intel_runtime_suspend has ample barriers of its own, you can
> > avoid the struct_mutex if you have a dedicated dev_priv->mm.fault_list.
> > 
> > Something along the lines of:
> 
> Ok, looks interesting. But as you said we would have to then make
> callers of i915_gem_release_mmap() wake up the device, which isn't
> strictly necessary. (and imo as such goes somewhat against the
> documentation argument)

Outside of suspend, we only revoke the CPU PTE mapping when we change
the hardware (as doing so is very expensive). So all callers should
already have a reason (and be holding) the rpm wakelock, the only
complication from my point of view is enforcing that and reviewing that
what I said is true.

From the rpm point of view, this should improve the success of runtime
suspend, and reduce wakelocks.
-Chris
Imre Deak Nov. 9, 2015, 1:09 p.m. UTC | #4
On pe, 2015-11-06 at 08:54 +0000, Chris Wilson wrote:
> On Fri, Nov 06, 2015 at 12:57:35AM +0200, Imre Deak wrote:
> > On Thu, 2015-11-05 at 11:56 +0000, Chris Wilson wrote:
> > > On Thu, Nov 05, 2015 at 01:28:32PM +0200, Imre Deak wrote:
> > > > On ke, 2015-11-04 at 20:57 +0000, Chris Wilson wrote:
> > > > > On Wed, Nov 04, 2015 at 09:25:32PM +0200, Imre Deak wrote:
> > > > > > After Damien's D3 fix I started to get runtime suspend
> > > > > > residency for the
> > > > > > first time and that revealed a breakage on the set_caching
> > > > > > IOCTL path
> > > > > > that accesses the HW but doesn't take an RPM ref. Fix this
> > > > > > up.
> > > > > 
> > > > > Why here (and in so many random places) and not around the
> > > > > PTE write
> > > > > itself?
> > > > 
> > > > Imo we should take the RPM ref outside of any of our locks.
> > > > Otherwise we
> > > > need hacks like we have currently in the runtime suspend
> > > > handler to work
> > > > around lock inversions. It works now, but we couldn't do the
> > > > same trick
> > > > if we needed to take struct_mutex for example in the resume
> > > > handler too
> > > > for some reason.
> > > 
> > > On the other hand, it leads to hard to track down bugs and
> > > improper
> > > documentation. Neither solution is perfect.
> > 
> > I think I understand the documentation part, not sure how that
> > could be
> > solved. Perhaps RPM-held asserts right before the point where the
> > HW
> > needs to be on?
> > 
> > What do you mean by hard to track down bugs? Couldn't that also be
> > tackled by asserts?
> > 
> > > Note since intel_runtime_suspend has ample barriers of its own,
> > > you can
> > > avoid the struct_mutex if you have a dedicated dev_priv
> > > ->mm.fault_list.
> > > 
> > > Something along the lines of:
> > 
> > Ok, looks interesting. But as you said we would have to then make
> > callers of i915_gem_release_mmap() wake up the device, which isn't
> > strictly necessary. (and imo as such goes somewhat against the
> > documentation argument)
> 
> Outside of suspend, we only revoke the CPU PTE mapping when we change
> the hardware (as doing so is very expensive). So all callers should
> already have a reason (and be holding) the rpm wakelock, the only
> complication from my point of view is enforcing that and reviewing
> that
> what I said is true.

Looked through it, it seems only i915_gem_set_tiling() could release
the PTE's without waking up the hardware (if no need to unbind the
object). Otherwise it's true that all callers hold (or should hold)
already an RPM ref. To fix the set tiling case to work after your
optimization we could wake up the HW unconditionally there, use a
no_resume RPM ref+and RPM barrier or a separate new lock for the fault
list.

> From the rpm point of view, this should improve the success of 
> runtime suspend, and reduce wakelocks.

Yes, seems like a worthy optimization, since I assume struct_mutex can
be held for a long time without the need to wake up the hardware.

Are you ok to first have the fix I posted and a similar one for
 i915_gem_set_tiling()? And then to follow-up with your plan.

--Imre
Chris Wilson Nov. 9, 2015, 1:25 p.m. UTC | #5
On Mon, Nov 09, 2015 at 03:09:18PM +0200, Imre Deak wrote:
> Looked through it, it seems only i915_gem_set_tiling() could release
> the PTE's without waking up the hardware (if no need to unbind the
> object). Otherwise it's true that all callers hold (or should hold)
> already an RPM ref. To fix the set tiling case to work after your
> optimization we could wake up the HW unconditionally there, use a
> no_resume RPM ref+and RPM barrier or a separate new lock for the fault
> list.

I was suggesting we move to the model where writes through gsm took the
rpm reference itself.
 
> > From the rpm point of view, this should improve the success of 
> > runtime suspend, and reduce wakelocks.
> 
> Yes, seems like a worthy optimization, since I assume struct_mutex can
> be held for a long time without the need to wake up the hardware.

Admittedly most of the time we hold struct_mutex, the hw will be awake
for other reasons. But there are many times where we do take the
struct_mutex for 10s (if not 100s!) of milliseconds where the hw is
completely idle, and so every chance to reduce usage/contention on
struct_mutex is a little victory.
 
> Are you ok to first have the fix I posted and a similar one for
>  i915_gem_set_tiling()? And then to follow-up with your plan.

Yes, adding the extra reference to that ioctl, juggling with the
struct_mutex and then moving the rpm reference to where it is required
lgtm.
-Chris
Imre Deak Nov. 9, 2015, 1:36 p.m. UTC | #6
On ma, 2015-11-09 at 13:25 +0000, Chris Wilson wrote:
> On Mon, Nov 09, 2015 at 03:09:18PM +0200, Imre Deak wrote:
> > Looked through it, it seems only i915_gem_set_tiling() could
> > release
> > the PTE's without waking up the hardware (if no need to unbind the
> > object). Otherwise it's true that all callers hold (or should hold)
> > already an RPM ref. To fix the set tiling case to work after your
> > optimization we could wake up the HW unconditionally there, use a
> > no_resume RPM ref+and RPM barrier or a separate new lock for the
> > fault
> > list.
> 
> I was suggesting we move to the model where writes through gsm took
> the
> rpm reference itself.

Yes, but even then you want to have a lock around updating the new
fault list, no? So if we go with your way and push down the RPM ref
where GSM is written, we wouldn't have a lock around the fault_list
update in i915_gem_set_tiling() (via i915_gem_release_mmap()). That's
where I meant we need an extra ref/lock above.
 
> > > From the rpm point of view, this should improve the success of 
> > > runtime suspend, and reduce wakelocks.
> > 
> > Yes, seems like a worthy optimization, since I assume struct_mutex
> > can
> > be held for a long time without the need to wake up the hardware.
> 
> Admittedly most of the time we hold struct_mutex, the hw will be
> awake
> for other reasons. But there are many times where we do take the
> struct_mutex for 10s (if not 100s!) of milliseconds where the hw is
> completely idle, and so every chance to reduce usage/contention on
> struct_mutex is a little victory.
>  
> > Are you ok to first have the fix I posted and a similar one for
> >  i915_gem_set_tiling()? And then to follow-up with your plan.
> 
> Yes, adding the extra reference to that ioctl, juggling with the
> struct_mutex and then moving the rpm reference to where it is
> required
> lgtm.

Ok, will post a new one for set_tiling.

--Imre
Chris Wilson Nov. 9, 2015, 1:43 p.m. UTC | #7
On Mon, Nov 09, 2015 at 03:36:10PM +0200, Imre Deak wrote:
> On ma, 2015-11-09 at 13:25 +0000, Chris Wilson wrote:
> > On Mon, Nov 09, 2015 at 03:09:18PM +0200, Imre Deak wrote:
> > > Looked through it, it seems only i915_gem_set_tiling() could
> > > release
> > > the PTE's without waking up the hardware (if no need to unbind the
> > > object). Otherwise it's true that all callers hold (or should hold)
> > > already an RPM ref. To fix the set tiling case to work after your
> > > optimization we could wake up the HW unconditionally there, use a
> > > no_resume RPM ref+and RPM barrier or a separate new lock for the
> > > fault
> > > list.
> > 
> > I was suggesting we move to the model where writes through gsm took
> > the
> > rpm reference itself.
> 
> Yes, but even then you want to have a lock around updating the new
> fault list, no? So if we go with your way and push down the RPM ref
> where GSM is written, we wouldn't have a lock around the fault_list
> update in i915_gem_set_tiling() (via i915_gem_release_mmap()). That's
> where I meant we need an extra ref/lock above.

Ok, I remember some of the specifics I had in mind about having to do it
inside the if (vma->map_and_fencable) branch of i915_vma_unbind() as
opposed to be able to push it into ggtt_unbind_vma... Hmm, pushing that
itself down to ggtt_unbind_vma isn't too silly. Equally moving the
vma->ggtt_view.pages = NULL would also help with symmetry.

Plenty of opportunity here for cleanup that should pay off nicely wrt to
rpm handling :)
-Chris
Daniel Vetter Nov. 17, 2015, 10:16 p.m. UTC | #8
On Thu, Nov 5, 2015 at 12:56 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> The tricky part is reviewing the i915_gem_release_mmap() callers to
> ensure that they have the right barrier. If you make
> i915_gem_release_mmap() assert it holds an rpm ref, and then make the
> i915_gem_release_all_mmaps() unlink the node directly that should do the
> trick.

I think the easier solution would be to add a mutex for the
fault_list. We call release_mmap from a lot of places, and for many it
doesn't make sense or would be nontrivial to guarantee that we hold an
rpm reference. The lock won't be a problem as long as it's always
nested within the rpm_get/put and never the other way round (which was
all the trouble with dev->struct_mutex).

Since this is creating noise in the CI system with

pci_pm_runtime_suspend(): intel_runtime_suspend+0x0/0x240 [i915] returns -11

can you please bake this into a proper patch?

Thanks, Daniel
Chris Wilson Nov. 17, 2015, 10:38 p.m. UTC | #9
On Tue, Nov 17, 2015 at 11:16:09PM +0100, Daniel Vetter wrote:
> On Thu, Nov 5, 2015 at 12:56 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > The tricky part is reviewing the i915_gem_release_mmap() callers to
> > ensure that they have the right barrier. If you make
> > i915_gem_release_mmap() assert it holds an rpm ref, and then make the
> > i915_gem_release_all_mmaps() unlink the node directly that should do the
> > trick.
> 
> I think the easier solution would be to add a mutex for the
> fault_list. We call release_mmap from a lot of places,

We don't though. The only times we do are when we touching hw registers
(or gsm):

set_tiling_ioctl - which also may unbind and so needs rpm
fence_write - which needs rpm to write the reigsters
vma_unbind - which needs rpm to write through the gsm
set_caching - which needs rpm to write through the gsm

and currently by rpm itself.

I think it is certainly reasonable to use the rpm barriers for the
faulted list. The only one where we have to actually ensure we hold rpm
is the set_tiling_ioctl.
-Chris
Imre Deak Nov. 17, 2015, 10:49 p.m. UTC | #10
On Tue, 2015-11-17 at 22:38 +0000, Chris Wilson wrote:
> On Tue, Nov 17, 2015 at 11:16:09PM +0100, Daniel Vetter wrote:
> > On Thu, Nov 5, 2015 at 12:56 PM, Chris Wilson <chris@chris-wilson.c
> > o.uk> wrote:
> > > The tricky part is reviewing the i915_gem_release_mmap() callers
> > > to
> > > ensure that they have the right barrier. If you make
> > > i915_gem_release_mmap() assert it holds an rpm ref, and then make
> > > the
> > > i915_gem_release_all_mmaps() unlink the node directly that should
> > > do the
> > > trick.
> > 
> > I think the easier solution would be to add a mutex for the
> > fault_list. We call release_mmap from a lot of places,
> 
> We don't though. The only times we do are when we touching hw
> registers
> (or gsm):
> 
> set_tiling_ioctl - which also may unbind and so needs rpm
> fence_write - which needs rpm to write the reigsters
> vma_unbind - which needs rpm to write through the gsm
> set_caching - which needs rpm to write through the gsm
> 
> and currently by rpm itself.
> 
> I think it is certainly reasonable to use the rpm barriers for the
> faulted list. The only one where we have to actually ensure we hold
> rpm
> is the set_tiling_ioctl.

Btw, since this would be a bigger change shouldn't we first add the new
RPM wakelock asserts? That's mostly reviewed already anyway, I still
have to check the RPS work which would need the same handling as the
hang check work (Chris' idea to use rpm_try_get there).

So how about the following until that, which is the correct way in any
case imo:

 void __suspend_report_result(const char *function, void *fn, int ret)
 {
-       if (ret)
+       if (ret == -EBUSY || ret == -EAGAIN)
+               printk(KERN_DEBUG "%s(): %pF returns %d\n", function, fn, ret);
+       else if (ret)
                printk(KERN_ERR "%s(): %pF returns %d\n", function, fn, ret);
 }


--Imre
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 86734be..fe91ce5 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -180,11 +180,11 @@  describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
        }
        if (obj->stolen)
                seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
-       if (obj->pin_display || obj->fault_mappable) {
+       if (obj->pin_display || !list_empty(&obj->fault_link)) {
                char s[3], *t = s;
                if (obj->pin_display)
                        *t++ = 'p';
-               if (obj->fault_mappable)
+               if (!list_empty(&obj->fault_link))
                        *t++ = 'f';
                *t = '\0';
                seq_printf(m, " (%s mappable)", s);
@@ -474,7 +474,7 @@  static int i915_gem_object_info(struct seq_file *m, void* data)
 
        size = count = mappable_size = mappable_count = 0;
        list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-               if (obj->fault_mappable) {
+               if (!list_empty(&obj->fault_link)) {
                        size += i915_gem_obj_ggtt_size(obj);
                        ++count;
                }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 1d88745..179594e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1447,28 +1447,10 @@  static int intel_runtime_suspend(struct device *device)
        DRM_DEBUG_KMS("Suspending device\n");
 
        /*
-        * We could deadlock here in case another thread holding struct_mutex
-        * calls RPM suspend concurrently, since the RPM suspend will wait
-        * first for this RPM suspend to finish. In this case the concurrent
-        * RPM resume will be followed by its RPM suspend counterpart. Still
-        * for consistency return -EAGAIN, which will reschedule this suspend.
-        */
-       if (!mutex_trylock(&dev->struct_mutex)) {
-               DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
-               /*
-                * Bump the expiration timestamp, otherwise the suspend won't
-                * be rescheduled.
-                */
-               pm_runtime_mark_last_busy(device);
-
-               return -EAGAIN;
-       }
-       /*
         * We are safe here against re-faults, since the fault handler takes
         * an RPM reference.
         */
        i915_gem_release_all_mmaps(dev_priv);
-       mutex_unlock(&dev->struct_mutex);
 
        intel_suspend_gt_powersave(dev);
        intel_runtime_pm_disable_interrupts(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 55611d8..5e4904a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1268,6 +1268,8 @@  struct i915_gem_mm {
         */
        struct list_head unbound_list;
 
+       struct list_head fault_list;
+
        /** Usable portion of the GTT for GEM */
        unsigned long stolen_base; /* limited to low memory (32-bit) */
 
@@ -2025,6 +2027,8 @@  struct drm_i915_gem_object {
 
        struct list_head batch_pool_link;
 
+       struct list_head fault_link;
+
        /**
         * This is set if the object is on the active lists (has pending
         * rendering and so a non-zero seqno), and is not set if it i s on
@@ -2069,13 +2073,6 @@  struct drm_i915_gem_object {
         */
        unsigned int map_and_fenceable:1;
 
-       /**
-        * Whether the current gtt mapping needs to be mappable (and isn't just
-        * mappable by accident). Track pin and fault separate for a more
-        * accurate mappable working set.
-        */
-       unsigned int fault_mappable:1;
-
        /*
         * Is the object to be mapped as read-only to the GPU
         * Only honoured if hardware has relevant pte bit
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 407b6b3..a90d1d8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1814,9 +1814,10 @@  int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
                                break;
                }
 
-               obj->fault_mappable = true;
+               if (list_empty(&obj->fault_link))
+                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
        } else {
-               if (!obj->fault_mappable) {
+               if (list_empty(&obj->fault_link)) {
                        unsigned long size = min_t(unsigned long,
                                                   vma->vm_end - vma->vm_start,
                                                   obj->base.size);
@@ -1830,7 +1831,7 @@  int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
                                        break;
                        }
 
-                       obj->fault_mappable = true;
+                       list_add(&obj->fault_link, &dev_priv->mm.fault_list);
                } else
                        ret = vm_insert_pfn(vma,
                                            (unsigned long)vmf->virtual_address,
@@ -1903,20 +1904,20 @@  out:
 void
 i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 {
-       if (!obj->fault_mappable)
+       if (list_empty(&obj->fault_link))
                return;
 
        drm_vma_node_unmap(&obj->base.vma_node,
                           obj->base.dev->anon_inode->i_mapping);
-       obj->fault_mappable = false;
+       list_del_init(&obj->fault_link);
 }
 
 void
 i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
 {
-       struct drm_i915_gem_object *obj;
+       struct drm_i915_gem_object *obj, *n;
 
-       list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
+       list_for_each_entry_safe(obj, n, &dev_priv->mm.fault_list, fault_link)
                i915_gem_release_mmap(obj);
 }
 
@@ -4224,6 +4229,7 @@  void i915_gem_object_init(struct drm_i915_gem_object *obj,
        INIT_LIST_HEAD(&obj->obj_exec_link);
        INIT_LIST_HEAD(&obj->vma_list);
        INIT_LIST_HEAD(&obj->batch_pool_link);
+       INIT_LIST_HEAD(&obj->fault_link);
 
        obj->ops = ops;
 
@@ -4858,6 +4864,7 @@  i915_gem_load(struct drm_device *dev)
        INIT_LIST_HEAD(&dev_priv->context_list);
        INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
        INIT_LIST_HEAD(&dev_priv->mm.bound_list);
+       INIT_LIST_HEAD(&dev_priv->mm.fault_list);
        INIT_LIST_HEAD(&dev_priv->mm.fence_list);
        for (i = 0; i < I915_NUM_RINGS; i++)
                init_ring_lists(&dev_priv->ring[i]);