diff mbox series

[1/3] drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init

Message ID 20201007120329.17076-1-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init | expand

Commit Message

Ville Syrjälä Oct. 7, 2020, 12:03 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Currently we leave the cache_level of the initial fb obj
set to NONE. This means on eLLC machines the first pin_to_display()
will try to switch it to WT which requires a vma unbind+bind.
If that happens during the fbdev initialization rcu does not
seem operational which causes the unbind to get stuck. To
most appearances this looks like a dead machine on boot.

Avoid the unbind by already marking the object cache_level
as WT when creating it. We still do an excplicit ggtt pin
which will rewrite the PTEs anyway, so they will match whatever
cache level we set.

Cc: <stable@vger.kernel.org> # v5.7+
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Chris Wilson Oct. 13, 2020, 3:47 p.m. UTC | #1
See subject, s/ininitial/iniital/

Quoting Ville Syrjala (2020-10-07 13:03:27)
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Currently we leave the cache_level of the initial fb obj
> set to NONE. This means on eLLC machines the first pin_to_display()
> will try to switch it to WT which requires a vma unbind+bind.
> If that happens during the fbdev initialization rcu does not
> seem operational which causes the unbind to get stuck. To
> most appearances this looks like a dead machine on boot.
> 
> Avoid the unbind by already marking the object cache_level
> as WT when creating it. We still do an excplicit ggtt pin
> which will rewrite the PTEs anyway, so they will match whatever
> cache level we set.
> 
> Cc: <stable@vger.kernel.org> # v5.7+
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 907e1d155443..00c08600c60a 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -3445,6 +3445,14 @@ initial_plane_vma(struct drm_i915_private *i915,
>         if (IS_ERR(obj))
>                 return NULL;
>  
> +       /*
> +        * Mark it WT ahead of time to avoid changing the
> +        * cache_level during fbdev initialization. The
> +        * unbind there would get stuck waiting for rcu.
> +        */
> +       i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
> +                                           I915_CACHE_WT : I915_CACHE_NONE);

Ok, I've been worrying about whether there were any more side-effects,
but I think it all comes out in the wash. The proof is definitely in the
eating, and we will know soon enough if we break someone's virtual
terminal.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Ville Syrjälä Oct. 13, 2020, 4:12 p.m. UTC | #2
On Tue, Oct 13, 2020 at 04:47:49PM +0100, Chris Wilson wrote:
> See subject, s/ininitial/iniital/
> 
> Quoting Ville Syrjala (2020-10-07 13:03:27)
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Currently we leave the cache_level of the initial fb obj
> > set to NONE. This means on eLLC machines the first pin_to_display()
> > will try to switch it to WT which requires a vma unbind+bind.
> > If that happens during the fbdev initialization rcu does not
> > seem operational which causes the unbind to get stuck. To
> > most appearances this looks like a dead machine on boot.
> > 
> > Avoid the unbind by already marking the object cache_level
> > as WT when creating it. We still do an excplicit ggtt pin
> > which will rewrite the PTEs anyway, so they will match whatever
> > cache level we set.
> > 
> > Cc: <stable@vger.kernel.org> # v5.7+
> > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > index 907e1d155443..00c08600c60a 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -3445,6 +3445,14 @@ initial_plane_vma(struct drm_i915_private *i915,
> >         if (IS_ERR(obj))
> >                 return NULL;
> >  
> > +       /*
> > +        * Mark it WT ahead of time to avoid changing the
> > +        * cache_level during fbdev initialization. The
> > +        * unbind there would get stuck waiting for rcu.
> > +        */
> > +       i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
> > +                                           I915_CACHE_WT : I915_CACHE_NONE);
> 
> Ok, I've been worrying about whether there were any more side-effects,
> but I think it all comes out in the wash. The proof is definitely in the
> eating, and we will know soon enough if we break someone's virtual
> terminal.

At least it seems to work on my CFL with eLLC caching enabled.

> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Ta.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 907e1d155443..00c08600c60a 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -3445,6 +3445,14 @@  initial_plane_vma(struct drm_i915_private *i915,
 	if (IS_ERR(obj))
 		return NULL;
 
+	/*
+	 * Mark it WT ahead of time to avoid changing the
+	 * cache_level during fbdev initialization. The
+	 * unbind there would get stuck waiting for rcu.
+	 */
+	i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
+					    I915_CACHE_WT : I915_CACHE_NONE);
+
 	switch (plane_config->tiling) {
 	case I915_TILING_NONE:
 		break;