diff mbox

Revert "drm/i915: Allocate context objects from stolen"

Message ID 1435598915-9347-1-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä June 29, 2015, 5:28 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Stolen gets trashed during hibernation, so storing contexts there
is not a very good idea. On my IVB machines this leads to a totally
dead GPU on resume. A reboot is required to resurrect it. So let's
not store contexts where they will get trampled.

This reverts commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Chris Wilson June 29, 2015, 7:53 p.m. UTC | #1
On Mon, Jun 29, 2015 at 08:28:35PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Stolen gets trashed during hibernation, so storing contexts there
> is not a very good idea. On my IVB machines this leads to a totally
> dead GPU on resume. A reboot is required to resurrect it. So let's
> not store contexts where they will get trampled.

We need to disable use of stolen entirely then. Fortunately ring buffers
are idle (and so trashable), but all plans for actually using stolen are
off the cards then (e.g. the create2 request to allocate from stolen).

I think it is fairer to say that stolen memory is not restored, and to
do so for hibernation of non-volatile objects would require allocation
of ordinary buffers. Ack on the patch in the short term, does a long
term plan of migrating I915_MADV_WILLNEED stolen objects to normal
memory on hibernation sound acceptable? (I don't like how it makes
hibernation special in terms of shutdown/suspend, but meh)

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Chris Wilson June 29, 2015, 8:05 p.m. UTC | #2
On Mon, Jun 29, 2015 at 08:53:12PM +0100, Chris Wilson wrote:
> On Mon, Jun 29, 2015 at 08:28:35PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Stolen gets trashed during hibernation, so storing contexts there
> > is not a very good idea. On my IVB machines this leads to a totally
> > dead GPU on resume. A reboot is required to resurrect it. So let's
> > not store contexts where they will get trampled.
> 
> We need to disable use of stolen entirely then. Fortunately ring buffers
> are idle (and so trashable), but all plans for actually using stolen are
> off the cards then (e.g. the create2 request to allocate from stolen).
> 
> I think it is fairer to say that stolen memory is not restored, and to
> do so for hibernation of non-volatile objects would require allocation
> of ordinary buffers. Ack on the patch in the short term, does a long
> term plan of migrating I915_MADV_WILLNEED stolen objects to normal
> memory on hibernation sound acceptable? (I don't like how it makes
> hibernation special in terms of shutdown/suspend, but meh)

I can get a patch to do the migration ready tomorrow, though it will
certainly be quite a few interesting patches.

What's the timeframe for a fix/revert here? (i.e. did this get merged
for 4.2 or was it 4.3?)
-Chris
akash.goel@intel.com June 30, 2015, 6:37 a.m. UTC | #3
On Mon, 2015-06-29 at 21:05 +0100, Chris Wilson wrote:
> On Mon, Jun 29, 2015 at 08:53:12PM +0100, Chris Wilson wrote:
> > On Mon, Jun 29, 2015 at 08:28:35PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Stolen gets trashed during hibernation, so storing contexts there
> > > is not a very good idea. On my IVB machines this leads to a totally
> > > dead GPU on resume. A reboot is required to resurrect it. So let's
> > > not store contexts where they will get trampled.
> > 
> > We need to disable use of stolen entirely then. Fortunately ring buffers
> > are idle (and so trashable), but all plans for actually using stolen are
> > off the cards then (e.g. the create2 request to allocate from stolen).
> > 
> > I think it is fairer to say that stolen memory is not restored, and to
> > do so for hibernation of non-volatile objects would require allocation
> > of ordinary buffers. Ack on the patch in the short term, does a long
> > term plan of migrating I915_MADV_WILLNEED stolen objects to normal
> > memory on hibernation sound acceptable? (I don't like how it makes
> > hibernation special in terms of shutdown/suspend, but meh)
> 
> I can get a patch to do the migration ready tomorrow, though it will
> certainly be quite a few interesting patches.
> 
Hi Chris,

Thanks for keeping us in the loop. 
As you are aware, that we started working on the patches for utilizing
the Stolen area, based on the placement preference flag passed by the
User (extended gem_create ioctl for it). We are trying to get them
reviewed. Would this use case of hibernation thwart that efforts ?

Are you planning to add the support for taking the back up of the
objects allocated from Stolen memory area, by copying them to shmem at
the time of hibernation. Should the objects be copied back to Stolen
area on resume ? 
Is this something, which we can implement and add a new patch to the
series ? 

Best regards
Akash

> What's the timeframe for a fix/revert here? (i.e. did this get merged
> for 4.2 or was it 4.3?)
> -Chris


>
Chris Wilson June 30, 2015, 8:31 a.m. UTC | #4
On Tue, Jun 30, 2015 at 12:07:06PM +0530, Akash Goel wrote:
> On Mon, 2015-06-29 at 21:05 +0100, Chris Wilson wrote:
> > On Mon, Jun 29, 2015 at 08:53:12PM +0100, Chris Wilson wrote:
> > > On Mon, Jun 29, 2015 at 08:28:35PM +0300, ville.syrjala@linux.intel.com wrote:
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > 
> > > > Stolen gets trashed during hibernation, so storing contexts there
> > > > is not a very good idea. On my IVB machines this leads to a totally
> > > > dead GPU on resume. A reboot is required to resurrect it. So let's
> > > > not store contexts where they will get trampled.
> > > 
> > > We need to disable use of stolen entirely then. Fortunately ring buffers
> > > are idle (and so trashable), but all plans for actually using stolen are
> > > off the cards then (e.g. the create2 request to allocate from stolen).
> > > 
> > > I think it is fairer to say that stolen memory is not restored, and to
> > > do so for hibernation of non-volatile objects would require allocation
> > > of ordinary buffers. Ack on the patch in the short term, does a long
> > > term plan of migrating I915_MADV_WILLNEED stolen objects to normal
> > > memory on hibernation sound acceptable? (I don't like how it makes
> > > hibernation special in terms of shutdown/suspend, but meh)
> > 
> > I can get a patch to do the migration ready tomorrow, though it will
> > certainly be quite a few interesting patches.
> > 
> Hi Chris,
> 
> Thanks for keeping us in the loop. 
> As you are aware, that we started working on the patches for utilizing
> the Stolen area, based on the placement preference flag passed by the
> User (extended gem_create ioctl for it). We are trying to get them
> reviewed. Would this use case of hibernation thwart that efforts ?

Yes. Since if we are not careful we can effectively loose the user data
across hibernation. So long as we can allocate fresh storage for those
objects during hibernation (and we can use the madv flags to limit what
we need to copy) then creating stolen objects for general use is fine.
I'll leave the debate as to whether we want to migrate such objects back
to stolen on thaw/first-use for later. (Gut feeling hibernate + stolen
will be so rare that the justification for migrating back will be
small.)
 
> Are you planning to add the support for taking the back up of the
> objects allocated from Stolen memory area, by copying them to shmem at
> the time of hibernation. Should the objects be copied back to Stolen
> area on resume ? 

Yes. Maybe.

> Is this something, which we can implement and add a new patch to the
> series ? 

It's something I need now as an alternative to reverting
context-in-stolen.
-Chris
Jani Nikula June 30, 2015, 8:31 a.m. UTC | #5
On Mon, 29 Jun 2015, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> Stolen gets trashed during hibernation, so storing contexts there
> is not a very good idea. On my IVB machines this leads to a totally
> dead GPU on resume. A reboot is required to resurrect it. So let's
> not store contexts where they will get trampled.
>
> This reverts commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261.

Looks like a *partial* revert of

commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:21:11 2015 +0100

    drm/i915: Allocate context objects from stolen

and the not reverted hunk has been touched since in 

commit 3126a660f352b3fe48125a8a0b4fdbf85935d8bf
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Thu Apr 30 17:30:50 2015 +0300

    drm/i915: checking IS_ERR() instead of NULL

The commit message should say it's a partial revert and not claim
otherwise, and document why not everything was reverted.

The above commits are in Linus' tree.

BR,
Jani.

>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index a7e58a8..5c7ccf4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -157,9 +157,7 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
>  	struct drm_i915_gem_object *obj;
>  	int ret;
>  
> -	obj = i915_gem_object_create_stolen(dev, size);
> -	if (obj == NULL)
> -		obj = i915_gem_alloc_object(dev, size);
> +	obj = i915_gem_alloc_object(dev, size);
>  	if (obj == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> -- 
> 2.3.6
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson June 30, 2015, 9:44 a.m. UTC | #6
On Tue, Jun 30, 2015 at 11:31:54AM +0300, Jani Nikula wrote:
> On Mon, 29 Jun 2015, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > Stolen gets trashed during hibernation, so storing contexts there
> > is not a very good idea. On my IVB machines this leads to a totally
> > dead GPU on resume. A reboot is required to resurrect it. So let's
> > not store contexts where they will get trampled.
> >
> > This reverts commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261.
> 
> Looks like a *partial* revert of
> 
> commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Apr 7 16:21:11 2015 +0100
> 
>     drm/i915: Allocate context objects from stolen
> 
> and the not reverted hunk has been touched since in 
> 
> commit 3126a660f352b3fe48125a8a0b4fdbf85935d8bf
> Author: Dan Carpenter <dan.carpenter@oracle.com>
> Date:   Thu Apr 30 17:30:50 2015 +0300
> 
>     drm/i915: checking IS_ERR() instead of NULL
> 
> The commit message should say it's a partial revert and not claim
> otherwise, and document why not everything was reverted.
> 
> The above commits are in Linus' tree.

The fixes I have planned to migrate context objects on hibernation are
not of the simple variety! Revert now, fix later.

Given Ville's discovery, it also makes

commit 8040513870399f1cb032cb8bc805df5042fedcdf
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 15 11:32:29 2012 +0000

    drm/i915: Allocate overlay registers from stolen memory

suspect.
-Chris
Ville Syrjälä June 30, 2015, 10:07 a.m. UTC | #7
On Tue, Jun 30, 2015 at 10:44:15AM +0100, Chris Wilson wrote:
> On Tue, Jun 30, 2015 at 11:31:54AM +0300, Jani Nikula wrote:
> > On Mon, 29 Jun 2015, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >
> > > Stolen gets trashed during hibernation, so storing contexts there
> > > is not a very good idea. On my IVB machines this leads to a totally
> > > dead GPU on resume. A reboot is required to resurrect it. So let's
> > > not store contexts where they will get trampled.
> > >
> > > This reverts commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261.
> > 
> > Looks like a *partial* revert of
> > 
> > commit 149c86e74fe44dcbac5e9f8d145c5fbc5dc21261
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Apr 7 16:21:11 2015 +0100
> > 
> >     drm/i915: Allocate context objects from stolen
> > 
> > and the not reverted hunk has been touched since in 
> > 
> > commit 3126a660f352b3fe48125a8a0b4fdbf85935d8bf
> > Author: Dan Carpenter <dan.carpenter@oracle.com>
> > Date:   Thu Apr 30 17:30:50 2015 +0300
> > 
> >     drm/i915: checking IS_ERR() instead of NULL
> > 
> > The commit message should say it's a partial revert and not claim
> > otherwise, and document why not everything was reverted.
> > 
> > The above commits are in Linus' tree.
> 
> The fixes I have planned to migrate context objects on hibernation are
> not of the simple variety! Revert now, fix later.
> 
> Given Ville's discovery, it also makes
> 
> commit 8040513870399f1cb032cb8bc805df5042fedcdf
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Nov 15 11:32:29 2012 +0000
> 
>     drm/i915: Allocate overlay registers from stolen memory
> 
> suspect.

Hmm. It does look like it might not reinitialize the entire register
block on suspend.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a7e58a8..5c7ccf4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -157,9 +157,7 @@  i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	obj = i915_gem_object_create_stolen(dev, size);
-	if (obj == NULL)
-		obj = i915_gem_alloc_object(dev, size);
+	obj = i915_gem_alloc_object(dev, size);
 	if (obj == NULL)
 		return ERR_PTR(-ENOMEM);