diff mbox

drm/i915: Invalidate the relocation presumed_offsets along the slow path

Message ID 1358266674-28308-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 15, 2013, 4:17 p.m. UTC
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.

Fixes regression from commit bcf50e2775bbc3101932d8e4ab8c7902aa4163b4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 21 22:07:12 2010 +0000

    drm/i915: Handle pagefaults in execbuffer user relocations

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Daniel Vetter Jan. 15, 2013, 6:21 p.m. UTC | #1
On Tue, Jan 15, 2013 at 04:17:54PM +0000, Chris Wilson wrote:
> In the slow path, we are forced to copy the relocations prior to
> acquiring the struct mutex in order to handle pagefaults. We forgo
> copying the new offsets back into the relocation entries in order to
> prevent a recursive locking bug should we trigger a pagefault whilst
> holding the mutex for the reservations of the execbuffer. Therefore, we
> need to reset the presumed_offsets just in case the objects are rebound
> back into their old locations after relocating for this exexbuffer - if
> that were to happen we would assume the relocations were valid and leave
> the actual pointers to the kernels dangling, instant hang.
> 
> Fixes regression from commit bcf50e2775bbc3101932d8e4ab8c7902aa4163b4
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sun Nov 21 22:07:12 2010 +0000
> 
>     drm/i915: Handle pagefaults in execbuffer user relocations
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@fwll.ch>
> Cc: stable@vger.kernel.org

Awesome piece of debugging!

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4532757..40c062d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -767,6 +767,8 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>  	total = 0;
>  	for (i = 0; i < count; i++) {
>  		struct drm_i915_gem_relocation_entry __user *user_relocs;
> +		u64 invalid_offset = (u64)-1;

I'm a bit uneasy with the semantics here, fearing that a random piece of
userspace ORs in a few flags instead of adding them. Could we align this
to 4096 bytes? Or maybe enshrine 0 as our official invalid reloc ...
-Daniel

> +		int j;
>  
>  		user_relocs = (void __user *)(uintptr_t)exec[i].relocs_ptr;
>  
> @@ -777,6 +779,25 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>  			goto err;
>  		}
>  
> +		/* As we do not update the known relocation offsets after
> +		 * relocating (due to the complexities in lock handling),
> +		 * we need to mark them as invalid now so that we force the
> +		 * relocation processing next time. Just in case the target
> +		 * object is evicted and then rebound into its old
> +		 * presumed_offset before the next execbuffer - if that
> +		 * happened we would make the mistake of assuming that the
> +		 * relocations were valid.
> +		 */
> +		for (j = 0; j < exec[i].relocation_count; j++) {
> +			if (copy_to_user(&user_relocs[j].presumed_offset,
> +					 &invalid_offset,
> +					 sizeof(invalid_offset))) {
> +				ret = -EFAULT;
> +				mutex_lock(&dev->struct_mutex);
> +				goto err;
> +			}
> +		}
> +
>  		reloc_offset[i] = total;
>  		total += exec[i].relocation_count;
>  	}
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 15, 2013, 6:28 p.m. UTC | #2
On Tue, 15 Jan 2013 19:21:25 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Jan 15, 2013 at 04:17:54PM +0000, Chris Wilson wrote:
> > In the slow path, we are forced to copy the relocations prior to
> > acquiring the struct mutex in order to handle pagefaults. We forgo
> > copying the new offsets back into the relocation entries in order to
> > prevent a recursive locking bug should we trigger a pagefault whilst
> > holding the mutex for the reservations of the execbuffer. Therefore, we
> > need to reset the presumed_offsets just in case the objects are rebound
> > back into their old locations after relocating for this exexbuffer - if
> > that were to happen we would assume the relocations were valid and leave
> > the actual pointers to the kernels dangling, instant hang.
> > 
> > Fixes regression from commit bcf50e2775bbc3101932d8e4ab8c7902aa4163b4
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Sun Nov 21 22:07:12 2010 +0000
> > 
> >     drm/i915: Handle pagefaults in execbuffer user relocations
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@fwll.ch>
> > Cc: stable@vger.kernel.org
> 
> Awesome piece of debugging!
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 4532757..40c062d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -767,6 +767,8 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
> >  	total = 0;
> >  	for (i = 0; i < count; i++) {
> >  		struct drm_i915_gem_relocation_entry __user *user_relocs;
> > +		u64 invalid_offset = (u64)-1;
> 
> I'm a bit uneasy with the semantics here, fearing that a random piece of
> userspace ORs in a few flags instead of adding them. Could we align this
> to 4096 bytes? Or maybe enshrine 0 as our official invalid reloc ...

0 is a valid offset, the location of the stolen fbcon. For the sake of
argument, use (u64)-4096. Even if userspace were suddenly to readback
reloc->presumed_offset (instead of exec->offset) and start or'ing in
flags, it would never match the obj->gtt_offset forcing the relocation
to be recomputed by the kernel.
-Chris
Daniel Vetter Jan. 16, 2013, 9:53 a.m. UTC | #3
On Tue, Jan 15, 2013 at 04:17:54PM +0000, Chris Wilson wrote:
> In the slow path, we are forced to copy the relocations prior to
> acquiring the struct mutex in order to handle pagefaults. We forgo
> copying the new offsets back into the relocation entries in order to
> prevent a recursive locking bug should we trigger a pagefault whilst
> holding the mutex for the reservations of the execbuffer. Therefore, we
> need to reset the presumed_offsets just in case the objects are rebound
> back into their old locations after relocating for this exexbuffer - if
> that were to happen we would assume the relocations were valid and leave
> the actual pointers to the kernels dangling, instant hang.
> 
> Fixes regression from commit bcf50e2775bbc3101932d8e4ab8c7902aa4163b4
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sun Nov 21 22:07:12 2010 +0000
> 
>     drm/i915: Handle pagefaults in execbuffer user relocations
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@fwll.ch>
> Cc: stable@vger.kernel.org

Picked up for -fixes, thanks for the patch.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4532757..40c062d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -767,6 +767,8 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>  	total = 0;
>  	for (i = 0; i < count; i++) {
>  		struct drm_i915_gem_relocation_entry __user *user_relocs;
> +		u64 invalid_offset = (u64)-1;
> +		int j;
>  
>  		user_relocs = (void __user *)(uintptr_t)exec[i].relocs_ptr;
>  
> @@ -777,6 +779,25 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>  			goto err;
>  		}
>  
> +		/* As we do not update the known relocation offsets after
> +		 * relocating (due to the complexities in lock handling),
> +		 * we need to mark them as invalid now so that we force the
> +		 * relocation processing next time. Just in case the target
> +		 * object is evicted and then rebound into its old
> +		 * presumed_offset before the next execbuffer - if that
> +		 * happened we would make the mistake of assuming that the
> +		 * relocations were valid.
> +		 */
> +		for (j = 0; j < exec[i].relocation_count; j++) {
> +			if (copy_to_user(&user_relocs[j].presumed_offset,
> +					 &invalid_offset,
> +					 sizeof(invalid_offset))) {
> +				ret = -EFAULT;
> +				mutex_lock(&dev->struct_mutex);
> +				goto err;
> +			}
> +		}
> +
>  		reloc_offset[i] = total;
>  		total += exec[i].relocation_count;
>  	}
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4532757..40c062d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -767,6 +767,8 @@  i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 	total = 0;
 	for (i = 0; i < count; i++) {
 		struct drm_i915_gem_relocation_entry __user *user_relocs;
+		u64 invalid_offset = (u64)-1;
+		int j;
 
 		user_relocs = (void __user *)(uintptr_t)exec[i].relocs_ptr;
 
@@ -777,6 +779,25 @@  i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 			goto err;
 		}
 
+		/* As we do not update the known relocation offsets after
+		 * relocating (due to the complexities in lock handling),
+		 * we need to mark them as invalid now so that we force the
+		 * relocation processing next time. Just in case the target
+		 * object is evicted and then rebound into its old
+		 * presumed_offset before the next execbuffer - if that
+		 * happened we would make the mistake of assuming that the
+		 * relocations were valid.
+		 */
+		for (j = 0; j < exec[i].relocation_count; j++) {
+			if (copy_to_user(&user_relocs[j].presumed_offset,
+					 &invalid_offset,
+					 sizeof(invalid_offset))) {
+				ret = -EFAULT;
+				mutex_lock(&dev->struct_mutex);
+				goto err;
+			}
+		}
+
 		reloc_offset[i] = total;
 		total += exec[i].relocation_count;
 	}