diff mbox

[2/2] drm/radeon: fix deadlock when bo is associated to different handle

Message ID 1354039626-19920-2-git-send-email-j.glisse@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jerome Glisse Nov. 27, 2012, 6:07 p.m. UTC
From: Jerome Glisse <jglisse@redhat.com>

There is a rare case, that seems to only happen accross suspend/resume
cycle, where a bo is associated with several different handle. This
lead to a deadlock in ttm buffer reservation path. This could only
happen with flinked(globaly exported) object. Userspace should not
reopen multiple time a globaly exported object.

However the kernel should handle gracefully this corner case and not
keep rejecting the userspace command stream. This is the object of
this patch.

Fix suspend/resume issue where user see following message :
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c | 53 ++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 22 deletions(-)

Comments

Christian König Nov. 28, 2012, 10:27 a.m. UTC | #1
On 27.11.2012 19:07, j.glisse@gmail.com wrote:
> From: Jerome Glisse <jglisse@redhat.com>
>
> There is a rare case, that seems to only happen accross suspend/resume
> cycle, where a bo is associated with several different handle. This
> lead to a deadlock in ttm buffer reservation path. This could only
> happen with flinked(globaly exported) object. Userspace should not
> reopen multiple time a globaly exported object.
>
> However the kernel should handle gracefully this corner case and not
> keep rejecting the userspace command stream. This is the object of
> this patch.
>
> Fix suspend/resume issue where user see following message :
> [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!
>
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>

See comment below.

> ---
>   drivers/gpu/drm/radeon/radeon_cs.c | 53 ++++++++++++++++++++++----------------
>   1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index 41672cc..064e64d 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -54,39 +54,48 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>   		return -ENOMEM;
>   	}
>   	for (i = 0; i < p->nrelocs; i++) {
> -		struct drm_radeon_cs_reloc *r;
> -
> +		struct drm_radeon_cs_reloc *reloc;
> +
> +		/* One bo could be associated with several different handle.
> +		 * Only happen for flinked bo that are open several time.
> +		 *
> +		 * FIXME:
> +		 * Maybe we should consider an alternative to idr for gem
> +		 * object to insure a 1:1 uniq mapping btw handle and gem
> +		 * object.
> +		 */
>   		duplicate = false;
> -		r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
> +		reloc = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
> +		p->relocs[i].handle = 0;
> +		p->relocs[i].flags = reloc->flags;
> +		p->relocs[i].gobj = drm_gem_object_lookup(ddev,
> +							  p->filp,
> +							  reloc->handle);
> +		if (p->relocs[i].gobj == NULL) {
> +			DRM_ERROR("gem object lookup failed 0x%x\n",
> +				  reloc->handle);
> +			return -ENOENT;
> +		}
> +		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
> +		p->relocs[i].lobj.bo = p->relocs[i].robj;
> +		p->relocs[i].lobj.wdomain = reloc->write_domain;
> +		p->relocs[i].lobj.rdomain = reloc->read_domains;
> +		p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
> +
>   		for (j = 0; j < i; j++) {
> -			if (r->handle == p->relocs[j].handle) {
> +			if (p->relocs[i].lobj.bo == p->relocs[j].lobj.bo) {
>   				p->relocs_ptr[i] = &p->relocs[j];
>   				duplicate = true;
>   				break;
>   			}
>   		}
> +
>   		if (!duplicate) {
> -			p->relocs[i].gobj = drm_gem_object_lookup(ddev,
> -								  p->filp,
> -								  r->handle);
> -			if (p->relocs[i].gobj == NULL) {
> -				DRM_ERROR("gem object lookup failed 0x%x\n",
> -					  r->handle);
> -				return -ENOENT;
> -			}
>   			p->relocs_ptr[i] = &p->relocs[i];
> -			p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
> -			p->relocs[i].lobj.bo = p->relocs[i].robj;
> -			p->relocs[i].lobj.wdomain = r->write_domain;
> -			p->relocs[i].lobj.rdomain = r->read_domains;
> -			p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
> -			p->relocs[i].handle = r->handle;
> -			p->relocs[i].flags = r->flags;
> +			p->relocs[i].handle = reloc->handle;
>   			radeon_bo_list_add_object(&p->relocs[i].lobj,
>   						  &p->validated);
> -
> -		} else
> -			p->relocs[i].handle = 0;

I'm not sure if the memory p->relocs is pointing to is zero initialized, 
so we should at least initialize whatever member we use to find the 
duplicates. Also I think we don't need the handle in this structure any 
more if we don't use it for comparison (but not 100% sure without 
testing it).

> +		}
>   	}
>   	return radeon_bo_list_validate(&p->validated);
>   }
Jerome Glisse Nov. 28, 2012, 3:38 p.m. UTC | #2
On Wed, Nov 28, 2012 at 11:27:27AM +0100, Christian König wrote:
> On 27.11.2012 19:07, j.glisse@gmail.com wrote:
> >From: Jerome Glisse <jglisse@redhat.com>
> >
> >There is a rare case, that seems to only happen accross suspend/resume
> >cycle, where a bo is associated with several different handle. This
> >lead to a deadlock in ttm buffer reservation path. This could only
> >happen with flinked(globaly exported) object. Userspace should not
> >reopen multiple time a globaly exported object.
> >
> >However the kernel should handle gracefully this corner case and not
> >keep rejecting the userspace command stream. This is the object of
> >this patch.
> >
> >Fix suspend/resume issue where user see following message :
> >[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!
> >
> >Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> 
> See comment below.
> 
> >---
> >  drivers/gpu/drm/radeon/radeon_cs.c | 53 ++++++++++++++++++++++----------------
> >  1 file changed, 31 insertions(+), 22 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> >index 41672cc..064e64d 100644
> >--- a/drivers/gpu/drm/radeon/radeon_cs.c
> >+++ b/drivers/gpu/drm/radeon/radeon_cs.c
> >@@ -54,39 +54,48 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> >  		return -ENOMEM;
> >  	}
> >  	for (i = 0; i < p->nrelocs; i++) {
> >-		struct drm_radeon_cs_reloc *r;
> >-
> >+		struct drm_radeon_cs_reloc *reloc;
> >+
> >+		/* One bo could be associated with several different handle.
> >+		 * Only happen for flinked bo that are open several time.
> >+		 *
> >+		 * FIXME:
> >+		 * Maybe we should consider an alternative to idr for gem
> >+		 * object to insure a 1:1 uniq mapping btw handle and gem
> >+		 * object.
> >+		 */
> >  		duplicate = false;
> >-		r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
> >+		reloc = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
> >+		p->relocs[i].handle = 0;
> >+		p->relocs[i].flags = reloc->flags;
> >+		p->relocs[i].gobj = drm_gem_object_lookup(ddev,
> >+							  p->filp,
> >+							  reloc->handle);
> >+		if (p->relocs[i].gobj == NULL) {
> >+			DRM_ERROR("gem object lookup failed 0x%x\n",
> >+				  reloc->handle);
> >+			return -ENOENT;
> >+		}
> >+		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
> >+		p->relocs[i].lobj.bo = p->relocs[i].robj;
> >+		p->relocs[i].lobj.wdomain = reloc->write_domain;
> >+		p->relocs[i].lobj.rdomain = reloc->read_domains;
> >+		p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
> >+
> >  		for (j = 0; j < i; j++) {
> >-			if (r->handle == p->relocs[j].handle) {
> >+			if (p->relocs[i].lobj.bo == p->relocs[j].lobj.bo) {
> >  				p->relocs_ptr[i] = &p->relocs[j];
> >  				duplicate = true;
> >  				break;
> >  			}
> >  		}
> >+
> >  		if (!duplicate) {
> >-			p->relocs[i].gobj = drm_gem_object_lookup(ddev,
> >-								  p->filp,
> >-								  r->handle);
> >-			if (p->relocs[i].gobj == NULL) {
> >-				DRM_ERROR("gem object lookup failed 0x%x\n",
> >-					  r->handle);
> >-				return -ENOENT;
> >-			}
> >  			p->relocs_ptr[i] = &p->relocs[i];
> >-			p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
> >-			p->relocs[i].lobj.bo = p->relocs[i].robj;
> >-			p->relocs[i].lobj.wdomain = r->write_domain;
> >-			p->relocs[i].lobj.rdomain = r->read_domains;
> >-			p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
> >-			p->relocs[i].handle = r->handle;
> >-			p->relocs[i].flags = r->flags;
> >+			p->relocs[i].handle = reloc->handle;
> >  			radeon_bo_list_add_object(&p->relocs[i].lobj,
> >  						  &p->validated);
> >-
> >-		} else
> >-			p->relocs[i].handle = 0;
> 
> I'm not sure if the memory p->relocs is pointing to is zero
> initialized, so we should at least initialize whatever member we use
> to find the duplicates. Also I think we don't need the handle in
> this structure any more if we don't use it for comparison (but not
> 100% sure without testing it).

No need to initialize p->relocs[i].lobj.bo which is the one use to find
duplicate. When a duplicate is found p->relocs_ptr[i] points to first
relocation with the duplicate bo. p->relocs[i].lobj.bo is always
initialized before looking for duplicate.

I kept the handle around because its usefull for debuging. But it could
as well be removed and just added back whenever someone is doing debugging.

Cheers,
Jerome

> 
> >+		}
> >  	}
> >  	return radeon_bo_list_validate(&p->validated);
> >  }
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 41672cc..064e64d 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -54,39 +54,48 @@  static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 		return -ENOMEM;
 	}
 	for (i = 0; i < p->nrelocs; i++) {
-		struct drm_radeon_cs_reloc *r;
-
+		struct drm_radeon_cs_reloc *reloc;
+
+		/* One bo could be associated with several different handle.
+		 * Only happen for flinked bo that are open several time.
+		 *
+		 * FIXME:
+		 * Maybe we should consider an alternative to idr for gem
+		 * object to insure a 1:1 uniq mapping btw handle and gem
+		 * object.
+		 */
 		duplicate = false;
-		r = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
+		reloc = (struct drm_radeon_cs_reloc *)&chunk->kdata[i*4];
+		p->relocs[i].handle = 0;
+		p->relocs[i].flags = reloc->flags;
+		p->relocs[i].gobj = drm_gem_object_lookup(ddev,
+							  p->filp,
+							  reloc->handle);
+		if (p->relocs[i].gobj == NULL) {
+			DRM_ERROR("gem object lookup failed 0x%x\n",
+				  reloc->handle);
+			return -ENOENT;
+		}
+		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
+		p->relocs[i].lobj.bo = p->relocs[i].robj;
+		p->relocs[i].lobj.wdomain = reloc->write_domain;
+		p->relocs[i].lobj.rdomain = reloc->read_domains;
+		p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
+
 		for (j = 0; j < i; j++) {
-			if (r->handle == p->relocs[j].handle) {
+			if (p->relocs[i].lobj.bo == p->relocs[j].lobj.bo) {
 				p->relocs_ptr[i] = &p->relocs[j];
 				duplicate = true;
 				break;
 			}
 		}
+
 		if (!duplicate) {
-			p->relocs[i].gobj = drm_gem_object_lookup(ddev,
-								  p->filp,
-								  r->handle);
-			if (p->relocs[i].gobj == NULL) {
-				DRM_ERROR("gem object lookup failed 0x%x\n",
-					  r->handle);
-				return -ENOENT;
-			}
 			p->relocs_ptr[i] = &p->relocs[i];
-			p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
-			p->relocs[i].lobj.bo = p->relocs[i].robj;
-			p->relocs[i].lobj.wdomain = r->write_domain;
-			p->relocs[i].lobj.rdomain = r->read_domains;
-			p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
-			p->relocs[i].handle = r->handle;
-			p->relocs[i].flags = r->flags;
+			p->relocs[i].handle = reloc->handle;
 			radeon_bo_list_add_object(&p->relocs[i].lobj,
 						  &p->validated);
-
-		} else
-			p->relocs[i].handle = 0;
+		}
 	}
 	return radeon_bo_list_validate(&p->validated);
 }