Message ID | 1461172195-3959-5-git-send-email-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com: > From: John Harrison <John.C.Harrison@Intel.com> > > If an execbuff IOCTL call fails for some reason, it would leave the > request in the client list. The request clean up code would remove > this but only later on and only after the reference count has dropped > to zero. The entire sequence is contained within the driver mutex > lock. However, there is still a hole such that any code which does not > require the mutex lock could still find the request on the client list > and start using it. That would lead to broken reference counts, use of > dangling pointers and all sorts of other nastiness. > > The throttle IOCTL in particular does not acquire the mutex and does > process the client list. And the likely situation of the execbuff > IOCTL failing is when the system is busy with lots of work > outstanding. That is exactly the situation where the throttle IOCTL > would try to wait on a request. > > Currently, this hole is tiny - the gap between the reference count > dropping to zero and the free function being called in response. > However the next patch in this series enlarges that gap considerably > by deferring the free function (to remove the need for the mutex lock > when unreferencing requests). > Seems to already have been fixed by commit aa9b78104fe3210758fa9e6c644e9a108d371e8b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 13 17:35:15 2016 +0100 drm/i915: Late request cancellations are harmful
On 21/04/2016 08:20, Maarten Lankhorst wrote: > Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com: >> From: John Harrison <John.C.Harrison@Intel.com> >> >> If an execbuff IOCTL call fails for some reason, it would leave the >> request in the client list. The request clean up code would remove >> this but only later on and only after the reference count has dropped >> to zero. The entire sequence is contained within the driver mutex >> lock. However, there is still a hole such that any code which does not >> require the mutex lock could still find the request on the client list >> and start using it. That would lead to broken reference counts, use of >> dangling pointers and all sorts of other nastiness. >> >> The throttle IOCTL in particular does not acquire the mutex and does >> process the client list. And the likely situation of the execbuff >> IOCTL failing is when the system is busy with lots of work >> outstanding. That is exactly the situation where the throttle IOCTL >> would try to wait on a request. >> >> Currently, this hole is tiny - the gap between the reference count >> dropping to zero and the free function being called in response. >> However the next patch in this series enlarges that gap considerably >> by deferring the free function (to remove the need for the mutex lock >> when unreferencing requests). >> > Seems to already have been fixed by > > commit aa9b78104fe3210758fa9e6c644e9a108d371e8b > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Apr 13 17:35:15 2016 +0100 > > drm/i915: Late request cancellations are harmful > I don't believe that patch is the best way to solve the problem.
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1c3a1ca..707bf0f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2358,6 +2358,7 @@ static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) int i915_gem_request_add_to_client(struct drm_i915_gem_request *req, struct drm_file *file); +void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request); static inline uint32_t i915_gem_request_get_seqno(struct drm_i915_gem_request *req) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 99e9ddb..74db6a3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1399,8 +1399,7 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req, return 0; } -static inline void -i915_gem_request_remove_from_client(struct drm_i915_gem_request *request) +void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request) { struct drm_i915_file_private *file_priv = request->file_priv; @@ -2735,7 +2734,7 @@ static void i915_gem_request_free(struct fence *req_fence) WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex)); - if (req->file_priv) + if (WARN_ON(req->file_priv)) i915_gem_request_remove_from_client(req); if (ctx) { diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index bb95d27..40d333c 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1659,8 +1659,12 @@ err: * must be freed again. If it was submitted then it is being tracked * on the active request list and no clean up is required here. */ - if (ret && !IS_ERR_OR_NULL(req)) + if (ret && !IS_ERR_OR_NULL(req)) { + if (req->file_priv) + i915_gem_request_remove_from_client(req); + i915_gem_request_cancel(req); + } mutex_unlock(&dev->struct_mutex);