Message ID | 1435683333-17844-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/30/2015 05:55 PM, Chris Wilson wrote: > The userptr worker allows for a slight race condition where upon there > may two or more threads calling get_user_pages for the same object. When > we have the array of pages, then we serialise the update of the object. > However, the worker should only overwrite the obj->userptr.work pointer > if and only if it is the active one. Currently we clear it for a > secondary worker with the effect that we may rarely force a second > lookup. Secondary worker can fire only if invalidate clears the current one, no? (if (obj->userptr.work == NULL && ...)) It then "cancels" the worker so that the st_set_pages path is avoided. > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > index 7a5242cd5ea5..cb367d9f7909 100644 > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > @@ -581,17 +581,17 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) > } > > mutex_lock(&dev->struct_mutex); > - if (obj->userptr.work != &work->work) { > - ret = 0; > - } else if (pinned == num_pages) { > - ret = st_set_pages(&obj->pages, pvec, num_pages); > - if (ret == 0) { > - list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > - pinned = 0; > + if (obj->userptr.work == &work->work) { > + if (pinned == num_pages) { > + ret = st_set_pages(&obj->pages, pvec, num_pages); > + if (ret == 0) { > + list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > + pinned = 0; > + } > } > + obj->userptr.work = ERR_PTR(ret); > } > > - obj->userptr.work = ERR_PTR(ret); > obj->userptr.workers--; > drm_gem_object_unreference(&obj->base); > mutex_unlock(&dev->struct_mutex); Previously the canceled worker would allow another worker to be created in case it failed (obj->userptr.work != &work->work; ret = 0;) and now it still does since obj->userptr.work remains at NULL from cancellation. Both seem wrong, am I missing the change? Regards, Tvrtko
On Wed, Jul 01, 2015 at 10:48:59AM +0100, Tvrtko Ursulin wrote: > > On 06/30/2015 05:55 PM, Chris Wilson wrote: > >The userptr worker allows for a slight race condition where upon there > >may two or more threads calling get_user_pages for the same object. When > >we have the array of pages, then we serialise the update of the object. > >However, the worker should only overwrite the obj->userptr.work pointer > >if and only if it is the active one. Currently we clear it for a > >secondary worker with the effect that we may rarely force a second > >lookup. > > Secondary worker can fire only if invalidate clears the current one, > no? (if (obj->userptr.work == NULL && ...)) > > It then "cancels" the worker so that the st_set_pages path is avoided. I may have overegged the changelog, but what I did not like here was that we would touch obj->userptr.work when we clearly had lost ownership of that field. > >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >--- > > drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > >index 7a5242cd5ea5..cb367d9f7909 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_userptr.c > >+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > >@@ -581,17 +581,17 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) > > } > > > > mutex_lock(&dev->struct_mutex); > >- if (obj->userptr.work != &work->work) { > >- ret = 0; > >- } else if (pinned == num_pages) { > >- ret = st_set_pages(&obj->pages, pvec, num_pages); > >- if (ret == 0) { > >- list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > >- pinned = 0; > >+ if (obj->userptr.work == &work->work) { > >+ if (pinned == num_pages) { > >+ ret = st_set_pages(&obj->pages, pvec, num_pages); > >+ if (ret == 0) { > >+ list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > >+ pinned = 0; > >+ } > > } > >+ obj->userptr.work = ERR_PTR(ret); > > } > > > >- obj->userptr.work = ERR_PTR(ret); > > obj->userptr.workers--; > > drm_gem_object_unreference(&obj->base); > > mutex_unlock(&dev->struct_mutex); > > Previously the canceled worker would allow another worker to be > created in case it failed (obj->userptr.work != &work->work; ret = > 0;) and now it still does since obj->userptr.work remains at NULL > from cancellation. > > Both seem wrong, am I missing the change? No, the obj->userptr.work must remain NULL until a new get_pages() because we don't actually know if this worker's gup was before or after the cancellation - mmap_sem vs struct_mutex ordering. -Chris
On 07/01/2015 10:59 AM, Chris Wilson wrote: > On Wed, Jul 01, 2015 at 10:48:59AM +0100, Tvrtko Ursulin wrote: >> >> On 06/30/2015 05:55 PM, Chris Wilson wrote: >>> The userptr worker allows for a slight race condition where upon there >>> may two or more threads calling get_user_pages for the same object. When >>> we have the array of pages, then we serialise the update of the object. >>> However, the worker should only overwrite the obj->userptr.work pointer >>> if and only if it is the active one. Currently we clear it for a >>> secondary worker with the effect that we may rarely force a second >>> lookup. >> >> Secondary worker can fire only if invalidate clears the current one, >> no? (if (obj->userptr.work == NULL && ...)) >> >> It then "cancels" the worker so that the st_set_pages path is avoided. > > I may have overegged the changelog, but what I did not like here was > that we would touch obj->userptr.work when we clearly had lost ownership > of that field. Yes that part makes sense. >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> --- >>> drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++-------- >>> 1 file changed, 8 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c >>> index 7a5242cd5ea5..cb367d9f7909 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c >>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c >>> @@ -581,17 +581,17 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) >>> } >>> >>> mutex_lock(&dev->struct_mutex); >>> - if (obj->userptr.work != &work->work) { >>> - ret = 0; >>> - } else if (pinned == num_pages) { >>> - ret = st_set_pages(&obj->pages, pvec, num_pages); >>> - if (ret == 0) { >>> - list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); >>> - pinned = 0; >>> + if (obj->userptr.work == &work->work) { >>> + if (pinned == num_pages) { >>> + ret = st_set_pages(&obj->pages, pvec, num_pages); >>> + if (ret == 0) { >>> + list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); >>> + pinned = 0; >>> + } >>> } >>> + obj->userptr.work = ERR_PTR(ret); >>> } >>> >>> - obj->userptr.work = ERR_PTR(ret); >>> obj->userptr.workers--; >>> drm_gem_object_unreference(&obj->base); >>> mutex_unlock(&dev->struct_mutex); >> >> Previously the canceled worker would allow another worker to be >> created in case it failed (obj->userptr.work != &work->work; ret = >> 0;) and now it still does since obj->userptr.work remains at NULL >> from cancellation. >> >> Both seem wrong, am I missing the change? > > No, the obj->userptr.work must remain NULL until a new get_pages() > because we don't actually know if this worker's gup was before or after > the cancellation - mmap_sem vs struct_mutex ordering. No one is not wrong, or no I was not missing the change? I am thinking more and more that we should just mark it canceled forever and not allow get_pages to succeed ever since. Regards, Tvrtko
On Wed, Jul 01, 2015 at 11:58:46AM +0100, Tvrtko Ursulin wrote: > On 07/01/2015 10:59 AM, Chris Wilson wrote: > >On Wed, Jul 01, 2015 at 10:48:59AM +0100, Tvrtko Ursulin wrote: > >>Previously the canceled worker would allow another worker to be > >>created in case it failed (obj->userptr.work != &work->work; ret = > >>0;) and now it still does since obj->userptr.work remains at NULL > >>from cancellation. > >> > >>Both seem wrong, am I missing the change? > > > >No, the obj->userptr.work must remain NULL until a new get_pages() > >because we don't actually know if this worker's gup was before or after > >the cancellation - mmap_sem vs struct_mutex ordering. > > No one is not wrong, or no I was not missing the change? The only change is that we don't change the value of userptr.work if it is set to something else. The only time it should be different was if it had been cancelled and so NULL. The patch just makes it so that a coding error is less damaging - and I think easier to read because of that. > I am thinking more and more that we should just mark it canceled > forever and not allow get_pages to succeed ever since. Yes, I toyed with that yesterday in response to you being able to alias a GTT mmap address with the userptr after munmap(userptr.ptr). The problem is that cancel_userptr() is caller for any change in the CPU PTE's, including mprotect() or cow after forking. Both of those are valid situations where we want to keep the userptr around, but with a new gup. It's tricky to know what the right thing to do is. For example, another quirk is that we can recover a failed get_pages() by repeatedly invoking it after a new aliasing. Again, I'm not sure if the current behaviour is a little too lax. -Chris
On 07/01/2015 12:09 PM, Chris Wilson wrote: > On Wed, Jul 01, 2015 at 11:58:46AM +0100, Tvrtko Ursulin wrote: >> On 07/01/2015 10:59 AM, Chris Wilson wrote: >>> On Wed, Jul 01, 2015 at 10:48:59AM +0100, Tvrtko Ursulin wrote: >>>> Previously the canceled worker would allow another worker to be >>>> created in case it failed (obj->userptr.work != &work->work; ret = >>>> 0;) and now it still does since obj->userptr.work remains at NULL >>> >from cancellation. >>>> >>>> Both seem wrong, am I missing the change? >>> >>> No, the obj->userptr.work must remain NULL until a new get_pages() >>> because we don't actually know if this worker's gup was before or after >>> the cancellation - mmap_sem vs struct_mutex ordering. >> >> No one is not wrong, or no I was not missing the change? > > The only change is that we don't change the value of userptr.work if it > is set to something else. The only time it should be different was if it > had been cancelled and so NULL. The patch just makes it so that a coding > error is less damaging - and I think easier to read because of that. > >> I am thinking more and more that we should just mark it canceled >> forever and not allow get_pages to succeed ever since. > > Yes, I toyed with that yesterday in response to you being able to alias > a GTT mmap address with the userptr after munmap(userptr.ptr). The > problem is that cancel_userptr() is caller for any change in the CPU > PTE's, including mprotect() or cow after forking. Both of those are > valid situations where we want to keep the userptr around, but with a > new gup. Why do we want that? I would be surprised if someone is using it like that. How would it be defined on the GEM handle level even? Regards, Tvrtko
On Wed, Jul 01, 2015 at 01:26:59PM +0100, Tvrtko Ursulin wrote: > > On 07/01/2015 12:09 PM, Chris Wilson wrote: > >On Wed, Jul 01, 2015 at 11:58:46AM +0100, Tvrtko Ursulin wrote: > >>On 07/01/2015 10:59 AM, Chris Wilson wrote: > >>>On Wed, Jul 01, 2015 at 10:48:59AM +0100, Tvrtko Ursulin wrote: > >>>>Previously the canceled worker would allow another worker to be > >>>>created in case it failed (obj->userptr.work != &work->work; ret = > >>>>0;) and now it still does since obj->userptr.work remains at NULL > >>>>from cancellation. > >>>> > >>>>Both seem wrong, am I missing the change? > >>> > >>>No, the obj->userptr.work must remain NULL until a new get_pages() > >>>because we don't actually know if this worker's gup was before or after > >>>the cancellation - mmap_sem vs struct_mutex ordering. > >> > >>No one is not wrong, or no I was not missing the change? > > > >The only change is that we don't change the value of userptr.work if it > >is set to something else. The only time it should be different was if it > >had been cancelled and so NULL. The patch just makes it so that a coding > >error is less damaging - and I think easier to read because of that. > > > >>I am thinking more and more that we should just mark it canceled > >>forever and not allow get_pages to succeed ever since. > > > >Yes, I toyed with that yesterday in response to you being able to alias > >a GTT mmap address with the userptr after munmap(userptr.ptr). The > >problem is that cancel_userptr() is caller for any change in the CPU > >PTE's, including mprotect() or cow after forking. Both of those are > >valid situations where we want to keep the userptr around, but with a > >new gup. > > Why do we want that? I would be surprised if someone is using it > like that. How would it be defined on the GEM handle level even? I would be surprised as well, but it is a race condition we can handle correctly and succinctly. The race is just bo = userptr(ptr, size); set-to-domain(bo); mremap(ptr, newptr, size); set-to-domain(bo); // or exec(bo); -Chris
On Tue, Jun 30, 2015 at 05:55:31PM +0100, Chris Wilson wrote: > The userptr worker allows for a slight race condition where upon there > may two or more threads calling get_user_pages for the same object. When > we have the array of pages, then we serialise the update of the object. > However, the worker should only overwrite the obj->userptr.work pointer > if and only if it is the active one. Currently we clear it for a > secondary worker with the effect that we may rarely force a second > lookup. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Whole series: Tested-by: Micha? Winiarski <michal.winiarski@intel.com> > --- > drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > index 7a5242cd5ea5..cb367d9f7909 100644 > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > @@ -581,17 +581,17 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) > } > > mutex_lock(&dev->struct_mutex); > - if (obj->userptr.work != &work->work) { > - ret = 0; > - } else if (pinned == num_pages) { > - ret = st_set_pages(&obj->pages, pvec, num_pages); > - if (ret == 0) { > - list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > - pinned = 0; > + if (obj->userptr.work == &work->work) { > + if (pinned == num_pages) { > + ret = st_set_pages(&obj->pages, pvec, num_pages); > + if (ret == 0) { > + list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); > + pinned = 0; > + } > } > + obj->userptr.work = ERR_PTR(ret); > } > > - obj->userptr.work = ERR_PTR(ret); > obj->userptr.workers--; > drm_gem_object_unreference(&obj->base); > mutex_unlock(&dev->struct_mutex); > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Jul 03, 2015 at 12:48:03PM +0200, Micha? Winiarski wrote: > On Tue, Jun 30, 2015 at 05:55:31PM +0100, Chris Wilson wrote: > > The userptr worker allows for a slight race condition where upon there > > may two or more threads calling get_user_pages for the same object. When > > we have the array of pages, then we serialise the update of the object. > > However, the worker should only overwrite the obj->userptr.work pointer > > if and only if it is the active one. Currently we clear it for a > > secondary worker with the effect that we may rarely force a second > > lookup. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Whole series: > Tested-by: Micha? Winiarski <michal.winiarski@intel.com> That reminds me there was a refleak in patch 3 if a second invalidate-range notification before the first's worker had run (we would take the ref for the active mo, but since the worker was queued, it would still only run once and not drop our new ref.) -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 7a5242cd5ea5..cb367d9f7909 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -581,17 +581,17 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) } mutex_lock(&dev->struct_mutex); - if (obj->userptr.work != &work->work) { - ret = 0; - } else if (pinned == num_pages) { - ret = st_set_pages(&obj->pages, pvec, num_pages); - if (ret == 0) { - list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); - pinned = 0; + if (obj->userptr.work == &work->work) { + if (pinned == num_pages) { + ret = st_set_pages(&obj->pages, pvec, num_pages); + if (ret == 0) { + list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list); + pinned = 0; + } } + obj->userptr.work = ERR_PTR(ret); } - obj->userptr.work = ERR_PTR(ret); obj->userptr.workers--; drm_gem_object_unreference(&obj->base); mutex_unlock(&dev->struct_mutex);
The userptr worker allows for a slight race condition where upon there may two or more threads calling get_user_pages for the same object. When we have the array of pages, then we serialise the update of the object. However, the worker should only overwrite the obj->userptr.work pointer if and only if it is the active one. Currently we clear it for a secondary worker with the effect that we may rarely force a second lookup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)