Message ID | 20160829070834.22296-11-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: > Currently we install a callback for performing poll on a dma-buf, > irrespective of the timeout. This involves taking a spinlock, as well as > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > multiple threads. > > We can query whether the poll will block prior to installing the > callback to make the busy-query fast. > > Single thread: 60% faster > 8 threads on 4 (+4 HT) cores: 600% faster > > Still not quite the perfect scaling we get with a native busy ioctl, but > poll(dmabuf) is faster due to the quicker lookup of the object and > avoiding drm_ioctl(). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sumit Semwal <sumit.semwal@linaro.org> > Cc: linux-media@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Cc: linaro-mm-sig@lists.linaro.org > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Need to strike the r-b here, since Christian König pointed out that objects won't magically switch signalling on. -Daniel > --- > drivers/dma-buf/dma-buf.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index cf04d249a6a4..c7a7bc579941 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll) > if (!events) > return 0; > > + if (poll_does_not_wait(poll)) { > + if (events & POLLOUT && > + !reservation_object_test_signaled_rcu(resv, true)) > + events &= ~(POLLOUT | POLLIN); > + > + if (events & POLLIN && > + !reservation_object_test_signaled_rcu(resv, false)) > + events &= ~POLLIN; > + > + return events; > + } > + > retry: > seq = read_seqcount_begin(&resv->seq); > rcu_read_lock(); > -- > 2.9.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote: > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: > > Currently we install a callback for performing poll on a dma-buf, > > irrespective of the timeout. This involves taking a spinlock, as well as > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > > multiple threads. > > > > We can query whether the poll will block prior to installing the > > callback to make the busy-query fast. > > > > Single thread: 60% faster > > 8 threads on 4 (+4 HT) cores: 600% faster > > > > Still not quite the perfect scaling we get with a native busy ioctl, but > > poll(dmabuf) is faster due to the quicker lookup of the object and > > avoiding drm_ioctl(). > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > > Cc: linux-media@vger.kernel.org > > Cc: dri-devel@lists.freedesktop.org > > Cc: linaro-mm-sig@lists.linaro.org > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Need to strike the r-b here, since Christian König pointed out that > objects won't magically switch signalling on. The point being here that we don't even want to switch signaling on! :) Christian's point was that not all fences guarantee forward progress irrespective of whether signaling is enabled or not, and fences are not required to guarantee forward progress without signaling even if they provide an ops->signaled(). -Chris
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote: > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: > > Currently we install a callback for performing poll on a dma-buf, > > irrespective of the timeout. This involves taking a spinlock, as well as > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > > multiple threads. > > > > We can query whether the poll will block prior to installing the > > callback to make the busy-query fast. > > > > Single thread: 60% faster > > 8 threads on 4 (+4 HT) cores: 600% faster > > > > Still not quite the perfect scaling we get with a native busy ioctl, but > > poll(dmabuf) is faster due to the quicker lookup of the object and > > avoiding drm_ioctl(). > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > > Cc: linux-media@vger.kernel.org > > Cc: dri-devel@lists.freedesktop.org > > Cc: linaro-mm-sig@lists.linaro.org > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Need to strike the r-b here, since Christian König pointed out that > objects won't magically switch signalling on. Propagating a flag through to sync_file is trivial, but not through to the dma_buf->resv. Looks like dma-buf will be without a fast busy query, which I guess in the grand scheme of things (i.e. dma-buf itself is not intended to be used as a fence) is not that important. -Chris
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote: > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: > > Currently we install a callback for performing poll on a dma-buf, > > irrespective of the timeout. This involves taking a spinlock, as well as > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > > multiple threads. > > > > We can query whether the poll will block prior to installing the > > callback to make the busy-query fast. > > > > Single thread: 60% faster > > 8 threads on 4 (+4 HT) cores: 600% faster > > > > Still not quite the perfect scaling we get with a native busy ioctl, but > > poll(dmabuf) is faster due to the quicker lookup of the object and > > avoiding drm_ioctl(). > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > > Cc: linux-media@vger.kernel.org > > Cc: dri-devel@lists.freedesktop.org > > Cc: linaro-mm-sig@lists.linaro.org > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Need to strike the r-b here, since Christian König pointed out that > objects won't magically switch signalling on. Oh, it also means that commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9 Author: Jammy Zhou <Jammy.Zhou@amd.com> Date: Wed Jan 21 18:35:47 2015 +0800 reservation: wait only with non-zero timeout specified (v3) When the timeout value passed to reservation_object_wait_timeout_rcu is zero, no wait should be done if the fences are not signaled. Return '1' for idle and '0' for busy if the specified timeout is '0' to keep consistent with the case of non-zero timeout. v2: call fence_put if not signaled in the case of timeout==0 v3: switch to reservation_object_test_signaled_rcu Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> is wrong. And reservation_object_test_signaled_rcu() is unreliable. -Chris
Am 23.09.2016 um 17:20 schrieb Chris Wilson: > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote: >> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: >>> Currently we install a callback for performing poll on a dma-buf, >>> irrespective of the timeout. This involves taking a spinlock, as well as >>> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across >>> multiple threads. >>> >>> We can query whether the poll will block prior to installing the >>> callback to make the busy-query fast. >>> >>> Single thread: 60% faster >>> 8 threads on 4 (+4 HT) cores: 600% faster >>> >>> Still not quite the perfect scaling we get with a native busy ioctl, but >>> poll(dmabuf) is faster due to the quicker lookup of the object and >>> avoiding drm_ioctl(). >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Sumit Semwal <sumit.semwal@linaro.org> >>> Cc: linux-media@vger.kernel.org >>> Cc: dri-devel@lists.freedesktop.org >>> Cc: linaro-mm-sig@lists.linaro.org >>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> >> Need to strike the r-b here, since Christian König pointed out that >> objects won't magically switch signalling on. > Oh, it also means that > > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9 > Author: Jammy Zhou <Jammy.Zhou@amd.com> > Date: Wed Jan 21 18:35:47 2015 +0800 > > reservation: wait only with non-zero timeout specified (v3) > > When the timeout value passed to reservation_object_wait_timeout_rcu > is zero, no wait should be done if the fences are not signaled. > > Return '1' for idle and '0' for busy if the specified timeout is '0' > to keep consistent with the case of non-zero timeout. > > v2: call fence_put if not signaled in the case of timeout==0 > > v3: switch to reservation_object_test_signaled_rcu > > Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com> > Reviewed-by: Christian König <christian.koenig@amd.com> > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com> > Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> > > is wrong. And reservation_object_test_signaled_rcu() is unreliable. Ups indeed, that patch is wrong as well. I suggest that we just enable the signaling in this case as well. Regards, Christian. > -Chris > -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 23, 2016 at 07:59:44PM +0200, Christian König wrote: > Am 23.09.2016 um 17:20 schrieb Chris Wilson: > > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote: > > > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote: > > > > Currently we install a callback for performing poll on a dma-buf, > > > > irrespective of the timeout. This involves taking a spinlock, as well as > > > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across > > > > multiple threads. > > > > > > > > We can query whether the poll will block prior to installing the > > > > callback to make the busy-query fast. > > > > > > > > Single thread: 60% faster > > > > 8 threads on 4 (+4 HT) cores: 600% faster > > > > > > > > Still not quite the perfect scaling we get with a native busy ioctl, but > > > > poll(dmabuf) is faster due to the quicker lookup of the object and > > > > avoiding drm_ioctl(). > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > > > > Cc: linux-media@vger.kernel.org > > > > Cc: dri-devel@lists.freedesktop.org > > > > Cc: linaro-mm-sig@lists.linaro.org > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > Need to strike the r-b here, since Christian König pointed out that > > > objects won't magically switch signalling on. > > Oh, it also means that > > > > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9 > > Author: Jammy Zhou <Jammy.Zhou@amd.com> > > Date: Wed Jan 21 18:35:47 2015 +0800 > > > > reservation: wait only with non-zero timeout specified (v3) > > When the timeout value passed to reservation_object_wait_timeout_rcu > > is zero, no wait should be done if the fences are not signaled. > > Return '1' for idle and '0' for busy if the specified timeout is '0' > > to keep consistent with the case of non-zero timeout. > > v2: call fence_put if not signaled in the case of timeout==0 > > v3: switch to reservation_object_test_signaled_rcu > > Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com> > > Reviewed-by: Christian König <christian.koenig@amd.com> > > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com> > > Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> > > > > is wrong. And reservation_object_test_signaled_rcu() is unreliable. > > Ups indeed, that patch is wrong as well. > > I suggest that we just enable the signaling in this case as well. Will you/Zhou take care of this corner case? Just so I can't forget about it ;-) Thanks, Daniel
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index cf04d249a6a4..c7a7bc579941 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll) if (!events) return 0; + if (poll_does_not_wait(poll)) { + if (events & POLLOUT && + !reservation_object_test_signaled_rcu(resv, true)) + events &= ~(POLLOUT | POLLIN); + + if (events & POLLIN && + !reservation_object_test_signaled_rcu(resv, false)) + events &= ~POLLIN; + + return events; + } + retry: seq = read_seqcount_begin(&resv->seq); rcu_read_lock();