diff mbox

[Linaro-mm-sig] thoughts of looking at android fences

Message ID CAABpnA-85QdwoYO5Ldz52bxw1_dWaQz541MjgEGZhwdu+Bhtvw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rom Lemarchand Nov. 7, 2013, 9:11 p.m. UTC
Hi Maarten, I tested your changes and needed the attached patch: behavior
now seems equivalent as android sync. I haven't tested performance.

The issue resolved by this patch happens when i_b < b->num_fences and i_a
>= a->num_fences (or vice versa). Then, pt_a is invalid and so
dereferencing pt_a->context causes a crash.


On Mon, Nov 4, 2013 at 2:31 AM, Maarten Lankhorst <
maarten.lankhorst@canonical.com> wrote:

> op 02-11-13 22:36, Colin Cross schreef:
> > On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
> > <maarten.lankhorst@canonical.com> wrote:
> >> op 24-10-13 14:13, Maarten Lankhorst schreef:
> >>> So I actually tried to implement it now. I killed all the deprecated
> members and assumed a linear timeline.
> >>> This means that syncpoints can only be added at the end, not in
> between. In particular it means sw_sync
> >>> might be slightly broken.
> >>>
> >>> I only tested it with a simple program I wrote called ufence.c, it's
> in drivers/staging/android/ufence.c in the following tree:
> >>>
> >>> http://cgit.freedesktop.org/~mlankhorst/linux
> >>>
> >>> the "rfc: convert android to fence api" has all the changes from my
> dma-fence proposal to what android would need,
> >>> it also converts the userspace fence api to use the dma-fence api.
> >>>
> >>> sync_pt is implemented as fence too. This meant not having to convert
> all of android right away, though I did make some changes.
> >>> I killed the deprecated members and made all the fence calls forward
> to the sync_timeline_ops. dup and compare are no longer used.
> >>>
> >>> I haven't given this a spin on a full android kernel, only with the
> components that are in mainline kernel under staging and my dumb test
> program.
> >>>
> >>> ~Maarten
> >>>
> >>> PS: The nomenclature is very confusing. I want to rename dma-fence to
> syncpoint, but I want some feedback from the android devs first. :)
> >>>
> >> Come on, any feedback? I want to move the discussion forward.
> >>
> >> ~Maarten
> > I experimented with it a little on a device that uses sync and came
> > across a few bugs:
> > 1.  sync_timeline_signal needs to call __fence_signal on all signaled
> > points on the timeline, not just the first
> > 2.  fence_add_callback doesn't always initialize cb.node
> > 3.  sync_fence_wait should take ms
> > 4.  sync_print_pt status printing was incorrect
> > 5.  there is a deadlock:
> >    sync_print_obj takes obj->child_list_lock
> >    sync_print_pt
> >    fence_is_signaled
> >    fence_signal takes fence->lock == obj->child_list_lock
> > 6.  freeing a timeline before all the fences holding points on that
> > timeline have timed out results in a crash
> >
> > With the attached patch to fix these issues, our libsync and sync_test
> > give the same results as with our sync code.  I haven't tested against
> > the full Android framework yet.
> >
> > The compare op and timeline ordering is critical to the efficiency of
> > sync points on Android.  The compare op is used when merging fences to
> > drop all but the latest point on the same timeline.  This is necessary
> > for example when the same buffer is submitted to the display on
> > multiple frames, like when there is a live wallpaper in the background
> > updating at 60 fps and a static screen of widgets on top of it.  The
> > static widget buffer is submitted on every frame, returning a new
> > fence each time.  The compositor merges the new fence with the fence
> > for the previous buffer, and because they are on the same timeline it
> > merges down to a single point.  I experimented with disabling the
> > merge optimization on a Nexus 10, and found that leaving the screen on
> > running a live wallpaper eventually resulted in 100k outstanding sync
> > points.
>
> Well, here I did the same for dma-fence, can you take a look?
>
> ---
>
> diff --git a/drivers/staging/android/sync.c
> b/drivers/staging/android/sync.c
> index 2c7fd3f2ab23..d1d89f1f8553 100644
> --- a/drivers/staging/android/sync.c
> +++ b/drivers/staging/android/sync.c
> @@ -232,39 +232,62 @@ void sync_fence_install(struct sync_fence *fence,
> int fd)
>  }
>  EXPORT_SYMBOL(sync_fence_install);
>
> +static void sync_fence_add_pt(struct sync_fence *fence, int *i, struct
> fence *pt) {
> +       fence->cbs[*i].sync_pt = pt;
> +       fence->cbs[*i].fence = fence;
> +
> +       if (!fence_add_callback(pt, &fence->cbs[*i].cb,
> fence_check_cb_func)) {
> +               fence_get(pt);
> +               (*i)++;
> +       }
> +}
> +
>  struct sync_fence *sync_fence_merge(const char *name,
>                                     struct sync_fence *a, struct
> sync_fence *b)
>  {
>         int num_fences = a->num_fences + b->num_fences;
>         struct sync_fence *fence;
> -       int i;
> +       int i, i_a, i_b;
>
>         fence = sync_fence_alloc(offsetof(struct sync_fence,
> cbs[num_fences]), name);
>         if (fence == NULL)
>                 return NULL;
>
> -       fence->num_fences = num_fences;
>         atomic_set(&fence->status, num_fences);
>
> -       for (i = 0; i < a->num_fences; ++i) {
> -               struct fence *pt = a->cbs[i].sync_pt;
> -
> -               fence_get(pt);
> -               fence->cbs[i].sync_pt = pt;
> -               fence->cbs[i].fence = fence;
> -               if (fence_add_callback(pt, &fence->cbs[i].cb,
> fence_check_cb_func))
> -                       atomic_dec(&fence->status);
> +       /*
> +        * Assume sync_fence a and b are both ordered and have no
> +        * duplicates with the same context.
> +        *
> +        * If a sync_fence can only be created with sync_fence_merge
> +        * and sync_fence_create, this is a reasonable assumption.
> +        */
> +       for (i = i_a = i_b = 0; i_a < a->num_fences || i_b <
> b->num_fences; ) {
> +               struct fence *pt_a = i_a < a->num_fences ?
> a->cbs[i_a].sync_pt : NULL;
> +               struct fence *pt_b = i_b < b->num_fences ?
> b->cbs[i_b].sync_pt : NULL;
> +
> +               if (!pt_b || pt_a->context < pt_b->context) {
> +                       sync_fence_add_pt(fence, &i, pt_a);
> +
> +                       i_a++;
> +               } else if (!pt_a || pt_a->context > pt_b->context) {
> +                       sync_fence_add_pt(fence, &i, pt_b);
> +
> +                       i_b++;
> +               } else {
> +                       if (pt_a->seqno - pt_b->seqno <= INT_MAX)
> +                               sync_fence_add_pt(fence, &i, pt_a);
> +                       else
> +                               sync_fence_add_pt(fence, &i, pt_b);
> +
> +                       i_a++;
> +                       i_b++;
> +               }
>         }
>
> -       for (i = 0; i < b->num_fences; ++i) {
> -               struct fence *pt = b->cbs[i].sync_pt;
> -
> -               fence_get(pt);
> -               fence->cbs[a->num_fences + i].sync_pt = pt;
> -               fence->cbs[a->num_fences + i].fence = fence;
> -               if (fence_add_callback(pt, &fence->cbs[a->num_fences +
> i].cb, fence_check_cb_func))
> -                       atomic_dec(&fence->status);
> -       }
> +       if (num_fences > i)
> +               atomic_sub(num_fences - i, &fence->status);
> +       fence->num_fences = i;
>
>         sync_fence_debug_add(fence);
>         return fence;
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

Comments

Maarten Lankhorst Nov. 8, 2013, 10:43 a.m. UTC | #1
op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and
> i_a >= a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.

Oops, thinko. :) Originally I had it correct by doing this:

+       /*
+        * Assume sync_fence a and b are both ordered and have no
+        * duplicates with the same context.
+        *
+        * If a sync_fence can only be created with sync_fence_merge
+        * and sync_fence_create, this is a reasonable assumption.
+        */
+       for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
+               struct fence *pt_a = a->cbs[i_a].sync_pt;
+               struct fence *pt_b = b->cbs[i_b].sync_pt;
+
+               if (pt_a->context < pt_b->context) {
+                       sync_fence_add_pt(fence, &i, pt_a);
+
+                       i_a++;
+               } else if (pt_a->context > pt_b->context) {
+                       sync_fence_add_pt(fence, &i, pt_b);
+
+                       i_b++;
+               } else {
+                       if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+                               sync_fence_add_pt(fence, &i, pt_a);
+                       else
+                               sync_fence_add_pt(fence, &i, pt_b);
+
+                       i_a++;
+                       i_b++;
+               }
+        }
+
+        /* Add remaining fences from a or b*/
+        for (; i_a < a->num_fences; i_a++)
+               sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
+
+        for (; i_b < b->num_fences; i_b++)
+               sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);

Then I thought I could clean it up by merging it, but that ended up being
more unreadable and crashing... so I guess I'll revert back to this version. :)
Maarten Lankhorst Nov. 8, 2013, 11:43 a.m. UTC | #2
op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and i_a
>> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.
>
Yeah, I pushed my original fix. I intended to keep android userspace behavior the same, and I tried to keep the kernelspace the api same as much as I could. If peformance is the same, or not noticeably worse, would there be any objections on the android side about renaming dma-fence to syncpoint, and getting it in mainline?

~Maarten
Rom Lemarchand Nov. 8, 2013, 2:35 p.m. UTC | #3
Let me run some benchmarks today, talk to people internally, and I'll let
you know.
On Nov 8, 2013 3:43 AM, "Maarten Lankhorst" <maarten.lankhorst@canonical.com>
wrote:

> op 07-11-13 22:11, Rom Lemarchand schreef:
> > Hi Maarten, I tested your changes and needed the attached patch: behavior
> > now seems equivalent as android sync. I haven't tested performance.
> >
> > The issue resolved by this patch happens when i_b < b->num_fences and i_a
> >> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> > dereferencing pt_a->context causes a crash.
> >
> Yeah, I pushed my original fix. I intended to keep android userspace
> behavior the same, and I tried to keep the kernelspace the api same as much
> as I could. If peformance is the same, or not noticeably worse, would there
> be any objections on the android side about renaming dma-fence to
> syncpoint, and getting it in mainline?
>
> ~Maarten
>
Rom Lemarchand Nov. 12, 2013, 1:53 a.m. UTC | #4
I ran some benchmarks and things seem to be running about the same.
No one on our graphics team seemed concerned about the change.

The only concern I heard was about the increased complexity of the new sync
code as opposed to the old sync framework which tried to keep things
straightforward.


On Fri, Nov 8, 2013 at 3:43 AM, Maarten Lankhorst <
maarten.lankhorst@canonical.com> wrote:

> op 07-11-13 22:11, Rom Lemarchand schreef:
> > Hi Maarten, I tested your changes and needed the attached patch: behavior
> > now seems equivalent as android sync. I haven't tested performance.
> >
> > The issue resolved by this patch happens when i_b < b->num_fences and i_a
> >> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> > dereferencing pt_a->context causes a crash.
> >
> Yeah, I pushed my original fix. I intended to keep android userspace
> behavior the same, and I tried to keep the kernelspace the api same as much
> as I could. If peformance is the same, or not noticeably worse, would there
> be any objections on the android side about renaming dma-fence to
> syncpoint, and getting it in mainline?
>
> ~Maarten
>
diff mbox

Patch

From a440530a29682c595ad69b8cbb35c568228a8777 Mon Sep 17 00:00:00 2001
From: Rom Lemarchand <romlem@google.com>
Date: Thu, 7 Nov 2013 11:36:08 -0800
Subject: [PATCH] fix

Change-Id: Ie8a10dc466462835456d12962b378158a917a33e
---
 drivers/staging/android/sync.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 1025857..04af0fe 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -266,11 +266,19 @@  struct sync_fence *sync_fence_merge(const char *name,
 		struct fence *pt_a = a->cbs[i_a].sync_pt;
 		struct fence *pt_b = b->cbs[i_b].sync_pt;
 
-		if (i_b >= b->num_fences || pt_a->context < pt_b->context) {
+		if (i_b >= b->num_fences) {
 			sync_fence_add_pt(fence, &i, pt_a);
 
 			i_a++;
-		} else if (i_a >= a->num_fences || pt_a->context > pt_b->context) {
+		} else if (i_a >= a->num_fences) {
+			sync_fence_add_pt(fence, &i, pt_b);
+
+			i_b++;
+		} else if (pt_a->context < pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_a);
+
+			i_a++;
+		} else if (pt_a->context > pt_b->context) {
 			sync_fence_add_pt(fence, &i, pt_b);
 
 			i_b++;
-- 
1.8.4.1