diff mbox

dma-buf/sw_sync: Fix timeline/pt overflow cases

Message ID 149866562059.23475.15965626912972737879@mail.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 28, 2017, 4 p.m. UTC
Quoting Sean Paul (2017-06-28 16:51:11)
> Protect against long-running processes from overflowing the timeline
> and creating fences that go back in time. While we're at it, avoid
> overflowing while we're incrementing the timeline.
> 
> Signed-off-by: Sean Paul <seanpaul@chromium.org>
> ---
>  drivers/dma-buf/sw_sync.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 69c5ff36e2f9..40934619ed88 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
>  
>         spin_lock_irqsave(&obj->child_list_lock, flags);
>  
> -       obj->value += inc;
> +       obj->value += min(inc, ~0x0U - obj->value);

The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);

Better of course would be to report the error,

>         list_for_each_entry_safe(pt, next, &obj->active_list_head,
>                                  active_list) {
> @@ -178,6 +178,11 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size,
>                 return NULL;
>  
>         spin_lock_irqsave(&obj->child_list_lock, flags);
> +       if (value < obj->value) {
> +               spin_unlock_irqrestore(&obj->child_list_lock, flags);
> +               return NULL;
> +       }

Needs a u32 check. if ((int)(value - obj->value) < 0) return some_error;
-Chris

Comments

Sean Paul June 28, 2017, 4:47 p.m. UTC | #1
On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote:
> Quoting Sean Paul (2017-06-28 16:51:11)
> > Protect against long-running processes from overflowing the timeline
> > and creating fences that go back in time. While we're at it, avoid
> > overflowing while we're incrementing the timeline.
> > 
> > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > ---
> >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > index 69c5ff36e2f9..40934619ed88 100644
> > --- a/drivers/dma-buf/sw_sync.c
> > +++ b/drivers/dma-buf/sw_sync.c
> > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> >  
> >         spin_lock_irqsave(&obj->child_list_lock, flags);
> >  
> > -       obj->value += inc;
> > +       obj->value += min(inc, ~0x0U - obj->value);
> 
> The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);
> 
Hi Chris,
Thanks for the review.

I don't think that solves the same problem I was trying to solve. The issue is
that android userspace increments value by 0x7fffffff twice in order to ensure
all fences have signaled. This is causing value to overflow and is_signaled will
never be true. With your snippet, the possibility of overflow still exists.

> Better of course would be to report the error,

AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping
value at UINT_MAX will ensure all fences are signaled, and the check below ensures
that fences can't be created beyond that (returning an error at that point in
time).

Sean

> 
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 69c5ff36e2f9..2503cf884018 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -345,6 +345,9 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg)
>         if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
>                 return -EFAULT;
>  
> +       if (value > INT_MAX)
> +               return -EINVAL;
> +
>         sync_timeline_signal(obj, value);
> 
> >  
> >         list_for_each_entry_safe(pt, next, &obj->active_list_head,
> >                                  active_list) {
> > @@ -178,6 +178,11 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size,
> >                 return NULL;
> >  
> >         spin_lock_irqsave(&obj->child_list_lock, flags);
> > +       if (value < obj->value) {
> > +               spin_unlock_irqrestore(&obj->child_list_lock, flags);
> > +               return NULL;
> > +       }
> 
> Needs a u32 check. if ((int)(value - obj->value) < 0) return some_error;
> -Chris
Chris Wilson June 28, 2017, 7:45 p.m. UTC | #2
Quoting Sean Paul (2017-06-28 17:47:24)
> On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote:
> > Quoting Sean Paul (2017-06-28 16:51:11)
> > > Protect against long-running processes from overflowing the timeline
> > > and creating fences that go back in time. While we're at it, avoid
> > > overflowing while we're incrementing the timeline.
> > > 
> > > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > > ---
> > >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > > index 69c5ff36e2f9..40934619ed88 100644
> > > --- a/drivers/dma-buf/sw_sync.c
> > > +++ b/drivers/dma-buf/sw_sync.c
> > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> > >  
> > >         spin_lock_irqsave(&obj->child_list_lock, flags);
> > >  
> > > -       obj->value += inc;
> > > +       obj->value += min(inc, ~0x0U - obj->value);
> > 
> > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);
> > 
> Hi Chris,
> Thanks for the review.
> 
> I don't think that solves the same problem I was trying to solve. The issue is
> that android userspace increments value by 0x7fffffff twice in order to ensure
> all fences have signaled. This is causing value to overflow and is_signaled will
> never be true. With your snippet, the possibility of overflow still exists.
> 
> > Better of course would be to report the error,
> 
> AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping
> value at UINT_MAX will ensure all fences are signaled, and the check below ensures
> that fences can't be created beyond that (returning an error at that point in
> time).

UINT_MAX doesn't imply all fences will be signaled either, the timeline
is supposed to wrap.

The issue is timeline_fence_signaled() is using the wrong test, it
should be return (int)(fence->seqno - parent->value) <= 0; If it helps
extract a little helper from dma_fence_is_later().
-Chris
Sean Paul June 28, 2017, 9 p.m. UTC | #3
On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote:
> Quoting Sean Paul (2017-06-28 17:47:24)
> > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote:
> > > Quoting Sean Paul (2017-06-28 16:51:11)
> > > > Protect against long-running processes from overflowing the timeline
> > > > and creating fences that go back in time. While we're at it, avoid
> > > > overflowing while we're incrementing the timeline.
> > > > 
> > > > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > > > ---
> > > >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > > > index 69c5ff36e2f9..40934619ed88 100644
> > > > --- a/drivers/dma-buf/sw_sync.c
> > > > +++ b/drivers/dma-buf/sw_sync.c
> > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> > > >  
> > > >         spin_lock_irqsave(&obj->child_list_lock, flags);
> > > >  
> > > > -       obj->value += inc;
> > > > +       obj->value += min(inc, ~0x0U - obj->value);
> > > 
> > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);
> > > 
> > Hi Chris,
> > Thanks for the review.
> > 
> > I don't think that solves the same problem I was trying to solve. The issue is
> > that android userspace increments value by 0x7fffffff twice in order to ensure
> > all fences have signaled. This is causing value to overflow and is_signaled will
> > never be true. With your snippet, the possibility of overflow still exists.
> > 
> > > Better of course would be to report the error,
> > 
> > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping
> > value at UINT_MAX will ensure all fences are signaled, and the check below ensures
> > that fences can't be created beyond that (returning an error at that point in
> > time).
> 
> UINT_MAX doesn't imply all fences will be signaled either, the timeline
> is supposed to wrap.
> 
> The issue is timeline_fence_signaled() is using the wrong test, it
> should be return (int)(fence->seqno - parent->value) <= 0; If it helps
> extract a little helper from dma_fence_is_later().

Understood, thank you for clarifying. This still doesn't solve the issue of userspace
jumping the timeline by INT_MAX multiple times. In that case, value will rollover and
even the new signaled() will fail to report.

Sean

> -Chris
Dominik Behr June 28, 2017, 10:17 p.m. UTC | #4
I think the kernel has problems with Android fences which were slowly
broken as they were de-staged:

1. They allowed for fence/timeline value/seqno to overflow/rollover and
that would break only if delta between timeline and earlier unsignaled
fence exceeded 2^31 or so. Android relies on that behavior.
It was also possible to clean-up timeline by incrementing it for instance
by calling timeline_inc(0x7FFFFFFF) twice. This doesn't work anymore
because if no one is actively waiting for fence, active_list is empty and
timeline_inc does not signal fences anymore; just increments timeline value
and allows it to rollover without any effect.

2. They did guarantee that when timeline was destroyed all fences on
timeline were signaled (or error state). Thus, if timeline was destroyed,
or if process that owned the timeline died, processes that depended on it
would not hang. Android also relies on that behavior for cleanup when a
process crashes for example.

3. It seems from some stack traces that I have seen that timeline_inc
signals fence from inside spinlock, which can cause fence_array to call
fence_array_release, which calls flush_work()
e.g.
<4>[  140.762030]  [<ffffffff858939fb>] dump_stack+0x4d/0x63
<4>[  140.762035]  [<ffffffff8568b593>] ___might_sleep+0x149/0x14e
<4>[  140.762039]  [<ffffffff8568b637>] __might_sleep+0x9f/0xa6
<4>[  140.762045]  [<ffffffff8567ef2a>] flush_work+0x39/0x19a
<4>[  140.762049]  [<ffffffff8568eb5c>] ? try_to_wake_up+0x20b/0x21b
<4>[  140.762055]  [<ffffffff85a54cea>] fence_array_release+0x2e/0x63
<4>[  140.762058]  [<ffffffff85a53a65>] fence_release+0x82/0x8e
<4>[  140.762061]  [<ffffffff85a54cba>] fence_put+0x15/0x17
<4>[  140.762065]  [<ffffffff85a54e08>] fence_array_cb_func+0x1f/0x39
<4>[  140.762068]  [<ffffffff85a53881>] fence_signal_locked+0x8e/0xa3
<4>[  140.762072]  [<ffffffff85a55cda>] sync_timeline_signal+0xcd/0x10a
<4>[  140.762075]  [<ffffffff85a5613b>] sw_sync_ioctl+0x159/0x17f


On Wed, Jun 28, 2017 at 2:00 PM, Sean Paul <seanpaul@chromium.org> wrote:

> On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote:
> > Quoting Sean Paul (2017-06-28 17:47:24)
> > > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote:
> > > > Quoting Sean Paul (2017-06-28 16:51:11)
> > > > > Protect against long-running processes from overflowing the
> timeline
> > > > > and creating fences that go back in time. While we're at it, avoid
> > > > > overflowing while we're incrementing the timeline.
> > > > >
> > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > > > > ---
> > > > >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > > > > index 69c5ff36e2f9..40934619ed88 100644
> > > > > --- a/drivers/dma-buf/sw_sync.c
> > > > > +++ b/drivers/dma-buf/sw_sync.c
> > > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct
> sync_timeline *obj, unsigned int inc)
> > > > >
> > > > >         spin_lock_irqsave(&obj->child_list_lock, flags);
> > > > >
> > > > > -       obj->value += inc;
> > > > > +       obj->value += min(inc, ~0x0U - obj->value);
> > > >
> > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);
> > > >
> > > Hi Chris,
> > > Thanks for the review.
> > >
> > > I don't think that solves the same problem I was trying to solve. The
> issue is
> > > that android userspace increments value by 0x7fffffff twice in order
> to ensure
> > > all fences have signaled. This is causing value to overflow and
> is_signaled will
> > > never be true. With your snippet, the possibility of overflow still
> exists.
> > >
> > > > Better of course would be to report the error,
> > >
> > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste.
> Capping
> > > value at UINT_MAX will ensure all fences are signaled, and the check
> below ensures
> > > that fences can't be created beyond that (returning an error at that
> point in
> > > time).
> >
> > UINT_MAX doesn't imply all fences will be signaled either, the timeline
> > is supposed to wrap.
> >
> > The issue is timeline_fence_signaled() is using the wrong test, it
> > should be return (int)(fence->seqno - parent->value) <= 0; If it helps
> > extract a little helper from dma_fence_is_later().
>
> Understood, thank you for clarifying. This still doesn't solve the issue
> of userspace
> jumping the timeline by INT_MAX multiple times. In that case, value will
> rollover and
> even the new signaled() will fail to report.
>
> Sean
>
> > -Chris
>
> --
> Sean Paul, Software Engineer, Google / Chromium OS
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Chris Wilson June 29, 2017, 6:35 a.m. UTC | #5
Quoting Sean Paul (2017-06-28 22:00:02)
> On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote:
> > Quoting Sean Paul (2017-06-28 17:47:24)
> > > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote:
> > > > Quoting Sean Paul (2017-06-28 16:51:11)
> > > > > Protect against long-running processes from overflowing the timeline
> > > > > and creating fences that go back in time. While we're at it, avoid
> > > > > overflowing while we're incrementing the timeline.
> > > > > 
> > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > > > > ---
> > > > >  drivers/dma-buf/sw_sync.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > > > > index 69c5ff36e2f9..40934619ed88 100644
> > > > > --- a/drivers/dma-buf/sw_sync.c
> > > > > +++ b/drivers/dma-buf/sw_sync.c
> > > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> > > > >  
> > > > >         spin_lock_irqsave(&obj->child_list_lock, flags);
> > > > >  
> > > > > -       obj->value += inc;
> > > > > +       obj->value += min(inc, ~0x0U - obj->value);
> > > > 
> > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX);
> > > > 
> > > Hi Chris,
> > > Thanks for the review.
> > > 
> > > I don't think that solves the same problem I was trying to solve. The issue is
> > > that android userspace increments value by 0x7fffffff twice in order to ensure
> > > all fences have signaled. This is causing value to overflow and is_signaled will
> > > never be true. With your snippet, the possibility of overflow still exists.
> > > 
> > > > Better of course would be to report the error,
> > > 
> > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping
> > > value at UINT_MAX will ensure all fences are signaled, and the check below ensures
> > > that fences can't be created beyond that (returning an error at that point in
> > > time).
> > 
> > UINT_MAX doesn't imply all fences will be signaled either, the timeline
> > is supposed to wrap.
> > 
> > The issue is timeline_fence_signaled() is using the wrong test, it
> > should be return (int)(fence->seqno - parent->value) <= 0; If it helps
> > extract a little helper from dma_fence_is_later().
> 
> Understood, thank you for clarifying. This still doesn't solve the issue of userspace
> jumping the timeline by INT_MAX multiple times. In that case, value will rollover and
> even the new signaled() will fail to report.

Right, all the intermediate fences need to be signaled as the timeline
jumps. That seems to be handled by walking the active_list. However,
that is broken because not all fences are on the active_list and the
active_list itself is not updated atomically!
-Chris
diff mbox

Patch

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 69c5ff36e2f9..2503cf884018 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -345,6 +345,9 @@  static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg)
        if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
                return -EFAULT;
 
+       if (value > INT_MAX)
+               return -EINVAL;
+
        sync_timeline_signal(obj, value);

>