diff mbox series

drm: Block fb changes for async plane updates

Message ID 20181221143324.27679-1-nicholas.kazlauskas@amd.com (mailing list archive)
State New, archived
Headers show
Series drm: Block fb changes for async plane updates | expand

Commit Message

Kazlauskas, Nicholas Dec. 21, 2018, 2:33 p.m. UTC
The behavior of drm_atomic_helper_cleanup_planes differs depending on
whether the commit was an asynchronous update or not.

For a typical (non-async) atomic commit prepare_fb is called on the
new_plane_state and cleanup_fb is called on the old_plane_state.

However, async commits are performed in place and don't swap the state
for the plane. The call to prepare_fb happens on the new_plane_state
and the call to cleanup_fb is also called on the new_plane_state in
this case (since the state hasn't swapped).

This behavior can lead to use-after-free or unpin of an active fb.

Consider the following sequence of events for interleaving fbs:

- Async update, fb1 prepare, fb1 cleanup_fb
- Async update, fb2 prepare, fb2 cleanup_fb
- Non-async update, fb1 prepare, fb2 cleanup_fb
- Async update, fb2 cleanup_fb -> double cleanup, use-after-free

This situation has been observed in practice for a double buffered
cursor when closing an X client. The non-async update occurs because
the new_plane_state->crtc != old_plane_state->crtc which forces the
non-async path to occur.

The simplest fix for this is to block fb updates in
drm_atomic_helper_async_check. This guarantees that the framebuffer
will have previously been prepared and any subsequent async updates
will always call prepare and cleanup_fb like the non-async atomic
commit path would.

Cc: Michel Dänzer <michel@daenzer.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrey Grodzovsky Dec. 21, 2018, 4:29 p.m. UTC | #1
As far as we discussed this internally looks good to me, but obviously 
we need to wait for some feedback from non AMD people.

Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Andrey


On 12/21/2018 09:33 AM, Nicholas Kazlauskas wrote:
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was an asynchronous update or not.
>
> For a typical (non-async) atomic commit prepare_fb is called on the
> new_plane_state and cleanup_fb is called on the old_plane_state.
>
> However, async commits are performed in place and don't swap the state
> for the plane. The call to prepare_fb happens on the new_plane_state
> and the call to cleanup_fb is also called on the new_plane_state in
> this case (since the state hasn't swapped).
>
> This behavior can lead to use-after-free or unpin of an active fb.
>
> Consider the following sequence of events for interleaving fbs:
>
> - Async update, fb1 prepare, fb1 cleanup_fb
> - Async update, fb2 prepare, fb2 cleanup_fb
> - Non-async update, fb1 prepare, fb2 cleanup_fb
> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free
>
> This situation has been observed in practice for a double buffered
> cursor when closing an X client. The non-async update occurs because
> the new_plane_state->crtc != old_plane_state->crtc which forces the
> non-async path to occur.
>
> The simplest fix for this is to block fb updates in
> drm_atomic_helper_async_check. This guarantees that the framebuffer
> will have previously been prepared and any subsequent async updates
> will always call prepare and cleanup_fb like the non-async atomic
> commit path would.
>
> Cc: Michel Dänzer <michel@daenzer.net>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> ---
>   drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 54e2ae614dcc..d2f80bf14f86 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>   		return -EINVAL;
>   
>   	if (!new_plane_state->crtc ||
> -	    old_plane_state->crtc != new_plane_state->crtc)
> +	    old_plane_state->crtc != new_plane_state->crtc ||
> +	    old_plane_state->fb != new_plane_state->fb)
>   		return -EINVAL;
>   
>   	funcs = plane->helper_private;
Daniel Vetter Dec. 24, 2018, 12:15 p.m. UTC | #2
On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote:
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was an asynchronous update or not.
> 
> For a typical (non-async) atomic commit prepare_fb is called on the
> new_plane_state and cleanup_fb is called on the old_plane_state.
> 
> However, async commits are performed in place and don't swap the state
> for the plane. The call to prepare_fb happens on the new_plane_state
> and the call to cleanup_fb is also called on the new_plane_state in
> this case (since the state hasn't swapped).
> 
> This behavior can lead to use-after-free or unpin of an active fb.
> 
> Consider the following sequence of events for interleaving fbs:
> 
> - Async update, fb1 prepare, fb1 cleanup_fb
> - Async update, fb2 prepare, fb2 cleanup_fb
> - Non-async update, fb1 prepare, fb2 cleanup_fb
> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free

I think I see your bug, but I'm completely lost in your description above.

I think this is ok as a short-term gap, but imo better if it's a separate
if condition with a FIXME comment.

Long-term we want to fix this, and I think simplest way to do that is if
we expect drivers to store the old fb in the new_plane_state (and check
that with a WARN_ON like the others). I think that should work.

We probably also need some locking on top, to prevent races with the
cleanup_fb calls done by non-blocking commits, to make sure those clean up
the right fb.
-Daniel

> This situation has been observed in practice for a double buffered
> cursor when closing an X client. The non-async update occurs because
> the new_plane_state->crtc != old_plane_state->crtc which forces the
> non-async path to occur.
> 
> The simplest fix for this is to block fb updates in
> drm_atomic_helper_async_check. This guarantees that the framebuffer
> will have previously been prepared and any subsequent async updates
> will always call prepare and cleanup_fb like the non-async atomic
> commit path would.
> 
> Cc: Michel Dänzer <michel@daenzer.net>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 54e2ae614dcc..d2f80bf14f86 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>  		return -EINVAL;
>  
>  	if (!new_plane_state->crtc ||
> -	    old_plane_state->crtc != new_plane_state->crtc)
> +	    old_plane_state->crtc != new_plane_state->crtc ||
> +	    old_plane_state->fb != new_plane_state->fb)
>  		return -EINVAL;
>  
>  	funcs = plane->helper_private;
> -- 
> 2.17.1
>
Kazlauskas, Nicholas Jan. 2, 2019, 3:02 p.m. UTC | #3
On 12/24/18 7:15 AM, Daniel Vetter wrote:
> On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote:
>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>> whether the commit was an asynchronous update or not.
>>
>> For a typical (non-async) atomic commit prepare_fb is called on the
>> new_plane_state and cleanup_fb is called on the old_plane_state.
>>
>> However, async commits are performed in place and don't swap the state
>> for the plane. The call to prepare_fb happens on the new_plane_state
>> and the call to cleanup_fb is also called on the new_plane_state in
>> this case (since the state hasn't swapped).
>>
>> This behavior can lead to use-after-free or unpin of an active fb.
>>
>> Consider the following sequence of events for interleaving fbs:
>>
>> - Async update, fb1 prepare, fb1 cleanup_fb
>> - Async update, fb2 prepare, fb2 cleanup_fb
>> - Non-async update, fb1 prepare, fb2 cleanup_fb
>> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free
> 
> I think I see your bug, but I'm completely lost in your description above.
> 
> I think this is ok as a short-term gap, but imo better if it's a separate
> if condition with a FIXME comment.
> 
> Long-term we want to fix this, and I think simplest way to do that is if
> we expect drivers to store the old fb in the new_plane_state (and check
> that with a WARN_ON like the others). I think that should work.
> 
> We probably also need some locking on top, to prevent races with the
> cleanup_fb calls done by non-blocking commits, to make sure those clean up
> the right fb.
> -Daniel

Hi Daniel,

The description is supposed to be saying that the wrong fb is being 
cleaned up because in state updates to the plane don't swap the state 
pointer - which is required by the cleanup planes helper function.

But putting that aside, I think what you suggest here would work best 
for "fixing" the problem. Storing the old fb pointer in the new state 
looks kind of odd, but as long as it's documented and there's the 
WARN_ON in the helper I think it's fine. I think I would prefer this 
over a solution that blocks fb changes in the helper altogether.

I don't think any additional drm level locking is necessary here. The 
race with cleanup_fb calls is already something you have to worry about 
when dealing with async commits even if the fb hasn't changed. Most 
drivers have their own internal locks or ref-counting that can handle this.

So to summarize, I think I can post a new patch series that addresses 
this problem by fixing existing async commit usage in vc4 and amdgpu for 
framebuffer changes. The last patch in the series could be the one that 
adds a WARN_ON in the drm_atomic_helper_async_commit helper along with a 
comment indicating that the old fb should be in the new plane state.

Does this sound reasonable to you?

Nicholas Kazlauskas

> 
>> This situation has been observed in practice for a double buffered
>> cursor when closing an X client. The non-async update occurs because
>> the new_plane_state->crtc != old_plane_state->crtc which forces the
>> non-async path to occur.
>>
>> The simplest fix for this is to block fb updates in
>> drm_atomic_helper_async_check. This guarantees that the framebuffer
>> will have previously been prepared and any subsequent async updates
>> will always call prepare and cleanup_fb like the non-async atomic
>> commit path would.
>>
>> Cc: Michel Dänzer <michel@daenzer.net>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>> ---
>>   drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>> index 54e2ae614dcc..d2f80bf14f86 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>>   		return -EINVAL;
>>   
>>   	if (!new_plane_state->crtc ||
>> -	    old_plane_state->crtc != new_plane_state->crtc)
>> +	    old_plane_state->crtc != new_plane_state->crtc ||
>> +	    old_plane_state->fb != new_plane_state->fb)
>>   		return -EINVAL;
>>   
>>   	funcs = plane->helper_private;
>> -- 
>> 2.17.1
>>
>
Daniel Vetter Jan. 7, 2019, 9:51 a.m. UTC | #4
On Wed, Jan 02, 2019 at 03:02:04PM +0000, Kazlauskas, Nicholas wrote:
> On 12/24/18 7:15 AM, Daniel Vetter wrote:
> > On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote:
> >> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> >> whether the commit was an asynchronous update or not.
> >>
> >> For a typical (non-async) atomic commit prepare_fb is called on the
> >> new_plane_state and cleanup_fb is called on the old_plane_state.
> >>
> >> However, async commits are performed in place and don't swap the state
> >> for the plane. The call to prepare_fb happens on the new_plane_state
> >> and the call to cleanup_fb is also called on the new_plane_state in
> >> this case (since the state hasn't swapped).
> >>
> >> This behavior can lead to use-after-free or unpin of an active fb.
> >>
> >> Consider the following sequence of events for interleaving fbs:
> >>
> >> - Async update, fb1 prepare, fb1 cleanup_fb
> >> - Async update, fb2 prepare, fb2 cleanup_fb
> >> - Non-async update, fb1 prepare, fb2 cleanup_fb
> >> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free
> > 
> > I think I see your bug, but I'm completely lost in your description above.
> > 
> > I think this is ok as a short-term gap, but imo better if it's a separate
> > if condition with a FIXME comment.
> > 
> > Long-term we want to fix this, and I think simplest way to do that is if
> > we expect drivers to store the old fb in the new_plane_state (and check
> > that with a WARN_ON like the others). I think that should work.
> > 
> > We probably also need some locking on top, to prevent races with the
> > cleanup_fb calls done by non-blocking commits, to make sure those clean up
> > the right fb.
> > -Daniel
> 
> Hi Daniel,
> 
> The description is supposed to be saying that the wrong fb is being 
> cleaned up because in state updates to the plane don't swap the state 
> pointer - which is required by the cleanup planes helper function.

Yeah I inferred that, but the actual commit message isn't really clear on
this, and the example you've listed only confused me. We need good commit
messages, so that half a year down the road we still know what exactly
we've been thinking :-) Especially in an area that's really tricky, like
synchronization of the nonblocking atomic updates against normal updates
(and each another), which has seen plenty of bugs in the past.

> But putting that aside, I think what you suggest here would work best 
> for "fixing" the problem. Storing the old fb pointer in the new state 
> looks kind of odd, but as long as it's documented and there's the 
> WARN_ON in the helper I think it's fine. I think I would prefer this 
> over a solution that blocks fb changes in the helper altogether.
> 
> I don't think any additional drm level locking is necessary here. The 
> race with cleanup_fb calls is already something you have to worry about 
> when dealing with async commits even if the fb hasn't changed. Most 
> drivers have their own internal locks or ref-counting that can handle this.
> 
> So to summarize, I think I can post a new patch series that addresses 
> this problem by fixing existing async commit usage in vc4 and amdgpu for 
> framebuffer changes. The last patch in the series could be the one that 
> adds a WARN_ON in the drm_atomic_helper_async_commit helper along with a 
> comment indicating that the old fb should be in the new plane state.
> 
> Does this sound reasonable to you?

Let's please do the the minimal bugfixe first (probably cc: stable), and
figure out how to properly fix things up long-term.
-Daniel

> 
> Nicholas Kazlauskas
> 
> > 
> >> This situation has been observed in practice for a double buffered
> >> cursor when closing an X client. The non-async update occurs because
> >> the new_plane_state->crtc != old_plane_state->crtc which forces the
> >> non-async path to occur.
> >>
> >> The simplest fix for this is to block fb updates in
> >> drm_atomic_helper_async_check. This guarantees that the framebuffer
> >> will have previously been prepared and any subsequent async updates
> >> will always call prepare and cleanup_fb like the non-async atomic
> >> commit path would.
> >>
> >> Cc: Michel Dänzer <michel@daenzer.net>
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >> ---
> >>   drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> >> index 54e2ae614dcc..d2f80bf14f86 100644
> >> --- a/drivers/gpu/drm/drm_atomic_helper.c
> >> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> >> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
> >>   		return -EINVAL;
> >>   
> >>   	if (!new_plane_state->crtc ||
> >> -	    old_plane_state->crtc != new_plane_state->crtc)
> >> +	    old_plane_state->crtc != new_plane_state->crtc ||
> >> +	    old_plane_state->fb != new_plane_state->fb)
> >>   		return -EINVAL;
> >>   
> >>   	funcs = plane->helper_private;
> >> -- 
> >> 2.17.1
> >>
> > 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 54e2ae614dcc..d2f80bf14f86 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1599,7 +1599,8 @@  int drm_atomic_helper_async_check(struct drm_device *dev,
 		return -EINVAL;
 
 	if (!new_plane_state->crtc ||
-	    old_plane_state->crtc != new_plane_state->crtc)
+	    old_plane_state->crtc != new_plane_state->crtc ||
+	    old_plane_state->fb != new_plane_state->fb)
 		return -EINVAL;
 
 	funcs = plane->helper_private;