diff mbox series

[v2] drm: Block fb changes for async plane updates

Message ID 20190107174146.2947-1-nicholas.kazlauskas@amd.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm: Block fb changes for async plane updates | expand

Commit Message

Kazlauskas, Nicholas Jan. 7, 2019, 5:41 p.m. UTC
The prepare_fb call always happens on new_plane_state.

The drm_atomic_helper_cleanup_planes checks to see if
plane state pointer has changed when deciding to call cleanup_fb on
either the new_plane_state or the old_plane_state.

For a non-async atomic commit the state pointer is swapped, so this
helper calls prepare_fb on the new_plane_state and cleanup_fb on the
old_plane_state. This makes sense, since we want to prepare the
framebuffer we are going to use and cleanup the the framebuffer we are
no longer using.

For the async atomic update helpers this differs. The async atomic
update helpers perform in-place updates on the existing state. They call
drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
This means that prepare_fb is called on the new_plane_state and
cleanup_fb is called on the new_plane_state (not the old).

In the case where old_plane_state->fb == new_plane_state->fb then
there should be no behavioral difference between an async update
and a non-async commit. But there are issues that arise when
old_plane_state->fb != new_plane_state->fb.

The first is that the new_plane_state->fb is immediately cleaned up
after it has been prepared, so we're using a fb that we shouldn't
be.

The second occurs during a sequence of async atomic updates and
non-async regular atomic commits. Suppose there are two framebuffers
being interleaved in a double-buffering scenario, fb1 and fb2:

- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2

We call cleanup_fb on fb2 twice in this example scenario, and any
further use will result in use-after-free.

The simple fix to this problem is to block framebuffer changes
in the drm_atomic_helper_async_check function for now.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Sasha Levin Jan. 8, 2019, 12:57 a.m. UTC | #1
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: fef9df8b5945 drm/atomic: initial support for asynchronous plane update.

The bot has tested the following trees: v4.20.0, v4.19.13, v4.14.91.

v4.20.0: Build OK!
v4.19.13: Build OK!
v4.14.91: Build failed! Errors:
    drivers/gpu/drm/drm_atomic_helper.c:1421:6: error: ‘old_plane_state’ undeclared (first use in this function); did you mean ‘drm_plane_state’?
    drivers/gpu/drm/drm_atomic_helper.c:1421:29: error: ‘new_plane_state’ undeclared (first use in this function); did you mean ‘drm_plane_state’?


How should we proceed with this patch?

--
Thanks,
Sasha
kernel test robot via dri-devel Feb. 13, 2019, 9:21 p.m. UTC | #2
On 2/13/19 5:18 PM, Nicholas Kazlauskas wrote:
> The prepare_fb call always happens on new_plane_state.
> 
> The drm_atomic_helper_cleanup_planes checks to see if
> plane state pointer has changed when deciding to call cleanup_fb on
> either the new_plane_state or the old_plane_state.
> 
> For a non-async atomic commit the state pointer is swapped, so this
> helper calls prepare_fb on the new_plane_state and cleanup_fb on the
> old_plane_state. This makes sense, since we want to prepare the
> framebuffer we are going to use and cleanup the the framebuffer we are
> no longer using.
> 
> For the async atomic update helpers this differs. The async atomic
> update helpers perform in-place updates on the existing state. They call
> drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
> This means that prepare_fb is called on the new_plane_state and
> cleanup_fb is called on the new_plane_state (not the old).
> 
> In the case where old_plane_state->fb == new_plane_state->fb then
> there should be no behavioral difference between an async update
> and a non-async commit. But there are issues that arise when
> old_plane_state->fb != new_plane_state->fb.
> 
> The first is that the new_plane_state->fb is immediately cleaned up
> after it has been prepared, so we're using a fb that we shouldn't
> be.
> 
> The second occurs during a sequence of async atomic updates and
> non-async regular atomic commits. Suppose there are two framebuffers
> being interleaved in a double-buffering scenario, fb1 and fb2:
> 
> - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
> - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
> - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
> 
> We call cleanup_fb on fb2 twice in this example scenario, and any
> further use will result in use-after-free.
> 
> The simple fix to this problem is to block framebuffer changes
> in the drm_atomic_helper_async_check function for now.
> 
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: <stable@vger.kernel.org> # v4.14+
> Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 54e2ae614dcc..f4290f6b0c38 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>  	    old_plane_state->crtc != new_plane_state->crtc)
>  		return -EINVAL;
>  
> +	/*
> +	 * FIXME: Since prepare_fb and cleanup_fb are always called on
> +	 * the new_plane_state for async updates we need to block framebuffer
> +	 * changes. This prevents use of a fb that's been cleaned up and
> +	 * double cleanups from occuring.
> +	 */
> +	if (old_plane_state->fb != new_plane_state->fb)
> +		return -EINVAL;
> +
>  	funcs = plane->helper_private;
>  	if (!funcs->atomic_async_update)
>  		return -EINVAL;
> 

Hi,

Thank you for working on this.

I have two points I would like to raise:

1) cursor update is too slow:

I am working on the v8 of
https://lore.kernel.org/patchwork/patch/949264/ "drm/i915: update
cursors asynchronously through atomic".

But with this patch applied, the following tests fails:

cursor-vs-flip-atomic-transitions-varying-size
cursor-vs-flip-toggle
cursor-vs-flip-varying-size

with errors of type:

"CRITICAL: completed 97 cursor updated in a period of 30 flips, we
expect to complete approximately 15360 updates, with the threshold set
at 7680"

I guess it's because what should be async updates are being turned into
non-async updates (which is slower).

Well, I guess we need to work on a proper solution for the FIXME above,
this is just an observation/note.

2) I have another possibly related issue:

From the tests I am making, unless I did some mistake or miss
interpretation (which is possible since I am new to drm), it seems that
there is a problem if an async update re-uses the same fb from a
previous non-async commit, for example:

- Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
- Async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1 #
double prepare fb1

The cursor through atomic patch was "solving" this by bypassing the
prepare call with a:

@@ -13495,6 +13503,9 @@ intel_prepare_plane_fb(struct drm_plane *plane,
        struct drm_i915_gem_object *old_obj =
intel_fb_obj(plane->state->fb);
        int ret;

+       if (new_state->state->async_update)
+               return 0;
+
        if (old_obj) {
                struct drm_crtc_state *crtc_state =
                        drm_atomic_get_new_crtc_state(new_state->state,


Which looks really wrong to me, but without it, igt tests
(plane_cursor_legacy, kms_cursor_legacy) fail, I get lots of
I915_VMA_PIN_OVERFLOW, which suggest double pinning the frame buffers.

I did this really ugly test to bypass the prepare call just if the
buffer is already pinned:
https://gitlab.collabora.com/koike/linux/commit/2d01d4af30347ff43fbac254d8ec305cd6fe4aeb

With this hack igt tests pass.

As I am new to DRM, I would appreciate if someone could confirm that
this is a real bug or please clarify if I am miss interpreting anything.

If this is a real bug, then if someone could give me pointers in how to
properly fix this it would be great.

Thanks a lot
Helen
kernel test robot via dri-devel Feb. 13, 2019, 11:32 p.m. UTC | #3
On 2/13/19 7:21 PM, Helen Koike wrote:
> 
> 
> On 2/13/19 5:18 PM, Nicholas Kazlauskas wrote:
>> The prepare_fb call always happens on new_plane_state.
>>
>> The drm_atomic_helper_cleanup_planes checks to see if
>> plane state pointer has changed when deciding to call cleanup_fb on
>> either the new_plane_state or the old_plane_state.
>>
>> For a non-async atomic commit the state pointer is swapped, so this
>> helper calls prepare_fb on the new_plane_state and cleanup_fb on the
>> old_plane_state. This makes sense, since we want to prepare the
>> framebuffer we are going to use and cleanup the the framebuffer we are
>> no longer using.
>>
>> For the async atomic update helpers this differs. The async atomic
>> update helpers perform in-place updates on the existing state. They call
>> drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
>> This means that prepare_fb is called on the new_plane_state and
>> cleanup_fb is called on the new_plane_state (not the old).
>>
>> In the case where old_plane_state->fb == new_plane_state->fb then
>> there should be no behavioral difference between an async update
>> and a non-async commit. But there are issues that arise when
>> old_plane_state->fb != new_plane_state->fb.
>>
>> The first is that the new_plane_state->fb is immediately cleaned up
>> after it has been prepared, so we're using a fb that we shouldn't
>> be.
>>
>> The second occurs during a sequence of async atomic updates and
>> non-async regular atomic commits. Suppose there are two framebuffers
>> being interleaved in a double-buffering scenario, fb1 and fb2:
>>
>> - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
>> - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
>> - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
>>
>> We call cleanup_fb on fb2 twice in this example scenario, and any
>> further use will result in use-after-free.
>>
>> The simple fix to this problem is to block framebuffer changes
>> in the drm_atomic_helper_async_check function for now.
>>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Cc: <stable@vger.kernel.org> # v4.14+
>> Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
>> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>> ---
>>  drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>> index 54e2ae614dcc..f4290f6b0c38 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>>  	    old_plane_state->crtc != new_plane_state->crtc)
>>  		return -EINVAL;
>>  
>> +	/*
>> +	 * FIXME: Since prepare_fb and cleanup_fb are always called on
>> +	 * the new_plane_state for async updates we need to block framebuffer
>> +	 * changes. This prevents use of a fb that's been cleaned up and
>> +	 * double cleanups from occuring.
>> +	 */
>> +	if (old_plane_state->fb != new_plane_state->fb)
>> +		return -EINVAL;
>> +
>>  	funcs = plane->helper_private;
>>  	if (!funcs->atomic_async_update)
>>  		return -EINVAL;
>>
> 
> Hi,
> 
> Thank you for working on this.
> 
> I have two points I would like to raise:
> 
> 1) cursor update is too slow:
> 
> I am working on the v8 of
> https://lore.kernel.org/patchwork/patch/949264/ "drm/i915: update
> cursors asynchronously through atomic".
> 
> But with this patch applied, the following tests fails:
> 
> cursor-vs-flip-atomic-transitions-varying-size
> cursor-vs-flip-toggle
> cursor-vs-flip-varying-size
> 
> with errors of type:
> 
> "CRITICAL: completed 97 cursor updated in a period of 30 flips, we
> expect to complete approximately 15360 updates, with the threshold set
> at 7680"
> 
> I guess it's because what should be async updates are being turned into
> non-async updates (which is slower).
> 
> Well, I guess we need to work on a proper solution for the FIXME above,
> this is just an observation/note.
> 
> 2) I have another possibly related issue:
> 
> From the tests I am making, unless I did some mistake or miss
> interpretation (which is possible since I am new to drm), it seems that
> there is a problem if an async update re-uses the same fb from a
> previous non-async commit, for example:
> 
> - Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
> - Async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1 #
> double prepare fb1

Sorry, this is solved by this patch, I meant this:

- Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
- Async commit, oldfb = fb1, newfb = fb1, prepare fb1, cleanup fb1
#double prepare fb1

But I was told that async commit should block if there is a pending
commit, so this scenario shouldn't happen, but there is still something
off (please see below).

> 
> The cursor through atomic patch was "solving" this by bypassing the
> prepare call with a:
> 
> @@ -13495,6 +13503,9 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>         struct drm_i915_gem_object *old_obj =
> intel_fb_obj(plane->state->fb);
>         int ret;
> 
> +       if (new_state->state->async_update)
> +               return 0;
> +
>         if (old_obj) {
>                 struct drm_crtc_state *crtc_state =
>                         drm_atomic_get_new_crtc_state(new_state->state,
> 
> 
> Which looks really wrong to me, but without it, igt tests
> (plane_cursor_legacy, kms_cursor_legacy) fail, I get lots of
> I915_VMA_PIN_OVERFLOW, which suggest double pinning the frame buffers.

Please, ignore this error above, when I ran the test this patch (Block
fb changes) wasn't applied when I got this error about pin overflow, but
I still have issues:

With this patch (Block fb changes) applied I get several dmesg failures
on WARN_ON(to_intel_plane_state(state)->vma) in the
intel_plane_destroy_state() function.

https://people.collabora.com/~koike/results-5.0.0-rc1+-v8-tmp/html/problems.html

But I don't get this errors with the hack below (that bypass
intel_prepare_plane_fb if the buffer is already pinned).

I'm still not sure why, I would really appreciate if any one has any
pointers.

Thanks
Helen

> 
> I did this really ugly test to bypass the prepare call just if the
> buffer is already pinned:
> https://gitlab.collabora.com/koike/linux/commit/2d01d4af30347ff43fbac254d8ec305cd6fe4aeb
> 
> With this hack igt tests pass.
> 
> As I am new to DRM, I would appreciate if someone could confirm that
> this is a real bug or please clarify if I am miss interpreting anything.
> 
> If this is a real bug, then if someone could give me pointers in how to
> properly fix this it would be great.
> 
> Thanks a lot
> Helen
>
Daniel Vetter Feb. 14, 2019, 8:21 a.m. UTC | #4
On Wed, Feb 13, 2019 at 09:32:43PM -0200, Helen Koike wrote:
> 
> 
> On 2/13/19 7:21 PM, Helen Koike wrote:
> > 
> > 
> > On 2/13/19 5:18 PM, Nicholas Kazlauskas wrote:
> >> The prepare_fb call always happens on new_plane_state.
> >>
> >> The drm_atomic_helper_cleanup_planes checks to see if
> >> plane state pointer has changed when deciding to call cleanup_fb on
> >> either the new_plane_state or the old_plane_state.
> >>
> >> For a non-async atomic commit the state pointer is swapped, so this
> >> helper calls prepare_fb on the new_plane_state and cleanup_fb on the
> >> old_plane_state. This makes sense, since we want to prepare the
> >> framebuffer we are going to use and cleanup the the framebuffer we are
> >> no longer using.
> >>
> >> For the async atomic update helpers this differs. The async atomic
> >> update helpers perform in-place updates on the existing state. They call
> >> drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
> >> This means that prepare_fb is called on the new_plane_state and
> >> cleanup_fb is called on the new_plane_state (not the old).
> >>
> >> In the case where old_plane_state->fb == new_plane_state->fb then
> >> there should be no behavioral difference between an async update
> >> and a non-async commit. But there are issues that arise when
> >> old_plane_state->fb != new_plane_state->fb.
> >>
> >> The first is that the new_plane_state->fb is immediately cleaned up
> >> after it has been prepared, so we're using a fb that we shouldn't
> >> be.
> >>
> >> The second occurs during a sequence of async atomic updates and
> >> non-async regular atomic commits. Suppose there are two framebuffers
> >> being interleaved in a double-buffering scenario, fb1 and fb2:
> >>
> >> - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
> >> - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
> >> - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
> >>
> >> We call cleanup_fb on fb2 twice in this example scenario, and any
> >> further use will result in use-after-free.
> >>
> >> The simple fix to this problem is to block framebuffer changes
> >> in the drm_atomic_helper_async_check function for now.
> >>
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >> Cc: <stable@vger.kernel.org> # v4.14+
> >> Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
> >> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >> ---
> >>  drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> >> index 54e2ae614dcc..f4290f6b0c38 100644
> >> --- a/drivers/gpu/drm/drm_atomic_helper.c
> >> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> >> @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
> >>  	    old_plane_state->crtc != new_plane_state->crtc)
> >>  		return -EINVAL;
> >>  
> >> +	/*
> >> +	 * FIXME: Since prepare_fb and cleanup_fb are always called on
> >> +	 * the new_plane_state for async updates we need to block framebuffer
> >> +	 * changes. This prevents use of a fb that's been cleaned up and
> >> +	 * double cleanups from occuring.
> >> +	 */
> >> +	if (old_plane_state->fb != new_plane_state->fb)
> >> +		return -EINVAL;
> >> +
> >>  	funcs = plane->helper_private;
> >>  	if (!funcs->atomic_async_update)
> >>  		return -EINVAL;
> >>
> > 
> > Hi,
> > 
> > Thank you for working on this.
> > 
> > I have two points I would like to raise:
> > 
> > 1) cursor update is too slow:
> > 
> > I am working on the v8 of
> > https://lore.kernel.org/patchwork/patch/949264/ "drm/i915: update
> > cursors asynchronously through atomic".
> > 
> > But with this patch applied, the following tests fails:
> > 
> > cursor-vs-flip-atomic-transitions-varying-size
> > cursor-vs-flip-toggle
> > cursor-vs-flip-varying-size
> > 
> > with errors of type:
> > 
> > "CRITICAL: completed 97 cursor updated in a period of 30 flips, we
> > expect to complete approximately 15360 updates, with the threshold set
> > at 7680"
> > 
> > I guess it's because what should be async updates are being turned into
> > non-async updates (which is slower).
> > 
> > Well, I guess we need to work on a proper solution for the FIXME above,
> > this is just an observation/note.
> > 
> > 2) I have another possibly related issue:
> > 
> > From the tests I am making, unless I did some mistake or miss
> > interpretation (which is possible since I am new to drm), it seems that
> > there is a problem if an async update re-uses the same fb from a
> > previous non-async commit, for example:
> > 
> > - Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
> > - Async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1 #
> > double prepare fb1
> 
> Sorry, this is solved by this patch, I meant this:
> 
> - Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
> - Async commit, oldfb = fb1, newfb = fb1, prepare fb1, cleanup fb1
> #double prepare fb1
> 
> But I was told that async commit should block if there is a pending
> commit, so this scenario shouldn't happen, but there is still something
> off (please see below).
> 
> > 
> > The cursor through atomic patch was "solving" this by bypassing the
> > prepare call with a:
> > 
> > @@ -13495,6 +13503,9 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> >         struct drm_i915_gem_object *old_obj =
> > intel_fb_obj(plane->state->fb);
> >         int ret;
> > 
> > +       if (new_state->state->async_update)
> > +               return 0;
> > +
> >         if (old_obj) {
> >                 struct drm_crtc_state *crtc_state =
> >                         drm_atomic_get_new_crtc_state(new_state->state,
> > 
> > 
> > Which looks really wrong to me, but without it, igt tests
> > (plane_cursor_legacy, kms_cursor_legacy) fail, I get lots of
> > I915_VMA_PIN_OVERFLOW, which suggest double pinning the frame buffers.
> 
> Please, ignore this error above, when I ran the test this patch (Block
> fb changes) wasn't applied when I got this error about pin overflow, but
> I still have issues:
> 
> With this patch (Block fb changes) applied I get several dmesg failures
> on WARN_ON(to_intel_plane_state(state)->vma) in the
> intel_plane_destroy_state() function.
> 
> https://people.collabora.com/~koike/results-5.0.0-rc1+-v8-tmp/html/problems.html
> 
> But I don't get this errors with the hack below (that bypass
> intel_prepare_plane_fb if the buffer is already pinned).
> 
> I'm still not sure why, I would really appreciate if any one has any
> pointers.

i915 largely uses its own modeset framework, so there's a good chance for
differences in how we pin/unpin buffers, compared to common atomic
helpers. That's about my best guess.
-Daniel

> 
> Thanks
> Helen
> 
> > 
> > I did this really ugly test to bypass the prepare call just if the
> > buffer is already pinned:
> > https://gitlab.collabora.com/koike/linux/commit/2d01d4af30347ff43fbac254d8ec305cd6fe4aeb
> > 
> > With this hack igt tests pass.
> > 
> > As I am new to DRM, I would appreciate if someone could confirm that
> > this is a real bug or please clarify if I am miss interpreting anything.
> > 
> > If this is a real bug, then if someone could give me pointers in how to
> > properly fix this it would be great.
> > 
> > Thanks a lot
> > Helen
> >
kernel test robot via dri-devel Feb. 14, 2019, 1:37 p.m. UTC | #5
On 2/14/19 6:21 AM, Daniel Vetter wrote:
> On Wed, Feb 13, 2019 at 09:32:43PM -0200, Helen Koike wrote:
>>
>>
>> On 2/13/19 7:21 PM, Helen Koike wrote:
>>>
>>>
>>> On 2/13/19 5:18 PM, Nicholas Kazlauskas wrote:
>>>> The prepare_fb call always happens on new_plane_state.
>>>>
>>>> The drm_atomic_helper_cleanup_planes checks to see if
>>>> plane state pointer has changed when deciding to call cleanup_fb on
>>>> either the new_plane_state or the old_plane_state.
>>>>
>>>> For a non-async atomic commit the state pointer is swapped, so this
>>>> helper calls prepare_fb on the new_plane_state and cleanup_fb on the
>>>> old_plane_state. This makes sense, since we want to prepare the
>>>> framebuffer we are going to use and cleanup the the framebuffer we are
>>>> no longer using.
>>>>
>>>> For the async atomic update helpers this differs. The async atomic
>>>> update helpers perform in-place updates on the existing state. They call
>>>> drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
>>>> This means that prepare_fb is called on the new_plane_state and
>>>> cleanup_fb is called on the new_plane_state (not the old).
>>>>
>>>> In the case where old_plane_state->fb == new_plane_state->fb then
>>>> there should be no behavioral difference between an async update
>>>> and a non-async commit. But there are issues that arise when
>>>> old_plane_state->fb != new_plane_state->fb.
>>>>
>>>> The first is that the new_plane_state->fb is immediately cleaned up
>>>> after it has been prepared, so we're using a fb that we shouldn't
>>>> be.
>>>>
>>>> The second occurs during a sequence of async atomic updates and
>>>> non-async regular atomic commits. Suppose there are two framebuffers
>>>> being interleaved in a double-buffering scenario, fb1 and fb2:
>>>>
>>>> - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
>>>> - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
>>>> - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
>>>>
>>>> We call cleanup_fb on fb2 twice in this example scenario, and any
>>>> further use will result in use-after-free.
>>>>
>>>> The simple fix to this problem is to block framebuffer changes
>>>> in the drm_atomic_helper_async_check function for now.
>>>>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> Cc: <stable@vger.kernel.org> # v4.14+
>>>> Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
>>>> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>>> ---
>>>>  drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
>>>> index 54e2ae614dcc..f4290f6b0c38 100644
>>>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>>>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>>>> @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>>>>  	    old_plane_state->crtc != new_plane_state->crtc)
>>>>  		return -EINVAL;
>>>>  
>>>> +	/*
>>>> +	 * FIXME: Since prepare_fb and cleanup_fb are always called on
>>>> +	 * the new_plane_state for async updates we need to block framebuffer
>>>> +	 * changes. This prevents use of a fb that's been cleaned up and
>>>> +	 * double cleanups from occuring.
>>>> +	 */
>>>> +	if (old_plane_state->fb != new_plane_state->fb)
>>>> +		return -EINVAL;
>>>> +
>>>>  	funcs = plane->helper_private;
>>>>  	if (!funcs->atomic_async_update)
>>>>  		return -EINVAL;
>>>>
>>>
>>> Hi,
>>>
>>> Thank you for working on this.
>>>
>>> I have two points I would like to raise:
>>>
>>> 1) cursor update is too slow:
>>>
>>> I am working on the v8 of
>>> https://lore.kernel.org/patchwork/patch/949264/ "drm/i915: update
>>> cursors asynchronously through atomic".
>>>
>>> But with this patch applied, the following tests fails:
>>>
>>> cursor-vs-flip-atomic-transitions-varying-size
>>> cursor-vs-flip-toggle
>>> cursor-vs-flip-varying-size
>>>
>>> with errors of type:
>>>
>>> "CRITICAL: completed 97 cursor updated in a period of 30 flips, we
>>> expect to complete approximately 15360 updates, with the threshold set
>>> at 7680"
>>>
>>> I guess it's because what should be async updates are being turned into
>>> non-async updates (which is slower).
>>>
>>> Well, I guess we need to work on a proper solution for the FIXME above,
>>> this is just an observation/note.
>>>
>>> 2) I have another possibly related issue:
>>>
>>> From the tests I am making, unless I did some mistake or miss
>>> interpretation (which is possible since I am new to drm), it seems that
>>> there is a problem if an async update re-uses the same fb from a
>>> previous non-async commit, for example:
>>>
>>> - Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
>>> - Async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1 #
>>> double prepare fb1
>>
>> Sorry, this is solved by this patch, I meant this:
>>
>> - Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
>> - Async commit, oldfb = fb1, newfb = fb1, prepare fb1, cleanup fb1
>> #double prepare fb1
>>
>> But I was told that async commit should block if there is a pending
>> commit, so this scenario shouldn't happen, but there is still something
>> off (please see below).
>>
>>>
>>> The cursor through atomic patch was "solving" this by bypassing the
>>> prepare call with a:
>>>
>>> @@ -13495,6 +13503,9 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>>>         struct drm_i915_gem_object *old_obj =
>>> intel_fb_obj(plane->state->fb);
>>>         int ret;
>>>
>>> +       if (new_state->state->async_update)
>>> +               return 0;
>>> +
>>>         if (old_obj) {
>>>                 struct drm_crtc_state *crtc_state =
>>>                         drm_atomic_get_new_crtc_state(new_state->state,
>>>
>>>
>>> Which looks really wrong to me, but without it, igt tests
>>> (plane_cursor_legacy, kms_cursor_legacy) fail, I get lots of
>>> I915_VMA_PIN_OVERFLOW, which suggest double pinning the frame buffers.
>>
>> Please, ignore this error above, when I ran the test this patch (Block
>> fb changes) wasn't applied when I got this error about pin overflow, but
>> I still have issues:
>>
>> With this patch (Block fb changes) applied I get several dmesg failures
>> on WARN_ON(to_intel_plane_state(state)->vma) in the
>> intel_plane_destroy_state() function.
>>
>> https://people.collabora.com/~koike/results-5.0.0-rc1+-v8-tmp/html/problems.html
>>
>> But I don't get this errors with the hack below (that bypass
>> intel_prepare_plane_fb if the buffer is already pinned).
>>
>> I'm still not sure why, I would really appreciate if any one has any
>> pointers.
> 
> i915 largely uses its own modeset framework, so there's a good chance for
> differences in how we pin/unpin buffers, compared to common atomic
> helpers. That's about my best guess.
> -Daniel

Indeed, there was a missing drm_atomic_helper_cleanup_planes() call in
intel_atomic_commit(), now it is working fine.

Thanks and sorry for the noise on this.
Helen

> 
>>
>> Thanks
>> Helen
>>
>>>
>>> I did this really ugly test to bypass the prepare call just if the
>>> buffer is already pinned:
>>> https://gitlab.collabora.com/koike/linux/commit/2d01d4af30347ff43fbac254d8ec305cd6fe4aeb
>>>
>>> With this hack igt tests pass.
>>>
>>> As I am new to DRM, I would appreciate if someone could confirm that
>>> this is a real bug or please clarify if I am miss interpreting anything.
>>>
>>> If this is a real bug, then if someone could give me pointers in how to
>>> properly fix this it would be great.
>>>
>>> Thanks a lot
>>> Helen
>>>
>
Harry Wentland Feb. 14, 2019, 2:13 p.m. UTC | #6
On 2019-01-07 12:41 p.m., Nicholas Kazlauskas wrote:
> The prepare_fb call always happens on new_plane_state.
> 
> The drm_atomic_helper_cleanup_planes checks to see if
> plane state pointer has changed when deciding to call cleanup_fb on
> either the new_plane_state or the old_plane_state.
> 
> For a non-async atomic commit the state pointer is swapped, so this
> helper calls prepare_fb on the new_plane_state and cleanup_fb on the
> old_plane_state. This makes sense, since we want to prepare the
> framebuffer we are going to use and cleanup the the framebuffer we are
> no longer using.
> 
> For the async atomic update helpers this differs. The async atomic
> update helpers perform in-place updates on the existing state. They call
> drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
> This means that prepare_fb is called on the new_plane_state and
> cleanup_fb is called on the new_plane_state (not the old).
> 
> In the case where old_plane_state->fb == new_plane_state->fb then
> there should be no behavioral difference between an async update
> and a non-async commit. But there are issues that arise when
> old_plane_state->fb != new_plane_state->fb.
> 
> The first is that the new_plane_state->fb is immediately cleaned up
> after it has been prepared, so we're using a fb that we shouldn't
> be.
> 
> The second occurs during a sequence of async atomic updates and
> non-async regular atomic commits. Suppose there are two framebuffers
> being interleaved in a double-buffering scenario, fb1 and fb2:
> 
> - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
> - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
> - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
> 
> We call cleanup_fb on fb2 twice in this example scenario, and any
> further use will result in use-after-free.
> 
> The simple fix to this problem is to block framebuffer changes
> in the drm_atomic_helper_async_check function for now.
> 
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: <stable@vger.kernel.org> # v4.14+
> Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

Patch is merged to drm-misc-fixes with Daniel's RB from IRC and Andrey's and my AB.

Harry

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 54e2ae614dcc..f4290f6b0c38 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
>  	    old_plane_state->crtc != new_plane_state->crtc)
>  		return -EINVAL;
>  
> +	/*
> +	 * FIXME: Since prepare_fb and cleanup_fb are always called on
> +	 * the new_plane_state for async updates we need to block framebuffer
> +	 * changes. This prevents use of a fb that's been cleaned up and
> +	 * double cleanups from occuring.
> +	 */
> +	if (old_plane_state->fb != new_plane_state->fb)
> +		return -EINVAL;
> +
>  	funcs = plane->helper_private;
>  	if (!funcs->atomic_async_update)
>  		return -EINVAL;
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 54e2ae614dcc..f4290f6b0c38 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1602,6 +1602,15 @@  int drm_atomic_helper_async_check(struct drm_device *dev,
 	    old_plane_state->crtc != new_plane_state->crtc)
 		return -EINVAL;
 
+	/*
+	 * FIXME: Since prepare_fb and cleanup_fb are always called on
+	 * the new_plane_state for async updates we need to block framebuffer
+	 * changes. This prevents use of a fb that's been cleaned up and
+	 * double cleanups from occuring.
+	 */
+	if (old_plane_state->fb != new_plane_state->fb)
+		return -EINVAL;
+
 	funcs = plane->helper_private;
 	if (!funcs->atomic_async_update)
 		return -EINVAL;