diff mbox series

[2/4] mgag200: Optimize damage clips

Message ID 20220426164108.1051295-3-jfalempe@redhat.com (mailing list archive)
State New, archived
Headers show
Series mgag200: Improve damage handling | expand

Commit Message

Jocelyn Falempe April 26, 2022, 4:41 p.m. UTC
when there are multiple damage clips, previous code merged them into one
big rectangle. As the Matrox memory is very slow, it's faster to copy each
damage clip.

Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
---
 drivers/gpu/drm/mgag200/mgag200_mode.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Thomas Zimmermann May 4, 2022, 10:04 a.m. UTC | #1
Hi

Am 26.04.22 um 18:41 schrieb Jocelyn Falempe:
> when there are multiple damage clips, previous code merged them into one
> big rectangle. As the Matrox memory is very slow, it's faster to copy each
> damage clip.
> 
> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>

How do you measure the performance? It seems as if it depends a lot on 
the nature of the screen update.  But maybe using that loop is faster in 
the general case with other drivers as well.

Best regards
Thomas

> ---
>   drivers/gpu/drm/mgag200/mgag200_mode.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
> index cff2e76f3fa0..2bc380a85996 100644
> --- a/drivers/gpu/drm/mgag200/mgag200_mode.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
> @@ -855,10 +855,6 @@ mgag200_handle_damage(struct mga_device *mdev, struct drm_framebuffer *fb,
>   
>   	dst += drm_fb_clip_offset(fb->pitches[0], fb->format, clip);
>   	drm_fb_memcpy_toio(dst, fb->pitches[0], vmap, fb, clip);
> -
> -	/* Always scanout image at VRAM offset 0 */
> -	mgag200_set_startadd(mdev, (u32)0);
> -	mgag200_set_offset(mdev, fb);
>   }
>   
>   static void
> @@ -904,6 +900,9 @@ mgag200_simple_display_pipe_enable(struct drm_simple_display_pipe *pipe,
>   	mgag200_enable_display(mdev);
>   
>   	mgag200_handle_damage(mdev, fb, &fullscreen, &shadow_plane_state->data[0]);
> +	/* Always scanout image at VRAM offset 0 */
> +	mgag200_set_startadd(mdev, (u32)0);
> +	mgag200_set_offset(mdev, fb);
>   }
>   
>   static void
> @@ -959,12 +958,18 @@ mgag200_simple_display_pipe_update(struct drm_simple_display_pipe *pipe,
>   	struct drm_shadow_plane_state *shadow_plane_state = to_drm_shadow_plane_state(state);
>   	struct drm_framebuffer *fb = state->fb;
>   	struct drm_rect damage;
> +	struct drm_atomic_helper_damage_iter iter;
>   
>   	if (!fb)
>   		return;
>   
> -	if (drm_atomic_helper_damage_merged(old_state, state, &damage))
> +	drm_atomic_helper_damage_iter_init(&iter, old_state, state);
> +	drm_atomic_for_each_plane_damage(&iter, &damage) {
>   		mgag200_handle_damage(mdev, fb, &damage, &shadow_plane_state->data[0]);
> +	}
> +	/* Always scanout image at VRAM offset 0 */
> +	mgag200_set_startadd(mdev, (u32)0);
> +	mgag200_set_offset(mdev, fb);
>   }
>   
>   static struct drm_crtc_state *
Jocelyn Falempe May 4, 2022, 11:56 a.m. UTC | #2
On 04/05/2022 12:04, Thomas Zimmermann wrote:
> Hi
> 
> Am 26.04.22 um 18:41 schrieb Jocelyn Falempe:
>> when there are multiple damage clips, previous code merged them into one
>> big rectangle. As the Matrox memory is very slow, it's faster to copy 
>> each
>> damage clip.
>>
>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
> 
> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
> 
> How do you measure the performance? It seems as if it depends a lot on 
> the nature of the screen update.  But maybe using that loop is faster in 
> the general case with other drivers as well.

I used ktime_get_ts64() to measure the time to copy the damage 
rectangles to VRAM.
It was always faster to copy multiple small rectangles than one big. 
(using Gnome shell/Wayland, dragging windows around).
The memory bandwith of mgag200 is so slow, that accessing it in a more 
"linear" way, is clearly not worth.

I didn't save the numbers, but copying the whole frame was around 130ms 
(for 1280x1024x32).

It may also depends on the userspace optimizations (like if a userspace 
send overlapping rectangles).
diff mbox series

Patch

diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
index cff2e76f3fa0..2bc380a85996 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -855,10 +855,6 @@  mgag200_handle_damage(struct mga_device *mdev, struct drm_framebuffer *fb,
 
 	dst += drm_fb_clip_offset(fb->pitches[0], fb->format, clip);
 	drm_fb_memcpy_toio(dst, fb->pitches[0], vmap, fb, clip);
-
-	/* Always scanout image at VRAM offset 0 */
-	mgag200_set_startadd(mdev, (u32)0);
-	mgag200_set_offset(mdev, fb);
 }
 
 static void
@@ -904,6 +900,9 @@  mgag200_simple_display_pipe_enable(struct drm_simple_display_pipe *pipe,
 	mgag200_enable_display(mdev);
 
 	mgag200_handle_damage(mdev, fb, &fullscreen, &shadow_plane_state->data[0]);
+	/* Always scanout image at VRAM offset 0 */
+	mgag200_set_startadd(mdev, (u32)0);
+	mgag200_set_offset(mdev, fb);
 }
 
 static void
@@ -959,12 +958,18 @@  mgag200_simple_display_pipe_update(struct drm_simple_display_pipe *pipe,
 	struct drm_shadow_plane_state *shadow_plane_state = to_drm_shadow_plane_state(state);
 	struct drm_framebuffer *fb = state->fb;
 	struct drm_rect damage;
+	struct drm_atomic_helper_damage_iter iter;
 
 	if (!fb)
 		return;
 
-	if (drm_atomic_helper_damage_merged(old_state, state, &damage))
+	drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+	drm_atomic_for_each_plane_damage(&iter, &damage) {
 		mgag200_handle_damage(mdev, fb, &damage, &shadow_plane_state->data[0]);
+	}
+	/* Always scanout image at VRAM offset 0 */
+	mgag200_set_startadd(mdev, (u32)0);
+	mgag200_set_offset(mdev, fb);
 }
 
 static struct drm_crtc_state *