[v2,6/9] drm/vkms: Add YUV support

Message ID	20240223-yuv-v2-6-aa6be2827bb7@bootlin.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Louis Chauvet <louis.chauvet@bootlin.com> Date: Fri, 23 Feb 2024 12:37:26 +0100 Subject: [PATCH v2 6/9] drm/vkms: Add YUV support MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240223-yuv-v2-6-aa6be2827bb7@bootlin.com> References: <20240223-yuv-v2-0-aa6be2827bb7@bootlin.com> In-Reply-To: <20240223-yuv-v2-0-aa6be2827bb7@bootlin.com> To: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>, Melissa Wen <melissa.srw@gmail.com>, =?utf-8?q?Ma=C3=ADra_Canal?= <mairacanal@riseup.net>, Haneen Mohammed <hamohammed.sa@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>, arthurgrillo@riseup.net, Jonathan Corbet <corbet@lwn.net> Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, jeremie.dautheribes@bootlin.com, miquel.raynal@bootlin.com, thomas.petazzoni@bootlin.com, Louis Chauvet <louis.chauvet@bootlin.com> Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/vkms: Reimplement line-per-line pixel conversion for plane reading \| expand [v2,0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading [v2,1/9] drm/vkms: Code formatting [v2,2/9] drm/vkms: Use drm_frame directly [v2,3/9] drm/vkms: write/update the documentation for pixel conversion and pixel write functions [v2,4/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions [v2,5/9] drm/vkms: Re-introduce line-per-line composition algorithm [v2,6/9] drm/vkms: Add YUV support [v2,7/9] drm/vkms: Add range and encoding properties to pixel_read function [v2,8/9] drm/vkms: Drop YUV formats TODO [v2,9/9] drm/vkms: Create KUnit tests for YUV conversions

Louis Chauvet Feb. 23, 2024, 11:37 a.m. UTC

From: Arthur Grillo <arthurgrillo@riseup.net>

Add support to the YUV formats bellow:

- NV12
- NV16
- NV24
- NV21
- NV61
- NV42
- YUV420
- YUV422
- YUV444
- YVU420
- YVU422
- YVU444

The conversion matrices of each encoding and range were obtained by
rounding the values of the original conversion matrices multiplied by
2^8. This is done to avoid the use of fixed point operations.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
[Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
callbacks for yuv formats]
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
 drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
 drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
 drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
 5 files changed, 295 insertions(+), 20 deletions(-)

Thomas Zimmermann Feb. 23, 2024, 11:46 a.m. UTC | #1

Am 23.02.24 um 12:37 schrieb Louis Chauvet:
> From: Arthur Grillo <arthurgrillo@riseup.net>
>
> Add support to the YUV formats bellow:
>
> - NV12
> - NV16
> - NV24
> - NV21
> - NV61
> - NV42
> - YUV420
> - YUV422
> - YUV444
> - YVU420
> - YVU422
> - YVU444
>
> The conversion matrices of each encoding and range were obtained by
> rounding the values of the original conversion matrices multiplied by
> 2^8. This is done to avoid the use of fixed point operations.
>
> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> callbacks for yuv formats]
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>   drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>   drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>   drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>   drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>   5 files changed, 295 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index e555bf9c1aee..54fc5161d565 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>   			 * buffer [1]
>   			 */
>   			current_plane->pixel_read_line(
> -				current_plane->frame_info,
> +				current_plane,
>   				x_start,
>   				y_start,
>   				direction,
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index ccc5be009f15..a4f6456cb971 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>   	READ_RIGHT
>   };
>   
> +struct vkms_plane_state;
> +
>   /**
>   <<<<<<< HEAD

Noise

>    * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>    * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>    *  x_end.
>    */
> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>   
>   /**
>    * vkms_plane_state - Driver specific plane state
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 46daea6d3ee9..515c80866a58 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>   	 */
>   	return fb->offsets[plane_index] +
>   	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> +	       (x / drm_format_info_block_height(format, plane_index)) *
> +	       format->char_per_block[plane_index];
>   }
>   
>   /**
> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>   	}
>   }
>   
> +/**
> + * get_subsampling() - Get the subsampling value on a specific direction
> + */
> +static int get_subsampling(const struct drm_format_info *format,
> +			   enum pixel_read_direction direction)
> +{
> +	if (direction == READ_LEFT || direction == READ_RIGHT)
> +		return format->hsub;
> +	else if (direction == READ_DOWN || direction == READ_UP)
> +		return format->vsub;
> +	return 1;
> +}
> +
> +/**
> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> + */
> +static int get_subsampling_offset(const struct drm_format_info *format,
> +				  enum pixel_read_direction direction, int x_start, int y_start)
> +{
> +	if (direction == READ_RIGHT || direction == READ_LEFT)
> +		return x_start;
> +	else if (direction == READ_DOWN || direction == READ_UP)
> +		return y_start;
> +	return 0;
> +}
> +
>   
>   /*
>    * The following  functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> @@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>   	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>   }
>   
> +static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +{
> +	s32 y_16, cb_16, cr_16;
> +	s32 r_16, g_16, b_16;
> +
> +	y_16 = y - y_offset;
> +	cb_16 = cb - 128;
> +	cr_16 = cr - 128;
> +
> +	r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
> +	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> +	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> +
> +	*r = clamp(r_16, 0, 0xffff) >> 8;
> +	*g = clamp(g_16, 0, 0xffff) >> 8;
> +	*b = clamp(b_16, 0, 0xffff) >> 8;
> +}
> +
> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> +			       enum drm_color_encoding encoding, enum drm_color_range range)
> +{
> +	static const s16 bt601_full[3][3] = {
> +		{ 256, 0,   359 },
> +		{ 256, -88, -183 },
> +		{ 256, 454, 0 },
> +	};
> +	static const s16 bt601[3][3] = {
> +		{ 298, 0,    409 },
> +		{ 298, -100, -208 },
> +		{ 298, 516,  0 },
> +	};
> +	static const s16 rec709_full[3][3] = {
> +		{ 256, 0,   408 },
> +		{ 256, -48, -120 },
> +		{ 256, 476, 0 },
> +	};
> +	static const s16 rec709[3][3] = {
> +		{ 298, 0,   459 },
> +		{ 298, -55, -136 },
> +		{ 298, 541, 0 },
> +	};
> +	static const s16 bt2020_full[3][3] = {
> +		{ 256, 0,   377 },
> +		{ 256, -42, -146 },
> +		{ 256, 482, 0 },
> +	};
> +	static const s16 bt2020[3][3] = {
> +		{ 298, 0,   430 },
> +		{ 298, -48, -167 },
> +		{ 298, 548, 0 },
> +	};
> +
> +	u8 r = 0;
> +	u8 g = 0;
> +	u8 b = 0;
> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> +	unsigned int y_offset = full ? 0 : 16;
> +
> +	switch (encoding) {
> +	case DRM_COLOR_YCBCR_BT601:
> +		ycbcr2rgb(full ? bt601_full : bt601,
> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	case DRM_COLOR_YCBCR_BT709:
> +		ycbcr2rgb(full ? rec709_full : rec709,
> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	case DRM_COLOR_YCBCR_BT2020:
> +		ycbcr2rgb(full ? bt2020_full : bt2020,
> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	default:
> +		pr_warn_once("Not supported color encoding\n");
> +		break;
> +	}
> +
> +	argb_u16->r = r * 257;
> +	argb_u16->g = g * 257;
> +	argb_u16->b = b * 257;
> +}
> +
>   /*
>    * The following functions are read_line function for each pixel format supported by VKMS.
>    *
> @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>    * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>    */
>   
> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>   			       enum pixel_read_direction direction, int count,
>   			       struct pixel_argb_u16 out_pixel[])
>   {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>   
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>   
>   	while (count) {
>   		u8 *px = (u8 *)src_pixels;
> @@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
>   	}
>   }
>   
> -static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>   			       enum pixel_read_direction direction, int count,
>   			       struct pixel_argb_u16 out_pixel[])
>   {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>   
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>   
>   	while (count) {
>   		u8 *px = (u8 *)src_pixels;
> @@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
>   	}
>   }
>   
> -static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>   				   enum pixel_read_direction direction, int count,
>   				   struct pixel_argb_u16 out_pixel[])
>   {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>   
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>   
>   	while (count) {
>   		u16 *px = (u16 *)src_pixels;
> @@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
>   	}
>   }
>   
> -static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>   				   enum pixel_read_direction direction, int count,
>   				   struct pixel_argb_u16 out_pixel[])
>   {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>   
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>   
>   	while (count) {
>   		u16 *px = (u16 *)src_pixels;
> @@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
>   	}
>   }
>   
> -static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>   			     enum pixel_read_direction direction, int count,
>   			     struct pixel_argb_u16 out_pixel[])
>   {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>   
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>   
>   	while (count) {
>   		u16 *px = (u16 *)src_pixels;
> @@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
>   	}
>   }
>   
> +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				      enum pixel_read_direction direction, int count,
> +				      struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> +					  x_start / plane->frame_info->fb->format->hsub,
> +					  y_start / plane->frame_info->fb->format->vsub,
> +					  1);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start); // 0
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = y_plane[0];
> +		yuv_u8.u = uv_plane[0];
> +		yuv_u8.v = uv_plane[1];
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0)
> +			uv_plane += step_uv;
> +	}
> +}
> +
> +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				      enum pixel_read_direction direction, int count,
> +				      struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> +					  x_start / plane->frame_info->fb->format->hsub,
> +					  y_start / plane->frame_info->fb->format->vsub,
> +					  1);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = y_plane[0];
> +		yuv_u8.u = vu_plane[1];
> +		yuv_u8.v = vu_plane[0];
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0)
> +			vu_plane += step_vu;
> +	}
> +}
> +
> +static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				 enum pixel_read_direction direction, int count,
> +				 struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *u_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 1);
> +	u8 *v_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 2);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = *y_plane;
> +		yuv_u8.u = *u_plane;
> +		yuv_u8.v = *v_plane;
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0) {
> +			u_plane += step_u;
> +			v_plane += step_v;
> +		}
> +	}
> +}
> +
> +static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				 enum pixel_read_direction direction, int count,
> +				 struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *v_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 1);
> +	u8 *u_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 2);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = *y_plane;
> +		yuv_u8.u = *u_plane;
> +		yuv_u8.v = *v_plane;
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0) {
> +			u_plane += step_u;
> +			v_plane += step_v;
> +		}
> +	}
> +}
> +
>   /*
>    * The following functions take one argb_u16 pixel and convert it to a specific format. The
>    * result is stored in @dst_pixels.
> @@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
>   		return &XRGB16161616_read_line;
>   	case DRM_FORMAT_RGB565:
>   		return &RGB565_read_line;
> +	case DRM_FORMAT_NV12:
> +	case DRM_FORMAT_NV16:
> +	case DRM_FORMAT_NV24:
> +		return &semi_planar_yuv_read_line;
> +	case DRM_FORMAT_NV21:
> +	case DRM_FORMAT_NV61:
> +	case DRM_FORMAT_NV42:
> +		return &semi_planar_yvu_read_line;
> +	case DRM_FORMAT_YUV420:
> +	case DRM_FORMAT_YUV422:
> +	case DRM_FORMAT_YUV444:
> +		return &planar_yuv_read_line;
> +	case DRM_FORMAT_YVU420:
> +	case DRM_FORMAT_YVU422:
> +	case DRM_FORMAT_YVU444:
> +		return &planar_yvu_read_line;
>   	default:
>   		return (pixel_read_line_t)NULL;
>   	}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 8d2bef95ff79..5a3a9e1328d8 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);
>   
>   pixel_write_t get_pixel_write_function(u32 format);
>   
> +struct pixel_yuv_u8 {
> +	u8 y, u, v;
> +};
> +
>   #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 58c1c74742b5..427ca67c60ce 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
>   	DRM_FORMAT_XRGB8888,
>   	DRM_FORMAT_XRGB16161616,
>   	DRM_FORMAT_ARGB16161616,
> -	DRM_FORMAT_RGB565
> +	DRM_FORMAT_RGB565,
> +	DRM_FORMAT_NV12,
> +	DRM_FORMAT_NV16,
> +	DRM_FORMAT_NV24,
> +	DRM_FORMAT_NV21,
> +	DRM_FORMAT_NV61,
> +	DRM_FORMAT_NV42,
> +	DRM_FORMAT_YUV420,
> +	DRM_FORMAT_YUV422,
> +	DRM_FORMAT_YUV444,
> +	DRM_FORMAT_YVU420,
> +	DRM_FORMAT_YVU422,
> +	DRM_FORMAT_YVU444
>   };
>   
>   static struct drm_plane_state *
>

Pekka Paalanen Feb. 26, 2024, 12:19 p.m. UTC | #2

On Fri, 23 Feb 2024 12:37:26 +0100
Louis Chauvet <louis.chauvet@bootlin.com> wrote:

> From: Arthur Grillo <arthurgrillo@riseup.net>
> 
> Add support to the YUV formats bellow:
> 
> - NV12
> - NV16
> - NV24
> - NV21
> - NV61
> - NV42
> - YUV420
> - YUV422
> - YUV444
> - YVU420
> - YVU422
> - YVU444
> 
> The conversion matrices of each encoding and range were obtained by
> rounding the values of the original conversion matrices multiplied by
> 2^8. This is done to avoid the use of fixed point operations.
> 
> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> callbacks for yuv formats]
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>  5 files changed, 295 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index e555bf9c1aee..54fc5161d565 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>  			 * buffer [1]
>  			 */
>  			current_plane->pixel_read_line(
> -				current_plane->frame_info,
> +				current_plane,
>  				x_start,
>  				y_start,
>  				direction,
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index ccc5be009f15..a4f6456cb971 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>  	READ_RIGHT
>  };
>  
> +struct vkms_plane_state;
> +
>  /**
>  <<<<<<< HEAD
>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>   *  x_end.
>   */
> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);

This is the second or third time in this one series changing this type.
Could you not do the change once, in its own patch if possible?

>  
>  /**
>   * vkms_plane_state - Driver specific plane state
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 46daea6d3ee9..515c80866a58 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>  	 */
>  	return fb->offsets[plane_index] +
>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> +	       (x / drm_format_info_block_height(format, plane_index)) *
> +	       format->char_per_block[plane_index];

Shouldn't this be in the patch that added this code in the first place?

>  }
>  
>  /**
> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>  	}
>  }
>  
> +/**
> + * get_subsampling() - Get the subsampling value on a specific direction

subsampling divisor

> + */
> +static int get_subsampling(const struct drm_format_info *format,
> +			   enum pixel_read_direction direction)
> +{
> +	if (direction == READ_LEFT || direction == READ_RIGHT)
> +		return format->hsub;
> +	else if (direction == READ_DOWN || direction == READ_UP)
> +		return format->vsub;
> +	return 1;

In this and the below function, personally I'd prefer switch-case, with
a cannot-happen-scream after the switch, so the compiler can warn about
unhandled enum values.

> +}
> +
> +/**
> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> + */
> +static int get_subsampling_offset(const struct drm_format_info *format,
> +				  enum pixel_read_direction direction, int x_start, int y_start)

'start' values as "increments" for a pixel counter? Is something
misnamed here?

Is it an increment or an offset?

> +{
> +	if (direction == READ_RIGHT || direction == READ_LEFT)
> +		return x_start;
> +	else if (direction == READ_DOWN || direction == READ_UP)
> +		return y_start;
> +	return 0;
> +}
> +
>  
>  /*
>   * The following  functions take pixel data (a, r, g, b, pixel, ...), convert them to the format
> @@ -130,6 +157,87 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>  }
>  
> +static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +{
> +	s32 y_16, cb_16, cr_16;
> +	s32 r_16, g_16, b_16;
> +
> +	y_16 = y - y_offset;
> +	cb_16 = cb - 128;
> +	cr_16 = cr - 128;
> +
> +	r_16 = m[0][0] * y_16 + m[0][1] * cb_16 + m[0][2] * cr_16;
> +	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> +	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> +
> +	*r = clamp(r_16, 0, 0xffff) >> 8;
> +	*g = clamp(g_16, 0, 0xffff) >> 8;
> +	*b = clamp(b_16, 0, 0xffff) >> 8;
> +}
> +
> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> +			       enum drm_color_encoding encoding, enum drm_color_range range)
> +{
> +	static const s16 bt601_full[3][3] = {
> +		{ 256, 0,   359 },
> +		{ 256, -88, -183 },
> +		{ 256, 454, 0 },
> +	};
> +	static const s16 bt601[3][3] = {
> +		{ 298, 0,    409 },
> +		{ 298, -100, -208 },
> +		{ 298, 516,  0 },
> +	};
> +	static const s16 rec709_full[3][3] = {
> +		{ 256, 0,   408 },
> +		{ 256, -48, -120 },
> +		{ 256, 476, 0 },
> +	};
> +	static const s16 rec709[3][3] = {
> +		{ 298, 0,   459 },
> +		{ 298, -55, -136 },
> +		{ 298, 541, 0 },
> +	};
> +	static const s16 bt2020_full[3][3] = {
> +		{ 256, 0,   377 },
> +		{ 256, -42, -146 },
> +		{ 256, 482, 0 },
> +	};
> +	static const s16 bt2020[3][3] = {
> +		{ 298, 0,   430 },
> +		{ 298, -48, -167 },
> +		{ 298, 548, 0 },
> +	};
> +
> +	u8 r = 0;
> +	u8 g = 0;
> +	u8 b = 0;
> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> +	unsigned int y_offset = full ? 0 : 16;
> +
> +	switch (encoding) {
> +	case DRM_COLOR_YCBCR_BT601:
> +		ycbcr2rgb(full ? bt601_full : bt601,

Doing all these conditional again pixel by pixel is probably
inefficient. Just like with the line reading functions, you could pick
the matrix in advance.

> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	case DRM_COLOR_YCBCR_BT709:
> +		ycbcr2rgb(full ? rec709_full : rec709,
> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	case DRM_COLOR_YCBCR_BT2020:
> +		ycbcr2rgb(full ? bt2020_full : bt2020,
> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> +		break;
> +	default:
> +		pr_warn_once("Not supported color encoding\n");
> +		break;
> +	}
> +
> +	argb_u16->r = r * 257;
> +	argb_u16->g = g * 257;
> +	argb_u16->b = b * 257;

I wonder. Using 8-bit fixed point precision seems quite coarse for
8-bit pixel formats, and it's going to be insufficient for higher bit
depths. Was supporting e.g. 10-bit YUV considered? There is even
deeper, too, like DRM_FORMAT_P016.

> +}
> +
>  /*
>   * The following functions are read_line function for each pixel format supported by VKMS.
>   *
> @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>   * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>   */
>  
> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>  			       enum pixel_read_direction direction, int count,
>  			       struct pixel_argb_u16 out_pixel[])
>  {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>  
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);

These are the kind of changes I would not expect to see in a patch
adding YUV support. There are a lot of them, too.

>  
>  	while (count) {
>  		u8 *px = (u8 *)src_pixels;
> @@ -160,13 +268,13 @@ static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
>  	}
>  }
>  
> -static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>  			       enum pixel_read_direction direction, int count,
>  			       struct pixel_argb_u16 out_pixel[])
>  {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>  
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>  
>  	while (count) {
>  		u8 *px = (u8 *)src_pixels;
> @@ -178,13 +286,13 @@ static void XRGB8888_read_line(struct vkms_frame_info *frame_info, int x_start,
>  	}
>  }
>  
> -static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void ARGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>  				   enum pixel_read_direction direction, int count,
>  				   struct pixel_argb_u16 out_pixel[])
>  {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>  
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>  
>  	while (count) {
>  		u16 *px = (u16 *)src_pixels;
> @@ -196,13 +304,13 @@ static void ARGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
>  	}
>  }
>  
> -static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void XRGB16161616_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>  				   enum pixel_read_direction direction, int count,
>  				   struct pixel_argb_u16 out_pixel[])
>  {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>  
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>  
>  	while (count) {
>  		u16 *px = (u16 *)src_pixels;
> @@ -214,13 +322,13 @@ static void XRGB16161616_read_line(struct vkms_frame_info *frame_info, int x_sta
>  	}
>  }
>  
> -static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> +static void RGB565_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
>  			     enum pixel_read_direction direction, int count,
>  			     struct pixel_argb_u16 out_pixel[])
>  {
> -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>  
> -	int step = get_step_1x1(frame_info->fb, direction, 0);
> +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>  
>  	while (count) {
>  		u16 *px = (u16 *)src_pixels;
> @@ -232,6 +340,139 @@ static void RGB565_read_line(struct vkms_frame_info *frame_info, int x_start, in
>  	}
>  }
>  
> +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				      enum pixel_read_direction direction, int count,
> +				      struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> +					  x_start / plane->frame_info->fb->format->hsub,
> +					  y_start / plane->frame_info->fb->format->vsub,
> +					  1);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start); // 0
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = y_plane[0];
> +		yuv_u8.u = uv_plane[0];
> +		yuv_u8.v = uv_plane[1];
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);

Oh, so this was the reason to change the read-line function signature.
Maybe just stash a pointer to the right matrix and the right y_offset
in frame_info instead?

> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0)
> +			uv_plane += step_uv;
> +	}
> +}
> +
> +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				      enum pixel_read_direction direction, int count,
> +				      struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> +					  x_start / plane->frame_info->fb->format->hsub,
> +					  y_start / plane->frame_info->fb->format->vsub,
> +					  1);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = y_plane[0];
> +		yuv_u8.u = vu_plane[1];
> +		yuv_u8.v = vu_plane[0];

You could swap matrix columns instead of writing this whole new
function for UV vs. VU. Just an idea.


Thanks,
pq

> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0)
> +			vu_plane += step_vu;
> +	}
> +}
> +
> +static void planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				 enum pixel_read_direction direction, int count,
> +				 struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *u_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 1);
> +	u8 *v_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 2);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = *y_plane;
> +		yuv_u8.u = *u_plane;
> +		yuv_u8.v = *v_plane;
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0) {
> +			u_plane += step_u;
> +			v_plane += step_v;
> +		}
> +	}
> +}
> +
> +static void planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> +				 enum pixel_read_direction direction, int count,
> +				 struct pixel_argb_u16 out_pixel[])
> +{
> +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> +	u8 *v_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 1);
> +	u8 *u_plane = packed_pixels_addr(plane->frame_info,
> +					 x_start / plane->frame_info->fb->format->hsub,
> +					 y_start / plane->frame_info->fb->format->vsub,
> +					 2);
> +	struct pixel_yuv_u8 yuv_u8;
> +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> +	int step_u = get_step_1x1(plane->frame_info->fb, direction, 1);
> +	int step_v = get_step_1x1(plane->frame_info->fb, direction, 2);
> +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> +							x_start, y_start);
> +
> +	for (int i = 0; i < count; i++) {
> +		yuv_u8.y = *y_plane;
> +		yuv_u8.u = *u_plane;
> +		yuv_u8.v = *v_plane;
> +
> +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> +				   plane->base.base.color_range);
> +		out_pixel += 1;
> +		y_plane += step_y;
> +		if ((i + subsampling_offset + 1) % subsampling == 0) {
> +			u_plane += step_u;
> +			v_plane += step_v;
> +		}
> +	}
> +}
> +
>  /*
>   * The following functions take one argb_u16 pixel and convert it to a specific format. The
>   * result is stored in @dst_pixels.
> @@ -344,6 +585,22 @@ pixel_read_line_t get_pixel_read_line_function(u32 format)
>  		return &XRGB16161616_read_line;
>  	case DRM_FORMAT_RGB565:
>  		return &RGB565_read_line;
> +	case DRM_FORMAT_NV12:
> +	case DRM_FORMAT_NV16:
> +	case DRM_FORMAT_NV24:
> +		return &semi_planar_yuv_read_line;
> +	case DRM_FORMAT_NV21:
> +	case DRM_FORMAT_NV61:
> +	case DRM_FORMAT_NV42:
> +		return &semi_planar_yvu_read_line;
> +	case DRM_FORMAT_YUV420:
> +	case DRM_FORMAT_YUV422:
> +	case DRM_FORMAT_YUV444:
> +		return &planar_yuv_read_line;
> +	case DRM_FORMAT_YVU420:
> +	case DRM_FORMAT_YVU422:
> +	case DRM_FORMAT_YVU444:
> +		return &planar_yvu_read_line;
>  	default:
>  		return (pixel_read_line_t)NULL;
>  	}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 8d2bef95ff79..5a3a9e1328d8 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -9,4 +9,8 @@ pixel_read_line_t get_pixel_read_line_function(u32 format);
>  
>  pixel_write_t get_pixel_write_function(u32 format);
>  
> +struct pixel_yuv_u8 {
> +	u8 y, u, v;
> +};
> +
>  #endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 58c1c74742b5..427ca67c60ce 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -17,7 +17,19 @@ static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
>  	DRM_FORMAT_XRGB16161616,
>  	DRM_FORMAT_ARGB16161616,
> -	DRM_FORMAT_RGB565
> +	DRM_FORMAT_RGB565,
> +	DRM_FORMAT_NV12,
> +	DRM_FORMAT_NV16,
> +	DRM_FORMAT_NV24,
> +	DRM_FORMAT_NV21,
> +	DRM_FORMAT_NV61,
> +	DRM_FORMAT_NV42,
> +	DRM_FORMAT_YUV420,
> +	DRM_FORMAT_YUV422,
> +	DRM_FORMAT_YUV444,
> +	DRM_FORMAT_YVU420,
> +	DRM_FORMAT_YVU422,
> +	DRM_FORMAT_YVU444
>  };
>  
>  static struct drm_plane_state *
>

Louis Chauvet Feb. 27, 2024, 3:02 p.m. UTC | #3

Hi Pekka,

For all the comment related to the conversion part, maybe Arthur have an 
opinion on it, I took his patch as a "black box" (I did not want to 
break (and debug) it).

Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
> On Fri, 23 Feb 2024 12:37:26 +0100
> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
> 
> > From: Arthur Grillo <arthurgrillo@riseup.net>
> > 
> > Add support to the YUV formats bellow:
> > 
> > - NV12
> > - NV16
> > - NV24
> > - NV21
> > - NV61
> > - NV42
> > - YUV420
> > - YUV422
> > - YUV444
> > - YVU420
> > - YVU422
> > - YVU444
> > 
> > The conversion matrices of each encoding and range were obtained by
> > rounding the values of the original conversion matrices multiplied by
> > 2^8. This is done to avoid the use of fixed point operations.
> > 
> > Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> > [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> > callbacks for yuv formats]
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
> >  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
> >  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
> >  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
> >  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
> >  5 files changed, 295 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index e555bf9c1aee..54fc5161d565 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> >  			 * buffer [1]
> >  			 */
> >  			current_plane->pixel_read_line(
> > -				current_plane->frame_info,
> > +				current_plane,
> >  				x_start,
> >  				y_start,
> >  				direction,
> > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > index ccc5be009f15..a4f6456cb971 100644
> > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > @@ -75,6 +75,8 @@ enum pixel_read_direction {
> >  	READ_RIGHT
> >  };
> >  
> > +struct vkms_plane_state;
> > +
> >  /**
> >  <<<<<<< HEAD
> >   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > @@ -87,8 +89,8 @@ enum pixel_read_direction {
> >   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> >   *  x_end.
> >   */
> > -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> > -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> > +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> 
> This is the second or third time in this one series changing this type.
> Could you not do the change once, in its own patch if possible?

Sorry, this is not a change here, but a wrong formatting (missed when 
rebasing).

Do you think that it make sense to re-order my patches and put this 
typedef at the end? This way it is never updated.

> >  
> >  /**
> >   * vkms_plane_state - Driver specific plane state
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 46daea6d3ee9..515c80866a58 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> >  	 */
> >  	return fb->offsets[plane_index] +
> >  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> > +	       (x / drm_format_info_block_height(format, plane_index)) *
> > +	       format->char_per_block[plane_index];
> 
> Shouldn't this be in the patch that added this code in the first place?

Same as above, a wrong formatting, I will remove this change and keep 
everything on one line (even if it's more than 100 chars, it is easier to 
read).

> >  }
> >  
> >  /**
> > @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> >  	}
> >  }
> >  
> > +/**
> > + * get_subsampling() - Get the subsampling value on a specific direction
> 
> subsampling divisor

Thanks for this precision.

> > + */
> > +static int get_subsampling(const struct drm_format_info *format,
> > +			   enum pixel_read_direction direction)
> > +{
> > +	if (direction == READ_LEFT || direction == READ_RIGHT)
> > +		return format->hsub;
> > +	else if (direction == READ_DOWN || direction == READ_UP)
> > +		return format->vsub;
> > +	return 1;
> 
> In this and the below function, personally I'd prefer switch-case, with
> a cannot-happen-scream after the switch, so the compiler can warn about
> unhandled enum values.

As for the previous patch, I did not know about this compiler feature, 
thanks!

> > +}
> > +
> > +/**
> > + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> > + */
> > +static int get_subsampling_offset(const struct drm_format_info *format,
> > +				  enum pixel_read_direction direction, int x_start, int y_start)
> 
> 'start' values as "increments" for a pixel counter? Is something
> misnamed here?
> 
> Is it an increment or an offset?

I don't really know how to name the function. I'm open to suggestions
x_start and y_start are really the coordinate of the starting reading point.

To explain what it does:

When using subsampling, you have to read the next pixel of planes[1..4] 
not at the same "speed" as plane[0]. But I can't only rely on 
"read_pixel_count % subsampling == 0", because it means that the pixel 
incrementation on planes[1..4] may not be aligned with the buffer (if 
hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
for x=2,4,6... not 1,3,5...).

A way to ensure this is to add an "offset" to count, which ensure that the 
count % subsampling == 0 on the correct pixel.

I made an error, the switch case must be (as count is always counting up, 
for "inverted" reading direction a negative number ensure that 
%subsampling == 0 on the correct pixel):

	switch (direction) {
	case READ_UP:
		return -y_start;
	case READ_DOWN:
		return y_start;
	case READ_LEFT:
		return -x_start;
	case READ_RIGHT:
		return x_start;
	}

> > +{
> > +	if (direction == READ_RIGHT || direction == READ_LEFT)
> > +		return x_start;
> > +	else if (direction == READ_DOWN || direction == READ_UP)
> > +		return y_start;
> > +	return 0;
> > +}
> > +

[...]

> > +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> > +			       enum drm_color_encoding encoding, enum drm_color_range range)
> > +{
> > +	static const s16 bt601_full[3][3] = {
> > +		{ 256, 0,   359 },
> > +		{ 256, -88, -183 },
> > +		{ 256, 454, 0 },
> > +	};

[...]

> > +
> > +	u8 r = 0;
> > +	u8 g = 0;
> > +	u8 b = 0;
> > +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> > +	unsigned int y_offset = full ? 0 : 16;
> > +
> > +	switch (encoding) {
> > +	case DRM_COLOR_YCBCR_BT601:
> > +		ycbcr2rgb(full ? bt601_full : bt601,
> 
> Doing all these conditional again pixel by pixel is probably
> inefficient. Just like with the line reading functions, you could pick
> the matrix in advance.

I don't think the performance impact is huge (it's only a pair of if), but 
yes, it's an easy optimization. 

I will create a conversion_matrix structure:

	struct conversion_matrix {
		s16 matrix[3][3];
		u16 y_offset;
	}

I will create a `get_conversion_matrix_to_argb_u16` function to get this 
structure from a format+encoding+range.

I will also add a field `conversion_matrix` in struct vkms_plane_state to 
get this matrix only once per plane setup.


> > +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > +		break;
> > +	case DRM_COLOR_YCBCR_BT709:
> > +		ycbcr2rgb(full ? rec709_full : rec709,
> > +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > +		break;
> > +	case DRM_COLOR_YCBCR_BT2020:
> > +		ycbcr2rgb(full ? bt2020_full : bt2020,
> > +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > +		break;
> > +	default:
> > +		pr_warn_once("Not supported color encoding\n");
> > +		break;
> > +	}
> > +
> > +	argb_u16->r = r * 257;
> > +	argb_u16->g = g * 257;
> > +	argb_u16->b = b * 257;
> 
> I wonder. Using 8-bit fixed point precision seems quite coarse for
> 8-bit pixel formats, and it's going to be insufficient for higher bit
> depths. Was supporting e.g. 10-bit YUV considered? There is even
> deeper, too, like DRM_FORMAT_P016.

It's a good point, as I explained above, I took the conversion part as a 
"black box" to avoid breaking (and debugging) stuff. I think it's easy to 
switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
the float part).

Maybe Arthur have an opinion on this?

Just to be sure, the DRM subsystem don't have such matrix somewhere? It 
can be nice to avoid duplicating them.

> > +}
> > +
> >  /*
> >   * The following functions are read_line function for each pixel format supported by VKMS.
> >   *
> > @@ -142,13 +250,13 @@ static void RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
> >   * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
> >   */
> >  
> > -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, int x_start, int y_start,
> > +static void ARGB8888_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> >  			       enum pixel_read_direction direction, int count,
> >  			       struct pixel_argb_u16 out_pixel[])
> >  {
> > -	u8 *src_pixels = packed_pixels_addr(frame_info, x_start, y_start, 0);
> > +	u8 *src_pixels = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> >  
> > -	int step = get_step_1x1(frame_info->fb, direction, 0);
> > +	int step = get_step_1x1(plane->frame_info->fb, direction, 0);
> 
> These are the kind of changes I would not expect to see in a patch
> adding YUV support. There are a lot of them, too.

I will put it directly this change in PATCHv2 5/9.

[...]

> > +static void semi_planar_yuv_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> > +				      enum pixel_read_direction direction, int count,
> > +				      struct pixel_argb_u16 out_pixel[])
> > +{
> > +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> > +	u8 *uv_plane = packed_pixels_addr(plane->frame_info,
> > +					  x_start / plane->frame_info->fb->format->hsub,
> > +					  y_start / plane->frame_info->fb->format->vsub,
> > +					  1);
> > +	struct pixel_yuv_u8 yuv_u8;
> > +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> > +	int step_uv = get_step_1x1(plane->frame_info->fb, direction, 1);
> > +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> > +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> > +							x_start, y_start); // 0
> > +
> > +	for (int i = 0; i < count; i++) {
> > +		yuv_u8.y = y_plane[0];
> > +		yuv_u8.u = uv_plane[0];
> > +		yuv_u8.v = uv_plane[1];
> > +
> > +		yuv_u8_to_argb_u16(out_pixel, &yuv_u8, plane->base.base.color_encoding,
> > +				   plane->base.base.color_range);
> 
> Oh, so this was the reason to change the read-line function signature.
> Maybe just stash a pointer to the right matrix and the right y_offset
> in frame_info instead?

Yes, that why I changed the signature. I think I will keep this signature 
and put the conversion_matrix inside the vkms_plane_state, for me it make 
more sense to have pixel_read_line and conversion_matrix in the same 
structure.

> > +		out_pixel += 1;
> > +		y_plane += step_y;
> > +		if ((i + subsampling_offset + 1) % subsampling == 0)
> > +			uv_plane += step_uv;
> > +	}
> > +}
> > +
> > +static void semi_planar_yvu_read_line(struct vkms_plane_state *plane, int x_start, int y_start,
> > +				      enum pixel_read_direction direction, int count,
> > +				      struct pixel_argb_u16 out_pixel[])
> > +{
> > +	u8 *y_plane = packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
> > +	u8 *vu_plane = packed_pixels_addr(plane->frame_info,
> > +					  x_start / plane->frame_info->fb->format->hsub,
> > +					  y_start / plane->frame_info->fb->format->vsub,
> > +					  1);
> > +	struct pixel_yuv_u8 yuv_u8;
> > +	int step_y = get_step_1x1(plane->frame_info->fb, direction, 0);
> > +	int step_vu = get_step_1x1(plane->frame_info->fb, direction, 1);
> > +	int subsampling = get_subsampling(plane->frame_info->fb->format, direction);
> > +	int subsampling_offset = get_subsampling_offset(plane->frame_info->fb->format, direction,
> > +							x_start, y_start);
> > +	for (int i = 0; i < count; i++) {
> > +		yuv_u8.y = y_plane[0];
> > +		yuv_u8.u = vu_plane[1];
> > +		yuv_u8.v = vu_plane[0];
> 
> You could swap matrix columns instead of writing this whole new
> function for UV vs. VU. Just an idea.

I was not happy with this duplication too, but I did not think about 
switching columns. That's a good idea, thanks!
 
Kind regards,
Louis Chauvet

[...]

Arthur Grillo Feb. 27, 2024, 8:01 p.m. UTC | #4

On 27/02/24 12:02, Louis Chauvet wrote:
> Hi Pekka,
> 
> For all the comment related to the conversion part, maybe Arthur have an 
> opinion on it, I took his patch as a "black box" (I did not want to 
> break (and debug) it).
> 
> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>> On Fri, 23 Feb 2024 12:37:26 +0100
>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
>>
>>> From: Arthur Grillo <arthurgrillo@riseup.net>
>>>
>>> Add support to the YUV formats bellow:
>>>
>>> - NV12
>>> - NV16
>>> - NV24
>>> - NV21
>>> - NV61
>>> - NV42
>>> - YUV420
>>> - YUV422
>>> - YUV444
>>> - YVU420
>>> - YVU422
>>> - YVU444
>>>
>>> The conversion matrices of each encoding and range were obtained by
>>> rounding the values of the original conversion matrices multiplied by
>>> 2^8. This is done to avoid the use of fixed point operations.
>>>
>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>> callbacks for yuv formats]
>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>> ---
>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>  5 files changed, 295 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>> index e555bf9c1aee..54fc5161d565 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>  			 * buffer [1]
>>>  			 */
>>>  			current_plane->pixel_read_line(
>>> -				current_plane->frame_info,
>>> +				current_plane,
>>>  				x_start,
>>>  				y_start,
>>>  				direction,
>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>> index ccc5be009f15..a4f6456cb971 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>  	READ_RIGHT
>>>  };
>>>  
>>> +struct vkms_plane_state;
>>> +
>>>  /**
>>>  <<<<<<< HEAD
>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>   *  x_end.
>>>   */
>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>
>> This is the second or third time in this one series changing this type.
>> Could you not do the change once, in its own patch if possible?
> 
> Sorry, this is not a change here, but a wrong formatting (missed when 
> rebasing).
> 
> Do you think that it make sense to re-order my patches and put this 
> typedef at the end? This way it is never updated.
> 
>>>  
>>>  /**
>>>   * vkms_plane_state - Driver specific plane state
>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>> index 46daea6d3ee9..515c80866a58 100644
>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>  	 */
>>>  	return fb->offsets[plane_index] +
>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
>>> +	       format->char_per_block[plane_index];
>>
>> Shouldn't this be in the patch that added this code in the first place?
> 
> Same as above, a wrong formatting, I will remove this change and keep 
> everything on one line (even if it's more than 100 chars, it is easier to 
> read).
> 
>>>  }
>>>  
>>>  /**
>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>  	}
>>>  }
>>>  
>>> +/**
>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>
>> subsampling divisor
> 
> Thanks for this precision.
> 
>>> + */
>>> +static int get_subsampling(const struct drm_format_info *format,
>>> +			   enum pixel_read_direction direction)
>>> +{
>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
>>> +		return format->hsub;
>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>> +		return format->vsub;
>>> +	return 1;
>>
>> In this and the below function, personally I'd prefer switch-case, with
>> a cannot-happen-scream after the switch, so the compiler can warn about
>> unhandled enum values.
> 
> As for the previous patch, I did not know about this compiler feature, 
> thanks!
> 
>>> +}
>>> +
>>> +/**
>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>> + */
>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>> +				  enum pixel_read_direction direction, int x_start, int y_start)
>>
>> 'start' values as "increments" for a pixel counter? Is something
>> misnamed here?
>>
>> Is it an increment or an offset?
> 
> I don't really know how to name the function. I'm open to suggestions
> x_start and y_start are really the coordinate of the starting reading point.
> 
> To explain what it does:
> 
> When using subsampling, you have to read the next pixel of planes[1..4] 
> not at the same "speed" as plane[0]. But I can't only rely on 
> "read_pixel_count % subsampling == 0", because it means that the pixel 
> incrementation on planes[1..4] may not be aligned with the buffer (if 
> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
> for x=2,4,6... not 1,3,5...).
> 
> A way to ensure this is to add an "offset" to count, which ensure that the 
> count % subsampling == 0 on the correct pixel.
> 
> I made an error, the switch case must be (as count is always counting up, 
> for "inverted" reading direction a negative number ensure that 
> %subsampling == 0 on the correct pixel):
> 
> 	switch (direction) {
> 	case READ_UP:
> 		return -y_start;
> 	case READ_DOWN:
> 		return y_start;
> 	case READ_LEFT:
> 		return -x_start;
> 	case READ_RIGHT:
> 		return x_start;
> 	}
> 
>>> +{
>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
>>> +		return x_start;
>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>> +		return y_start;
>>> +	return 0;
>>> +}
>>> +
> 
> [...]
> 
>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
>>> +{
>>> +	static const s16 bt601_full[3][3] = {
>>> +		{ 256, 0,   359 },
>>> +		{ 256, -88, -183 },
>>> +		{ 256, 454, 0 },
>>> +	};
> 
> [...]
> 
>>> +
>>> +	u8 r = 0;
>>> +	u8 g = 0;
>>> +	u8 b = 0;
>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>> +	unsigned int y_offset = full ? 0 : 16;
>>> +
>>> +	switch (encoding) {
>>> +	case DRM_COLOR_YCBCR_BT601:
>>> +		ycbcr2rgb(full ? bt601_full : bt601,
>>
>> Doing all these conditional again pixel by pixel is probably
>> inefficient. Just like with the line reading functions, you could pick
>> the matrix in advance.
> 
> I don't think the performance impact is huge (it's only a pair of if), but 
> yes, it's an easy optimization. 
> 
> I will create a conversion_matrix structure:
> 
> 	struct conversion_matrix {
> 		s16 matrix[3][3];
> 		u16 y_offset;
> 	}
> 
> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
> structure from a format+encoding+range.
> 
> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
> get this matrix only once per plane setup.
> 
> 
>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> +		break;
>>> +	case DRM_COLOR_YCBCR_BT709:
>>> +		ycbcr2rgb(full ? rec709_full : rec709,
>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> +		break;
>>> +	case DRM_COLOR_YCBCR_BT2020:
>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>> +		break;
>>> +	default:
>>> +		pr_warn_once("Not supported color encoding\n");
>>> +		break;
>>> +	}
>>> +
>>> +	argb_u16->r = r * 257;
>>> +	argb_u16->g = g * 257;
>>> +	argb_u16->b = b * 257;
>>
>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>> deeper, too, like DRM_FORMAT_P016.
> 
> It's a good point, as I explained above, I took the conversion part as a 
> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
> the float part).
> 
> Maybe Arthur have an opinion on this?

Yeah, I too don't see why not we could do that. The 8-bit precision was
sufficient for those formats, but as well noted by Pekka this could be a
problem for higher bit depths. I just need to make my terrible python
script spit those values XD.

> Just to be sure, the DRM subsystem don't have such matrix somewhere? It 
> can be nice to avoid duplicating them.

As to my knowledge it does not exist on DRM, I think those are normally
on the hardware itself (*please* correct me if I'm wrong).

But, v4l2 has a similar table on
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
code based on this), unfortunately it's only 8-bit too.

Best Regards,
~Arthur Grillo

> 
>>> +} + /* * The following functions are read_line function for each
>>> pixel format supported by VKMS. * @@ -142,13 +250,13 @@ static void
>>> RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>>> * [1]:
>>> https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>>> */
>>>  
>>> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info,
>>> int x_start, int y_start, +static void ARGB8888_read_line(struct
>>> vkms_plane_state *plane, int x_start, int y_start, enum
>>> pixel_read_direction direction, int count, struct pixel_argb_u16
>>> out_pixel[]) { -	u8 *src_pixels = packed_pixels_addr(frame_info,
>>> x_start, y_start, 0); +	u8 *src_pixels =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>>>  
>>> -	int step = get_step_1x1(frame_info->fb, direction, 0); +
>>> int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>>
>> These are the kind of changes I would not expect to see in a patch
>> adding YUV support. There are a lot of them, too.
> 
> I will put it directly this change in PATCHv2 5/9.
> 
> [...]
> 
>>> +static void semi_planar_yuv_read_line(struct vkms_plane_state
>>> *plane, int x_start, int y_start, +
>>> enum pixel_read_direction direction, int count, +
>>> struct pixel_argb_u16 out_pixel[]) +{ +	u8 *y_plane =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>> u8 *uv_plane = packed_pixels_addr(plane->frame_info, +
>>> x_start / plane->frame_info->fb->format->hsub, +
>>> y_start / plane->frame_info->fb->format->vsub, +
>>> 1); +	struct pixel_yuv_u8 yuv_u8; +	int step_y =
>>> get_step_1x1(plane->frame_info->fb, direction, 0); +	int
>>> step_uv = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>> direction); +	int subsampling_offset =
>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>> x_start, y_start); // 0 + +	for (int i = 0; i < count; i++) { +
>>> yuv_u8.y = y_plane[0]; +		yuv_u8.u = uv_plane[0]; +
>>> yuv_u8.v = uv_plane[1]; + +		yuv_u8_to_argb_u16(out_pixel,
>>> &yuv_u8, plane->base.base.color_encoding, +
>>> plane->base.base.color_range);
>>
>> Oh, so this was the reason to change the read-line function
>> signature. Maybe just stash a pointer to the right matrix and the
>> right y_offset in frame_info instead?
> 
> Yes, that why I changed the signature. I think I will keep this
> signature and put the conversion_matrix inside the vkms_plane_state,
> for me it make more sense to have pixel_read_line and
> conversion_matrix in the same structure.
> 
>>> +		out_pixel += 1; +		y_plane += step_y; +
>>> if ((i + subsampling_offset + 1) % subsampling == 0) +
>>> uv_plane += step_uv; +	} +} + +static void
>>> semi_planar_yvu_read_line(struct vkms_plane_state *plane, int
>>> x_start, int y_start, +				      enum
>>> pixel_read_direction direction, int count, +
>>> struct pixel_argb_u16 out_pixel[]) +{ +	u8 *y_plane =
>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>> u8 *vu_plane = packed_pixels_addr(plane->frame_info, +
>>> x_start / plane->frame_info->fb->format->hsub, +
>>> y_start / plane->frame_info->fb->format->vsub, +
>>> 1); +	struct pixel_yuv_u8 yuv_u8; +	int step_y =
>>> get_step_1x1(plane->frame_info->fb, direction, 0); +	int
>>> step_vu = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>> direction); +	int subsampling_offset =
>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>> x_start, y_start); +	for (int i = 0; i < count; i++) { +
>>> yuv_u8.y = y_plane[0]; +		yuv_u8.u = vu_plane[1]; +
>>> yuv_u8.v = vu_plane[0];
>>
>> You could swap matrix columns instead of writing this whole new
>> function for UV vs. VU. Just an idea.
> 
> I was not happy with this duplication too, but I did not think about
> switching columns. That's a good idea, thanks!
>  
> Kind regards, Louis Chauvet
> 
> [...]
>

Arthur Grillo Feb. 29, 2024, 1:52 a.m. UTC | #5

On 27/02/24 17:01, Arthur Grillo wrote:
> 
> 
> On 27/02/24 12:02, Louis Chauvet wrote:
>> Hi Pekka,
>>
>> For all the comment related to the conversion part, maybe Arthur have an 
>> opinion on it, I took his patch as a "black box" (I did not want to 
>> break (and debug) it).
>>
>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :
>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
>>>
>>>> From: Arthur Grillo <arthurgrillo@riseup.net>
>>>>
>>>> Add support to the YUV formats bellow:
>>>>
>>>> - NV12
>>>> - NV16
>>>> - NV24
>>>> - NV21
>>>> - NV61
>>>> - NV42
>>>> - YUV420
>>>> - YUV422
>>>> - YUV444
>>>> - YVU420
>>>> - YVU422
>>>> - YVU444
>>>>
>>>> The conversion matrices of each encoding and range were obtained by
>>>> rounding the values of the original conversion matrices multiplied by
>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>
>>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>> callbacks for yuv formats]
>>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>>> ---
>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>>  5 files changed, 295 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index e555bf9c1aee..54fc5161d565 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>  			 * buffer [1]
>>>>  			 */
>>>>  			current_plane->pixel_read_line(
>>>> -				current_plane->frame_info,
>>>> +				current_plane,
>>>>  				x_start,
>>>>  				y_start,
>>>>  				direction,
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> index ccc5be009f15..a4f6456cb971 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>  	READ_RIGHT
>>>>  };
>>>>  
>>>> +struct vkms_plane_state;
>>>> +
>>>>  /**
>>>>  <<<<<<< HEAD
>>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>   *  x_end.
>>>>   */
>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>
>>> This is the second or third time in this one series changing this type.
>>> Could you not do the change once, in its own patch if possible?
>>
>> Sorry, this is not a change here, but a wrong formatting (missed when 
>> rebasing).
>>
>> Do you think that it make sense to re-order my patches and put this 
>> typedef at the end? This way it is never updated.
>>
>>>>  
>>>>  /**
>>>>   * vkms_plane_state - Driver specific plane state
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> index 46daea6d3ee9..515c80866a58 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>  	 */
>>>>  	return fb->offsets[plane_index] +
>>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
>>>> +	       format->char_per_block[plane_index];
>>>
>>> Shouldn't this be in the patch that added this code in the first place?
>>
>> Same as above, a wrong formatting, I will remove this change and keep 
>> everything on one line (even if it's more than 100 chars, it is easier to 
>> read).
>>
>>>>  }
>>>>  
>>>>  /**
>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>  	}
>>>>  }
>>>>  
>>>> +/**
>>>> + * get_subsampling() - Get the subsampling value on a specific direction
>>>
>>> subsampling divisor
>>
>> Thanks for this precision.
>>
>>>> + */
>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>> +			   enum pixel_read_direction direction)
>>>> +{
>>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
>>>> +		return format->hsub;
>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>> +		return format->vsub;
>>>> +	return 1;
>>>
>>> In this and the below function, personally I'd prefer switch-case, with
>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>> unhandled enum values.
>>
>> As for the previous patch, I did not know about this compiler feature, 
>> thanks!
>>
>>>> +}
>>>> +
>>>> +/**
>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>> + */
>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>> +				  enum pixel_read_direction direction, int x_start, int y_start)
>>>
>>> 'start' values as "increments" for a pixel counter? Is something
>>> misnamed here?
>>>
>>> Is it an increment or an offset?
>>
>> I don't really know how to name the function. I'm open to suggestions
>> x_start and y_start are really the coordinate of the starting reading point.
>>
>> To explain what it does:
>>
>> When using subsampling, you have to read the next pixel of planes[1..4] 
>> not at the same "speed" as plane[0]. But I can't only rely on 
>> "read_pixel_count % subsampling == 0", because it means that the pixel 
>> incrementation on planes[1..4] may not be aligned with the buffer (if 
>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
>> for x=2,4,6... not 1,3,5...).
>>
>> A way to ensure this is to add an "offset" to count, which ensure that the 
>> count % subsampling == 0 on the correct pixel.
>>
>> I made an error, the switch case must be (as count is always counting up, 
>> for "inverted" reading direction a negative number ensure that 
>> %subsampling == 0 on the correct pixel):
>>
>> 	switch (direction) {
>> 	case READ_UP:
>> 		return -y_start;
>> 	case READ_DOWN:
>> 		return y_start;
>> 	case READ_LEFT:
>> 		return -x_start;
>> 	case READ_RIGHT:
>> 		return x_start;
>> 	}
>>
>>>> +{
>>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
>>>> +		return x_start;
>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>> +		return y_start;
>>>> +	return 0;
>>>> +}
>>>> +
>>
>> [...]
>>
>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
>>>> +{
>>>> +	static const s16 bt601_full[3][3] = {
>>>> +		{ 256, 0,   359 },
>>>> +		{ 256, -88, -183 },
>>>> +		{ 256, 454, 0 },
>>>> +	};
>>
>> [...]
>>
>>>> +
>>>> +	u8 r = 0;
>>>> +	u8 g = 0;
>>>> +	u8 b = 0;
>>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>> +	unsigned int y_offset = full ? 0 : 16;
>>>> +
>>>> +	switch (encoding) {
>>>> +	case DRM_COLOR_YCBCR_BT601:
>>>> +		ycbcr2rgb(full ? bt601_full : bt601,
>>>
>>> Doing all these conditional again pixel by pixel is probably
>>> inefficient. Just like with the line reading functions, you could pick
>>> the matrix in advance.
>>
>> I don't think the performance impact is huge (it's only a pair of if), but 
>> yes, it's an easy optimization. 
>>
>> I will create a conversion_matrix structure:
>>
>> 	struct conversion_matrix {
>> 		s16 matrix[3][3];
>> 		u16 y_offset;
>> 	}
>>
>> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
>> structure from a format+encoding+range.
>>
>> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
>> get this matrix only once per plane setup.
>>
>>
>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> +		break;
>>>> +	case DRM_COLOR_YCBCR_BT709:
>>>> +		ycbcr2rgb(full ? rec709_full : rec709,
>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> +		break;
>>>> +	case DRM_COLOR_YCBCR_BT2020:
>>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>> +		break;
>>>> +	default:
>>>> +		pr_warn_once("Not supported color encoding\n");
>>>> +		break;
>>>> +	}
>>>> +
>>>> +	argb_u16->r = r * 257;
>>>> +	argb_u16->g = g * 257;
>>>> +	argb_u16->b = b * 257;
>>>
>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>> deeper, too, like DRM_FORMAT_P016.
>>
>> It's a good point, as I explained above, I took the conversion part as a 
>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
>> the float part).
>>
>> Maybe Arthur have an opinion on this?
> 
> Yeah, I too don't see why not we could do that. The 8-bit precision was
> sufficient for those formats, but as well noted by Pekka this could be a
> problem for higher bit depths. I just need to make my terrible python
> script spit those values XD.

Finally, I got it working with 32-bit precision.

I basically threw all my untrusted python code away, and started using
the colour python framework suggested by Sebastian[1]. After knowing the
right values (and staring at numbers for hours), I found that with a
little bit of rounding, the conversion works.

Also, while at it, I changed the name rec709 to bt709 to follow the
pattern and added "_full" to the full ranges matrices.

While using the library, I noticed that the red component is wrong on
the color red in one test case.

[1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/

Best Regards,
~Arthur Grillo

---

diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
index f66584549827..4cee3c2d8d84 100644
--- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
+++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
@@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
 			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
 			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
 			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
-			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
+			{"red",   {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
 			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
 			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
 		},
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index e06bbd7c0a67..043f23dbf80d 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
 	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
 }
 
-static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
+#define BIT_DEPTH 32
+
+static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
 {
-	s32 y_16, cb_16, cr_16;
-	s32 r_16, g_16, b_16;
+	s64 y_16, cb_16, cr_16;
+	s64 r_16, g_16, b_16;
 
 	y_16 =  y - y_offset;
 	cb_16 = cb - 128;
@@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
 	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
 	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
 
-	*r = clamp(r_16, 0, 0xffff) >> 8;
-	*g = clamp(g_16, 0, 0xffff) >> 8;
-	*b = clamp(b_16, 0, 0xffff) >> 8;
+	// rounding the values
+	r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
+	g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
+	b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
+
+	r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+	g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+	b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
+
+	*r = r_16 >> BIT_DEPTH;
+	*g = g_16 >> BIT_DEPTH;
+	*b = b_16 >> BIT_DEPTH;
 }
 
 VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
@@ -144,35 +155,40 @@ VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
 					 enum drm_color_encoding encoding,
 					 enum drm_color_range range)
 {
-	static const s16 bt601_full[3][3] = {
-		{256,   0,  359},
-		{256, -88, -183},
-		{256, 454,    0},
+	static const s64 bt601_full[3][3] = {
+		{4294967296, 0, 6021544149},
+		{4294967296, -1478054095, -3067191994},
+		{4294967296, 7610682049, 0},
 	};
-	static const s16 bt601[3][3] = {
-		{298,    0,  409},
-		{298, -100, -208},
-		{298,  516,    0},
+
+	static const s64 bt601_limited[3][3] = {
+		{5020601039, 0, 6881764740},
+		{5020601039, -1689204679, -3505362278},
+		{5020601039, 8697922339, 0},
 	};
-	static const s16 rec709_full[3][3] = {
-		{256,   0,  408},
-		{256, -48, -120},
-		{256, 476,   0 },
+
+	static const s64 bt709_full[3][3] = {
+		{4294967296, 0, 6763714498},
+		{4294967296, -804551626, -2010578443},
+		{4294967296, 7969741314, 0},
 	};
-	static const s16 rec709[3][3] = {
-		{298,   0,  459},
-		{298, -55, -136},
-		{298, 541,    0},
+
+	static const s64 bt709_limited[3][3] = {
+		{5020601039, 0, 7729959424},
+		{5020601039, -919487572, -2297803934},
+		{5020601039, 9108275786, 0},
 	};
-	static const s16 bt2020_full[3][3] = {
-		{256,   0,  377},
-		{256, -42, -146},
-		{256, 482,    0},
+
+	static const s64 bt2020_full[3][3] = {
+		{4294967296, 0, 6333358775},
+		{4294967296, -706750298, -2453942994},
+		{4294967296, 8080551471, 0},
 	};
-	static const s16 bt2020[3][3] = {
-		{298,   0,  430},
-		{298, -48, -167},
-		{298, 548,    0},
+
+	static const s64 bt2020_limited[3][3] = {
+		{5020601039, 0, 7238124312},
+		{5020601039, -807714626, -2804506279},
+		{5020601039, 9234915964, 0},
 	};
 
 	u8 r = 0;
@@ -183,15 +199,15 @@ VISIBLE_IF_KUNIT void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16,
 
 	switch (encoding) {
 	case DRM_COLOR_YCBCR_BT601:
-		ycbcr2rgb(full ? bt601_full : bt601,
+		ycbcr2rgb(full ? bt601_full : bt601_limited,
 			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
 		break;
 	case DRM_COLOR_YCBCR_BT709:
-		ycbcr2rgb(full ? rec709_full : rec709,
+		ycbcr2rgb(full ? bt709_full : bt709_limited,
 			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
 		break;
 	case DRM_COLOR_YCBCR_BT2020:
-		ycbcr2rgb(full ? bt2020_full : bt2020,
+		ycbcr2rgb(full ? bt2020_full : bt2020_limited,
 			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
 		break;
 	default:

---

> 
>> Just to be sure, the DRM subsystem don't have such matrix somewhere? It 
>> can be nice to avoid duplicating them.
> 
> As to my knowledge it does not exist on DRM, I think those are normally
> on the hardware itself (*please* correct me if I'm wrong).
> 
> But, v4l2 has a similar table on
> drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
> code based on this), unfortunately it's only 8-bit too.
> 
> Best Regards,
> ~Arthur Grillo
> 
>>
>>>> +} + /* * The following functions are read_line function for each
>>>> pixel format supported by VKMS. * @@ -142,13 +250,13 @@ static void
>>>> RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe
>>>> * [1]:
>>>> https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>>>> */
>>>>  
>>>> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info,
>>>> int x_start, int y_start, +static void ARGB8888_read_line(struct
>>>> vkms_plane_state *plane, int x_start, int y_start, enum
>>>> pixel_read_direction direction, int count, struct pixel_argb_u16
>>>> out_pixel[]) { -	u8 *src_pixels = packed_pixels_addr(frame_info,
>>>> x_start, y_start, 0); +	u8 *src_pixels =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0);
>>>>  
>>>> -	int step = get_step_1x1(frame_info->fb, direction, 0); +
>>>> int step = get_step_1x1(plane->frame_info->fb, direction, 0);
>>>
>>> These are the kind of changes I would not expect to see in a patch
>>> adding YUV support. There are a lot of them, too.
>>
>> I will put it directly this change in PATCHv2 5/9.
>>
>> [...]
>>
>>>> +static void semi_planar_yuv_read_line(struct vkms_plane_state
>>>> *plane, int x_start, int y_start, +
>>>> enum pixel_read_direction direction, int count, +
>>>> struct pixel_argb_u16 out_pixel[]) +{ +	u8 *y_plane =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>>> u8 *uv_plane = packed_pixels_addr(plane->frame_info, +
>>>> x_start / plane->frame_info->fb->format->hsub, +
>>>> y_start / plane->frame_info->fb->format->vsub, +
>>>> 1); +	struct pixel_yuv_u8 yuv_u8; +	int step_y =
>>>> get_step_1x1(plane->frame_info->fb, direction, 0); +	int
>>>> step_uv = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>>> direction); +	int subsampling_offset =
>>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>>> x_start, y_start); // 0 + +	for (int i = 0; i < count; i++) { +
>>>> yuv_u8.y = y_plane[0]; +		yuv_u8.u = uv_plane[0]; +
>>>> yuv_u8.v = uv_plane[1]; + +		yuv_u8_to_argb_u16(out_pixel,
>>>> &yuv_u8, plane->base.base.color_encoding, +
>>>> plane->base.base.color_range);
>>>
>>> Oh, so this was the reason to change the read-line function
>>> signature. Maybe just stash a pointer to the right matrix and the
>>> right y_offset in frame_info instead?
>>
>> Yes, that why I changed the signature. I think I will keep this
>> signature and put the conversion_matrix inside the vkms_plane_state,
>> for me it make more sense to have pixel_read_line and
>> conversion_matrix in the same structure.
>>
>>>> +		out_pixel += 1; +		y_plane += step_y; +
>>>> if ((i + subsampling_offset + 1) % subsampling == 0) +
>>>> uv_plane += step_uv; +	} +} + +static void
>>>> semi_planar_yvu_read_line(struct vkms_plane_state *plane, int
>>>> x_start, int y_start, +				      enum
>>>> pixel_read_direction direction, int count, +
>>>> struct pixel_argb_u16 out_pixel[]) +{ +	u8 *y_plane =
>>>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); +
>>>> u8 *vu_plane = packed_pixels_addr(plane->frame_info, +
>>>> x_start / plane->frame_info->fb->format->hsub, +
>>>> y_start / plane->frame_info->fb->format->vsub, +
>>>> 1); +	struct pixel_yuv_u8 yuv_u8; +	int step_y =
>>>> get_step_1x1(plane->frame_info->fb, direction, 0); +	int
>>>> step_vu = get_step_1x1(plane->frame_info->fb, direction, 1); +
>>>> int subsampling = get_subsampling(plane->frame_info->fb->format,
>>>> direction); +	int subsampling_offset =
>>>> get_subsampling_offset(plane->frame_info->fb->format, direction, +
>>>> x_start, y_start); +	for (int i = 0; i < count; i++) { +
>>>> yuv_u8.y = y_plane[0]; +		yuv_u8.u = vu_plane[1]; +
>>>> yuv_u8.v = vu_plane[0];
>>>
>>> You could swap matrix columns instead of writing this whole new
>>> function for UV vs. VU. Just an idea.
>>
>> I was not happy with this duplication too, but I did not think about
>> switching columns. That's a good idea, thanks!
>>  
>> Kind regards, Louis Chauvet
>>
>> [...]
>>

Pekka Paalanen Feb. 29, 2024, 12:12 p.m. UTC | #6

On Wed, 28 Feb 2024 22:52:09 -0300
Arthur Grillo <arthurgrillo@riseup.net> wrote:

> On 27/02/24 17:01, Arthur Grillo wrote:
> > 
> > 
> > On 27/02/24 12:02, Louis Chauvet wrote:  
> >> Hi Pekka,
> >>
> >> For all the comment related to the conversion part, maybe Arthur have an 
> >> opinion on it, I took his patch as a "black box" (I did not want to 
> >> break (and debug) it).
> >>
> >> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :  
> >>> On Fri, 23 Feb 2024 12:37:26 +0100
> >>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
> >>>  
> >>>> From: Arthur Grillo <arthurgrillo@riseup.net>
> >>>>
> >>>> Add support to the YUV formats bellow:
> >>>>
> >>>> - NV12
> >>>> - NV16
> >>>> - NV24
> >>>> - NV21
> >>>> - NV61
> >>>> - NV42
> >>>> - YUV420
> >>>> - YUV422
> >>>> - YUV444
> >>>> - YVU420
> >>>> - YVU422
> >>>> - YVU444
> >>>>
> >>>> The conversion matrices of each encoding and range were obtained by
> >>>> rounding the values of the original conversion matrices multiplied by
> >>>> 2^8. This is done to avoid the use of fixed point operations.
> >>>>
> >>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> >>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> >>>> callbacks for yuv formats]
> >>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> >>>> ---
> >>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
> >>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
> >>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
> >>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
> >>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
> >>>>  5 files changed, 295 insertions(+), 20 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> index e555bf9c1aee..54fc5161d565 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> >>>>  			 * buffer [1]
> >>>>  			 */
> >>>>  			current_plane->pixel_read_line(
> >>>> -				current_plane->frame_info,
> >>>> +				current_plane,
> >>>>  				x_start,
> >>>>  				y_start,
> >>>>  				direction,
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> index ccc5be009f15..a4f6456cb971 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> >>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> >>>>  	READ_RIGHT
> >>>>  };
> >>>>  
> >>>> +struct vkms_plane_state;
> >>>> +
> >>>>  /**
> >>>>  <<<<<<< HEAD
> >>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> >>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> >>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> >>>>   *  x_end.
> >>>>   */
> >>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> >>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> >>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> >>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);  
> >>>
> >>> This is the second or third time in this one series changing this type.
> >>> Could you not do the change once, in its own patch if possible?  
> >>
> >> Sorry, this is not a change here, but a wrong formatting (missed when 
> >> rebasing).
> >>
> >> Do you think that it make sense to re-order my patches and put this 
> >> typedef at the end? This way it is never updated.

I'm not sure, I haven't checked how it would change your patches. The
intermediate changes might get a lot uglier?

Just try to fold changes so that you don't need to change something
twice over the series unless there is a good reason to. "How hard would
it be to review this?" is my measure stick.


> >>  
> >>>>  
> >>>>  /**
> >>>>   * vkms_plane_state - Driver specific plane state
> >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> index 46daea6d3ee9..515c80866a58 100644
> >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> >>>>  	 */
> >>>>  	return fb->offsets[plane_index] +
> >>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> >>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> >>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
> >>>> +	       format->char_per_block[plane_index];  
> >>>
> >>> Shouldn't this be in the patch that added this code in the first place?  
> >>
> >> Same as above, a wrong formatting, I will remove this change and keep 
> >> everything on one line (even if it's more than 100 chars, it is easier to 
> >> read).

Personally I agree that readability is more important than strict line
length limits. I'm not sure how the kernel rolls there.

> >>  
> >>>>  }
> >>>>  
> >>>>  /**
> >>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +/**
> >>>> + * get_subsampling() - Get the subsampling value on a specific direction  
> >>>
> >>> subsampling divisor  
> >>
> >> Thanks for this precision.
> >>  
> >>>> + */
> >>>> +static int get_subsampling(const struct drm_format_info *format,
> >>>> +			   enum pixel_read_direction direction)
> >>>> +{
> >>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
> >>>> +		return format->hsub;
> >>>> +	else if (direction == READ_DOWN || direction == READ_UP)
> >>>> +		return format->vsub;
> >>>> +	return 1;  
> >>>
> >>> In this and the below function, personally I'd prefer switch-case, with
> >>> a cannot-happen-scream after the switch, so the compiler can warn about
> >>> unhandled enum values.  
> >>
> >> As for the previous patch, I did not know about this compiler feature, 
> >> thanks!
> >>  
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> >>>> + */
> >>>> +static int get_subsampling_offset(const struct drm_format_info *format,
> >>>> +				  enum pixel_read_direction direction, int x_start, int y_start)  
> >>>
> >>> 'start' values as "increments" for a pixel counter? Is something
> >>> misnamed here?
> >>>
> >>> Is it an increment or an offset?  
> >>
> >> I don't really know how to name the function. I'm open to suggestions
> >> x_start and y_start are really the coordinate of the starting reading point.

I looks like it's an offset, so "offset" and "start" are good words.
Then the only misleading piece is the doc:

	"Get the subsampling offset to use when incrementing the pixel counter"

This sounds like the offset is used when incrementing a counter, that
is, counter is increment by offset each time. That's my problem with
this.

Fix just the doc, and it's good, I think.

> >>
> >> To explain what it does:
> >>
> >> When using subsampling, you have to read the next pixel of planes[1..4] 
> >> not at the same "speed" as plane[0]. But I can't only rely on 
> >> "read_pixel_count % subsampling == 0", because it means that the pixel 
> >> incrementation on planes[1..4] may not be aligned with the buffer (if 
> >> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
> >> for x=2,4,6... not 1,3,5...).
> >>
> >> A way to ensure this is to add an "offset" to count, which ensure that the 
> >> count % subsampling == 0 on the correct pixel.

Yes, I think I did get that feeling from the code eventually somehow,
but it wouldn't hurt to explain it in the comment.

"An offset for keeping the chroma siting consistent regardless of
x_start and y_start" maybe?

> >>
> >> I made an error, the switch case must be (as count is always counting up, 
> >> for "inverted" reading direction a negative number ensure that 
> >> %subsampling == 0 on the correct pixel):
> >>
> >> 	switch (direction) {
> >> 	case READ_UP:
> >> 		return -y_start;
> >> 	case READ_DOWN:
> >> 		return y_start;
> >> 	case READ_LEFT:
> >> 		return -x_start;
> >> 	case READ_RIGHT:
> >> 		return x_start;
> >> 	}

Yes, the inverted reading directions are different indeed. I did not
think through if this works also for sub-sampling divisors > 2 which I
don't think are ever used.

Does IGT find this mistake? If not, maybe IGT should be extended.

> >>  
> >>>> +{
> >>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
> >>>> +		return x_start;
> >>>> +	else if (direction == READ_DOWN || direction == READ_UP)
> >>>> +		return y_start;
> >>>> +	return 0;
> >>>> +}
> >>>> +  
> >>
> >> [...]
> >>  
> >>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> >>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
> >>>> +{
> >>>> +	static const s16 bt601_full[3][3] = {
> >>>> +		{ 256, 0,   359 },
> >>>> +		{ 256, -88, -183 },
> >>>> +		{ 256, 454, 0 },
> >>>> +	};  
> >>
> >> [...]
> >>  
> >>>> +
> >>>> +	u8 r = 0;
> >>>> +	u8 g = 0;
> >>>> +	u8 b = 0;
> >>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> >>>> +	unsigned int y_offset = full ? 0 : 16;
> >>>> +
> >>>> +	switch (encoding) {
> >>>> +	case DRM_COLOR_YCBCR_BT601:
> >>>> +		ycbcr2rgb(full ? bt601_full : bt601,  
> >>>
> >>> Doing all these conditional again pixel by pixel is probably
> >>> inefficient. Just like with the line reading functions, you could pick
> >>> the matrix in advance.  
> >>
> >> I don't think the performance impact is huge (it's only a pair of if), but 
> >> yes, it's an easy optimization. 
> >>
> >> I will create a conversion_matrix structure:
> >>
> >> 	struct conversion_matrix {
> >> 		s16 matrix[3][3];
> >> 		u16 y_offset;
> >> 	}

When defining such a struct type, it would be good to document the
matrix layout (which one is row, which one is column), and what the s16
mean (fixed point?).

Try to not mix signed and unsigned types, too. The C implicit type
promotion rules can be surprising. Just make everything signed while
computing, and convert to/from unsigned only for storage.

> >>
> >> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
> >> structure from a format+encoding+range.
> >>
> >> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
> >> get this matrix only once per plane setup.

Alright. Let's see how that works.

> >>
> >>  
> >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> +		break;
> >>>> +	case DRM_COLOR_YCBCR_BT709:
> >>>> +		ycbcr2rgb(full ? rec709_full : rec709,
> >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> +		break;
> >>>> +	case DRM_COLOR_YCBCR_BT2020:
> >>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
> >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> >>>> +		break;
> >>>> +	default:
> >>>> +		pr_warn_once("Not supported color encoding\n");
> >>>> +		break;
> >>>> +	}
> >>>> +
> >>>> +	argb_u16->r = r * 257;
> >>>> +	argb_u16->g = g * 257;
> >>>> +	argb_u16->b = b * 257;  
> >>>
> >>> I wonder. Using 8-bit fixed point precision seems quite coarse for
> >>> 8-bit pixel formats, and it's going to be insufficient for higher bit
> >>> depths. Was supporting e.g. 10-bit YUV considered? There is even
> >>> deeper, too, like DRM_FORMAT_P016.  
> >>
> >> It's a good point, as I explained above, I took the conversion part as a 
> >> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
> >> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
> >> the float part).
> >>
> >> Maybe Arthur have an opinion on this?  
> > 
> > Yeah, I too don't see why not we could do that. The 8-bit precision was
> > sufficient for those formats, but as well noted by Pekka this could be a
> > problem for higher bit depths. I just need to make my terrible python
> > script spit those values XD.  
> 
> Finally, I got it working with 32-bit precision.
> 
> I basically threw all my untrusted python code away, and started using
> the colour python framework suggested by Sebastian[1]. After knowing the
> right values (and staring at numbers for hours), I found that with a
> little bit of rounding, the conversion works.
> 
> Also, while at it, I changed the name rec709 to bt709 to follow the
> pattern and added "_full" to the full ranges matrices.
> 
> While using the library, I noticed that the red component is wrong on
> the color red in one test case.
> 
> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/

That all sounds good. I wish the kernel code contained comments
explaining how exactly you computed those matrices with python/colour.
If the python snippets are not too long, including them verbatim as
code comments would be really nice for both reviewers and posterity.

The same for the VKMS unit tests, too, how you got the expected result
values.

> 
> Best Regards,
> ~Arthur Grillo
> 
> ---
> 
> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> index f66584549827..4cee3c2d8d84 100644
> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
>  			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
>  			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
>  			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> -			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> +			{"red",   {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>  			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
>  			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
>  		},
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index e06bbd7c0a67..043f23dbf80d 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>  }
>  
> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> +#define BIT_DEPTH 32
> +
> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>  {
> -	s32 y_16, cb_16, cr_16;
> -	s32 r_16, g_16, b_16;
> +	s64 y_16, cb_16, cr_16;
> +	s64 r_16, g_16, b_16;
>  
>  	y_16 =  y - y_offset;
>  	cb_16 = cb - 128;
> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
>  	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
>  	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>  
> -	*r = clamp(r_16, 0, 0xffff) >> 8;
> -	*g = clamp(g_16, 0, 0xffff) >> 8;
> -	*b = clamp(b_16, 0, 0xffff) >> 8;
> +	// rounding the values
> +	r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
> +	g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
> +	b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
> +
> +	r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> +	g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> +	b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);

Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?

> +
> +	*r = r_16 >> BIT_DEPTH;
> +	*g = g_16 >> BIT_DEPTH;
> +	*b = b_16 >> BIT_DEPTH;
>  }

...

> >   
> >> Just to be sure, the DRM subsystem don't have such matrix somewhere? It 
> >> can be nice to avoid duplicating them.  
> > 
> > As to my knowledge it does not exist on DRM, I think those are normally
> > on the hardware itself (*please* correct me if I'm wrong).

I couldn't find a matrix type either on a quick glance, but there is
drm_fixed.h for a couple different fixed point formats, it seems. FWIW.
drm_fixed.h didn't feel very appealing for this here.

> > 
> > But, v4l2 has a similar table on
> > drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
> > code based on this), unfortunately it's only 8-bit too.

Thanks,
pq

Arthur Grillo Feb. 29, 2024, 5:57 p.m. UTC | #7

On 29/02/24 09:12, Pekka Paalanen wrote:
> On Wed, 28 Feb 2024 22:52:09 -0300
> Arthur Grillo <arthurgrillo@riseup.net> wrote:
> 
>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>
>>>
>>> On 27/02/24 12:02, Louis Chauvet wrote:  
>>>> Hi Pekka,
>>>>
>>>> For all the comment related to the conversion part, maybe Arthur have an 
>>>> opinion on it, I took his patch as a "black box" (I did not want to 
>>>> break (and debug) it).
>>>>
>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :  
>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
>>>>>  
>>>>>> From: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>>
>>>>>> Add support to the YUV formats bellow:
>>>>>>
>>>>>> - NV12
>>>>>> - NV16
>>>>>> - NV24
>>>>>> - NV21
>>>>>> - NV61
>>>>>> - NV42
>>>>>> - YUV420
>>>>>> - YUV422
>>>>>> - YUV444
>>>>>> - YVU420
>>>>>> - YVU422
>>>>>> - YVU444
>>>>>>
>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>
>>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>> callbacks for yuv formats]
>>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>>>>  5 files changed, 295 insertions(+), 20 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> index e555bf9c1aee..54fc5161d565 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>>>  			 * buffer [1]
>>>>>>  			 */
>>>>>>  			current_plane->pixel_read_line(
>>>>>> -				current_plane->frame_info,
>>>>>> +				current_plane,
>>>>>>  				x_start,
>>>>>>  				y_start,
>>>>>>  				direction,
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> index ccc5be009f15..a4f6456cb971 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>>>  	READ_RIGHT
>>>>>>  };
>>>>>>  
>>>>>> +struct vkms_plane_state;
>>>>>> +
>>>>>>  /**
>>>>>>  <<<<<<< HEAD
>>>>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>>>   *  x_end.
>>>>>>   */
>>>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);  
>>>>>
>>>>> This is the second or third time in this one series changing this type.
>>>>> Could you not do the change once, in its own patch if possible?  
>>>>
>>>> Sorry, this is not a change here, but a wrong formatting (missed when 
>>>> rebasing).
>>>>
>>>> Do you think that it make sense to re-order my patches and put this 
>>>> typedef at the end? This way it is never updated.
> 
> I'm not sure, I haven't checked how it would change your patches. The
> intermediate changes might get a lot uglier?
> 
> Just try to fold changes so that you don't need to change something
> twice over the series unless there is a good reason to. "How hard would
> it be to review this?" is my measure stick.
> 
> 
>>>>  
>>>>>>  
>>>>>>  /**
>>>>>>   * vkms_plane_state - Driver specific plane state
>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> index 46daea6d3ee9..515c80866a58 100644
>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>>>  	 */
>>>>>>  	return fb->offsets[plane_index] +
>>>>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
>>>>>> +	       format->char_per_block[plane_index];  
>>>>>
>>>>> Shouldn't this be in the patch that added this code in the first place?  
>>>>
>>>> Same as above, a wrong formatting, I will remove this change and keep 
>>>> everything on one line (even if it's more than 100 chars, it is easier to 
>>>> read).
> 
> Personally I agree that readability is more important than strict line
> length limits. I'm not sure how the kernel rolls there.
> 
>>>>  
>>>>>>  }
>>>>>>  
>>>>>>  /**
>>>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>>>  	}
>>>>>>  }
>>>>>>  
>>>>>> +/**
>>>>>> + * get_subsampling() - Get the subsampling value on a specific direction  
>>>>>
>>>>> subsampling divisor  
>>>>
>>>> Thanks for this precision.
>>>>  
>>>>>> + */
>>>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>>>> +			   enum pixel_read_direction direction)
>>>>>> +{
>>>>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
>>>>>> +		return format->hsub;
>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>> +		return format->vsub;
>>>>>> +	return 1;  
>>>>>
>>>>> In this and the below function, personally I'd prefer switch-case, with
>>>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>>>> unhandled enum values.  
>>>>
>>>> As for the previous patch, I did not know about this compiler feature, 
>>>> thanks!
>>>>  
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>>>> + */
>>>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>>>> +				  enum pixel_read_direction direction, int x_start, int y_start)  
>>>>>
>>>>> 'start' values as "increments" for a pixel counter? Is something
>>>>> misnamed here?
>>>>>
>>>>> Is it an increment or an offset?  
>>>>
>>>> I don't really know how to name the function. I'm open to suggestions
>>>> x_start and y_start are really the coordinate of the starting reading point.
> 
> I looks like it's an offset, so "offset" and "start" are good words.
> Then the only misleading piece is the doc:
> 
> 	"Get the subsampling offset to use when incrementing the pixel counter"
> 
> This sounds like the offset is used when incrementing a counter, that
> is, counter is increment by offset each time. That's my problem with
> this.
> 
> Fix just the doc, and it's good, I think.
> 
>>>>
>>>> To explain what it does:
>>>>
>>>> When using subsampling, you have to read the next pixel of planes[1..4] 
>>>> not at the same "speed" as plane[0]. But I can't only rely on 
>>>> "read_pixel_count % subsampling == 0", because it means that the pixel 
>>>> incrementation on planes[1..4] may not be aligned with the buffer (if 
>>>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
>>>> for x=2,4,6... not 1,3,5...).
>>>>
>>>> A way to ensure this is to add an "offset" to count, which ensure that the 
>>>> count % subsampling == 0 on the correct pixel.
> 
> Yes, I think I did get that feeling from the code eventually somehow,
> but it wouldn't hurt to explain it in the comment.
> 
> "An offset for keeping the chroma siting consistent regardless of
> x_start and y_start" maybe?
> 
>>>>
>>>> I made an error, the switch case must be (as count is always counting up, 
>>>> for "inverted" reading direction a negative number ensure that 
>>>> %subsampling == 0 on the correct pixel):
>>>>
>>>> 	switch (direction) {
>>>> 	case READ_UP:
>>>> 		return -y_start;
>>>> 	case READ_DOWN:
>>>> 		return y_start;
>>>> 	case READ_LEFT:
>>>> 		return -x_start;
>>>> 	case READ_RIGHT:
>>>> 		return x_start;
>>>> 	}
> 
> Yes, the inverted reading directions are different indeed. I did not
> think through if this works also for sub-sampling divisors > 2 which I
> don't think are ever used.
> 
> Does IGT find this mistake? If not, maybe IGT should be extended.
> 
>>>>  
>>>>>> +{
>>>>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
>>>>>> +		return x_start;
>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>> +		return y_start;
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +  
>>>>
>>>> [...]
>>>>  
>>>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
>>>>>> +{
>>>>>> +	static const s16 bt601_full[3][3] = {
>>>>>> +		{ 256, 0,   359 },
>>>>>> +		{ 256, -88, -183 },
>>>>>> +		{ 256, 454, 0 },
>>>>>> +	};  
>>>>
>>>> [...]
>>>>  
>>>>>> +
>>>>>> +	u8 r = 0;
>>>>>> +	u8 g = 0;
>>>>>> +	u8 b = 0;
>>>>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>>>> +	unsigned int y_offset = full ? 0 : 16;
>>>>>> +
>>>>>> +	switch (encoding) {
>>>>>> +	case DRM_COLOR_YCBCR_BT601:
>>>>>> +		ycbcr2rgb(full ? bt601_full : bt601,  
>>>>>
>>>>> Doing all these conditional again pixel by pixel is probably
>>>>> inefficient. Just like with the line reading functions, you could pick
>>>>> the matrix in advance.  
>>>>
>>>> I don't think the performance impact is huge (it's only a pair of if), but 
>>>> yes, it's an easy optimization. 
>>>>
>>>> I will create a conversion_matrix structure:
>>>>
>>>> 	struct conversion_matrix {
>>>> 		s16 matrix[3][3];
>>>> 		u16 y_offset;
>>>> 	}
> 
> When defining such a struct type, it would be good to document the
> matrix layout (which one is row, which one is column), and what the s16
> mean (fixed point?).
> 
> Try to not mix signed and unsigned types, too. The C implicit type
> promotion rules can be surprising. Just make everything signed while
> computing, and convert to/from unsigned only for storage.
> 
>>>>
>>>> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
>>>> structure from a format+encoding+range.
>>>>
>>>> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
>>>> get this matrix only once per plane setup.
> 
> Alright. Let's see how that works.
> 
>>>>
>>>>  
>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> +		break;
>>>>>> +	case DRM_COLOR_YCBCR_BT709:
>>>>>> +		ycbcr2rgb(full ? rec709_full : rec709,
>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> +		break;
>>>>>> +	case DRM_COLOR_YCBCR_BT2020:
>>>>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		pr_warn_once("Not supported color encoding\n");
>>>>>> +		break;
>>>>>> +	}
>>>>>> +
>>>>>> +	argb_u16->r = r * 257;
>>>>>> +	argb_u16->g = g * 257;
>>>>>> +	argb_u16->b = b * 257;  
>>>>>
>>>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>>>> deeper, too, like DRM_FORMAT_P016.  
>>>>
>>>> It's a good point, as I explained above, I took the conversion part as a 
>>>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
>>>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
>>>> the float part).
>>>>
>>>> Maybe Arthur have an opinion on this?  
>>>
>>> Yeah, I too don't see why not we could do that. The 8-bit precision was
>>> sufficient for those formats, but as well noted by Pekka this could be a
>>> problem for higher bit depths. I just need to make my terrible python
>>> script spit those values XD.  
>>
>> Finally, I got it working with 32-bit precision.
>>
>> I basically threw all my untrusted python code away, and started using
>> the colour python framework suggested by Sebastian[1]. After knowing the
>> right values (and staring at numbers for hours), I found that with a
>> little bit of rounding, the conversion works.
>>
>> Also, while at it, I changed the name rec709 to bt709 to follow the
>> pattern and added "_full" to the full ranges matrices.
>>
>> While using the library, I noticed that the red component is wrong on
>> the color red in one test case.
>>
>> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
> 
> That all sounds good. I wish the kernel code contained comments
> explaining how exactly you computed those matrices with python/colour.
> If the python snippets are not too long, including them verbatim as
> code comments would be really nice for both reviewers and posterity.
> 
> The same for the VKMS unit tests, too, how you got the expected result
> values.

Sure, I can do that, the python code is not that long to not have it in
there.

> 
>>
>> Best Regards,
>> ~Arthur Grillo
>>
>> ---
>>
>> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> index f66584549827..4cee3c2d8d84 100644
>> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
>>  			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
>>  			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
>>  			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
>> -			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>> +			{"red",   {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>>  			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
>>  			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
>>  		},
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> index e06bbd7c0a67..043f23dbf80d 100644
>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
>>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>>  }
>>  
>> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>> +#define BIT_DEPTH 32
>> +
>> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>>  {
>> -	s32 y_16, cb_16, cr_16;
>> -	s32 r_16, g_16, b_16;
>> +	s64 y_16, cb_16, cr_16;
>> +	s64 r_16, g_16, b_16;
>>  
>>  	y_16 =  y - y_offset;
>>  	cb_16 = cb - 128;
>> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
>>  	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
>>  	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>>  
>> -	*r = clamp(r_16, 0, 0xffff) >> 8;
>> -	*g = clamp(g_16, 0, 0xffff) >> 8;
>> -	*b = clamp(b_16, 0, 0xffff) >> 8;
>> +	// rounding the values
>> +	r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
>> +	g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
>> +	b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
>> +
>> +	r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>> +	g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>> +	b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> 
> Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?

Basically, the numbers are in this form in hex:

0xsspppppppp

In the end, we only want the 's' bits.

The matrix multiplication is not giving us perfect results, making some
of KUnit test not pass, This is because the values end up a little bit
off. KUnit expects 0xfe, but this functions is returning 0xfd.

I noticed that before shifting the values to get the 's' bytes the
values were a lot close to what is expected, something like:

0xfdfe287312
    ^
So the rounding part adds 1 to this marked 'f' to round a bit the values
(drm_fixed.h does something similar on drm_fixp2int_round).
Like that:

   0xfdfe287312
+  0x0010000000
   ------------
   0xfe0e287312

That's why the BIT_DEPTH - 4.

After that, the values need to be clamped to not get wrong results when
shifting this s64 and casting it to u8. We clamp it to the minimum
allowed value: 0, and to the maximum allowed value, which in this case
is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
the size of the 's' bits.

After writing this, I think that maybe it would be good to add this
explanation as a comment on the code.

> 
>> +
>> +	*r = r_16 >> BIT_DEPTH;
>> +	*g = g_16 >> BIT_DEPTH;
>> +	*b = b_16 >> BIT_DEPTH;
>>  }
> 
> ...
> 
>>>   
>>>> Just to be sure, the DRM subsystem don't have such matrix somewhere? It 
>>>> can be nice to avoid duplicating them.  
>>>
>>> As to my knowledge it does not exist on DRM, I think those are normally
>>> on the hardware itself (*please* correct me if I'm wrong).
> 
> I couldn't find a matrix type either on a quick glance, but there is
> drm_fixed.h for a couple different fixed point formats, it seems. FWIW.
> drm_fixed.h didn't feel very appealing for this here.

Yes, when developing this code I started using drm_fixed.h, but after a
few iterations it was more of a hindrance than a help.

Best Regards,
~Arthur Grillo

> 
>>>
>>> But, v4l2 has a similar table on
>>> drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my
>>> code based on this), unfortunately it's only 8-bit too.
> 
> Thanks,
> pq

Pekka Paalanen March 1, 2024, 11:53 a.m. UTC | #8

On Thu, 29 Feb 2024 14:57:06 -0300
Arthur Grillo <arthurgrillo@riseup.net> wrote:

> On 29/02/24 09:12, Pekka Paalanen wrote:
> > On Wed, 28 Feb 2024 22:52:09 -0300
> > Arthur Grillo <arthurgrillo@riseup.net> wrote:
> >   
> >> On 27/02/24 17:01, Arthur Grillo wrote:  
> >>>
> >>>
> >>> On 27/02/24 12:02, Louis Chauvet wrote:    
> >>>> Hi Pekka,
> >>>>
> >>>> For all the comment related to the conversion part, maybe Arthur have an 
> >>>> opinion on it, I took his patch as a "black box" (I did not want to 
> >>>> break (and debug) it).
> >>>>
> >>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :    
> >>>>> On Fri, 23 Feb 2024 12:37:26 +0100
> >>>>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
> >>>>>    
> >>>>>> From: Arthur Grillo <arthurgrillo@riseup.net>
> >>>>>>
> >>>>>> Add support to the YUV formats bellow:
> >>>>>>
> >>>>>> - NV12
> >>>>>> - NV16
> >>>>>> - NV24
> >>>>>> - NV21
> >>>>>> - NV61
> >>>>>> - NV42
> >>>>>> - YUV420
> >>>>>> - YUV422
> >>>>>> - YUV444
> >>>>>> - YVU420
> >>>>>> - YVU422
> >>>>>> - YVU444
> >>>>>>
> >>>>>> The conversion matrices of each encoding and range were obtained by
> >>>>>> rounding the values of the original conversion matrices multiplied by
> >>>>>> 2^8. This is done to avoid the use of fixed point operations.
> >>>>>>
> >>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> >>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> >>>>>> callbacks for yuv formats]
> >>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
> >>>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
> >>>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
> >>>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
> >>>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
> >>>>>>  5 files changed, 295 insertions(+), 20 deletions(-)

...

> >> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> index f66584549827..4cee3c2d8d84 100644
> >> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
> >> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
> >>  			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
> >>  			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
> >>  			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
> >> -			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >> +			{"red",   {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
> >>  			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
> >>  			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
> >>  		},
> >> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> >> index e06bbd7c0a67..043f23dbf80d 100644
> >> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> >> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> >> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
> >>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> >>  }
> >>  
> >> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >> +#define BIT_DEPTH 32
> >> +
> >> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
> >>  {
> >> -	s32 y_16, cb_16, cr_16;
> >> -	s32 r_16, g_16, b_16;
> >> +	s64 y_16, cb_16, cr_16;
> >> +	s64 r_16, g_16, b_16;
> >>  
> >>  	y_16 =  y - y_offset;
> >>  	cb_16 = cb - 128;
> >> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
> >>  	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
> >>  	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
> >>  
> >> -	*r = clamp(r_16, 0, 0xffff) >> 8;
> >> -	*g = clamp(g_16, 0, 0xffff) >> 8;
> >> -	*b = clamp(b_16, 0, 0xffff) >> 8;
> >> +	// rounding the values
> >> +	r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
> >> +	g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
> >> +	b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
> >> +
> >> +	r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> +	g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
> >> +	b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);  
> > 
> > Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?  
> 
> Basically, the numbers are in this form in hex:
> 
> 0xsspppppppp
> 
> In the end, we only want the 's' bits.
> 
> The matrix multiplication is not giving us perfect results, making some
> of KUnit test not pass, This is because the values end up a little bit
> off. KUnit expects 0xfe, but this functions is returning 0xfd.
> 
> I noticed that before shifting the values to get the 's' bytes the
> values were a lot close to what is expected, something like:
> 
> 0xfdfe287312
>     ^
> So the rounding part adds 1 to this marked 'f' to round a bit the values
> (drm_fixed.h does something similar on drm_fixp2int_round).
> Like that:
> 
>    0xfdfe287312
> +  0x0010000000
>    ------------
>    0xfe0e287312
> 
> That's why the BIT_DEPTH - 4.

I have a hard time deciphering this. There is some sort of strange
combination of UNORM and fixed-point going on here, where you process
the range 0.0 - 255.0 including 32-bit fraction. All variables being
named "_16" does not help, I've no idea what that refers to.

Usually when you have unsigned pixel format type, it's UNORM, that is
an unsigned integer representation that maps to [0.0, 1.0]. When
converting UNORM properly to e.g. fixed-point, you don't have to
consider the UNORM bit depth when computing in fixed-point.

There is a catch: since 0xff maps to 1.0, the divisor is 0xff, and not
a bit shift by 8. This must be taken into account when converting
between different depths of UNORM, or between UNORM and fixed-point.
Converting between different depths of fixed-point does not have this
problem.

If you want to proper rounding, meaning that 0.5 rounds up to 1.0 and
0.4999 rounds down to 0.0 when rounding to integers, you have to add
0.5 before truncating.

So in this case you need to add 0x0100_0000 / 2 = 0x0080_0000, not
0x0010_0000.

I don't understand what drm_fixp2int_round() is even doing. The offset
is not 0.5, it's 0.0000076.

> After that, the values need to be clamped to not get wrong results when
> shifting this s64 and casting it to u8. We clamp it to the minimum
> allowed value: 0, and to the maximum allowed value, which in this case
> is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
> the size of the 's' bits.

Ok. You could also shift with >> BIT_DEPTH first, and then clamp to 0,
255.


Thanks,
pq

> After writing this, I think that maybe it would be good to add this
> explanation as a comment on the code.
> 
> >   
> >> +
> >> +	*r = r_16 >> BIT_DEPTH;
> >> +	*g = g_16 >> BIT_DEPTH;
> >> +	*b = b_16 >> BIT_DEPTH;
> >>  }

Arthur Grillo March 2, 2024, 2:14 p.m. UTC | #9

On 01/03/24 08:53, Pekka Paalanen wrote:
> On Thu, 29 Feb 2024 14:57:06 -0300
> Arthur Grillo <arthurgrillo@riseup.net> wrote:
> 
>> On 29/02/24 09:12, Pekka Paalanen wrote:
>>> On Wed, 28 Feb 2024 22:52:09 -0300
>>> Arthur Grillo <arthurgrillo@riseup.net> wrote:
>>>   
>>>> On 27/02/24 17:01, Arthur Grillo wrote:  
>>>>>
>>>>>
>>>>> On 27/02/24 12:02, Louis Chauvet wrote:    
>>>>>> Hi Pekka,
>>>>>>
>>>>>> For all the comment related to the conversion part, maybe Arthur have an 
>>>>>> opinion on it, I took his patch as a "black box" (I did not want to 
>>>>>> break (and debug) it).
>>>>>>
>>>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :    
>>>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>>>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
>>>>>>>    
>>>>>>>> From: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>>>>
>>>>>>>> Add support to the YUV formats bellow:
>>>>>>>>
>>>>>>>> - NV12
>>>>>>>> - NV16
>>>>>>>> - NV24
>>>>>>>> - NV21
>>>>>>>> - NV61
>>>>>>>> - NV42
>>>>>>>> - YUV420
>>>>>>>> - YUV422
>>>>>>>> - YUV444
>>>>>>>> - YVU420
>>>>>>>> - YVU422
>>>>>>>> - YVU444
>>>>>>>>
>>>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>>>
>>>>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>>>> callbacks for yuv formats]
>>>>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>>>>>>> ---
>>>>>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>>>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>>>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>>>>>>  5 files changed, 295 insertions(+), 20 deletions(-)
> 
> ...
> 
>>>> diff --git a/drivers/gpu/drm/vkms/tests/vkms_format_test.c b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> index f66584549827..4cee3c2d8d84 100644
>>>> --- a/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> +++ b/drivers/gpu/drm/vkms/tests/vkms_format_test.c
>>>> @@ -59,7 +59,7 @@ static struct yuv_u8_to_argb_u16_case yuv_u8_to_argb_u16_cases[] = {
>>>>  			{"white", {0xff, 0x80, 0x80}, {0x0000, 0xffff, 0xffff, 0xffff}},
>>>>  			{"gray",  {0x80, 0x80, 0x80}, {0x0000, 0x8000, 0x8000, 0x8000}},
>>>>  			{"black", {0x00, 0x80, 0x80}, {0x0000, 0x0000, 0x0000, 0x0000}},
>>>> -			{"red",   {0x35, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>>>> +			{"red",   {0x36, 0x63, 0xff}, {0x0000, 0xffff, 0x0000, 0x0000}},
>>>>  			{"green", {0xb6, 0x1e, 0x0c}, {0x0000, 0x0000, 0xffff, 0x0000}},
>>>>  			{"blue",  {0x12, 0xff, 0x74}, {0x0000, 0x0000, 0x0000, 0xffff}},
>>>>  		},
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> index e06bbd7c0a67..043f23dbf80d 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -121,10 +121,12 @@ static void RGB565_to_argb_u16(u8 **src_pixels, struct pixel_argb_u16 *out_pixel
>>>>  	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
>>>>  }
>>>>  
>>>> -static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>>>> +#define BIT_DEPTH 32
>>>> +
>>>> +static void ycbcr2rgb(const s64 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r, u8 *g, u8 *b)
>>>>  {
>>>> -	s32 y_16, cb_16, cr_16;
>>>> -	s32 r_16, g_16, b_16;
>>>> +	s64 y_16, cb_16, cr_16;
>>>> +	s64 r_16, g_16, b_16;
>>>>  
>>>>  	y_16 =  y - y_offset;
>>>>  	cb_16 = cb - 128;
>>>> @@ -134,9 +136,18 @@ static void ycbcr2rgb(const s16 m[3][3], u8 y, u8 cb, u8 cr, u8 y_offset, u8 *r,
>>>>  	g_16 = m[1][0] * y_16 + m[1][1] * cb_16 + m[1][2] * cr_16;
>>>>  	b_16 = m[2][0] * y_16 + m[2][1] * cb_16 + m[2][2] * cr_16;
>>>>  
>>>> -	*r = clamp(r_16, 0, 0xffff) >> 8;
>>>> -	*g = clamp(g_16, 0, 0xffff) >> 8;
>>>> -	*b = clamp(b_16, 0, 0xffff) >> 8;
>>>> +	// rounding the values
>>>> +	r_16 = r_16 + (1LL << (BIT_DEPTH - 4));
>>>> +	g_16 = g_16 + (1LL << (BIT_DEPTH - 4));
>>>> +	b_16 = b_16 + (1LL << (BIT_DEPTH - 4));
>>>> +
>>>> +	r_16 = clamp(r_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>>>> +	g_16 = clamp(g_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);
>>>> +	b_16 = clamp(b_16, 0, (1LL << (BIT_DEPTH + 8)) - 1);  
>>>
>>> Where do the BIT_DEPTH - 4 and BIT_DEPTH + 8 come from?  
>>
>> Basically, the numbers are in this form in hex:
>>
>> 0xsspppppppp
>>
>> In the end, we only want the 's' bits.
>>
>> The matrix multiplication is not giving us perfect results, making some
>> of KUnit test not pass, This is because the values end up a little bit
>> off. KUnit expects 0xfe, but this functions is returning 0xfd.
>>
>> I noticed that before shifting the values to get the 's' bytes the
>> values were a lot close to what is expected, something like:
>>
>> 0xfdfe287312
>>     ^
>> So the rounding part adds 1 to this marked 'f' to round a bit the values
>> (drm_fixed.h does something similar on drm_fixp2int_round).
>> Like that:
>>
>>    0xfdfe287312
>> +  0x0010000000
>>    ------------
>>    0xfe0e287312
>>
>> That's why the BIT_DEPTH - 4.
> 
> I have a hard time deciphering this. There is some sort of strange
> combination of UNORM and fixed-point going on here, where you process
> the range 0.0 - 255.0 including 32-bit fraction. All variables being
> named "_16" does not help, I've no idea what that refers to.

Totally forgot to rename that, sorry.

> 
> Usually when you have unsigned pixel format type, it's UNORM, that is
> an unsigned integer representation that maps to [0.0, 1.0]. When
> converting UNORM properly to e.g. fixed-point, you don't have to
> consider the UNORM bit depth when computing in fixed-point.
> 
> There is a catch: since 0xff maps to 1.0, the divisor is 0xff, and not
> a bit shift by 8. This must be taken into account when converting
> between different depths of UNORM, or between UNORM and fixed-point.
> Converting between different depths of fixed-point does not have this
> problem.
> 
> If you want to proper rounding, meaning that 0.5 rounds up to 1.0 and
> 0.4999 rounds down to 0.0 when rounding to integers, you have to add
> 0.5 before truncating.
> 
> So in this case you need to add 0x0100_0000 / 2 = 0x0080_0000, not
> 0x0010_0000.

Thanks for the explanations, I will try to take all this into account.

> 
> I don't understand what drm_fixp2int_round() is even doing. The offset
> is not 0.5, it's 0.0000076.
> 
>> After that, the values need to be clamped to not get wrong results when
>> shifting this s64 and casting it to u8. We clamp it to the minimum
>> allowed value: 0, and to the maximum allowed value, which in this case
>> is all the (BIT_DEPTH + 8) bits set to 1, The '+ 8' is to account for
>> the size of the 's' bits.
> 
> Ok. You could also shift with >> BIT_DEPTH first, and then clamp to 0,
> 255.

Great idea! This makes more sense.

Best Regards,
~Arthur Grillo

> 
> 
> Thanks,
> pq
> 
>> After writing this, I think that maybe it would be good to add this
>> explanation as a comment on the code.
>>
>>>   
>>>> +
>>>> +	*r = r_16 >> BIT_DEPTH;
>>>> +	*g = g_16 >> BIT_DEPTH;
>>>> +	*b = b_16 >> BIT_DEPTH;
>>>>  }

Louis Chauvet March 4, 2024, 3:28 p.m. UTC | #10

Le 29/02/24 - 14:12, Pekka Paalanen a écrit :
> On Wed, 28 Feb 2024 22:52:09 -0300
> Arthur Grillo <arthurgrillo@riseup.net> wrote:
> 
> > On 27/02/24 17:01, Arthur Grillo wrote:
> > > 
> > > 
> > > On 27/02/24 12:02, Louis Chauvet wrote:  
> > >> Hi Pekka,
> > >>
> > >> For all the comment related to the conversion part, maybe Arthur have an 
> > >> opinion on it, I took his patch as a "black box" (I did not want to 
> > >> break (and debug) it).
> > >>
> > >> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :  
> > >>> On Fri, 23 Feb 2024 12:37:26 +0100
> > >>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
> > >>>  
> > >>>> From: Arthur Grillo <arthurgrillo@riseup.net>
> > >>>>
> > >>>> Add support to the YUV formats bellow:
> > >>>>
> > >>>> - NV12
> > >>>> - NV16
> > >>>> - NV24
> > >>>> - NV21
> > >>>> - NV61
> > >>>> - NV42
> > >>>> - YUV420
> > >>>> - YUV422
> > >>>> - YUV444
> > >>>> - YVU420
> > >>>> - YVU422
> > >>>> - YVU444
> > >>>>
> > >>>> The conversion matrices of each encoding and range were obtained by
> > >>>> rounding the values of the original conversion matrices multiplied by
> > >>>> 2^8. This is done to avoid the use of fixed point operations.
> > >>>>
> > >>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
> > >>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
> > >>>> callbacks for yuv formats]
> > >>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > >>>> ---
> > >>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
> > >>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
> > >>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
> > >>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
> > >>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
> > >>>>  5 files changed, 295 insertions(+), 20 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> index e555bf9c1aee..54fc5161d565 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > >>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
> > >>>>  			 * buffer [1]
> > >>>>  			 */
> > >>>>  			current_plane->pixel_read_line(
> > >>>> -				current_plane->frame_info,
> > >>>> +				current_plane,
> > >>>>  				x_start,
> > >>>>  				y_start,
> > >>>>  				direction,
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> index ccc5be009f15..a4f6456cb971 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > >>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
> > >>>>  	READ_RIGHT
> > >>>>  };
> > >>>>  
> > >>>> +struct vkms_plane_state;
> > >>>> +
> > >>>>  /**
> > >>>>  <<<<<<< HEAD
> > >>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> > >>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
> > >>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
> > >>>>   *  x_end.
> > >>>>   */
> > >>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
> > >>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
> > >>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
> > >>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);  
> > >>>
> > >>> This is the second or third time in this one series changing this type.
> > >>> Could you not do the change once, in its own patch if possible?  
> > >>
> > >> Sorry, this is not a change here, but a wrong formatting (missed when 
> > >> rebasing).
> > >>
> > >> Do you think that it make sense to re-order my patches and put this 
> > >> typedef at the end? This way it is never updated.
> 
> I'm not sure, I haven't checked how it would change your patches. The
> intermediate changes might get a lot uglier?
> 
> Just try to fold changes so that you don't need to change something
> twice over the series unless there is a good reason to. "How hard would
> it be to review this?" is my measure stick.

It will not be uglier, it was just the order I did things. I first cleaned 
the code and created this typedef (PATCHv2 4/9), and then rewrote the 
composition, for which I had to change the typedef.

I also wanted to make my series easy to understand and make clear what is 
my "main contribution" and what are "quality stuff, not related to my 
contribution":
- Prepare things (document existing state, format, typedef)
- Big change (and update related doc, typedef)
- Rebase some other stuff on my big change (YUV)

So yes, some parts are changed twice in preparation step and the "big 
change".

> 
> > >>  
> > >>>>  
> > >>>>  /**
> > >>>>   * vkms_plane_state - Driver specific plane state
> > >>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> index 46daea6d3ee9..515c80866a58 100644
> > >>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > >>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
> > >>>>  	 */
> > >>>>  	return fb->offsets[plane_index] +
> > >>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
> > >>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
> > >>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
> > >>>> +	       format->char_per_block[plane_index];  
> > >>>
> > >>> Shouldn't this be in the patch that added this code in the first place?  
> > >>
> > >> Same as above, a wrong formatting, I will remove this change and keep 
> > >> everything on one line (even if it's more than 100 chars, it is easier to 
> > >> read).
> 
> Personally I agree that readability is more important than strict line
> length limits. I'm not sure how the kernel rolls there.
> 
> > >>  
> > >>>>  }
> > >>>>  
> > >>>>  /**
> > >>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
> > >>>>  	}
> > >>>>  }
> > >>>>  
> > >>>> +/**
> > >>>> + * get_subsampling() - Get the subsampling value on a specific direction  
> > >>>
> > >>> subsampling divisor  
> > >>
> > >> Thanks for this precision.
> > >>  
> > >>>> + */
> > >>>> +static int get_subsampling(const struct drm_format_info *format,
> > >>>> +			   enum pixel_read_direction direction)
> > >>>> +{
> > >>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
> > >>>> +		return format->hsub;
> > >>>> +	else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> +		return format->vsub;
> > >>>> +	return 1;  
> > >>>
> > >>> In this and the below function, personally I'd prefer switch-case, with
> > >>> a cannot-happen-scream after the switch, so the compiler can warn about
> > >>> unhandled enum values.  
> > >>
> > >> As for the previous patch, I did not know about this compiler feature, 
> > >> thanks!
> > >>  
> > >>>> +}
> > >>>> +
> > >>>> +/**
> > >>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
> > >>>> + */
> > >>>> +static int get_subsampling_offset(const struct drm_format_info *format,
> > >>>> +				  enum pixel_read_direction direction, int x_start, int y_start)  
> > >>>
> > >>> 'start' values as "increments" for a pixel counter? Is something
> > >>> misnamed here?
> > >>>
> > >>> Is it an increment or an offset?  
> > >>
> > >> I don't really know how to name the function. I'm open to suggestions
> > >> x_start and y_start are really the coordinate of the starting reading point.
> 
> I looks like it's an offset, so "offset" and "start" are good words.
> Then the only misleading piece is the doc:
> 
> 	"Get the subsampling offset to use when incrementing the pixel counter"
> 
> This sounds like the offset is used when incrementing a counter, that
> is, counter is increment by offset each time. That's my problem with
> this.
> 
> Fix just the doc, and it's good, I think.
> 
> > >>
> > >> To explain what it does:
> > >>
> > >> When using subsampling, you have to read the next pixel of planes[1..4] 
> > >> not at the same "speed" as plane[0]. But I can't only rely on 
> > >> "read_pixel_count % subsampling == 0", because it means that the pixel 
> > >> incrementation on planes[1..4] may not be aligned with the buffer (if 
> > >> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
> > >> for x=2,4,6... not 1,3,5...).
> > >>
> > >> A way to ensure this is to add an "offset" to count, which ensure that the 
> > >> count % subsampling == 0 on the correct pixel.
> 
> Yes, I think I did get that feeling from the code eventually somehow,
> but it wouldn't hurt to explain it in the comment.
> 
> "An offset for keeping the chroma siting consistent regardless of
> x_start and y_start" maybe?

It is better yes, thanks!

> > >>
> > >> I made an error, the switch case must be (as count is always counting up, 
> > >> for "inverted" reading direction a negative number ensure that 
> > >> %subsampling == 0 on the correct pixel):
> > >>
> > >> 	switch (direction) {
> > >> 	case READ_UP:
> > >> 		return -y_start;
> > >> 	case READ_DOWN:
> > >> 		return y_start;
> > >> 	case READ_LEFT:
> > >> 		return -x_start;
> > >> 	case READ_RIGHT:
> > >> 		return x_start;
> > >> 	}
> 
> Yes, the inverted reading directions are different indeed. I did not
> think through if this works also for sub-sampling divisors > 2 which I
> don't think are ever used.

I choosen those values because they should work with any sub-sampling 
divisor.

hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.

> 
> Does IGT find this mistake? If not, maybe IGT should be extended.

No, for two reasons:
- The original version works fine for NV12/16/24 and YUV with *sub <= 2
  (x+n%2 == x-n%2). It only breaks for *sub > 2.
- YUV410/... are not supported by VKMS
- IGT does not test different colors for rotations/translations (at least
  for the tests I tried). I will see if it's possible to add things in 
  kms_rotation_crc/kms_cursor_crc to test more colors format (at least 
  one RGB and one YUV).
 
> > >>  
> > >>>> +{
> > >>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
> > >>>> +		return x_start;
> > >>>> +	else if (direction == READ_DOWN || direction == READ_UP)
> > >>>> +		return y_start;
> > >>>> +	return 0;
> > >>>> +}
> > >>>> +  
> > >>
> > >> [...]
> > >>  
> > >>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
> > >>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
> > >>>> +{
> > >>>> +	static const s16 bt601_full[3][3] = {
> > >>>> +		{ 256, 0,   359 },
> > >>>> +		{ 256, -88, -183 },
> > >>>> +		{ 256, 454, 0 },
> > >>>> +	};  
> > >>
> > >> [...]
> > >>  
> > >>>> +
> > >>>> +	u8 r = 0;
> > >>>> +	u8 g = 0;
> > >>>> +	u8 b = 0;
> > >>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
> > >>>> +	unsigned int y_offset = full ? 0 : 16;
> > >>>> +
> > >>>> +	switch (encoding) {
> > >>>> +	case DRM_COLOR_YCBCR_BT601:
> > >>>> +		ycbcr2rgb(full ? bt601_full : bt601,  
> > >>>
> > >>> Doing all these conditional again pixel by pixel is probably
> > >>> inefficient. Just like with the line reading functions, you could pick
> > >>> the matrix in advance.  
> > >>
> > >> I don't think the performance impact is huge (it's only a pair of if), but 
> > >> yes, it's an easy optimization. 
> > >>
> > >> I will create a conversion_matrix structure:
> > >>
> > >> 	struct conversion_matrix {
> > >> 		s16 matrix[3][3];
> > >> 		u16 y_offset;
> > >> 	}
> 
> When defining such a struct type, it would be good to document the
> matrix layout (which one is row, which one is column), and what the s16
> mean (fixed point?).

Ack

> Try to not mix signed and unsigned types, too. The C implicit type
> promotion rules can be surprising. Just make everything signed while
> computing, and convert to/from unsigned only for storage.

Ack, I will change to signed type.

> > >>
> > >> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
> > >> structure from a format+encoding+range.
> > >>
> > >> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
> > >> get this matrix only once per plane setup.
> 
> Alright. Let's see how that works.
> 
> > >>
> > >>  
> > >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> +		break;
> > >>>> +	case DRM_COLOR_YCBCR_BT709:
> > >>>> +		ycbcr2rgb(full ? rec709_full : rec709,
> > >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> +		break;
> > >>>> +	case DRM_COLOR_YCBCR_BT2020:
> > >>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
> > >>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
> > >>>> +		break;
> > >>>> +	default:
> > >>>> +		pr_warn_once("Not supported color encoding\n");
> > >>>> +		break;
> > >>>> +	}
> > >>>> +
> > >>>> +	argb_u16->r = r * 257;
> > >>>> +	argb_u16->g = g * 257;
> > >>>> +	argb_u16->b = b * 257;  
> > >>>
> > >>> I wonder. Using 8-bit fixed point precision seems quite coarse for
> > >>> 8-bit pixel formats, and it's going to be insufficient for higher bit
> > >>> depths. Was supporting e.g. 10-bit YUV considered? There is even
> > >>> deeper, too, like DRM_FORMAT_P016.  
> > >>
> > >> It's a good point, as I explained above, I took the conversion part as a 
> > >> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
> > >> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
> > >> the float part).
> > >>
> > >> Maybe Arthur have an opinion on this?  
> > > 
> > > Yeah, I too don't see why not we could do that. The 8-bit precision was
> > > sufficient for those formats, but as well noted by Pekka this could be a
> > > problem for higher bit depths. I just need to make my terrible python
> > > script spit those values XD.  
> > 
> > Finally, I got it working with 32-bit precision.
> > 
> > I basically threw all my untrusted python code away, and started using
> > the colour python framework suggested by Sebastian[1]. After knowing the
> > right values (and staring at numbers for hours), I found that with a
> > little bit of rounding, the conversion works.
> > 
> > Also, while at it, I changed the name rec709 to bt709 to follow the
> > pattern and added "_full" to the full ranges matrices.
> > 
> > While using the library, I noticed that the red component is wrong on
> > the color red in one test case.
> > 
> > [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
> 
> That all sounds good. I wish the kernel code contained comments
> explaining how exactly you computed those matrices with python/colour.
> If the python snippets are not too long, including them verbatim as
> code comments would be really nice for both reviewers and posterity.
> 
> The same for the VKMS unit tests, too, how you got the expected result
> values.

I edited the YUV support to have those s64 values.

@arthur, I will submit a v4 with this:
- matrix selection in plane_atomic_update (so it's selected only once)
- s64 numbers for matrix
- avoiding multiple loop implementation by switching matrix columns

Regarding the YUV part, I don't feel confortable adressing Pekka's 
comments, would you mind doing it?

Kind regards,
Louis Chauvet

[...]

Arthur Grillo March 4, 2024, 3:39 p.m. UTC | #11

On 04/03/24 12:28, Louis Chauvet wrote:
> Le 29/02/24 - 14:12, Pekka Paalanen a écrit :
>> On Wed, 28 Feb 2024 22:52:09 -0300
>> Arthur Grillo <arthurgrillo@riseup.net> wrote:
>>
>>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>>
>>>>
>>>> On 27/02/24 12:02, Louis Chauvet wrote:  
>>>>> Hi Pekka,
>>>>>
>>>>> For all the comment related to the conversion part, maybe Arthur have an 
>>>>> opinion on it, I took his patch as a "black box" (I did not want to 
>>>>> break (and debug) it).
>>>>>
>>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :  
>>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>>> Louis Chauvet <louis.chauvet@bootlin.com> wrote:
>>>>>>  
>>>>>>> From: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>>>
>>>>>>> Add support to the YUV formats bellow:
>>>>>>>
>>>>>>> - NV12
>>>>>>> - NV16
>>>>>>> - NV24
>>>>>>> - NV21
>>>>>>> - NV61
>>>>>>> - NV42
>>>>>>> - YUV420
>>>>>>> - YUV422
>>>>>>> - YUV444
>>>>>>> - YVU420
>>>>>>> - YVU422
>>>>>>> - YVU444
>>>>>>>
>>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>>
>>>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
>>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>>> callbacks for yuv formats]
>>>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>>>>>  5 files changed, 295 insertions(+), 20 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> index e555bf9c1aee..54fc5161d565 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>>>>  			 * buffer [1]
>>>>>>>  			 */
>>>>>>>  			current_plane->pixel_read_line(
>>>>>>> -				current_plane->frame_info,
>>>>>>> +				current_plane,
>>>>>>>  				x_start,
>>>>>>>  				y_start,
>>>>>>>  				direction,
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> index ccc5be009f15..a4f6456cb971 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>>>>  	READ_RIGHT
>>>>>>>  };
>>>>>>>  
>>>>>>> +struct vkms_plane_state;
>>>>>>> +
>>>>>>>  /**
>>>>>>>  <<<<<<< HEAD
>>>>>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>>>>   *  x_end.
>>>>>>>   */
>>>>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>>>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>>>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);  
>>>>>>
>>>>>> This is the second or third time in this one series changing this type.
>>>>>> Could you not do the change once, in its own patch if possible?  
>>>>>
>>>>> Sorry, this is not a change here, but a wrong formatting (missed when 
>>>>> rebasing).
>>>>>
>>>>> Do you think that it make sense to re-order my patches and put this 
>>>>> typedef at the end? This way it is never updated.
>>
>> I'm not sure, I haven't checked how it would change your patches. The
>> intermediate changes might get a lot uglier?
>>
>> Just try to fold changes so that you don't need to change something
>> twice over the series unless there is a good reason to. "How hard would
>> it be to review this?" is my measure stick.
> 
> It will not be uglier, it was just the order I did things. I first cleaned 
> the code and created this typedef (PATCHv2 4/9), and then rewrote the 
> composition, for which I had to change the typedef.
> 
> I also wanted to make my series easy to understand and make clear what is 
> my "main contribution" and what are "quality stuff, not related to my 
> contribution":
> - Prepare things (document existing state, format, typedef)
> - Big change (and update related doc, typedef)
> - Rebase some other stuff on my big change (YUV)
> 
> So yes, some parts are changed twice in preparation step and the "big 
> change".
> 
>>
>>>>>  
>>>>>>>  
>>>>>>>  /**
>>>>>>>   * vkms_plane_state - Driver specific plane state
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> index 46daea6d3ee9..515c80866a58 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>>>>  	 */
>>>>>>>  	return fb->offsets[plane_index] +
>>>>>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>>>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>>>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
>>>>>>> +	       format->char_per_block[plane_index];  
>>>>>>
>>>>>> Shouldn't this be in the patch that added this code in the first place?  
>>>>>
>>>>> Same as above, a wrong formatting, I will remove this change and keep 
>>>>> everything on one line (even if it's more than 100 chars, it is easier to 
>>>>> read).
>>
>> Personally I agree that readability is more important than strict line
>> length limits. I'm not sure how the kernel rolls there.
>>
>>>>>  
>>>>>>>  }
>>>>>>>  
>>>>>>>  /**
>>>>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>>>>  	}
>>>>>>>  }
>>>>>>>  
>>>>>>> +/**
>>>>>>> + * get_subsampling() - Get the subsampling value on a specific direction  
>>>>>>
>>>>>> subsampling divisor  
>>>>>
>>>>> Thanks for this precision.
>>>>>  
>>>>>>> + */
>>>>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>>>>> +			   enum pixel_read_direction direction)
>>>>>>> +{
>>>>>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
>>>>>>> +		return format->hsub;
>>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> +		return format->vsub;
>>>>>>> +	return 1;  
>>>>>>
>>>>>> In this and the below function, personally I'd prefer switch-case, with
>>>>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>>>>> unhandled enum values.  
>>>>>
>>>>> As for the previous patch, I did not know about this compiler feature, 
>>>>> thanks!
>>>>>  
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>>>>> + */
>>>>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>>>>> +				  enum pixel_read_direction direction, int x_start, int y_start)  
>>>>>>
>>>>>> 'start' values as "increments" for a pixel counter? Is something
>>>>>> misnamed here?
>>>>>>
>>>>>> Is it an increment or an offset?  
>>>>>
>>>>> I don't really know how to name the function. I'm open to suggestions
>>>>> x_start and y_start are really the coordinate of the starting reading point.
>>
>> I looks like it's an offset, so "offset" and "start" are good words.
>> Then the only misleading piece is the doc:
>>
>> 	"Get the subsampling offset to use when incrementing the pixel counter"
>>
>> This sounds like the offset is used when incrementing a counter, that
>> is, counter is increment by offset each time. That's my problem with
>> this.
>>
>> Fix just the doc, and it's good, I think.
>>
>>>>>
>>>>> To explain what it does:
>>>>>
>>>>> When using subsampling, you have to read the next pixel of planes[1..4] 
>>>>> not at the same "speed" as plane[0]. But I can't only rely on 
>>>>> "read_pixel_count % subsampling == 0", because it means that the pixel 
>>>>> incrementation on planes[1..4] may not be aligned with the buffer (if 
>>>>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
>>>>> for x=2,4,6... not 1,3,5...).
>>>>>
>>>>> A way to ensure this is to add an "offset" to count, which ensure that the 
>>>>> count % subsampling == 0 on the correct pixel.
>>
>> Yes, I think I did get that feeling from the code eventually somehow,
>> but it wouldn't hurt to explain it in the comment.
>>
>> "An offset for keeping the chroma siting consistent regardless of
>> x_start and y_start" maybe?
> 
> It is better yes, thanks!
> 
>>>>>
>>>>> I made an error, the switch case must be (as count is always counting up, 
>>>>> for "inverted" reading direction a negative number ensure that 
>>>>> %subsampling == 0 on the correct pixel):
>>>>>
>>>>> 	switch (direction) {
>>>>> 	case READ_UP:
>>>>> 		return -y_start;
>>>>> 	case READ_DOWN:
>>>>> 		return y_start;
>>>>> 	case READ_LEFT:
>>>>> 		return -x_start;
>>>>> 	case READ_RIGHT:
>>>>> 		return x_start;
>>>>> 	}
>>
>> Yes, the inverted reading directions are different indeed. I did not
>> think through if this works also for sub-sampling divisors > 2 which I
>> don't think are ever used.
> 
> I choosen those values because they should work with any sub-sampling 
> divisor.
> 
> hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.
> 
>>
>> Does IGT find this mistake? If not, maybe IGT should be extended.
> 
> No, for two reasons:
> - The original version works fine for NV12/16/24 and YUV with *sub <= 2
>   (x+n%2 == x-n%2). It only breaks for *sub > 2.
> - YUV410/... are not supported by VKMS
> - IGT does not test different colors for rotations/translations (at least
>   for the tests I tried). I will see if it's possible to add things in 
>   kms_rotation_crc/kms_cursor_crc to test more colors format (at least 
>   one RGB and one YUV).
>  
>>>>>  
>>>>>>> +{
>>>>>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
>>>>>>> +		return x_start;
>>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> +		return y_start;
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +  
>>>>>
>>>>> [...]
>>>>>  
>>>>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>>>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
>>>>>>> +{
>>>>>>> +	static const s16 bt601_full[3][3] = {
>>>>>>> +		{ 256, 0,   359 },
>>>>>>> +		{ 256, -88, -183 },
>>>>>>> +		{ 256, 454, 0 },
>>>>>>> +	};  
>>>>>
>>>>> [...]
>>>>>  
>>>>>>> +
>>>>>>> +	u8 r = 0;
>>>>>>> +	u8 g = 0;
>>>>>>> +	u8 b = 0;
>>>>>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>>>>> +	unsigned int y_offset = full ? 0 : 16;
>>>>>>> +
>>>>>>> +	switch (encoding) {
>>>>>>> +	case DRM_COLOR_YCBCR_BT601:
>>>>>>> +		ycbcr2rgb(full ? bt601_full : bt601,  
>>>>>>
>>>>>> Doing all these conditional again pixel by pixel is probably
>>>>>> inefficient. Just like with the line reading functions, you could pick
>>>>>> the matrix in advance.  
>>>>>
>>>>> I don't think the performance impact is huge (it's only a pair of if), but 
>>>>> yes, it's an easy optimization. 
>>>>>
>>>>> I will create a conversion_matrix structure:
>>>>>
>>>>> 	struct conversion_matrix {
>>>>> 		s16 matrix[3][3];
>>>>> 		u16 y_offset;
>>>>> 	}
>>
>> When defining such a struct type, it would be good to document the
>> matrix layout (which one is row, which one is column), and what the s16
>> mean (fixed point?).
> 
> Ack
> 
>> Try to not mix signed and unsigned types, too. The C implicit type
>> promotion rules can be surprising. Just make everything signed while
>> computing, and convert to/from unsigned only for storage.
> 
> Ack, I will change to signed type.
> 
>>>>>
>>>>> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
>>>>> structure from a format+encoding+range.
>>>>>
>>>>> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
>>>>> get this matrix only once per plane setup.
>>
>> Alright. Let's see how that works.
>>
>>>>>
>>>>>  
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	case DRM_COLOR_YCBCR_BT709:
>>>>>>> +		ycbcr2rgb(full ? rec709_full : rec709,
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	case DRM_COLOR_YCBCR_BT2020:
>>>>>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	default:
>>>>>>> +		pr_warn_once("Not supported color encoding\n");
>>>>>>> +		break;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	argb_u16->r = r * 257;
>>>>>>> +	argb_u16->g = g * 257;
>>>>>>> +	argb_u16->b = b * 257;  
>>>>>>
>>>>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>>>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>>>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>>>>> deeper, too, like DRM_FORMAT_P016.  
>>>>>
>>>>> It's a good point, as I explained above, I took the conversion part as a 
>>>>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
>>>>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
>>>>> the float part).
>>>>>
>>>>> Maybe Arthur have an opinion on this?  
>>>>
>>>> Yeah, I too don't see why not we could do that. The 8-bit precision was
>>>> sufficient for those formats, but as well noted by Pekka this could be a
>>>> problem for higher bit depths. I just need to make my terrible python
>>>> script spit those values XD.  
>>>
>>> Finally, I got it working with 32-bit precision.
>>>
>>> I basically threw all my untrusted python code away, and started using
>>> the colour python framework suggested by Sebastian[1]. After knowing the
>>> right values (and staring at numbers for hours), I found that with a
>>> little bit of rounding, the conversion works.
>>>
>>> Also, while at it, I changed the name rec709 to bt709 to follow the
>>> pattern and added "_full" to the full ranges matrices.
>>>
>>> While using the library, I noticed that the red component is wrong on
>>> the color red in one test case.
>>>
>>> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>>
>> That all sounds good. I wish the kernel code contained comments
>> explaining how exactly you computed those matrices with python/colour.
>> If the python snippets are not too long, including them verbatim as
>> code comments would be really nice for both reviewers and posterity.
>>
>> The same for the VKMS unit tests, too, how you got the expected result
>> values.
> 
> I edited the YUV support to have those s64 values.
> 
> @arthur, I will submit a v4 with this:
> - matrix selection in plane_atomic_update (so it's selected only once)
> - s64 numbers for matrix
> - avoiding multiple loop implementation by switching matrix columns

This looks good to me.

> 
> Regarding the YUV part, I don't feel confortable adressing Pekka's 
> comments, would you mind doing it?

I'm already doing that, how do you want me to send those changes? I reply to
your series, like a did before?

Best Regards,
~Arthur Grillo

> 
> Kind regards,
> Louis Chauvet
> 
> [...]
>

Louis Chauvet March 4, 2024, 3:48 p.m. UTC | #12

[...]

> > @arthur, I will submit a v4 with this:
> > - matrix selection in plane_atomic_update (so it's selected only once)
> > - s64 numbers for matrix
> > - avoiding multiple loop implementation by switching matrix columns
> 
> This looks good to me.
> 
> > 
> > Regarding the YUV part, I don't feel confortable adressing Pekka's 
> > comments, would you mind doing it?
> 
> I'm already doing that, how do you want me to send those changes? I reply to
> your series, like a did before?

Yes, simply reply to my series, so I can rebase everything on the 
line-by-line work.
 
Kind regards,
Louis Chauvet

> Best Regards,
> ~Arthur Grillo
> 
> > 
> > Kind regards,
> > Louis Chauvet
> > 
> > [...]
> >

Arthur Grillo March 4, 2024, 4:51 p.m. UTC | #13

On 04/03/24 12:48, Louis Chauvet wrote:
> [...]
> 
>>> @arthur, I will submit a v4 with this:
>>> - matrix selection in plane_atomic_update (so it's selected only once)
>>> - s64 numbers for matrix
>>> - avoiding multiple loop implementation by switching matrix columns
>>
>> This looks good to me.
>>
>>>
>>> Regarding the YUV part, I don't feel confortable adressing Pekka's 
>>> comments, would you mind doing it?
>>
>> I'm already doing that, how do you want me to send those changes? I reply to
>> your series, like a did before?
> 
> Yes, simply reply to my series, so I can rebase everything on the 
> line-by-line work.

OK, I will do that.

Best Regards,
~Arthur Grillo

> Kind regards,
> Louis Chauvet
> 
>> Best Regards,
>> ~Arthur Grillo
>>
>>>
>>> Kind regards,
>>> Louis Chauvet
>>>
>>> [...]
>>>
>

Arthur Grillo March 6, 2024, 8:09 p.m. UTC | #14

On 04/03/24 13:51, Arthur Grillo wrote:
> 
> 
> On 04/03/24 12:48, Louis Chauvet wrote:
[...]
>>>
>>>> Regarding the YUV part, I don't feel confortable adressing Pekka's 
>>>> comments, would you mind doing it?
>>>
>>> I'm already doing that, how do you want me to send those changes? I reply to
>>> your series, like a did before?
>>
>> Yes, simply reply to my series, so I can rebase everything on the 
>> line-by-line work.
> 
> OK, I will do that.

Hi,

I know that I said that, but it would be very difficult to that with my
b4 workflow. So, I sent a separate series based on the v4:

https://lore.kernel.org/all/20240306-louis-vkms-conv-v1-0-5bfe7d129fdd@riseup.net/

I hope that it does not difficult things for you.

Best Regards,
~Arthur Grillo

> 
> Best Regards,
> ~Arthur Grillo
> 
>> Kind regards,
>> Louis Chauvet
>>
>>> Best Regards,
>>> ~Arthur Grillo
>>>
>>>>
>>>> Kind regards,
>>>> Louis Chauvet
>>>>
>>>> [...]
>>>>
>>

Louis Chauvet March 7, 2024, 12:03 a.m. UTC | #15

Le 06/03/24 - 17:09, Arthur Grillo a écrit :
> 
> 
> On 04/03/24 13:51, Arthur Grillo wrote:
> > 
> > 
> > On 04/03/24 12:48, Louis Chauvet wrote:
> [...]
> >>>
> >>>> Regarding the YUV part, I don't feel confortable adressing Pekka's 
> >>>> comments, would you mind doing it?
> >>>
> >>> I'm already doing that, how do you want me to send those changes? I reply to
> >>> your series, like a did before?
> >>
> >> Yes, simply reply to my series, so I can rebase everything on the 
> >> line-by-line work.
> > 
> > OK, I will do that.
> 
> Hi,
> 
> I know that I said that, but it would be very difficult to that with my
> b4 workflow. So, I sent a separate series based on the v4:
> 
> https://lore.kernel.org/all/20240306-louis-vkms-conv-v1-0-5bfe7d129fdd@riseup.net/
> 
> I hope that it does not difficult things for you.

Thanks for this work!

I completly understood, and a "real" patch is even better as I 
can fetch them through patchwork. The v5 is (almost, see my comment)
ready, but I want to wait for Pekka's comments/replies on the v4 before 
sending it. 

Kind regards,
Louis Chauvet

> Best Regards,
> ~Arthur Grillo
> 
> > 
> > Best Regards,
> > ~Arthur Grillo
> > 
> >> Kind regards,
> >> Louis Chauvet
> >>
> >>> Best Regards,
> >>> ~Arthur Grillo
> >>>
> >>>>
> >>>> Kind regards,
> >>>> Louis Chauvet
> >>>>
> >>>> [...]
> >>>>
> >>

[v2,6/9] drm/vkms: Add YUV support

Commit Message

Comments

Patch