From patchwork Tue Nov 13 07:45:01 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
X-Patchwork-Id: 10679987
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C053514E2
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:12 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B01E82A42B
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:12 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id A45B12A437; Tue, 13 Nov 2018 07:45:12 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2E7D72A42B
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:12 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 32A276E2C4;
	Tue, 13 Nov 2018 07:45:09 +0000 (UTC)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 7DC9F6E2C4;
 Tue, 13 Nov 2018 07:45:07 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 12 Nov 2018 23:45:07 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,498,1534834800"; d="scan'208";a="95881898"
Received: from slisovsk-lenovo-ideapad-720s-13ikb.fi.intel.com
 ([10.237.66.156])
 by FMSMGA003.fm.intel.com with ESMTP; 12 Nov 2018 23:45:05 -0800
From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH xf86-video-intel v8 1/2] sna/gen9+: Split out wm_kernel from
 the sna_composite_op flags
Date: Tue, 13 Nov 2018 09:45:01 +0200
Message-Id: <20181113074502.31063-2-stanislav.lisovskiy@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181113074502.31063-1-stanislav.lisovskiy@intel.com>
References: <20181113074502.31063-1-stanislav.lisovskiy@intel.com>
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: ville.syrjala@intel.com, martin.peres@intel.com,
 dri-devel@lists.freedesktop.org, stanislav.lisovskiy@intel.com
MIME-Version: 1.0
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

With the extra video kernels we already ran out of bits in
the flags. To tackle that let's just split out the
wm_kernel to its own thing.

Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
---
 src/sna/gen9_render.c | 35 ++++++++++++++++++++++-------------
 src/sna/sna_render.h  |  1 +
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/src/sna/gen9_render.c b/src/sna/gen9_render.c
index 505b98af..eb22b642 100644
--- a/src/sna/gen9_render.c
+++ b/src/sna/gen9_render.c
@@ -226,19 +226,18 @@ static const struct blendinfo {
 
 #define COPY_SAMPLER 0
 #define COPY_VERTEX VERTEX_2s2s
-#define COPY_FLAGS(a) GEN9_SET_FLAGS(COPY_SAMPLER, (a) == GXcopy ? NO_BLEND : CLEAR, GEN9_WM_KERNEL_NOMASK, COPY_VERTEX)
+#define COPY_FLAGS(a) GEN9_SET_FLAGS(COPY_SAMPLER, (a) == GXcopy ? NO_BLEND : CLEAR, COPY_VERTEX)
 
 #define FILL_SAMPLER 1
 #define FILL_VERTEX VERTEX_2s2s
-#define FILL_FLAGS(op, format) GEN9_SET_FLAGS(FILL_SAMPLER, gen9_get_blend((op), false, (format)), GEN9_WM_KERNEL_NOMASK, FILL_VERTEX)
-#define FILL_FLAGS_NOBLEND GEN9_SET_FLAGS(FILL_SAMPLER, NO_BLEND, GEN9_WM_KERNEL_NOMASK, FILL_VERTEX)
+#define FILL_FLAGS(op, format) GEN9_SET_FLAGS(FILL_SAMPLER, gen9_get_blend((op), false, (format)), FILL_VERTEX)
+#define FILL_FLAGS_NOBLEND GEN9_SET_FLAGS(FILL_SAMPLER, NO_BLEND, FILL_VERTEX)
 
 #define GEN9_SAMPLER(f) (((f) >> 20) & 0xfff)
 #define GEN9_BLEND(f) (((f) >> 4) & 0x7ff)
 #define GEN9_READS_DST(f) (((f) >> 15) & 1)
-#define GEN9_KERNEL(f) (((f) >> 16) & 0xf)
 #define GEN9_VERTEX(f) (((f) >> 0) & 0xf)
-#define GEN9_SET_FLAGS(S, B, K, V)  ((S) << 20 | (K) << 16 | (B) | (V))
+#define GEN9_SET_FLAGS(S, B, V)  ((S) << 20 | (B) | (V))
 
 #define OUT_BATCH(v) batch_emit(sna, v)
 #define OUT_BATCH64(v) batch_emit64(sna, v)
@@ -1349,7 +1348,7 @@ gen9_emit_state(struct sna *sna,
 	gen9_emit_cc(sna, GEN9_BLEND(op->u.gen9.flags));
 	gen9_emit_sampler(sna, GEN9_SAMPLER(op->u.gen9.flags));
 	gen9_emit_sf(sna, GEN9_VERTEX(op->u.gen9.flags) >> 2);
-	gen9_emit_wm(sna, GEN9_KERNEL(op->u.gen9.flags));
+	gen9_emit_wm(sna, op->u.gen9.wm_kernel);
 	gen9_emit_vertex_elements(sna, op);
 	gen9_emit_binding_table(sna, wm_binding_table);
 
@@ -1618,7 +1617,7 @@ static int gen9_get_rectangles__flush(struct sna *sna,
 		if (gen9_magic_ca_pass(sna, op)) {
 			gen9_emit_pipe_invalidate(sna);
 			gen9_emit_cc(sna, GEN9_BLEND(op->u.gen9.flags));
-			gen9_emit_wm(sna, GEN9_KERNEL(op->u.gen9.flags));
+			gen9_emit_wm(sna, op->u.gen9.wm_kernel);
 		}
 	}
 
@@ -2548,11 +2547,11 @@ gen9_render_composite(struct sna *sna,
 			       gen9_get_blend(tmp->op,
 					      tmp->has_component_alpha,
 					      tmp->dst.format),
-			       gen9_choose_composite_kernel(tmp->op,
-							    tmp->mask.bo != NULL,
-							    tmp->has_component_alpha,
-							    tmp->is_affine),
 			       gen4_choose_composite_emitter(sna, tmp));
+	tmp->u.gen9.wm_kernel = gen9_choose_composite_kernel(tmp->op,
+							     tmp->mask.bo != NULL,
+							     tmp->has_component_alpha,
+							     tmp->is_affine);
 
 	tmp->blt   = gen9_render_composite_blt;
 	tmp->box   = gen9_render_composite_box;
@@ -2781,8 +2780,9 @@ gen9_render_composite_spans(struct sna *sna,
 					      SAMPLER_FILTER_NEAREST,
 					      SAMPLER_EXTEND_PAD),
 			       gen9_get_blend(tmp->base.op, false, tmp->base.dst.format),
-			       GEN9_WM_KERNEL_OPACITY | !tmp->base.is_affine,
 			       gen4_choose_spans_emitter(sna, tmp));
+	tmp->base.u.gen9.wm_kernel =
+		GEN9_WM_KERNEL_OPACITY | !tmp->base.is_affine;
 
 	tmp->box   = gen9_render_composite_spans_box;
 	tmp->boxes = gen9_render_composite_spans_boxes;
@@ -3045,6 +3045,7 @@ fallback_blt:
 	tmp.need_magic_ca_pass = 0;
 
 	tmp.u.gen9.flags = COPY_FLAGS(alu);
+	tmp.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, tmp.dst.bo);
 	if (!kgem_check_bo(&sna->kgem, tmp.dst.bo, tmp.src.bo, NULL)) {
@@ -3214,6 +3215,7 @@ fallback:
 	op->base.floats_per_rect = 6;
 
 	op->base.u.gen9.flags = COPY_FLAGS(alu);
+	op->base.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, dst_bo);
 	if (!kgem_check_bo(&sna->kgem, dst_bo, src_bo, NULL)) {
@@ -3366,6 +3368,7 @@ gen9_render_fill_boxes(struct sna *sna,
 	tmp.need_magic_ca_pass = false;
 
 	tmp.u.gen9.flags = FILL_FLAGS(op, format);
+	tmp.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, dst_bo);
 	if (!kgem_check_bo(&sna->kgem, dst_bo, NULL)) {
@@ -3552,6 +3555,7 @@ gen9_render_fill(struct sna *sna, uint8_t alu,
 	op->base.floats_per_rect = 6;
 
 	op->base.u.gen9.flags = FILL_FLAGS_NOBLEND;
+	op->base.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, dst_bo);
 	if (!kgem_check_bo(&sna->kgem, dst_bo, NULL)) {
@@ -3637,6 +3641,7 @@ gen9_render_fill_one(struct sna *sna, PixmapPtr dst, struct kgem_bo *bo,
 	tmp.need_magic_ca_pass = false;
 
 	tmp.u.gen9.flags = FILL_FLAGS_NOBLEND;
+	tmp.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, bo);
 	if (!kgem_check_bo(&sna->kgem, bo, NULL)) {
@@ -3723,6 +3728,7 @@ gen9_render_clear(struct sna *sna, PixmapPtr dst, struct kgem_bo *bo)
 	tmp.need_magic_ca_pass = false;
 
 	tmp.u.gen9.flags = FILL_FLAGS_NOBLEND;
+	tmp.u.gen9.wm_kernel = GEN9_WM_KERNEL_NOMASK;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, bo);
 	if (!kgem_check_bo(&sna->kgem, bo, NULL)) {
@@ -3964,8 +3970,8 @@ gen9_render_video(struct sna *sna,
 		GEN9_SET_FLAGS(SAMPLER_OFFSET(filter, SAMPLER_EXTEND_PAD,
 					      SAMPLER_FILTER_NEAREST, SAMPLER_EXTEND_NONE),
 			       NO_BLEND,
-			       select_video_kernel(video, frame),
 			       2);
+	tmp.u.gen9.wm_kernel = select_video_kernel(video, frame);
 	tmp.priv = frame;
 
 	kgem_set_mode(&sna->kgem, KGEM_RENDER, tmp.dst.bo);
@@ -4135,6 +4141,9 @@ static bool gen9_render_setup(struct sna *sna)
 		assert(state->wm_kernel[m][0]|state->wm_kernel[m][1]|state->wm_kernel[m][2]);
 	}
 
+	COMPILE_TIME_ASSERT(GEN9_WM_KERNEL_COUNT <=
+			    1 << (sizeof(((struct sna_composite_op *)NULL)->u.gen9.wm_kernel) * 8));
+
 	COMPILE_TIME_ASSERT(SAMPLER_OFFSET(FILTER_COUNT, EXTEND_COUNT, FILTER_COUNT, EXTEND_COUNT) <= 0x7ff);
 	ss = sna_static_stream_map(&general,
 				   2 * sizeof(*ss) *
diff --git a/src/sna/sna_render.h b/src/sna/sna_render.h
index 6669af9d..a4e5b56a 100644
--- a/src/sna/sna_render.h
+++ b/src/sna/sna_render.h
@@ -151,6 +151,7 @@ struct sna_composite_op {
 
 		struct {
 			uint32_t flags;
+			uint8_t wm_kernel;
 		} gen9;
 	} u;
 

From patchwork Tue Nov 13 07:45:02 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
X-Patchwork-Id: 10679991
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 832A814E2
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 725EC2A42B
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 665672A437; Tue, 13 Nov 2018 07:45:16 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5AD8F2A42B
	for <patchwork-dri-devel@patchwork.kernel.org>;
 Tue, 13 Nov 2018 07:45:15 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 3CB1C6E326;
	Tue, 13 Nov 2018 07:45:12 +0000 (UTC)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 11AFC6E326;
 Tue, 13 Nov 2018 07:45:10 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 12 Nov 2018 23:45:09 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,498,1534834800"; d="scan'208";a="95881909"
Received: from slisovsk-lenovo-ideapad-720s-13ikb.fi.intel.com
 ([10.237.66.156])
 by FMSMGA003.fm.intel.com with ESMTP; 12 Nov 2018 23:45:07 -0800
From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH xf86-video-intel v8 2/2] sna: Added AYUV format support for
 textured and sprite video adapters.
Date: Tue, 13 Nov 2018 09:45:02 +0200
Message-Id: <20181113074502.31063-3-stanislav.lisovskiy@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181113074502.31063-1-stanislav.lisovskiy@intel.com>
References: <20181113074502.31063-1-stanislav.lisovskiy@intel.com>
MIME-Version: 1.0
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: ville.syrjala@intel.com, martin.peres@intel.com,
 dri-devel@lists.freedesktop.org, stanislav.lisovskiy@intel.com
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

v2: Renamed DRM_FORMAT_XYUV to DRM_FORMAT_XYUV8888.
    Added comment about AYUV byte ordering in Gstreamer.

v3: Removed sna_composite_op flags related change to the separate patch.

v4: Fixed review comments, done code refactoring

v5: Fixed following review comments:
    - Fixed comment in shader code for ayuv kernel.
    - Fixed naming to VIDEO_AYUV_BT601/BT709 for ayuv kernels.
    - Removed duplicate gen9_kernel parameter, left from previous patches
    - Added colorspace handling for new AYUV kernel
    - Fixed naming of sna_copy_packed_data_ayuv to sna_copy_ayuv_data
    - Started using standard bswap_32 function for byte swapping in sna_copy_ayuv_data
    - Removed redundant code in sna_copy_ayuv_data so that it looks more neat
    - Fixed XVIMAGE_AYUV structure initialization to contain proper byte sequence for GST
    - Fixed bogus comment about subsampling for DRM_FORMAT_XYUV8888
    - Fixed AYUV advertisement for all platforms
    - Removed unnecessary RGB888 declaration.

v6:
    - Fixed surface format not to use alpha as supposed
    - Now doing byte swapping always during copy
    - Changed hack, required for GST to work to be at one place
    - Fixed invalid sampling values for XVIMAGE_AYUV
    - Fixed sprite format checking order and images_ayuv definition.

v7:
    - Removed reverse_bytes bool parameter, now swapping bytes
      for XYUV unconditionally both for textured and sprite modes.

v8:
    - Added gen9_images structure, in order to expose AYUV format to
      proper platforms.

Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 src/render_program/Makefile.am                |  2 +
 .../exa_wm_src_sample_argb_ayuv.g8a           | 76 +++++++++++++++++++
 .../exa_wm_src_sample_argb_ayuv.g8b           |  8 ++
 src/sna/gen9_render.c                         | 24 +++++-
 src/sna/sna_render.h                          |  3 +
 src/sna/sna_video.c                           | 72 +++++++++++++++++-
 src/sna/sna_video.h                           | 20 +++++
 src/sna/sna_video_sprite.c                    | 20 ++++-
 src/sna/sna_video_textured.c                  | 19 +++++
 9 files changed, 239 insertions(+), 5 deletions(-)
 create mode 100644 src/render_program/exa_wm_src_sample_argb_ayuv.g8a
 create mode 100644 src/render_program/exa_wm_src_sample_argb_ayuv.g8b

diff --git a/src/render_program/Makefile.am b/src/render_program/Makefile.am
index dc58138f..e35ffa52 100644
--- a/src/render_program/Makefile.am
+++ b/src/render_program/Makefile.am
@@ -196,6 +196,7 @@ INTEL_G7B =				\
 INTEL_G8A =				\
 	exa_wm_src_affine.g8a 		\
 	exa_wm_src_sample_argb.g8a 	\
+	exa_wm_src_sample_argb_ayuv.g8a \
 	exa_wm_src_sample_nv12.g8a 	\
 	exa_wm_src_sample_planar.g8a 	\
 	exa_wm_write.g8a 		\
@@ -205,6 +206,7 @@ INTEL_G8A =				\
 
 INTEL_G8B =				\
 	exa_wm_src_affine.g8b 		\
+	exa_wm_src_sample_argb_ayuv.g8b \
 	exa_wm_src_sample_argb.g8b 	\
 	exa_wm_src_sample_nv12.g8b 	\
 	exa_wm_src_sample_planar.g8b 	\
diff --git a/src/render_program/exa_wm_src_sample_argb_ayuv.g8a b/src/render_program/exa_wm_src_sample_argb_ayuv.g8a
new file mode 100644
index 00000000..c0b84c2e
--- /dev/null
+++ b/src/render_program/exa_wm_src_sample_argb_ayuv.g8a
@@ -0,0 +1,76 @@
+/*
+ * Copyright © 2006 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Wang Zhenyu <zhenyu.z.wang@intel.com>
+ *    Keith Packard <keithp@keithp.com>
+ */
+
+/* Sample the src surface */
+
+include(`exa_wm.g4i')
+
+undefine(`src_msg')
+undefine(`src_msg_ind')
+
+define(`src_msg',       `g65')
+define(`src_msg_ind',   `65')
+
+/* prepare sampler read back gX register, which would be written back to output */
+
+/* use simd16 sampler, param 0 is u, param 1 is v. */
+/* 'payload' loading, assuming tex coord start from g4 */
+
+/* load argb */
+mov (1) g0.8<1>UD	0x00000000UD { align1 mask_disable };
+mov (8) src_msg<1>UD g0<8,8,1>UD { align1 }; /* copy to msg start reg*/
+
+/* src_msg will be copied with g0, as it contains send desc */
+/* emit sampler 'send' cmd */
+send (16) src_msg_ind	/* msg reg index */
+	src_sample_base<1>UW /* readback */
+	null
+	sampler (1,0,F)	/* sampler message description, (binding_table,sampler_index,datatype)
+				/* here(src->dst) we should use src_sampler and src_surface */
+	mlen 5 rlen 8 { align1 };   /* required message len 5, readback len 8 */
+
+/*
+ * Have to change bytes order, because the only
+ * player which supports AYUV format currently is
+ * Gstreamer and it supports in bad way, even though
+ * spec says MSB:AYUV, we get the bytes opposite way.
+ * We swap bytes both for sprite and texture modes during copy.
+ * So here we get argb which then becomes 1bgr.
+ */
+mov (16) src_sample_a<1>UD src_sample_b<1>UD  { align1 };
+mov (16) src_sample_b<1>UD src_sample_g<1>UD  { align1 };
+mov (16) src_sample_g<1>UD src_sample_r<1>UD  { align1 };
+mov (16) src_sample_r<1>UD src_sample_a<1>UD  { align1 };
+mov (16) src_sample_a<1>F 1.0F;
+
+
+
+
+
+
+
+
diff --git a/src/render_program/exa_wm_src_sample_argb_ayuv.g8b b/src/render_program/exa_wm_src_sample_argb_ayuv.g8b
new file mode 100644
index 00000000..f3ac4959
--- /dev/null
+++ b/src/render_program/exa_wm_src_sample_argb_ayuv.g8b
@@ -0,0 +1,8 @@
+   { 0x00000001, 0x2008060c, 0x00000000, 0x00000000 },
+   { 0x00600001, 0x28200208, 0x008d0000, 0x00000000 },
+   { 0x02800031, 0x21c00a48, 0x06000820, 0x0a8c0001 },
+   { 0x00800001, 0x22800208, 0x00200240, 0x00000000 },
+   { 0x00800001, 0x22400208, 0x00200200, 0x00000000 },
+   { 0x00800001, 0x22000208, 0x002001c0, 0x00000000 },
+   { 0x00800001, 0x21c00208, 0x00200280, 0x00000000 },
+   { 0x00800001, 0x22803ee8, 0x38000000, 0x3f800000 },
diff --git a/src/sna/gen9_render.c b/src/sna/gen9_render.c
index eb22b642..90707b1f 100644
--- a/src/sna/gen9_render.c
+++ b/src/sna/gen9_render.c
@@ -129,6 +129,20 @@ static const uint32_t ps_kernel_planar_bt709[][4] = {
 #include "exa_wm_write.g8b"
 };
 
+static const uint32_t ps_kernel_ayuv_bt601[][4] = {
+#include "exa_wm_src_affine.g8b"
+#include "exa_wm_src_sample_argb_ayuv.g8b"
+#include "exa_wm_yuv_rgb_bt601.g8b"
+#include "exa_wm_write.g8b"
+};
+
+static const uint32_t ps_kernel_ayuv_bt709[][4] = {
+#include "exa_wm_src_affine.g8b"
+#include "exa_wm_src_sample_argb_ayuv.g8b"
+#include "exa_wm_yuv_rgb_bt709.g8b"
+#include "exa_wm_write.g8b"
+};
+
 static const uint32_t ps_kernel_nv12_bt709[][4] = {
 #include "exa_wm_src_affine.g8b"
 #include "exa_wm_src_sample_nv12.g8b"
@@ -177,6 +191,8 @@ static const struct wm_kernel_info {
 	KERNEL(VIDEO_PLANAR_BT709, ps_kernel_planar_bt709, 7),
 	KERNEL(VIDEO_NV12_BT709, ps_kernel_nv12_bt709, 7),
 	KERNEL(VIDEO_PACKED_BT709, ps_kernel_packed_bt709, 2),
+	KERNEL(VIDEO_AYUV_BT601, ps_kernel_ayuv_bt601, 2),
+	KERNEL(VIDEO_AYUV_BT709, ps_kernel_ayuv_bt709, 2),
 	KERNEL(VIDEO_RGB, ps_kernel_rgb, 2),
 #endif
 };
@@ -2552,7 +2568,6 @@ gen9_render_composite(struct sna *sna,
 							     tmp->mask.bo != NULL,
 							     tmp->has_component_alpha,
 							     tmp->is_affine);
-
 	tmp->blt   = gen9_render_composite_blt;
 	tmp->box   = gen9_render_composite_box;
 	tmp->boxes = gen9_render_composite_boxes__blt;
@@ -3853,6 +3868,8 @@ static void gen9_emit_video_state(struct sna *sna,
 			src_surf_format[0] = SURFACEFORMAT_B8G8R8X8_UNORM;
 		else if (frame->id == FOURCC_UYVY)
 			src_surf_format[0] = SURFACEFORMAT_YCRCB_SWAPY;
+		else if (is_ayuv_fourcc(frame->id))
+			src_surf_format[0] = SURFACEFORMAT_B8G8R8X8_UNORM;
 		else
 			src_surf_format[0] = SURFACEFORMAT_YCRCB_NORMAL;
 
@@ -3903,6 +3920,11 @@ static unsigned select_video_kernel(const struct sna_video *video,
 	case FOURCC_RGB565:
 		return GEN9_WM_KERNEL_VIDEO_RGB;
 
+	case FOURCC_AYUV:
+		return video->colorspace ?
+			GEN9_WM_KERNEL_VIDEO_AYUV_BT709 :
+			GEN9_WM_KERNEL_VIDEO_AYUV_BT601;
+
 	default:
 		return video->colorspace ?
 			GEN9_WM_KERNEL_VIDEO_PACKED_BT709 :
diff --git a/src/sna/sna_render.h b/src/sna/sna_render.h
index a4e5b56a..891fc905 100644
--- a/src/sna/sna_render.h
+++ b/src/sna/sna_render.h
@@ -617,6 +617,9 @@ enum {
 	GEN9_WM_KERNEL_VIDEO_NV12_BT709,
 	GEN9_WM_KERNEL_VIDEO_PACKED_BT709,
 
+	GEN9_WM_KERNEL_VIDEO_AYUV_BT601,
+	GEN9_WM_KERNEL_VIDEO_AYUV_BT709,
+
 	GEN9_WM_KERNEL_VIDEO_RGB,
 	GEN9_WM_KERNEL_COUNT
 };
diff --git a/src/sna/sna_video.c b/src/sna/sna_video.c
index 55405f81..b51fed96 100644
--- a/src/sna/sna_video.c
+++ b/src/sna/sna_video.c
@@ -59,6 +59,7 @@
 #include "intel_options.h"
 
 #include <xf86xv.h>
+#include <byteswap.h>
 
 #ifdef SNA_XVMC
 #define _SNA_XVMC_SERVER_
@@ -281,6 +282,7 @@ sna_video_frame_set_rotation(struct sna_video *video,
 	} else {
 		switch (frame->id) {
 		case FOURCC_RGB888:
+		case FOURCC_AYUV:
 			if (rotation & (RR_Rotate_90 | RR_Rotate_270)) {
 				frame->pitch[0] = ALIGN((height << 2), align);
 				frame->size = (int)frame->pitch[0] * width;
@@ -584,6 +586,72 @@ sna_copy_packed_data(struct sna_video *video,
 	}
 }
 
+static void
+sna_copy_ayuv_data(struct sna_video *video,
+		   const struct sna_video_frame *frame,
+		   const uint8_t *buf,
+		   uint8_t *dst)
+{
+	int pitch = frame->width << 2;
+	const uint8_t *src, *s;
+	const uint32_t *src_dw;
+	uint32_t *dst_dw = (uint32_t *)dst;
+	int x, y, w, h;
+	int i, j;
+
+	if (video->textured) {
+		/* XXX support copying cropped extents */
+		x = y = 0;
+		w = frame->width;
+		h = frame->height;
+	} else {
+		x = frame->image.x1;
+		y = frame->image.y1;
+		w = frame->image.x2 - frame->image.x1;
+		h = frame->image.y2 - frame->image.y1;
+	}
+
+	src = buf + (y * pitch) + (x << 2);
+	src_dw = (uint32_t *)src;
+
+	switch (frame->rotation) {
+	case RR_Rotate_0:
+		for (i = 0; i < h; i++) {
+			for (j = 0; j < w; j++) {
+				/*
+				 * Have to reverse bytes order, because the only
+				 * player which supports AYUV format currently is
+				 * Gstreamer and it supports in bad way, even though
+				 * spec says MSB:AYUV, we get the bytes opposite way.
+				 */
+				dst_dw[i * w + j] = bswap_32(src_dw[i * w + j]);
+			}
+		}
+		break;
+	case RR_Rotate_90:
+		for (i = 0; i < h; i++) {
+			for (j = 0; j < w; j++) {
+				dst_dw[(w - j - 1) * h + i] = bswap_32(src_dw[i * w + j]);
+			}
+		}
+		break;
+	case RR_Rotate_180:
+		for (i = 0; i < h; i++) {
+			for (j = 0; j < w; j++) {
+				dst_dw[(h - i - 1) * w + w - j - 1] = bswap_32(src_dw[i * w + j]);
+			}
+		}
+		break;
+	case RR_Rotate_270:
+		for (i = 0; i < h; i++) {
+			for (j = 0; j < w; j++) {
+				dst_dw[(w - j - 1) * h + i] = bswap_32(src_dw[i * w + j]);
+			}
+		}
+		break;
+	}
+}
+
 bool
 sna_video_copy_data(struct sna_video *video,
 		    struct sna_video_frame *frame,
@@ -604,7 +672,7 @@ sna_video_copy_data(struct sna_video *video,
 	assert(frame->size);
 
 	/* In the common case, we can simply the upload in a single pwrite */
-	if (frame->rotation == RR_Rotate_0 && !video->tiled) {
+	if (frame->rotation == RR_Rotate_0 && !video->tiled && !is_ayuv_fourcc(frame->id)) {
 		DBG(("%s: unrotated, untiled fast paths: is-planar?=%d\n",
 		     __FUNCTION__, is_planar_fourcc(frame->id)));
 		if (is_nv12_fourcc(frame->id)) {
@@ -709,6 +777,8 @@ use_gtt: /* copy data, must use GTT so that we keep the overlay uncached */
 		sna_copy_nv12_data(video, frame, buf, dst);
 	else if (is_planar_fourcc(frame->id))
 		sna_copy_planar_data(video, frame, buf, dst);
+	else if (is_ayuv_fourcc(frame->id))
+		sna_copy_ayuv_data(video, frame, buf, dst);
 	else
 		sna_copy_packed_data(video, frame, buf, dst);
 
diff --git a/src/sna/sna_video.h b/src/sna/sna_video.h
index bbd3f0fd..a3ffdc0b 100644
--- a/src/sna/sna_video.h
+++ b/src/sna/sna_video.h
@@ -39,6 +39,7 @@ THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define FOURCC_RGB565 ((16 << 24) + ('B' << 16) + ('G' << 8) + 'R')
 #define FOURCC_RGB888 ((24 << 24) + ('B' << 16) + ('G' << 8) + 'R')
 #define FOURCC_NV12 (('2' << 24) + ('1' << 16) + ('V' << 8) + 'N')
+#define FOURCC_AYUV (('V' << 24) + ('U' << 16) + ('Y' << 8) + 'A')
 
 /*
  * Below, a dummy picture type that is used in XvPutImage
@@ -79,6 +80,15 @@ THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 	XvTopToBottom \
 }
 
+#define XVIMAGE_AYUV { \
+	FOURCC_AYUV, XvYUV, LSBFirst, \
+	{'A', 'Y', 'U', 'V', 0x00, 0x00, 0x00, 0x10, 0x80, 0x00, 0x00, 0xAA, 0x00, 0x38, 0x9B, 0x71}, \
+	32, XvPacked, 1, 0, 0, 0, 0, 8, 8, 8, 1, 1, 1, 1, 1, 1, \
+	{'A', 'Y', 'U', 'V', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, \
+	XvTopToBottom \
+}
+
+
 struct sna_video {
 	struct sna *sna;
 
@@ -189,6 +199,16 @@ static inline int is_nv12_fourcc(int id)
 	}
 }
 
+static inline int is_ayuv_fourcc(int id)
+{
+	switch (id) {
+	case FOURCC_AYUV:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 bool
 sna_video_clip_helper(struct sna_video *video,
 		      struct sna_video_frame *frame,
diff --git a/src/sna/sna_video_sprite.c b/src/sna/sna_video_sprite.c
index 8b7ae8ae..3780dc0e 100644
--- a/src/sna/sna_video_sprite.c
+++ b/src/sna/sna_video_sprite.c
@@ -47,7 +47,7 @@
 #define DRM_FORMAT_YUYV         fourcc_code('Y', 'U', 'Y', 'V') /* [31:0] Cr0:Y1:Cb0:Y0 8:8:8:8 little endian */
 #define DRM_FORMAT_UYVY         fourcc_code('U', 'Y', 'V', 'Y') /* [31:0] Y1:Cr0:Y0:Cb0 8:8:8:8 little endian */
 #define DRM_FORMAT_NV12         fourcc_code('N', 'V', '1', '2') /* 2x2 subsampled Cr:Cb plane */
-
+#define DRM_FORMAT_XYUV8888     fourcc_code('X', 'Y', 'U', 'V') /* [31:0] x:Y:U:V 8:8:8:8 little endian */
 #define has_hw_scaling(sna, video) ((sna)->kgem.gen < 071 || \
 				    (sna)->kgem.gen >= 0110)
 
@@ -79,6 +79,8 @@ static const XvImageRec images_rgb565[] = { XVIMAGE_YUY2, XVIMAGE_UYVY,
 					    XVMC_RGB888, XVMC_RGB565 };
 static const XvImageRec images_nv12[] = { XVIMAGE_YUY2, XVIMAGE_UYVY,
 					  XVIMAGE_NV12, XVMC_RGB888, XVMC_RGB565 };
+static const XvImageRec images_ayuv[] = { XVIMAGE_AYUV, XVIMAGE_YUY2, XVIMAGE_UYVY,
+					  XVIMAGE_NV12, XVMC_RGB888, XVMC_RGB565 };
 static const XvAttributeRec attribs[] = {
 	{ XvSettable | XvGettable, 0, 1, (char *)"XV_COLORSPACE" }, /* BT.601, BT.709 */
 	{ XvSettable | XvGettable, 0, 0xffffff, (char *)"XV_COLORKEY" },
@@ -364,6 +366,10 @@ sna_video_sprite_show(struct sna *sna,
 		case FOURCC_UYVY:
 			f.pixel_format = DRM_FORMAT_UYVY;
 			break;
+		case FOURCC_AYUV:
+			/* i915 doesn't support alpha, so we use XYUV */
+			f.pixel_format = DRM_FORMAT_XYUV8888;
+			break;
 		case FOURCC_YUY2:
 		default:
 			f.pixel_format = DRM_FORMAT_YUYV;
@@ -705,7 +711,12 @@ static int sna_video_sprite_query(ddQueryImageAttributes_ARGS)
 		tmp *= (*h >> 1);
 		size += tmp;
 		break;
-
+	case FOURCC_AYUV:
+		tmp = *w << 2;
+		if (pitches)
+			pitches[0] = tmp;
+		size = *h * tmp;
+		break;
 	default:
 		*w = (*w + 1) & ~1;
 		*h = (*h + 1) & ~1;
@@ -805,7 +816,10 @@ void sna_video_sprite_setup(struct sna *sna, ScreenPtr screen)
 	adaptor->nAttributes = ARRAY_SIZE(attribs);
 	adaptor->pAttributes = (XvAttributeRec *)attribs;
 
-	if (sna_has_sprite_format(sna, DRM_FORMAT_NV12)) {
+	if (sna_has_sprite_format(sna, DRM_FORMAT_XYUV8888)) {
+		adaptor->pImages = (XvImageRec *)images_ayuv;
+		adaptor->nImages = ARRAY_SIZE(images_ayuv);
+	} else if (sna_has_sprite_format(sna, DRM_FORMAT_NV12)) {
 		adaptor->pImages = (XvImageRec *)images_nv12;
 		adaptor->nImages = ARRAY_SIZE(images_nv12);
 	} else if (sna_has_sprite_format(sna, DRM_FORMAT_RGB565)) {
diff --git a/src/sna/sna_video_textured.c b/src/sna/sna_video_textured.c
index a784fe2e..1c7852fb 100644
--- a/src/sna/sna_video_textured.c
+++ b/src/sna/sna_video_textured.c
@@ -71,6 +71,16 @@ static const XvImageRec gen4_Images[] = {
 	XVMC_YUV,
 };
 
+static const XvImageRec gen9_Images[] = {
+	XVIMAGE_YUY2,
+	XVIMAGE_YV12,
+	XVIMAGE_I420,
+	XVIMAGE_NV12,
+	XVIMAGE_UYVY,
+	XVIMAGE_AYUV,
+	XVMC_YUV,
+};
+
 static int sna_video_textured_stop(ddStopVideo_ARGS)
 {
 	struct sna_video *video = port->devPriv.ptr;
@@ -337,6 +347,12 @@ sna_video_textured_query(ddQueryImageAttributes_ARGS)
 			pitches[0] = size;
 		size *= *h;
 		break;
+	case FOURCC_AYUV:
+		size = *w << 2;
+		if (pitches)
+			pitches[0] = size;
+		size *= *h;
+		break;
 	case FOURCC_XVMC:
 		*h = (*h + 1) & ~1;
 		size = sizeof(uint32_t);
@@ -408,6 +424,9 @@ void sna_video_textured_setup(struct sna *sna, ScreenPtr screen)
 	} else if (sna->kgem.gen < 040) {
 		adaptor->nImages = ARRAY_SIZE(gen3_Images);
 		adaptor->pImages = (XvImageRec *)gen3_Images;
+	} else if (sna->kgem.gen >= 0110) {
+		adaptor->nImages = ARRAY_SIZE(gen9_Images);
+		adaptor->pImages = (XvImageRec *)gen9_Images;
 	} else {
 		adaptor->nImages = ARRAY_SIZE(gen4_Images);
 		adaptor->pImages = (XvImageRec *)gen4_Images;