From patchwork Tue Apr 5 14:34:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 407E1C433F5 for ; Tue, 5 Apr 2022 14:34:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9C04A10E6A0; Tue, 5 Apr 2022 14:34:14 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5A34510E660; Tue, 5 Apr 2022 14:34:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169252; x=1680705252; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=FH1vn4P3u+/zwqyrOOmlegQIVQF8V8aUPfEigCgaNUY=; b=EWpY4q26ZYBA2pFDzBu+WuHwRH2nG2Qw+fLzCnkznKU4fqfmUzHS4lTY upN3XKsAEncK+GNYbW3tIw5eq33t5xKjuzrnkPueP6lP5Oc0ny7fzmg5E 7WbphSCfYQ6MCWlkmX9cHBaGmY+3Wu8QeCoNfFoIzeeAma2jy0BifMFdw WECuSt6GmAj040aLpeR53BAMbIT45QgU4IK/LF+arz6Jyd0tUT6Ctat/G ibcrPoP4/S+aHPf7XWZy1vRA6uRjsZmKDtpx48Zs9okKLnvSJLOLFdcUI F3Nvn++hAcJhALmBN4AgcYvzVvaYT63TUchfhochIt0FPqFI4d7izot/P g==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722168" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722168" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:11 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657958962" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:10 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:46 +0530 Message-Id: <20220405143454.16358-2-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 1/9] drm/i915/gt: use engine instance directly for offset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To make it uniform across copy and clear, use the engine offset directly to calculate the offset in the cmd forming for emit_clear. Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/intel_migrate.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 950fd6da146c..9d852a570400 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -613,15 +613,13 @@ intel_context_migrate_copy(struct intel_context *ce, return err; } -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value) +static int emit_clear(struct i915_request *rq, u32 offset, int size, u32 value) { const int ver = GRAPHICS_VER(rq->engine->i915); u32 *cs; GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); - offset += (u64)rq->engine->instance << 32; - cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -631,17 +629,16 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value) *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = lower_32_bits(offset); - *cs++ = upper_32_bits(offset); + *cs++ = offset; + *cs++ = rq->engine->instance; *cs++ = value; *cs++ = MI_NOOP; } else { - GEM_BUG_ON(upper_32_bits(offset)); *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = lower_32_bits(offset); + *cs++ = offset; *cs++ = value; } From patchwork Tue Apr 5 14:34:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 42E02C433F5 for ; Tue, 5 Apr 2022 14:34:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 29A6810E70A; Tue, 5 Apr 2022 14:34:16 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8538910E6F5; Tue, 5 Apr 2022 14:34:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169253; x=1680705253; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=b0JQ8nwkBzNoQ8uotPUYMhFyJhn8jMd8eYx0G5tzSRA=; b=GzOyR4Sq607gj2x1hG3+jX/x6bKgTd4EfagjBcFEaaWT4wzOoljPGnf2 zJWOXYR1bs407gDcxPkbHCj9H6yRrCSUA+IScYWqm9f2hp1gkWAuCSRP4 Js05ybbCqXk/jvy0XydEQjUOjh/HlG4YWOrU2XGw/URZ5cz47iz5QlaYu 7qv4/vh/lTCj1q/HsFpUWfwca7JYjX/Ea+KVUm3SlKdy+0x2b5L5EdALV 3ipDktYLqGYIb/Jjh0HnJilsjbFp9pCVVkA5sb7nrd4KZuhhdNlAsiOOo boLz6WdJliCe4yUXacdwzZWF03DiQe2Q+qvZuMk771MHqCzRdTkWfWv5Q Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722176" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722176" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:13 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657958974" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:12 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:47 +0530 Message-Id: <20220405143454.16358-3-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 2/9] drm/i915/gt: Use XY_FAST_COLOR_BLT to clear obj on graphics ver 12+ X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Use faster XY_FAST_COLOR_BLT cmd on graphics version of 12 and more, for clearing (Zero out) the pages of the newly allocated object. XY_FAST_COLOR_BLT is faster than the older XY_COLOR_BLT. v2: Typo fix at title [Thomas] v3: XY_FAST_COLOR_BLT is used only for FLAT_CCS capable gen12+ Signed-off-by: Ramalingam C Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 5 +++ drivers/gpu/drm/i915/gt/intel_migrate.c | 43 +++++++++++++++++--- 2 files changed, 43 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 4243be030bc1..d1b8c23f7a9e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -206,6 +206,11 @@ #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2)) #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) +#define XY_FAST_COLOR_BLT_CMD (2 << 29 | 0x44 << 22) +#define XY_FAST_COLOR_BLT_DEPTH_32 (2 << 19) +#define XY_FAST_COLOR_BLT_DW 16 +#define XY_FAST_COLOR_BLT_MOCS_MASK GENMASK(27, 21) +#define XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31 #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22) #define GEN9_XY_FAST_COPY_BLT_CMD (2 << 29 | 0x42 << 22) #define XY_SRC_COPY_BLT_CMD (2 << 29 | 0x53 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 9d852a570400..e81f20266f62 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -613,18 +613,51 @@ intel_context_migrate_copy(struct intel_context *ce, return err; } -static int emit_clear(struct i915_request *rq, u32 offset, int size, u32 value) +static int emit_clear(struct i915_request *rq, u32 offset, int size, + u32 value, bool is_lmem) { - const int ver = GRAPHICS_VER(rq->engine->i915); + struct drm_i915_private *i915 = rq->engine->i915; + int mocs = rq->engine->gt->mocs.uc_index << 1; + const int ver = GRAPHICS_VER(i915); + int ring_sz; u32 *cs; GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); - cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); + if (HAS_FLAT_CCS(i915) && ver >= 12) + ring_sz = XY_FAST_COLOR_BLT_DW; + else if (ver >= 8) + ring_sz = 8; + else + ring_sz = 6; + + cs = intel_ring_begin(rq, ring_sz); if (IS_ERR(cs)) return PTR_ERR(cs); - if (ver >= 8) { + if (HAS_FLAT_CCS(i915) && ver >= 12) { + *cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 | + (XY_FAST_COLOR_BLT_DW - 2); + *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) | + (PAGE_SIZE - 1); + *cs++ = 0; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cs++ = offset; + *cs++ = rq->engine->instance; + *cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT; + /* BG7 */ + *cs++ = value; + *cs++ = 0; + *cs++ = 0; + *cs++ = 0; + /* BG11 */ + *cs++ = 0; + *cs++ = 0; + /* BG13 */ + *cs++ = 0; + *cs++ = 0; + *cs++ = 0; + } else if (ver >= 8) { *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; @@ -707,7 +740,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - err = emit_clear(rq, offset, len, value); + err = emit_clear(rq, offset, len, value, is_lmem); /* Arbitration is re-enabled between requests. */ out_rq: From patchwork Tue Apr 5 14:34:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9DEBEC433EF for ; Tue, 5 Apr 2022 14:34:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2CFFE10E72F; Tue, 5 Apr 2022 14:34:17 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id F00D410E6F5; Tue, 5 Apr 2022 14:34:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169254; x=1680705254; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=FuMoLwUi2g2fhXN1j5GB5evUKwf4FZkN+vlXf2oenvA=; b=jUz6ObOEZWBWflVOGZaZkh7CdgZZBZAKYh2PvpFNj7C7ALcocSApM1Fk Ff8hs2+eZo02hyJkjHDdvp+2QaZyQqSz+6hhy1bYsEqFDr50kSh1DX1A0 ihE02RMb6/nAaARo1KpeQ0VJ/wmfEYiUtgOJkP3BSZj3vNwKF0wFRbbBz I4Rrvcihsjo3FgWbTcSHV3PvWDEH6BSwwf9WeD9UGrRUSYZws1DPhqpRo rM7HHHKA9mPs7DApCcUOdCAw5Dk4t0Blc8a3lY6TkUopBsXjIa905SISa 1YF372viM8L1aVnKSpfRTQDQrEvEpISTkckRDwl6DlFznUof1Bc7uII91 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722184" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722184" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:14 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657958983" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:13 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:48 +0530 Message-Id: <20220405143454.16358-4-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 3/9] drm/i915/gt: Optimize the migration and clear loop X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Move the static calculations out of the loops for copy and clear. v2: Fix the loss of proper error code on emit_pte Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellstrom (v1) --- drivers/gpu/drm/i915/gt/intel_migrate.c | 34 ++++++++++++------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index e81f20266f62..e0f1c727662e 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -526,6 +526,7 @@ intel_context_migrate_copy(struct intel_context *ce, struct i915_request **out) { struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); + u32 src_offset, dst_offset; struct i915_request *rq; int err; @@ -535,8 +536,18 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); + src_offset = 0; + dst_offset = CHUNK_SZ; + if (HAS_64K_PAGES(ce->engine->i915)) { + src_offset = 0; + dst_offset = 0; + if (src_is_lmem) + src_offset = CHUNK_SZ; + if (dst_is_lmem) + dst_offset = 2 * CHUNK_SZ; + } + do { - u32 src_offset, dst_offset; int len; rq = i915_request_create(ce); @@ -564,17 +575,6 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; - src_offset = 0; - dst_offset = CHUNK_SZ; - if (HAS_64K_PAGES(ce->engine->i915)) { - src_offset = 0; - dst_offset = 0; - if (src_is_lmem) - src_offset = CHUNK_SZ; - if (dst_is_lmem) - dst_offset = 2 * CHUNK_SZ; - } - len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, src_offset, CHUNK_SZ); if (len <= 0) { @@ -690,6 +690,7 @@ intel_context_migrate_clear(struct intel_context *ce, { struct sgt_dma it = sg_sgt(sg); struct i915_request *rq; + u32 offset; int err; GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); @@ -697,8 +698,11 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); + offset = 0; + if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) + offset = CHUNK_SZ; + do { - u32 offset; int len; rq = i915_request_create(ce); @@ -726,10 +730,6 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - offset = 0; - if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) - offset = CHUNK_SZ; - len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ); if (len <= 0) { err = len; From patchwork Tue Apr 5 14:34:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3C05FC433EF for ; Tue, 5 Apr 2022 14:34:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CFF5710E715; Tue, 5 Apr 2022 14:34:18 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7AFCF10E724; Tue, 5 Apr 2022 14:34:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169256; x=1680705256; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=KiOpqb8Tu4Ystozq7E4Ufs7roLiyN2gBwozWLFmgXi0=; b=f5y28JzTGSQZUjfy70a+FO5jQkYXWbLJ41kqSGuFm02ZyvmrKyi4fpDd jKnjS+KPoCYPTGWgcp4h2yvN+5/tAr4SxM9uAmGu6DgkTzfCUID5KlOzy Gj7HGulRwj3NAAvg5oAIryZm0E381Sw9nf04VUHm5KCIxOLlTYVHHvuTd DWMf3cBMwUKvcQkv6JtPApz/AroFOmQrXvhNs2fM82UyYMbdC5DLKB84l rWRdsNBC74D5fRqJGqeUS/qUZJwqtDP2ySRg9sTDXSYckLHCf7DG/wc5Z QIA0174bV5PTfGiA2fgZbkMhDEJLkm661QW6JJToFBOTm87PuGqWHsJ8d g==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722194" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722194" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:16 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657958991" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:15 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:49 +0530 Message-Id: <20220405143454.16358-5-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 4/9] drm/i915/gt: Pass the -EINVAL when emit_pte doesn't update any PTE X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When emit_pte doesn't update any PTE with return value as 0, interpret it as -EINVAL. Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index e0f1c727662e..f9f3b0e7ed87 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -577,7 +577,9 @@ intel_context_migrate_copy(struct intel_context *ce, len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, src_offset, CHUNK_SZ); - if (len <= 0) { + if (!len) + err = -EINVAL; + if (len < 0) { err = len; goto out_rq; } From patchwork Tue Apr 5 14:34:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 160FFC433F5 for ; Tue, 5 Apr 2022 14:34:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5B33910E732; Tue, 5 Apr 2022 14:34:19 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id D593310E732; Tue, 5 Apr 2022 14:34:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169257; x=1680705257; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=7QWOcN5xHytWixrkVZERQ4mhV5yhqGT82aDJJnK893E=; b=JU4UDuEczPcg7IKayHLYMO7rbIgnXqtEaIuWQHmQs5RZucQszGCXPs0j VcLtKxSfQesHP3frtN8hGrpzdRsT0i+8k0ZjaxW39MZPCXtXCVDyEAD/0 5R61l/gItZr4J2pSRE+S6GniOJtFU1acWRoBcueLE2LKPVf6iopt7weI5 XQSSaeBvnX+EopJfuu/5ivnetm3giMb3tYRgBbZP+OwVbUTeK7nYrZxrK YElOOuqdj+NvtP93H8ssCQyTHT9vozucga69yojFAYNhc/Z+5zzdeZO70 TfHK+oZA5WDj7xQcIlE6C6kwRLGL7BoxAUnXmMCAQOmwgNPuALW47oxqR g==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722202" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722202" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:17 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657958993" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:16 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:50 +0530 Message-Id: <20220405143454.16358-6-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 5/9] drm/i915/gt: Clear compress metadata for Flat-ccs objects X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Xe-HP and latest devices support Flat CCS which reserved a portion of the device memory to store compression metadata, during the clearing of device memory buffer object we also need to clear the associated CCS buffer. XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the ccs surface of a lmem memory. So on Flat-CCS capable platform we use XY_CTRL_SURF_COPY_BLT to clear the CCS meta data. v2: Fixed issues with platform naming [Lucas] v3: Rebased [Ram] Used the round_up funcs [Bob] v4: Fixed ccs blk calculation [Ram] Added Kdoc on flat-ccs. v5: GENMASK is used [Matt] mocs fix [Matt] Comments Fix [Matt] Flush address programming [Ram] v6: FLUSH_DW is fixed Few coding style fix v7: Adopting the XY_FAST_COLOR_BLT (Thomas] v8: XY_CTRL_SURF_COPY_BLT for ccs clearing. v9: emit_copy_ccs is used. v10: ctrl_surf cmds are filled in caller itself. [Thomas] only one ctrl surf cmd is used as size of lmem is <=8M [Thomas] Signed-off-by: Ramalingam C Signed-off-by: Ayaz A Siddiqui Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 16 +++ drivers/gpu/drm/i915/gt/intel_migrate.c | 137 ++++++++++++++++++- 2 files changed, 152 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index d1b8c23f7a9e..724ab069ddb6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -154,8 +154,10 @@ #define MI_FLUSH_DW_PROTECTED_MEM_EN (1 << 22) #define MI_FLUSH_DW_STORE_INDEX (1<<21) #define MI_INVALIDATE_TLB (1<<18) +#define MI_FLUSH_DW_CCS (1<<16) #define MI_FLUSH_DW_OP_STOREDW (1<<14) #define MI_FLUSH_DW_OP_MASK (3<<14) +#define MI_FLUSH_DW_LLC (1<<9) #define MI_FLUSH_DW_NOTIFY (1<<8) #define MI_INVALIDATE_BSD (1<<7) #define MI_FLUSH_DW_USE_GTT (1<<2) @@ -204,6 +206,20 @@ #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) +#define XY_CTRL_SURF_INSTR_SIZE 5 +#define MI_FLUSH_DW_SIZE 3 +#define XY_CTRL_SURF_COPY_BLT ((2 << 29) | (0x48 << 22) | 3) +#define SRC_ACCESS_TYPE_SHIFT 21 +#define DST_ACCESS_TYPE_SHIFT 20 +#define CCS_SIZE_MASK 0x3FF +#define CCS_SIZE_SHIFT 8 +#define XY_CTRL_SURF_MOCS_MASK GENMASK(31, 25) +#define NUM_CCS_BYTES_PER_BLOCK 256 +#define NUM_BYTES_PER_CCS_BYTE 256 +#define NUM_CCS_BLKS_PER_XFER 1024 +#define INDIRECT_ACCESS 0 +#define DIRECT_ACCESS 1 + #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2)) #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) #define XY_FAST_COLOR_BLT_CMD (2 << 29 | 0x44 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index f9f3b0e7ed87..2446ff70ce45 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -17,6 +17,8 @@ struct insert_pte_data { #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ +#define GET_CCS_BYTES(i915, size) (HAS_FLAT_CCS(i915) ? \ + DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0) static bool engine_supports_migration(struct intel_engine_cs *engine) { if (!engine) @@ -467,6 +469,123 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; } +/** + * DOC: Flat-CCS - Memory compression for Local memory + * + * On Xe-HP and later devices, we use dedicated compression control state (CCS) + * stored in local memory for each surface, to support the 3D and media + * compression formats. + * + * The memory required for the CCS of the entire local memory is 1/256 of the + * local memory size. So before the kernel boot, the required memory is reserved + * for the CCS data and a secure register will be programmed with the CCS base + * address. + * + * Flat CCS data needs to be cleared when a lmem object is allocated. + * And CCS data can be copied in and out of CCS region through + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. + * + * When we exhaust the lmem, if the object's placements support smem, then we can + * directly decompress the compressed lmem object into smem and start using it + * from smem itself. + * + * But when we need to swapout the compressed lmem object into a smem region + * though objects' placement doesn't support smem, then we copy the lmem content + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). + * When the object is referred, lmem content will be swaped in along with + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding + * location. + */ + +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags) +{ + *cmd++ = MI_FLUSH_DW | flags; + *cmd++ = 0; + *cmd++ = 0; + + return cmd; +} + +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size) +{ + u32 num_cmds, num_blks, total_size; + + if (!GET_CCS_BYTES(i915, size)) + return 0; + + /* + * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte + * blocks. one XY_CTRL_SURF_COPY_BLT command can + * transfer upto 1024 blocks. + */ + num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), + NUM_CCS_BYTES_PER_BLOCK); + num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER); + total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds; + + /* + * Adding a flush before and after XY_CTRL_SURF_COPY_BLT + */ + total_size += 2 * MI_FLUSH_DW_SIZE; + + return total_size; +} + +static int emit_copy_ccs(struct i915_request *rq, + u32 dst_offset, u8 dst_access, + u32 src_offset, u8 src_access, int size) +{ + struct drm_i915_private *i915 = rq->engine->i915; + int mocs = rq->engine->gt->mocs.uc_index << 1; + u32 num_ccs_blks, ccs_ring_size; + u32 *cs; + + ccs_ring_size = calc_ctrl_surf_instr_size(i915, size); + WARN_ON(!ccs_ring_size); + + cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2)); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), + NUM_CCS_BYTES_PER_BLOCK); + GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER); + cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS); + + /* + * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS + * data in and out of the CCS region. + * + * We can copy at most 1024 blocks of 256 bytes using one + * XY_CTRL_SURF_COPY_BLT instruction. + * + * In case we need to copy more than 1024 blocks, we need to add + * another instruction to the same batch buffer. + * + * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS. + * + * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM. + */ + *cs++ = XY_CTRL_SURF_COPY_BLT | + src_access << SRC_ACCESS_TYPE_SHIFT | + dst_access << DST_ACCESS_TYPE_SHIFT | + ((num_ccs_blks - 1) & CCS_SIZE_MASK) << CCS_SIZE_SHIFT; + *cs++ = src_offset; + *cs++ = rq->engine->instance | + FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, mocs); + *cs++ = dst_offset; + *cs++ = rq->engine->instance | + FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, mocs); + + cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS); + if (ccs_ring_size & 1) + *cs++ = MI_NOOP; + + intel_ring_advance(rq, cs); + + return 0; +} + static int emit_copy(struct i915_request *rq, u32 dst_offset, u32 src_offset, int size) { @@ -690,6 +809,7 @@ intel_context_migrate_clear(struct intel_context *ce, u32 value, struct i915_request **out) { + struct drm_i915_private *i915 = ce->engine->i915; struct sgt_dma it = sg_sgt(sg); struct i915_request *rq; u32 offset; @@ -701,7 +821,7 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); offset = 0; - if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) + if (HAS_64K_PAGES(i915) && is_lmem) offset = CHUNK_SZ; do { @@ -743,6 +863,21 @@ intel_context_migrate_clear(struct intel_context *ce, goto out_rq; err = emit_clear(rq, offset, len, value, is_lmem); + if (err) + goto out_rq; + + if (HAS_FLAT_CCS(i915) && is_lmem && !value) { + /* + * copy the content of memory into corresponding + * ccs surface + */ + err = emit_copy_ccs(rq, offset, INDIRECT_ACCESS, offset, + DIRECT_ACCESS, len); + if (err) + goto out_rq; + } + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); /* Arbitration is re-enabled between requests. */ out_rq: From patchwork Tue Apr 5 14:34:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801632 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 39FFAC433F5 for ; Tue, 5 Apr 2022 14:34:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D61BA10E764; Tue, 5 Apr 2022 14:34:21 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5FFC010E764; Tue, 5 Apr 2022 14:34:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169259; x=1680705259; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=tFRynHxMS+RjyyqC9TaSFhKuaHycRvpVH8isMAFz1Ww=; b=HHGIKmZmGifzvTrJ8Wtb17MbejK/rotJ3SsijhOGDKD0aDZz5VpAdWGq aJiT2RhICJ8xzdoVZUX1dvooxmnJ+1QHwE0d74jIOiWRQ2NgaJ80g1VlJ ORx/68bXwyhhaC1ou7yGSVWtRmMhMytyTmjghj8jtYQyTrAxhV9N05WTF hlZW563EAjVOfcCy9zv+oqwdImovnN6pEMdvL69ZdWQJRmHO58yGZWu0O noO1Shz75CmTYvSNKfkyMSSx81LMMalhQMGOMrgoDJuKnOHZUYi9WXROX /Nb3lRrLUcVzKBYG5R9rPUUOktvHzZzlfr/dVcrsN6UPu27/D4BUOMsoH w==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722206" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722206" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:19 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657959009" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:17 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:51 +0530 Message-Id: <20220405143454.16358-7-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 6/9] drm/i915/selftest_migrate: Consider the possible roundup of size X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Consider the possible round up happened at obj size alignment to min_page_size during the obj allocation. Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/selftest_migrate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c index c9c4f391c5cc..b5da8b8cd039 100644 --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c @@ -152,6 +152,9 @@ static int clear(struct intel_migrate *migrate, if (IS_ERR(obj)) return 0; + /* Consider the rounded up memory too */ + sz = obj->base.size; + for_i915_gem_ww(&ww, err, true) { err = i915_gem_object_lock(obj, &ww); if (err) From patchwork Tue Apr 5 14:34:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E857EC433EF for ; Tue, 5 Apr 2022 14:34:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BC70210E8DD; Tue, 5 Apr 2022 14:34:22 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2CE5410E8DD; Tue, 5 Apr 2022 14:34:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169261; x=1680705261; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=C7KSgyXS7+TQoLV43yMZNJqRIUfI/4INR6EpCGaMKo4=; b=H3j5gyKeGj6Qp2E9pU/+ApZdLODzOuR1o99grwMmlSeXxSjtQyp7fRaZ udPx+FYjMbGCk3WJzISLmYSJhtXJKAUd/XiSbE3JSKOsGwJO34mRPNa4x u8mJTt6VCfTG7zBCwx9Q9FL369Fy2cH/5Crz3cLMTUy9oJylcxqfm4MEc J/T95XVe4YweUDeZpCCLSVtfcufeA4Xh7ZVrpDs8fDEP4J54Hnm4g4h79 Cu8mpDPSgUBo7+x5/Je4wvR8vfmJaNXtDaQ22HCM2ANRG65yr8i2j9Co3 w6lY1zSzAadH0aZNE19TD1s45uwteifGb+WAAOZKior9yiMqdNmIM+c5W A==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722209" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722209" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:20 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657959028" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:19 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:52 +0530 Message-Id: <20220405143454.16358-8-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 7/9] drm/i915/selftest_migrate: Check CCS meta data clear X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Extend the live migrate selftest, to verify the ccs surface clearing during the Flat-CCS capable lmem obj clear. v2: Look at right places for ccs data [Thomas] Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/selftest_migrate.c | 250 ++++++++++++++++++--- 1 file changed, 222 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c index b5da8b8cd039..8cd9a22054f3 100644 --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c @@ -132,6 +132,124 @@ static int copy(struct intel_migrate *migrate, return err; } +static int intel_context_copy_ccs(struct intel_context *ce, + const struct i915_deps *deps, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool write_to_ccs, + struct i915_request **out) +{ + u8 src_access = write_to_ccs ? DIRECT_ACCESS : INDIRECT_ACCESS; + u8 dst_access = write_to_ccs ? INDIRECT_ACCESS : DIRECT_ACCESS; + struct sgt_dma it = sg_sgt(sg); + struct i915_request *rq; + u32 offset; + int err; + + GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); + *out = NULL; + + GEM_BUG_ON(ce->ring->size < SZ_64K); + + offset = 0; + if (HAS_64K_PAGES(ce->engine->i915)) + offset = CHUNK_SZ; + + do { + int len; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_ce; + } + + if (deps) { + err = i915_request_await_deps(rq, deps); + if (err) + goto out_rq; + + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) + goto out_rq; + } + + deps = NULL; + } + + /* The PTE updates + clear must not be interrupted. */ + err = emit_no_arbitration(rq); + if (err) + goto out_rq; + + len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ); + if (len <= 0) { + err = len; + goto out_rq; + } + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + err = emit_copy_ccs(rq, offset, dst_access, + offset, src_access, len); + if (err) + goto out_rq; + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + + /* Arbitration is re-enabled between requests. */ +out_rq: + if (*out) + i915_request_put(*out); + *out = i915_request_get(rq); + i915_request_add(rq); + if (err || !it.sg || !sg_dma_len(it.sg)) + break; + + cond_resched(); + } while (1); + +out_ce: + return err; +} + +static int +intel_migrate_ccs_copy(struct intel_migrate *m, + struct i915_gem_ww_ctx *ww, + const struct i915_deps *deps, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool write_to_ccs, + struct i915_request **out) +{ + struct intel_context *ce; + int err; + + *out = NULL; + if (!m->context) + return -ENODEV; + + ce = intel_migrate_create_context(m); + if (IS_ERR(ce)) + ce = intel_context_get(m->context); + GEM_BUG_ON(IS_ERR(ce)); + + err = intel_context_pin_ww(ce, ww); + if (err) + goto out; + + err = intel_context_copy_ccs(ce, deps, sg, cache_level, + write_to_ccs, out); + + intel_context_unpin(ce); +out: + intel_context_put(ce); + return err; +} + static int clear(struct intel_migrate *migrate, int (*fn)(struct intel_migrate *migrate, struct i915_gem_ww_ctx *ww, @@ -144,7 +262,8 @@ static int clear(struct intel_migrate *migrate, struct drm_i915_gem_object *obj; struct i915_request *rq; struct i915_gem_ww_ctx ww; - u32 *vaddr; + u32 *vaddr, val = 0; + bool ccs_cap = false; int err = 0; int i; @@ -155,7 +274,12 @@ static int clear(struct intel_migrate *migrate, /* Consider the rounded up memory too */ sz = obj->base.size; + if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj)) + ccs_cap = true; + for_i915_gem_ww(&ww, err, true) { + int ccs_bytes, ccs_bytes_per_chunk; + err = i915_gem_object_lock(obj, &ww); if (err) continue; @@ -170,44 +294,114 @@ static int clear(struct intel_migrate *migrate, vaddr[i] = ~i; i915_gem_object_flush_map(obj); - err = fn(migrate, &ww, obj, sz, &rq); - if (!err) - continue; + if (ccs_cap && !val) { + /* Write the obj data into ccs surface */ + err = intel_migrate_ccs_copy(migrate, &ww, NULL, + obj->mm.pages->sgl, + obj->cache_level, + true, &rq); + if (rq && !err) { + if (i915_request_wait(rq, 0, HZ) < 0) { + pr_err("%ps timed out, size: %u\n", + fn, sz); + err = -ETIME; + } + i915_request_put(rq); + rq = NULL; + } + if (err) + continue; + } - if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS) - pr_err("%ps failed, size: %u\n", fn, sz); - if (rq) { - i915_request_wait(rq, 0, HZ); + err = fn(migrate, &ww, obj, val, &rq); + if (rq && !err) { + if (i915_request_wait(rq, 0, HZ) < 0) { + pr_err("%ps timed out, size: %u\n", fn, sz); + err = -ETIME; + } i915_request_put(rq); + rq = NULL; } - i915_gem_object_unpin_map(obj); - } - if (err) - goto err_out; + if (err) + continue; - if (rq) { - if (i915_request_wait(rq, 0, HZ) < 0) { - pr_err("%ps timed out, size: %u\n", fn, sz); - err = -ETIME; + i915_gem_object_flush_map(obj); + + /* Verify the set/clear of the obj mem */ + for (i = 0; !err && i < sz / PAGE_SIZE; i++) { + int x = i * 1024 + + i915_prandom_u32_max_state(1024, prng); + + if (vaddr[x] != val) { + pr_err("%ps failed, (%u != %u), offset: %zu\n", + fn, vaddr[x], val, x * sizeof(u32)); + igt_hexdump(vaddr + i * 1024, 4096); + err = -EINVAL; + } } - i915_request_put(rq); - } + if (err) + continue; - for (i = 0; !err && i < sz / PAGE_SIZE; i++) { - int x = i * 1024 + i915_prandom_u32_max_state(1024, prng); + if (ccs_cap && !val) { + for (i = 0; i < sz / sizeof(u32); i++) + vaddr[i] = ~i; + i915_gem_object_flush_map(obj); + + err = intel_migrate_ccs_copy(migrate, &ww, NULL, + obj->mm.pages->sgl, + obj->cache_level, + false, &rq); + if (rq && !err) { + if (i915_request_wait(rq, 0, HZ) < 0) { + pr_err("%ps timed out, size: %u\n", + fn, sz); + err = -ETIME; + } + i915_request_put(rq); + rq = NULL; + } + if (err) + continue; + + ccs_bytes = GET_CCS_BYTES(i915, sz); + ccs_bytes_per_chunk = GET_CCS_BYTES(i915, CHUNK_SZ); + i915_gem_object_flush_map(obj); + + for (i = 0; !err && i < DIV_ROUND_UP(ccs_bytes, PAGE_SIZE); i++) { + int offset = ((i * PAGE_SIZE) / + ccs_bytes_per_chunk) * CHUNK_SZ / sizeof(u32); + int ccs_bytes_left = (ccs_bytes - i * PAGE_SIZE) / sizeof(u32); + int x = i915_prandom_u32_max_state(min_t(int, 1024, + ccs_bytes_left), prng); + + if (vaddr[offset + x]) { + pr_err("%ps ccs clearing failed, offset: %ld/%d\n", + fn, i * PAGE_SIZE + x * sizeof(u32), ccs_bytes); + igt_hexdump(vaddr + offset, + min_t(int, 4096, + ccs_bytes_left * sizeof(u32))); + err = -EINVAL; + } + } + + if (err) + continue; + } + i915_gem_object_unpin_map(obj); + } - if (vaddr[x] != sz) { - pr_err("%ps failed, size: %u, offset: %zu\n", - fn, sz, x * sizeof(u32)); - igt_hexdump(vaddr + i * 1024, 4096); - err = -EINVAL; + if (err) { + if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS) + pr_err("%ps failed, size: %u\n", fn, sz); + if (rq && err != -EINVAL) { + i915_request_wait(rq, 0, HZ); + i915_request_put(rq); } + + i915_gem_object_unpin_map(obj); } - i915_gem_object_unpin_map(obj); -err_out: i915_gem_object_put(obj); - return err; } From patchwork Tue Apr 5 14:34:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 263DDC433EF for ; Tue, 5 Apr 2022 14:34:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5809810E8F1; Tue, 5 Apr 2022 14:34:24 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9F24D10E8EC; Tue, 5 Apr 2022 14:34:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169262; x=1680705262; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Z8cL7sSXIWoLnYTQXrRog7pSrLmf0C6XHEdiF6j31ZU=; b=B0Dfm8XU9bA1i2CRGIJIF9yquLoGh0EGdxzI0zWxYy2Mp2BrpiTrtF5z cpCsPKKuBE8BuhqJJUqWyYGD4H0+VxmbQI8aHIIamTNfMQImaH3FGtcfn TXd36bBQsGjOpFYha2FW1hlKn2zNp8cE7xSB7OrwDwbiEfpWv0Cbb9NZ0 N9P7C94Kg8sZ6LOFCP+RICLA3KIpJf4xxHWfxfuukX0ALp9DJ0mCN9Twj WLLq16B+uwkFibC1HZofMDJzM+4ZVqU3uJZl1jGBYNg+TeFw+zaSz6Y15 Pv6zQij0H/oMd0LVMy10bT6ECnDAddhrLiUktNzySNdtDW5nglTNMHmeM w==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722215" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722215" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:22 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657959036" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:21 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:53 +0530 Message-Id: <20220405143454.16358-9-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 8/9] drm/i915/gem: Add extra pages in ttm_tt for ccs data X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Xe-HP and later devices, dedicated compression control state (CCS) stored in local memory is used for each surface, to support the 3D and media compression formats. The memory required for the CCS of the entire local memory is 1/256 of the local memory size. So before the kernel boot, the required memory is reserved for the CCS data and a secure register will be programmed with the CCS base address So when an object is allocated in local memory, dont need to explicitly allocate the space for ccs data. But when the obj is evicted into the smem, to hold the compression related data along with the obj extra space is needed in smem. i.e obj_size + (obj_size/256). Hence when a smem pages are allocated for an obj with lmem placement possibility we create with the extra pages required for the ccs data for the obj size. v2: Used imperative wording [Thomas] v3: Inflate the pages only when obj's placement is lmem only v4: GEM_BUG_ON if the ttm->num_pages > obj page size [Thomas] Signed-off-by: Ramalingam C cc: Christian Koenig cc: Hellstrom Thomas Reviewed-by: Thomas Hellstrom Reviewed-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 30 ++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index a878910a563c..4c25d9b2f138 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -20,6 +20,7 @@ #include "gem/i915_gem_ttm.h" #include "gem/i915_gem_ttm_move.h" #include "gem/i915_gem_ttm_pm.h" +#include "gt/intel_gpu_commands.h" #define I915_TTM_PRIO_PURGE 0 #define I915_TTM_PRIO_NO_PAGES 1 @@ -265,12 +266,33 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = { .release = i915_ttm_tt_release }; +static inline bool +i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj) +{ + bool lmem_placement = false; + int i; + + for (i = 0; i < obj->mm.n_placements; i++) { + /* Compression is not allowed for the objects with smem placement */ + if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM) + return false; + if (!lmem_placement && + obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL) + lmem_placement = true; + } + + return lmem_placement; +} + static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags) { + struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), + bdev); struct ttm_resource_manager *man = ttm_manager_type(bo->bdev, bo->resource->mem_type); struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); + unsigned long ccs_pages = 0; enum ttm_caching caching; struct i915_ttm_tt *i915_tt; int ret; @@ -293,7 +315,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, i915_tt->is_shmem = true; } - ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0); + if (HAS_FLAT_CCS(i915) && i915_gem_object_needs_ccs_pages(obj)) + ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size, + NUM_BYTES_PER_CCS_BYTE), + PAGE_SIZE); + + ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, ccs_pages); if (ret) goto err_free; @@ -773,6 +800,7 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj, i915_sg_dma_sizes(rsgt->table.sgl)); } + GEM_BUG_ON(bo->ttm && ((obj->base.size >> PAGE_SHIFT) < bo->ttm->num_pages)); i915_ttm_adjust_lru(obj); return ret; } From patchwork Tue Apr 5 14:34:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramalingam C X-Patchwork-Id: 12801635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4E33C433EF for ; Tue, 5 Apr 2022 14:34:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E512810E8B6; Tue, 5 Apr 2022 14:34:26 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2057B10E8EC; Tue, 5 Apr 2022 14:34:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649169264; x=1680705264; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=GfQ96kOJnsUHcBLMfXYuSuNXdDpqwm8PzatzaVOc7+c=; b=aqzFvH6NFYY60ixtCFmpPmjf2AgMoENJR2As+Fu6X9HpNUUEzZWQNxtu zVsnV7s9+GnbwYiRCYLSppYIRRdpPh/8ospUVA8sM2q0B4knpgfwtUen6 l4TZssO/eNfTcMH2iQk6y9761969tWvqX27yLJoN7BGMn9Quw2DJ0ERZ3 K4ygPlcMOyJnpgSRKwq8MbRjwC/5NKkFR7ZOkw5uoAytPXpNkR7WDHoGK 7+hKDffaSPHSP3WLVeSBV5iK3aZQ4d0SWTlK7h4l0gcMExmmPOZilDtUe NjQdbiI1NrQW7vP4BHA9oFNS8X6H/OO8nh+Cl4GobvlWFSYoXF0L9gyX8 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10307"; a="285722222" X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="285722222" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:23 -0700 X-IronPort-AV: E=Sophos;i="5.90,236,1643702400"; d="scan'208";a="657959043" Received: from ramaling-i9x.iind.intel.com ([10.203.144.108]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 07:34:22 -0700 From: Ramalingam C To: intel-gfx , dri-devel Date: Tue, 5 Apr 2022 20:04:54 +0530 Message-Id: <20220405143454.16358-10-ramalingam.c@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20220405143454.16358-1-ramalingam.c@intel.com> References: <20220405143454.16358-1-ramalingam.c@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v8 9/9] drm/i915/migrate: Evict and restore the flatccs capable lmem obj X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When we are swapping out the local memory obj on flat-ccs capable platform, we need to capture the ccs data too along with main meory and we need to restore it when we are swapping in the content. When lmem object is swapped into a smem obj, smem obj will have the extra pages required to hold the ccs data corresponding to the lmem main memory. So main memory of lmem will be copied into the initial pages of the smem and then ccs data corresponding to the main memory will be copied to the subsequent pages of smem. ccs data is 1/256 of lmem size. Swapin happens exactly in reverse order. First main memory of lmem is restored from the smem's initial pages and the ccs data will be restored from the subsequent pages of smem. Extracting and restoring the CCS data is done through a special cmd called XY_CTRL_SURF_COPY_BLT v2: Fixing the ccs handling v3: Handle the ccs data at same loop as main memory [Thomas] v4: changes for emit_copy_ccs v5: handle non-flat-ccs scenario Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gt/intel_migrate.c | 164 +++++++++++++++++++++++- 1 file changed, 160 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 2446ff70ce45..dd7b89589f20 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -633,6 +633,65 @@ static int emit_copy(struct i915_request *rq, return 0; } +static int scatter_list_length(struct scatterlist *sg) +{ + int len = 0; + + while (sg && sg_dma_len(sg)) { + len += sg_dma_len(sg); + sg = sg_next(sg); + }; + + return len; +} + +static void +calculate_chunk_sz(struct drm_i915_private *i915, bool src_is_lmem, + int *src_sz, int *ccs_sz, u32 bytes_to_cpy, + u32 ccs_bytes_to_cpy) +{ + if (ccs_bytes_to_cpy) { + /* + * We can only copy the ccs data corresponding to + * the CHUNK_SZ of lmem which is + * GET_CCS_BYTES(i915, CHUNK_SZ)) + */ + *ccs_sz = min_t(int, ccs_bytes_to_cpy, GET_CCS_BYTES(i915, CHUNK_SZ)); + + if (!src_is_lmem) + /* + * When CHUNK_SZ is passed all the pages upto CHUNK_SZ + * will be taken for the blt. in Flat-ccs supported + * platform Smem obj will have more pages than required + * for main meory hence limit it to the required size + * for main memory + */ + *src_sz = min_t(int, bytes_to_cpy, CHUNK_SZ); + } else { /* ccs handling is not required */ + *src_sz = CHUNK_SZ; + } +} + +static void get_ccs_sg_sgt(struct sgt_dma *it, u32 bytes_to_cpy) +{ + u32 len; + + do { + GEM_BUG_ON(!it->sg || !sg_dma_len(it->sg)); + len = it->max - it->dma; + if (len > bytes_to_cpy) { + it->dma += bytes_to_cpy; + break; + } + + bytes_to_cpy -= len; + + it->sg = __sg_next(it->sg); + it->dma = sg_dma_address(it->sg); + it->max = it->dma + sg_dma_len(it->sg); + } while (bytes_to_cpy); +} + int intel_context_migrate_copy(struct intel_context *ce, const struct i915_deps *deps, @@ -644,9 +703,15 @@ intel_context_migrate_copy(struct intel_context *ce, bool dst_is_lmem, struct i915_request **out) { - struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); + struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs; + struct drm_i915_private *i915 = ce->engine->i915; + u32 ccs_bytes_to_cpy = 0, bytes_to_cpy; + enum i915_cache_level ccs_cache_level; + int src_sz, dst_sz, ccs_sz; u32 src_offset, dst_offset; + u8 src_access, dst_access; struct i915_request *rq; + bool ccs_is_src; int err; GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); @@ -655,6 +720,38 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); + src_sz = scatter_list_length(src); + bytes_to_cpy = src_sz; + + if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) { + src_access = !src_is_lmem && dst_is_lmem; + dst_access = !src_access; + + dst_sz = scatter_list_length(dst); + if (src_is_lmem) { + it_ccs = it_dst; + ccs_cache_level = dst_cache_level; + ccs_is_src = false; + } else if (dst_is_lmem) { + bytes_to_cpy = dst_sz; + it_ccs = it_src; + ccs_cache_level = src_cache_level; + ccs_is_src = true; + } + + /* + * When there is a eviction of ccs needed smem will have the + * extra pages for the ccs data + * + * TO-DO: Want to move the size mismatch check to a WARN_ON, + * but still we have some requests of smem->lmem with same size. + * Need to fix it. + */ + ccs_bytes_to_cpy = src_sz != dst_sz ? GET_CCS_BYTES(i915, bytes_to_cpy) : 0; + if (ccs_bytes_to_cpy) + get_ccs_sg_sgt(&it_ccs, bytes_to_cpy); + } + src_offset = 0; dst_offset = CHUNK_SZ; if (HAS_64K_PAGES(ce->engine->i915)) { @@ -694,8 +791,11 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; + calculate_chunk_sz(i915, src_is_lmem, &src_sz, &ccs_sz, + bytes_to_cpy, ccs_bytes_to_cpy); + len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, - src_offset, CHUNK_SZ); + src_offset, src_sz); if (!len) err = -EINVAL; if (len < 0) { @@ -716,7 +816,46 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; - err = emit_copy(rq, dst_offset, src_offset, len); + err = emit_copy(rq, dst_offset, src_offset, len); + if (err) + goto out_rq; + + bytes_to_cpy -= len; + + if (ccs_bytes_to_cpy) { + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + err = emit_pte(rq, &it_ccs, ccs_cache_level, false, + ccs_is_src ? src_offset : dst_offset, + ccs_sz); + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + /* + * Using max of src_sz and dst_sz, as we need to + * pass the lmem size corresponding to the ccs + * blocks we need to handle. + */ + ccs_sz = max_t(int, ccs_is_src ? ccs_sz : src_sz, + ccs_is_src ? dst_sz : ccs_sz); + + err = emit_copy_ccs(rq, dst_offset, dst_access, + src_offset, src_access, ccs_sz); + if (err) + goto out_rq; + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + /* Converting back to ccs bytes */ + ccs_sz = GET_CCS_BYTES(rq->engine->i915, ccs_sz); + ccs_bytes_to_cpy -= ccs_sz; + } /* Arbitration is re-enabled between requests. */ out_rq: @@ -724,9 +863,26 @@ intel_context_migrate_copy(struct intel_context *ce, i915_request_put(*out); *out = i915_request_get(rq); i915_request_add(rq); - if (err || !it_src.sg || !sg_dma_len(it_src.sg)) + + if (err) break; + if (!bytes_to_cpy && !ccs_bytes_to_cpy) { + if (src_is_lmem) + WARN_ON(it_src.sg && sg_dma_len(it_src.sg)); + else + WARN_ON(it_dst.sg && sg_dma_len(it_dst.sg)); + break; + } + + if (WARN_ON(!it_src.sg || !sg_dma_len(it_src.sg) || + !it_dst.sg || !sg_dma_len(it_dst.sg) || + (ccs_bytes_to_cpy && (!it_ccs.sg || + !sg_dma_len(it_ccs.sg))))) { + err = -EINVAL; + break; + } + cond_resched(); } while (1);