From patchwork Thu Jul 4 08:18:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13723413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84EFDC3271F for ; Thu, 4 Jul 2024 08:33:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 086FB10EA26; Thu, 4 Jul 2024 08:33:48 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Iiqlde8a"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by gabe.freedesktop.org (Postfix) with ESMTPS id AECE310EA22; Thu, 4 Jul 2024 08:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720082026; x=1751618026; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=HtVUBkgUvp3WV2KMzg4FrE3KfdlNxhr8bIx1dnxf1Fc=; b=Iiqlde8axxvK5mxt1GApZzBIA+IwULAL4+EwCqUTFcjhSjflehuxoznM FTwgjJL/2iodg7WUvSH0xC2UCoFc5Js1C2Tm7VomFbpnehry4RkYzDXza CjOwfE5BXIj5/y96GgAJgGF9Re4NEgZslRmewUwUG9I+iO15c3ol0VzLX 7dQ4pp/D0YaQ/Op/Y/rGLHHw8IftP4B3RqEdIsqOlF5X08LEtWmgS/kKp UMj2kxrcNAb8tmnBt+UA5GiDZBaY5Bb7llC238HcLfxQF4sGc/TeMXx28 Wm3Oly7L8LccU/Cz9Qevm2hCXDcTzF6CwXOta6d7hwAefxzL2+Nv04ZY2 g==; X-CSE-ConnectionGUID: xxR2k9UhTNWF08zQWUWwRg== X-CSE-MsgGUID: 7+XTKxvDQKabPIB1lLZuJg== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="17297710" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="17297710" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:46 -0700 X-CSE-ConnectionGUID: xXd7pXknRw6bNC6Y49Gnnw== X-CSE-MsgGUID: +1WiYQhLTyGXsvtiO3qqdg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="51703491" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:45 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , Matthew Auld , =?utf-8?q?Thomas_Hellstr=C3=B6m?= , =?utf-8?q?Christian_K=C3=B6nig?= Subject: [PATCH v5 1/4] drm/ttm: Add a flag to allow drivers to skip clear-on-free Date: Thu, 4 Jul 2024 10:18:38 +0200 Message-ID: <20240704081841.30212-1-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add TTM_TT_FLAG_CLEARED_ON_FREE, which DRM drivers can set before releasing backing stores if they want to skip clear-on-free. Cc: Matthew Auld Cc: Thomas Hellström Suggested-by: Christian König Signed-off-by: Nirmoy Das Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_pool.c | 18 +++++++++++------- include/drm/ttm/ttm_tt.h | 6 +++++- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 6e1fd6985ffc..b78ee7524bcf 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -222,15 +222,18 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr, } /* Give pages into a specific pool_type */ -static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p) +static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p, + bool cleared) { unsigned int i, num_pages = 1 << pt->order; - for (i = 0; i < num_pages; ++i) { - if (PageHighMem(p)) - clear_highpage(p + i); - else - clear_page(page_address(p + i)); + if (!cleared) { + for (i = 0; i < num_pages; ++i) { + if (PageHighMem(p)) + clear_highpage(p + i); + else + clear_page(page_address(p + i)); + } } spin_lock(&pt->lock); @@ -394,6 +397,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, pgoff_t start_page, pgoff_t end_page) { struct page **pages = &tt->pages[start_page]; + bool cleared = tt->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE; unsigned int order; pgoff_t i, nr; @@ -407,7 +411,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, pt = ttm_pool_select_type(pool, caching, order); if (pt) - ttm_pool_type_give(pt, *pages); + ttm_pool_type_give(pt, *pages, cleared); else ttm_pool_free_page(pool, caching, order, *pages); } diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index 2b9d856ff388..cfaf49de2419 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -85,6 +85,9 @@ struct ttm_tt { * fault handling abuses the DMA api a bit and dma_map_attrs can't be * used to assure pgprot always matches. * + * TTM_TT_FLAG_CLEARED_ON_FREE: Set this if a drm driver handles + * clearing backing store + * * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is * set by TTM after ttm_tt_populate() has successfully returned, and is * then unset when TTM calls ttm_tt_unpopulate(). @@ -94,8 +97,9 @@ struct ttm_tt { #define TTM_TT_FLAG_EXTERNAL BIT(2) #define TTM_TT_FLAG_EXTERNAL_MAPPABLE BIT(3) #define TTM_TT_FLAG_DECRYPTED BIT(4) +#define TTM_TT_FLAG_CLEARED_ON_FREE BIT(5) -#define TTM_TT_FLAG_PRIV_POPULATED BIT(5) +#define TTM_TT_FLAG_PRIV_POPULATED BIT(6) uint32_t page_flags; /** @num_pages: Number of pages in the page array. */ uint32_t num_pages; From patchwork Thu Jul 4 08:18:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13723414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8180FC3065C for ; Thu, 4 Jul 2024 08:33:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0614710EA27; Thu, 4 Jul 2024 08:33:50 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="D9o/5XX0"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by gabe.freedesktop.org (Postfix) with ESMTPS id AA8CC10EA27; Thu, 4 Jul 2024 08:33:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720082028; x=1751618028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dGm9vKUI+AA0jMZcq0Yihe9syAFf+8ClkNDzf9dy2co=; b=D9o/5XX0H1eyz+Pm5y1InbYp89t3FhKLAlECqCiSStbkim8pST+HiaFo 9ZNkSww9zWC4TyUycLCqImV/p/xM/n2L1Y710wie9qA/x3l1uU+vE3HVw ghT79m55Bvh39+WxgqVT8gsDaJDOFe121DKApRZ0cSj1MLQCBZgoWsPgx bchk6X5Es80YzQaC8eMQj/afFtTNLIZ0xueLoGJ9LFR8tbGYQ4jGPkgGF 1w73Ocp9P6q5DJxO80RWnDAcJmXlefRSUGt3W2+6a/zLbyT3NYxhFKQxR NN15nlB1kF7LfZEno3O913jT+LaBR1L6vPSChi/sHR1cERT6FwzJVkLhN w==; X-CSE-ConnectionGUID: fqJ8/Hv3SFGXi2TMpyZadw== X-CSE-MsgGUID: u949d9NJT2Oz7lc03XEYUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="17297716" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="17297716" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:48 -0700 X-CSE-ConnectionGUID: iniw8qDlSzGJ5ogEqdmGAg== X-CSE-MsgGUID: 4+ytLhETSLKGBTA3IdjyVw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="51703496" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:47 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , Himal Prasad Ghimiray , Matthew Auld , =?utf-8?q?Thomas_Hellstr=C3=B6m?= Subject: [PATCH v5 2/4] drm/xe/migrate: Parameterize ccs and bo data clear in xe_migrate_clear() Date: Thu, 4 Jul 2024 10:18:39 +0200 Message-ID: <20240704081841.30212-2-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240704081841.30212-1-nirmoy.das@intel.com> References: <20240704081841.30212-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Parameterize clearing ccs and bo data in xe_migrate_clear() which higher layers can utilize. This patch will be used later on when doing bo data clear for igfx as well. Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/tests/xe_bo.c | 3 ++- drivers/gpu/drm/xe/tests/xe_migrate.c | 6 +++--- drivers/gpu/drm/xe/xe_bo.c | 11 +++++++++-- drivers/gpu/drm/xe/xe_migrate.c | 23 +++++++++++++++-------- drivers/gpu/drm/xe/xe_migrate.h | 4 +++- 5 files changed, 32 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c index 9f3c02826464..aea9b64fe04a 100644 --- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -36,7 +36,8 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo, /* Optionally clear bo *and* CCS data in VRAM. */ if (clear) { - fence = xe_migrate_clear(tile->migrate, bo, bo->ttm.resource); + fence = xe_migrate_clear(tile->migrate, bo, bo->ttm.resource, + true, true); if (IS_ERR(fence)) { KUNIT_FAIL(test, "Failed to submit bo clear.\n"); return PTR_ERR(fence); diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c index 962f6438e219..ef2dc34e8297 100644 --- a/drivers/gpu/drm/xe/tests/xe_migrate.c +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c @@ -105,7 +105,7 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo, } xe_map_memset(xe, &remote->vmap, 0, 0xd0, remote->size); - fence = xe_migrate_clear(m, remote, remote->ttm.resource); + fence = xe_migrate_clear(m, remote, remote->ttm.resource, true, true); if (!sanity_fence_failed(xe, fence, big ? "Clearing remote big bo" : "Clearing remote small bo", test)) { retval = xe_map_rd(xe, &remote->vmap, 0, u64); @@ -279,7 +279,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) kunit_info(test, "Clearing small buffer object\n"); xe_map_memset(xe, &tiny->vmap, 0, 0x22, tiny->size); expected = 0; - fence = xe_migrate_clear(m, tiny, tiny->ttm.resource); + fence = xe_migrate_clear(m, tiny, tiny->ttm.resource, true, true); if (sanity_fence_failed(xe, fence, "Clearing small bo", test)) goto out; @@ -300,7 +300,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) kunit_info(test, "Clearing big buffer object\n"); xe_map_memset(xe, &big->vmap, 0, 0x11, big->size); expected = 0; - fence = xe_migrate_clear(m, big, big->ttm.resource); + fence = xe_migrate_clear(m, big, big->ttm.resource, true, true); if (sanity_fence_failed(xe, fence, "Clearing big bo", test)) goto out; diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 65c696966e96..4d6315d2ae9a 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -650,6 +650,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, bool needs_clear; bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) && ttm && ttm_tt_is_populated(ttm)) ? true : false; + int ret = 0; /* Bo creation path, moving to system or TT. */ @@ -784,8 +785,14 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, } } } else { - if (move_lacks_source) - fence = xe_migrate_clear(migrate, bo, new_mem); + if (move_lacks_source) { + bool clear_ccs = mem_type_is_vram(new_mem->mem_type) || + handle_system_ccs; + bool clear_bo_data = mem_type_is_vram(new_mem->mem_type); + + fence = xe_migrate_clear(migrate, bo, new_mem, + clear_bo_data, clear_ccs); + } else fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem, handle_system_ccs); diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index c9f5673353ee..e0a3f6921572 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -986,9 +986,12 @@ static void emit_clear(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs, * @m: The migration context. * @bo: The buffer object @dst is currently bound to. * @dst: The dst TTM resource to be cleared. + * @clear_bo_data: clear bo data + * @clear_ccs: clear ccs metadata * - * Clear the contents of @dst to zero. On flat CCS devices, - * the CCS metadata is cleared to zero as well on VRAM destinations. + * Clear the contents of @dst to zero when @clear_bo_data is set. + * On flat CCS devices, the CCS metadata is cleared to zero with @clear_ccs. + * Set both, @clear_bo_data and @clear_ccs to clear bo as well as CCS metadata * TODO: Eliminate the @bo argument. * * Return: Pointer to a dma_fence representing the last clear batch, or @@ -997,18 +1000,22 @@ static void emit_clear(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs, */ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, struct xe_bo *bo, - struct ttm_resource *dst) + struct ttm_resource *dst, + bool clear_bo_data, + bool clear_ccs) { bool clear_vram = mem_type_is_vram(dst->mem_type); struct xe_gt *gt = m->tile->primary_gt; struct xe_device *xe = gt_to_xe(gt); - bool clear_system_ccs = (xe_bo_needs_ccs_pages(bo) && !IS_DGFX(xe)) ? true : false; struct dma_fence *fence = NULL; u64 size = bo->size; struct xe_res_cursor src_it; struct ttm_resource *src = dst; int err; + if (WARN_ON(!clear_bo_data && !clear_ccs)) + return NULL; + if (!clear_vram) xe_res_first_sg(xe_bo_sg(bo), 0, bo->size, &src_it); else @@ -1032,7 +1039,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, batch_size = 2 + pte_update_size(m, clear_vram, src, &src_it, &clear_L0, &clear_L0_ofs, &clear_L0_pt, - clear_system_ccs ? 0 : emit_clear_cmd_len(gt), 0, + clear_bo_data ? emit_clear_cmd_len(gt) : 0, 0, avail_pts); if (xe_device_has_flat_ccs(xe)) @@ -1054,13 +1061,13 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it)) xe_res_next(&src_it, clear_L0); else - emit_pte(m, bb, clear_L0_pt, clear_vram, clear_system_ccs, + emit_pte(m, bb, clear_L0_pt, clear_vram, clear_ccs, &src_it, clear_L0, dst); bb->cs[bb->len++] = MI_BATCH_BUFFER_END; update_idx = bb->len; - if (!clear_system_ccs) + if (clear_bo_data) emit_clear(gt, bb, clear_L0_ofs, clear_L0, XE_PAGE_SIZE, clear_vram); if (xe_device_has_flat_ccs(xe)) { @@ -1119,7 +1126,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, return ERR_PTR(err); } - if (clear_system_ccs) + if (clear_ccs) bo->ccs_cleared = true; return fence; diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h index 951f19318ea4..33306cb98dc8 100644 --- a/drivers/gpu/drm/xe/xe_migrate.h +++ b/drivers/gpu/drm/xe/xe_migrate.h @@ -90,7 +90,9 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, struct dma_fence *xe_migrate_clear(struct xe_migrate *m, struct xe_bo *bo, - struct ttm_resource *dst); + struct ttm_resource *dst, + bool clear_bo_data, + bool clear_ccs); struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m); From patchwork Thu Jul 4 08:18:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13723415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 74EF7C3271F for ; Thu, 4 Jul 2024 08:33:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E96EA10EA2C; Thu, 4 Jul 2024 08:33:51 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="DNzfHG48"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by gabe.freedesktop.org (Postfix) with ESMTPS id B451110EA28; Thu, 4 Jul 2024 08:33:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720082030; x=1751618030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9Thy6zaUqG5YAAIY5KSmHUzdZXZb8avjUYaTT0fJ550=; b=DNzfHG48zN7gV/wxbg3iVZjirA5fdB0j8aUVvHOUzy1rYswf0SqVpQTe ETeUkVxqg8AbD8KWn40z8lM1vkHv11GISfk1iyQU5ArrO+G47ubs4OT8D 635GlX/vmizRR/M6eaYRt9G1VemGQ0BuicgymozbObsWWkvrAiofkmmyx jAiMcK7M1s3hQ//c7+B03v92l4MX3lMTceNuC432BGfCRJi0El9pBI8PI w82luksMR4+NdcZGyLrXUnskFZZERfYbIYMD8bLXxSWF2pg1KR/hH1RB7 m+DG1xJ5gZdVn/KNhf5fhw4U8qUyvMiOKSnexvWKSyS5qZoyoYkgzpcpD A==; X-CSE-ConnectionGUID: i0VhDVvERaSi64uO7W2Mvw== X-CSE-MsgGUID: IsSel1ITQw+Tmes5Ye4foA== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="17297725" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="17297725" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:50 -0700 X-CSE-ConnectionGUID: vIKCSuXxSM+CsW5S6X/qqg== X-CSE-MsgGUID: B3ErJcOoStqlyaFXNy3thw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="51703502" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:49 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , Himal Prasad Ghimiray , Matthew Auld , =?utf-8?q?Thomas_Hellstr=C3=B6m?= Subject: [PATCH v5 3/4] drm/xe/migrate: Clear CCS when clearing bo on xe2 Date: Thu, 4 Jul 2024 10:18:40 +0200 Message-ID: <20240704081841.30212-3-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240704081841.30212-1-nirmoy.das@intel.com> References: <20240704081841.30212-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Clearing bo with uncompress PTE will trigger a CCS clearing as well for XE2, so skip emit_copy_ccs() when on xe2 when clearing bo. v2: When clearing BO, CCS clear happens with all command as long as PTEs are uncompress. Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/xe_migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index e0a3f6921572..cc8beed2bf8e 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -1061,7 +1061,8 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it)) xe_res_next(&src_it, clear_L0); else - emit_pte(m, bb, clear_L0_pt, clear_vram, clear_ccs, + /* Use uncompressed pte so clear happens in the real memory. */ + emit_pte(m, bb, clear_L0_pt, clear_vram, false, &src_it, clear_L0, dst); bb->cs[bb->len++] = MI_BATCH_BUFFER_END; @@ -1070,7 +1071,9 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, if (clear_bo_data) emit_clear(gt, bb, clear_L0_ofs, clear_L0, XE_PAGE_SIZE, clear_vram); - if (xe_device_has_flat_ccs(xe)) { + /* Clearing BO with uncompress PTE will clear CCS metadata as well on XE2 */ + if (xe_device_has_flat_ccs(xe) && clear_ccs && + !(clear_bo_data && GRAPHICS_VERx100(gt_to_xe(gt)) >= 2000)) { emit_copy_ccs(gt, bb, clear_L0_ofs, true, m->cleared_mem_ofs, false, clear_L0); flush_flags = MI_FLUSH_DW_CCS; From patchwork Thu Jul 4 08:18:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13723416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8C8E6C3065C for ; Thu, 4 Jul 2024 08:33:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0D88710EA2D; Thu, 4 Jul 2024 08:33:54 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="fhrCRZMt"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by gabe.freedesktop.org (Postfix) with ESMTPS id C91E310EA2D; Thu, 4 Jul 2024 08:33:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720082033; x=1751618033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Zq0mpzGbiJ/JBZO5dgERZF1nVtrxEMJW0+1srAx+c2E=; b=fhrCRZMtqQfkDWv3ItcSLnIoCmxau+S4YpM63UkvwXR/t9/e8pf9wRuK jJFe1jqHI4b3uTjad7unp4HIr+BAPagFpBq1MptJ8rYKqZkzCl3y2WDMe FfRl1E0fMMYSqmmUXAi5rRzn4HrB2ciGOHkNKqUhW8uBxl9Y4PXy+U8fH uE/Stb1RlZKSaykVyHA/ft3/vKjlpBybl5bR+QrRZXE8zBetHv2/JuQi2 C0HPlWKOOChrXuaanI063e9NEr1g6o1qWGt9lNym7comjH29WHrW5dU0m 777hihjrKBIdV0ICoQCvHGCpviuExPVk6DFbx+VN2LRNMRAsqhSXWOdJ+ Q==; X-CSE-ConnectionGUID: v1SyfNcGRkmKzBG4QTz5og== X-CSE-MsgGUID: qfXti3dATnC5aSXakz0Z6Q== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="17297734" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="17297734" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:52 -0700 X-CSE-ConnectionGUID: dp+b0IwQQa+wvERYHp4Nfw== X-CSE-MsgGUID: jxt+vII7Q42LRuDKcRZ60w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="51703507" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2024 01:33:51 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , Himal Prasad Ghimiray , Matthew Auld , =?utf-8?q?Thomas_Hellstr=C3=B6m?= Subject: [PATCH v5 4/4] drm/xe/lnl: Offload system clear page activity to GPU Date: Thu, 4 Jul 2024 10:18:41 +0200 Message-ID: <20240704081841.30212-4-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240704081841.30212-1-nirmoy.das@intel.com> References: <20240704081841.30212-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On LNL because of flat CCS, driver creates a migrate job to clear CCS meta data. Extend that to also clear system pages using GPU. Inform TTM to allocate pages without __GFP_ZERO to avoid double page clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clearn-on-free as XE now takes care of clearing pages. If a bo is in system placement and there is a cpu map then for such BO gpu clear will be avoided as there is no dma mapping for such BO. To test the patch, created a small test that tries to submit a job after binding various sizes of buffer which shows good gains for larger buffer. For lower buffer sizes, the result is not very reliable as the results vary a lot. With the patch sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719237905 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 9493 us Time taken for size SZ_2M: 5503 us Time taken for size SZ_64M: 13016 us Time taken for size SZ_128M: 29464 us Time taken for size SZ_256M: 38408 us Time taken for size SZ_1G: 148758 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 3889 us Time taken for size SZ_2M: 6091 us Time taken for size SZ_64M: 20920 us Time taken for size SZ_128M: 32394 us Time taken for size SZ_256M: 61710 us Time taken for size SZ_1G: 215437 us Subtest basic-store-benchmark: SUCCESS (0.589s) With the patch: sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719238062 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 11803 us Time taken for size SZ_2M: 4237 us Time taken for size SZ_64M: 8649 us Time taken for size SZ_128M: 14682 us Time taken for size SZ_256M: 22156 us Time taken for size SZ_1G: 74457 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 5129 us Time taken for size SZ_2M: 12563 us Time taken for size SZ_64M: 14860 us Time taken for size SZ_128M: 26064 us Time taken for size SZ_256M: 47167 us Time taken for size SZ_1G: 170304 us Subtest basic-store-benchmark: SUCCESS (0.417s) With the patch and init_on_alloc=0 sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719238219 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 4803 us Time taken for size SZ_2M: 9212 us Time taken for size SZ_64M: 9643 us Time taken for size SZ_128M: 13479 us Time taken for size SZ_256M: 22429 us Time taken for size SZ_1G: 83110 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 4003 us Time taken for size SZ_2M: 4443 us Time taken for size SZ_64M: 12960 us Time taken for size SZ_128M: 13741 us Time taken for size SZ_256M: 26841 us Time taken for size SZ_1G: 84746 us Subtest basic-store-benchmark: SUCCESS (0.290s) v2: Handle regression on dgfx(Himal) Update commit message as no ttm API changes needed. v3: Fix Kunit test. v4: handle data leak on cpu mmap(Thomas) Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/xe_bo.c | 25 ++++++++++++++++++++++++- drivers/gpu/drm/xe/xe_device.c | 7 +++++++ drivers/gpu/drm/xe/xe_device_types.h | 2 ++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 4d6315d2ae9a..b76a44fcf3b1 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -387,6 +387,13 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, caching = ttm_uncached; } + /* If the device can support gpu clear pages then set proper ttm + * flag. Zeroed pages are only required for ttm_bo_type_device so + * unwanted data is leaked to userspace. + */ + if (ttm_bo->type == ttm_bo_type_device && xe->mem.gpu_page_clear) + page_flags |= TTM_TT_FLAG_CLEARED_ON_FREE; + err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching, extra_pages); if (err) { kfree(tt); @@ -408,6 +415,10 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt, if (tt->page_flags & TTM_TT_FLAG_EXTERNAL) return 0; + /* Clear TTM_TT_FLAG_ZERO_ALLOC when GPU is set to clear pages */ + if (tt->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE) + tt->page_flags &= ~TTM_TT_FLAG_ZERO_ALLOC; + err = ttm_pool_alloc(&ttm_dev->pool, tt, ctx); if (err) return err; @@ -653,6 +664,14 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, int ret = 0; + /* + * Clear TTM_TT_FLAG_CLEARED_ON_FREE on bo creation path when + * moving to system as the bo doesn't dma_mapping. + */ + if (!old_mem && ttm && !ttm_tt_is_populated(ttm)) { + ttm->page_flags &= ~TTM_TT_FLAG_CLEARED_ON_FREE; + } + /* Bo creation path, moving to system or TT. */ if ((!old_mem && ttm) && !handle_system_ccs) { if (new_mem->mem_type == XE_PL_TT) @@ -676,7 +695,8 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, (!mem_type_is_vram(old_mem_type) && !tt_has_data); needs_clear = (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC) || - (!ttm && ttm_bo->type == ttm_bo_type_device); + (!ttm && ttm_bo->type == ttm_bo_type_device) || + (ttm && ttm->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE); if (new_mem->mem_type == XE_PL_TT) { ret = xe_tt_map_sg(ttm); @@ -790,6 +810,9 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, handle_system_ccs; bool clear_bo_data = mem_type_is_vram(new_mem->mem_type); + if (ttm && (ttm->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE)) + clear_bo_data |= true; + fence = xe_migrate_clear(migrate, bo, new_mem, clear_bo_data, clear_ccs); } diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 03492fbcb8fb..7c682a53f06e 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -636,6 +636,13 @@ int xe_device_probe(struct xe_device *xe) if (err) goto err; + /** + * On iGFX device with flat CCS, we clear CCS metadata, let's extend that + * and use GPU to clear pages as well. + */ + if (xe_device_has_flat_ccs(xe) && !IS_DGFX(xe)) + xe->mem.gpu_page_clear = true; + err = xe_vram_probe(xe); if (err) goto err; diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index 3bca6d344744..28eaf2ab1f25 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -325,6 +325,8 @@ struct xe_device { struct xe_mem_region vram; /** @mem.sys_mgr: system TTM manager */ struct ttm_resource_manager sys_mgr; + /** @gpu_page_clear: clear pages offloaded to GPU */ + bool gpu_page_clear; } mem; /** @sriov: device level virtualization data */