From patchwork Wed Aug 21 09:50:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13771200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D6FA2C5320E for ; Wed, 21 Aug 2024 10:18:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5DA2F10E8BD; Wed, 21 Aug 2024 10:18:57 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="eG50+pai"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by gabe.freedesktop.org (Postfix) with ESMTPS id 60A7610E8BC; Wed, 21 Aug 2024 10:18:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724235536; x=1755771536; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=WwZ8Wf+/9YgnIUDxCb37b9Jxq5QSJt4H6stId2C1728=; b=eG50+paiAGfsS3d+B3qzjCj/EDysaiA0WXxV7R/ZWJ83BcXiLvBJRUpr pCclLjEOvfxwnrofKbDIf/8EpsyygsZkAp7SGa19GhQKZ1kErM9FDqgwp BFn9T5P1zl/etJFd5FtiX6v9Ncv5wpSmR25FkkuDyUL/u5xHVXotOEWE8 SSPjvjeGbGBjqejJ+HK3Gwg6xvrBqhQpeOIueqHEP1rKBrmfJFZkYbjcZ M8VPhLuMbIUa3SubWMsBVyOnUTj2r2wa4S3/6Di3qX+cE/5tGsfM+0GuR xpOjeoGS+esoFupOy6c/Kl6qRT48vH3CZd5yQtA4vcmHcZo1zN2ZbNQgg Q==; X-CSE-ConnectionGUID: yzA5wa9ZSeCZo9+wWlg+kg== X-CSE-MsgGUID: 55jCrQ82RVGBXX4lh/7wmQ== X-IronPort-AV: E=McAfee;i="6700,10204,11170"; a="22720555" X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="22720555" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2024 03:18:56 -0700 X-CSE-ConnectionGUID: i3qr9jujTMOgstkcmqXdqg== X-CSE-MsgGUID: TRuOvZDHRbqJchuHNyB85Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="65880954" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2024 03:18:54 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , =?utf-8?q?Christian_K=C3=B6nig?= , Himal Prasad Ghimiray , Matthew Auld , Matthew Brost , =?utf-8?q?Thomas_Hellstr=C3=B6m?= Subject: [PATCH 1/2] drm/xe/lnl: Only do gpu sys page clear for non-pooled BOs Date: Wed, 21 Aug 2024 11:50:34 +0200 Message-ID: <20240821095035.29083-1-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Currently XE lacks clean-on-free implementation so using TTM_TT_FLAG_CLEARED_ON_FREE is invalid. Remove usage of TTM_TT_FLAG_CLEARED_ON_FREE and limit gpu system page clearing only for WB cached BOs which are not pooled so there is no need to return a zeroed pages to a pool. Without the patch: api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation: UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us Perf tool top 5 entries: 11.16% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page 7.85% api_overhead_be [kernel.kallsyms] [k] cpa_flush 7.59% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res 7.24% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 5.53% api_overhead_be [kernel.kallsyms] [k] lookup_address_in_pgd_attr With the patch: UsmMemoryAllocation(api=l0 type=Host size=4KB) 78.164 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 98880.39 us Perf tool top 5 entries: 25.40% api_overhead_be [kernel.kallsyms] [k] clear_page_erms 9.89% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 4.64% api_overhead_be [kernel.kallsyms] [k] cpa_flush 4.04% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res 3.96% api_overhead_be [kernel.kallsyms] [k] mod_find This is still better than the base case where there was no page clearing offloading. Cc: Christian König Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: Matthew Brost Cc: Thomas Hellström Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/xe_bo.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 6ed0e1955215..a18408d5d185 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -283,6 +283,7 @@ struct xe_ttm_tt { struct device *dev; struct sg_table sgt; struct sg_table *sg; + bool clear_system_pages; }; static int xe_tt_map_sg(struct ttm_tt *tt) @@ -397,12 +398,17 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, } /* - * If the device can support gpu clear system pages then set proper ttm + * If the device can support gpu clear system pages then set proper * flag. Zeroed pages are only required for ttm_bo_type_device so * unwanted data is not leaked to userspace. + * + * XE currently does clear-on-alloc so gpu clear will only work on + * non-pooled BO, DRM_XE_GEM_CPU_CACHING_WB otherwise global pool will + * get poisoned ono-zeroed pages. */ - if (ttm_bo->type == ttm_bo_type_device && xe->mem.gpu_page_clear_sys) - page_flags |= TTM_TT_FLAG_CLEARED_ON_FREE; + if (ttm_bo->type == ttm_bo_type_device && xe->mem.gpu_page_clear_sys && + bo->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB) + tt->clear_system_pages = true; err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching, extra_pages); if (err) { @@ -416,8 +422,11 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt, struct ttm_operation_ctx *ctx) { + struct xe_ttm_tt *xe_tt; int err; + xe_tt = container_of(tt, struct xe_ttm_tt, ttm); + /* * dma-bufs are not populated with pages, and the dma- * addresses are set up when moved to XE_PL_TT. @@ -426,7 +435,7 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt, return 0; /* Clear TTM_TT_FLAG_ZERO_ALLOC when GPU is set to clear system pages */ - if (tt->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE) + if (xe_tt->clear_system_pages) tt->page_flags &= ~TTM_TT_FLAG_ZERO_ALLOC; err = ttm_pool_alloc(&ttm_dev->pool, tt, ctx); @@ -664,6 +673,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, struct ttm_resource *old_mem = ttm_bo->resource; u32 old_mem_type = old_mem ? old_mem->mem_type : XE_PL_SYSTEM; struct ttm_tt *ttm = ttm_bo->ttm; + struct xe_ttm_tt *xe_tt = container_of(ttm, struct xe_ttm_tt, ttm); struct xe_migrate *migrate = NULL; struct dma_fence *fence; bool move_lacks_source; @@ -671,15 +681,16 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, bool needs_clear; bool handle_system_ccs = (!IS_DGFX(xe) && xe_bo_needs_ccs_pages(bo) && ttm && ttm_tt_is_populated(ttm)) ? true : false; - bool clear_system_pages; + bool clear_system_pages = false; int ret = 0; /* * Clear TTM_TT_FLAG_CLEARED_ON_FREE on bo creation path when * moving to system as the bo doesn't have dma_mapping. */ - if (!old_mem && ttm && !ttm_tt_is_populated(ttm)) - ttm->page_flags &= ~TTM_TT_FLAG_CLEARED_ON_FREE; + if (!old_mem && ttm && !ttm_tt_is_populated(ttm) && + xe_tt->clear_system_pages) + xe_tt->clear_system_pages = false; /* Bo creation path, moving to system or TT. */ if ((!old_mem && ttm) && !handle_system_ccs) { @@ -703,7 +714,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, move_lacks_source = handle_system_ccs ? (!bo->ccs_cleared) : (!mem_type_is_vram(old_mem_type) && !tt_has_data); - clear_system_pages = ttm && (ttm->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE); + clear_system_pages = ttm && xe_tt->clear_system_pages; needs_clear = (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC) || (!ttm && ttm_bo->type == ttm_bo_type_device) || clear_system_pages; From patchwork Wed Aug 21 09:50:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13771201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E576CC5320E for ; Wed, 21 Aug 2024 10:19:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6598710E8BF; Wed, 21 Aug 2024 10:19:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="SONdqYpU"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8311310E8BE; Wed, 21 Aug 2024 10:18:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724235538; x=1755771538; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=676QHHPYwHAmDeDNfPzlPnq60iduL1BnfCns1TM0QqM=; b=SONdqYpUUa5Ncsy+5AP1Rd5rHGXgge02/a0oWjutx09tNBUvkgORjeCe 8+xXCpEvg4AK9WRSAY0KDn3qnS+TpSnhbLHMVNVDIIcY1O0780wHtlPE5 u6zirEma3jjDDcvrcHPYNkMEi12QBUfjgL6efZybhhyIRtseWLnkFrpbM IhtqZ8/A+lXCcTn0jA3/hi5vsAlKCy2TT2F5bjOPSJ357YCUyqCWtMSCv q+gBGNR6OgyjIO8MOMXtD4CiB7KTAlXOSQSCOUDhuKdhkuKU97k1cXObN gWhkLQMySDj59XOrM8QaDig4zK6UxMefGNj9w9y9Bu0EotCM9UCdZaMdz g==; X-CSE-ConnectionGUID: 6oTKOBA4SUa2PBLB7KjrLA== X-CSE-MsgGUID: 3r47IToLRAyrq0d/pNm2uA== X-IronPort-AV: E=McAfee;i="6700,10204,11170"; a="22720564" X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="22720564" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2024 03:18:58 -0700 X-CSE-ConnectionGUID: KIyD/EjkSZSK/9UVHA6zBA== X-CSE-MsgGUID: 8PKEwgnNTvOUNg+gxObOww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="65880963" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2024 03:18:57 -0700 From: Nirmoy Das To: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, Nirmoy Das , =?utf-8?q?Christian_K=C3=B6nig?= , Himal Prasad Ghimiray , Matthew Auld , Matthew Brost , =?utf-8?q?Thomas_Hellstr=C3=B6m?= Subject: [PATCH 2/2] Revert "drm/ttm: Add a flag to allow drivers to skip clear-on-free" Date: Wed, 21 Aug 2024 11:50:35 +0200 Message-ID: <20240821095035.29083-2-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240821095035.29083-1-nirmoy.das@intel.com> References: <20240821095035.29083-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Remove TTM_TT_FLAG_CLEARED_ON_FREE now that XE stopped using this flag. This reverts commit decbfaf06db05fa1f9b33149ebb3c145b44e878f. Cc: Christian König Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: Matthew Brost Cc: Thomas Hellström Signed-off-by: Nirmoy Das Reviewed-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_pool.c | 18 +++++++----------- include/drm/ttm/ttm_tt.h | 6 +----- 2 files changed, 8 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 935ab3cfd046..8504dbe19c1a 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -222,18 +222,15 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_addr_t dma_addr, } /* Give pages into a specific pool_type */ -static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p, - bool cleared) +static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p) { unsigned int i, num_pages = 1 << pt->order; - if (!cleared) { - for (i = 0; i < num_pages; ++i) { - if (PageHighMem(p)) - clear_highpage(p + i); - else - clear_page(page_address(p + i)); - } + for (i = 0; i < num_pages; ++i) { + if (PageHighMem(p)) + clear_highpage(p + i); + else + clear_page(page_address(p + i)); } spin_lock(&pt->lock); @@ -397,7 +394,6 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, pgoff_t start_page, pgoff_t end_page) { struct page **pages = &tt->pages[start_page]; - bool cleared = tt->page_flags & TTM_TT_FLAG_CLEARED_ON_FREE; unsigned int order; pgoff_t i, nr; @@ -411,7 +407,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, pt = ttm_pool_select_type(pool, caching, order); if (pt) - ttm_pool_type_give(pt, *pages, cleared); + ttm_pool_type_give(pt, *pages); else ttm_pool_free_page(pool, caching, order, *pages); } diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index cfaf49de2419..2b9d856ff388 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -85,9 +85,6 @@ struct ttm_tt { * fault handling abuses the DMA api a bit and dma_map_attrs can't be * used to assure pgprot always matches. * - * TTM_TT_FLAG_CLEARED_ON_FREE: Set this if a drm driver handles - * clearing backing store - * * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is * set by TTM after ttm_tt_populate() has successfully returned, and is * then unset when TTM calls ttm_tt_unpopulate(). @@ -97,9 +94,8 @@ struct ttm_tt { #define TTM_TT_FLAG_EXTERNAL BIT(2) #define TTM_TT_FLAG_EXTERNAL_MAPPABLE BIT(3) #define TTM_TT_FLAG_DECRYPTED BIT(4) -#define TTM_TT_FLAG_CLEARED_ON_FREE BIT(5) -#define TTM_TT_FLAG_PRIV_POPULATED BIT(6) +#define TTM_TT_FLAG_PRIV_POPULATED BIT(5) uint32_t page_flags; /** @num_pages: Number of pages in the page array. */ uint32_t num_pages;