From patchwork Wed Jun 1 15:07:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89CE9CCA477 for ; Wed, 1 Jun 2022 15:07:56 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 43CE610EB70; Wed, 1 Jun 2022 15:07:53 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5DCAD10EA63; Wed, 1 Jun 2022 15:07:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096071; x=1685632071; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XSlMKMDhjLRlb/Sl+xgxEo8Fhha9Q8xO/z6MnD4PHmY=; b=ltFnD/FdXY8o9i24qvWL1Vkj2UVm7BU3JLx5/VhlOl9WRu3gnU+YI8pb 4j1j2lcIbZRWCzwz7/k1Pqzl/x0SJO6MXlyKNzfLk8zO+C6aKUBIROykS /IEwgDgR/oDNPz0R7KFaksoaLCgxdAR99ylNLuTXPoQ1Q21NwQfoKSbb7 jJEyW2gBfKpC5s1b8ULh2Wase+Va3uwlpFqwULqTq2sLIaI/cwfqM5THM YmEpRgaUcgJWYjBU6xMNuwj/sdlyI0hWQ67HXb16YBXS7bIOsEIxurXWs wwGfXLae+JcMIjoa11ww8KC1YiqwJ8UDxBGEEtZ+Z2FAw2+R3ux2BPg8E w==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="338665525" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="338665525" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465443" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:30 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 1/6] drm/i915/xehp: Use separate sseu init function Date: Wed, 1 Jun 2022 08:07:20 -0700 Message-Id: <20220601150725.521468-2-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Xe_HP has enough fundamental differences from previous platforms that it makes sense to use a separate SSEU init function to keep things straightforward and easy to understand. We'll also add a has_xehp_dss flag to the SSEU structure that will be used by other upcoming changes. v2: - Add has_xehp_dss flag Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_sseu.c | 86 ++++++++++++++++------------ drivers/gpu/drm/i915/gt/intel_sseu.h | 5 ++ 2 files changed, 54 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index fdd25691beda..b5fd479a7b85 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -169,13 +169,43 @@ static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u8 s_en, sseu->eu_total = compute_eu_total(sseu); } -static void gen12_sseu_info_init(struct intel_gt *gt) +static void xehp_sseu_info_init(struct intel_gt *gt) { struct sseu_dev_info *sseu = >->info.sseu; struct intel_uncore *uncore = gt->uncore; u32 g_dss_en, c_dss_en = 0; u16 eu_en = 0; u8 eu_en_fuse; + int eu; + + /* + * The concept of slice has been removed in Xe_HP. To be compatible + * with prior generations, assume a single slice across the entire + * device. Then calculate out the DSS for each workload type within + * that software slice. + */ + intel_sseu_set_info(sseu, 1, 32, 16); + sseu->has_xehp_dss = 1; + + g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE); + c_dss_en = intel_uncore_read(uncore, GEN12_GT_COMPUTE_DSS_ENABLE); + + eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK; + + for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++) + if (eu_en_fuse & BIT(eu)) + eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); + + gen11_compute_sseu_info(sseu, 0x1, g_dss_en, c_dss_en, eu_en); +} + +static void gen12_sseu_info_init(struct intel_gt *gt) +{ + struct sseu_dev_info *sseu = >->info.sseu; + struct intel_uncore *uncore = gt->uncore; + u32 g_dss_en; + u16 eu_en = 0; + u8 eu_en_fuse; u8 s_en; int eu; @@ -183,43 +213,23 @@ static void gen12_sseu_info_init(struct intel_gt *gt) * Gen12 has Dual-Subslices, which behave similarly to 2 gen11 SS. * Instead of splitting these, provide userspace with an array * of DSS to more closely represent the hardware resource. - * - * In addition, the concept of slice has been removed in Xe_HP. - * To be compatible with prior generations, assume a single slice - * across the entire device. Then calculate out the DSS for each - * workload type within that software slice. */ - if (IS_DG2(gt->i915) || IS_XEHPSDV(gt->i915)) - intel_sseu_set_info(sseu, 1, 32, 16); - else - intel_sseu_set_info(sseu, 1, 6, 16); + intel_sseu_set_info(sseu, 1, 6, 16); - /* - * As mentioned above, Xe_HP does not have the concept of a slice. - * Enable one for software backwards compatibility. - */ - if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) - s_en = 0x1; - else - s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & - GEN11_GT_S_ENA_MASK; + s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & + GEN11_GT_S_ENA_MASK; g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE); - if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) - c_dss_en = intel_uncore_read(uncore, GEN12_GT_COMPUTE_DSS_ENABLE); /* one bit per pair of EUs */ - if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) - eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK; - else - eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & - GEN11_EU_DIS_MASK); + eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & + GEN11_EU_DIS_MASK); for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); - gen11_compute_sseu_info(sseu, s_en, g_dss_en, c_dss_en, eu_en); + gen11_compute_sseu_info(sseu, s_en, g_dss_en, 0, eu_en); /* TGL only supports slice-level power gating */ sseu->has_slice_pg = 1; @@ -574,18 +584,20 @@ void intel_sseu_info_init(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; - if (IS_HASWELL(i915)) - hsw_sseu_info_init(gt); - else if (IS_CHERRYVIEW(i915)) - cherryview_sseu_info_init(gt); - else if (IS_BROADWELL(i915)) - bdw_sseu_info_init(gt); - else if (GRAPHICS_VER(i915) == 9) - gen9_sseu_info_init(gt); - else if (GRAPHICS_VER(i915) == 11) - gen11_sseu_info_init(gt); + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + xehp_sseu_info_init(gt); else if (GRAPHICS_VER(i915) >= 12) gen12_sseu_info_init(gt); + else if (GRAPHICS_VER(i915) >= 11) + gen11_sseu_info_init(gt); + else if (GRAPHICS_VER(i915) >= 9) + gen9_sseu_info_init(gt); + else if (IS_BROADWELL(i915)) + bdw_sseu_info_init(gt); + else if (IS_CHERRYVIEW(i915)) + cherryview_sseu_info_init(gt); + else if (IS_HASWELL(i915)) + hsw_sseu_info_init(gt); } u32 intel_sseu_make_rpcs(struct intel_gt *gt, diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index 5c078df4729c..4a041f9dc490 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -66,6 +66,11 @@ struct sseu_dev_info { u8 has_slice_pg:1; u8 has_subslice_pg:1; u8 has_eu_pg:1; + /* + * For Xe_HP and beyond, the hardware no longer has traditional slices + * so we just report the entire DSS pool under a fake "slice 0." + */ + u8 has_xehp_dss:1; /* Topology fields */ u8 max_slices; From patchwork Wed Jun 1 15:07:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A209CCA473 for ; Wed, 1 Jun 2022 15:08:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9B83110EAC8; Wed, 1 Jun 2022 15:07:54 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id A986F10EABD; Wed, 1 Jun 2022 15:07:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096071; x=1685632071; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LTzsqvf5hnmf7u6q16OrYfGngypycfElxXQU2HmIV8k=; b=Oup8/cUJtb0wXh1rS311R4DkjRsx5C+wfqLdiCAlD4tbWYv01F/n18Xs UtsdjKgdWOTai1MNVlNmhx+9CCMGSXUOvNQVvAFCJJwQAD/5sU89vjKBH cMNKNKtzQzkBSU+6Pj9NPAOtuLe0xKk+WDEW5U4Y4jYFrdGdV7NZ40g6a MHo4ZTHkDHMpLSmD0uClmSo8Kw8atobqaQ77rSWeVbU4Xv+R8BpJrMF5V wKjEzqnHce/TWkHtylbnnLEELAqRWA6PDqAwr3S8/URnB0IXgn6ZJuOK3 JQUuDFBlyvFUOWQbdTUVPIMlQSpbfu+9gbYm8/wloMQOjF/UQ1RX1OjXY g==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="338665530" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="338665530" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465445" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:30 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 2/6] drm/i915/xehp: Drop GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK Date: Wed, 1 Jun 2022 08:07:21 -0700 Message-Id: <20220601150725.521468-3-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Slice/subslice/EU information should be obtained via the topology queries provided by the I915_QUERY interface; let's turn off support for the old GETPARAM lookups on Xe_HP and beyond where we can't return meaningful values. The slice mask lookup is meaningless since Xe_HP doesn't support traditional slices (and we make no attempt to return the various new units like gslices, cslices, mslices, etc.) here. The subslice mask lookup is even more problematic; given the distinct masks for geometry vs compute purposes, the combined mask returned here is likely not what userspace would want to act upon anyway. The value is also limited to 32-bits by the nature of the GETPARAM ioctl which is sufficient for the initial Xe_HP platforms, but is unable to convey the larger masks that will be needed on other upcoming platforms. Finally, the value returned here becomes even less meaningful when used on multi-tile platforms where each tile will have its own masks. Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin Acked-by: Lionel Landwerlin # mesa --- drivers/gpu/drm/i915/i915_getparam.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index c12a0adefda5..ac9767c56619 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -148,11 +148,19 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, value = intel_engines_has_context_isolation(i915); break; case I915_PARAM_SLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + value = sseu->slice_mask; if (!value) return -ENODEV; break; case I915_PARAM_SUBSLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + /* Only copy bits from the first slice */ memcpy(&value, sseu->subslice_mask, min(sseu->ss_stride, (u8)sizeof(value))); From patchwork Wed Jun 1 15:07:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91260C433EF for ; Wed, 1 Jun 2022 15:08:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 09B8D10ED58; Wed, 1 Jun 2022 15:08:00 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id C6B4810EABD; Wed, 1 Jun 2022 15:07:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096072; x=1685632072; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=smFkVac93Nmbl1pQN5zv2H81XDgVi58ovZ6VqIVhs4I=; b=Drhi4/Cy//gdMdnzQHDEPHbqmCFGgwQbtD5S6SfIwPquVbIOPFRnzZdA bAtj1GdsPYE0od3tThYBWAUxfolKwCq9bXFhRJ9VfG1ZNZkufLKhVZm2O cT2cojtVZyYECAxoI7im7d2poXPgUFXlwNvig2mEWh4cY6HFQgcIrxIAF JCMtFAO7PQTumGi34rEXw/Gc9S86Rn8aYLYk+N+i2sldl9gUnwv34BZln OXA/stSQkUWfgL0GaR94IkFmMgE9As5JFScROsedBMkAFbwD0Vok2hxNV RSrmKc2jL9nQGI9VsRlHRmR879YftGN2SG7ahFtIQNKd8r32U58FXdzYs Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="338665532" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="338665532" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465448" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:30 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 3/6] drm/i915/sseu: Simplify gen11+ SSEU handling Date: Wed, 1 Jun 2022 08:07:22 -0700 Message-Id: <20220601150725.521468-4-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Although gen11 and gen12 architectures supported the concept of multiple slices, in practice all the platforms that were actually designed only had a single slice (i.e., note the parameters to 'intel_sseu_set_info' that we pass for each platform). We can simplify the code slightly by dropping the multi-slice logic from gen11+ platforms. v2: - Promote drm_dbg to drm_WARN_ON if the slice fuse register reports unexpected fusing. (Tvrtko) Cc: Tvrtko Ursulin Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_sseu.c | 76 +++++++++++++--------------- 1 file changed, 36 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index b5fd479a7b85..7e5222ad2f98 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -119,52 +119,37 @@ static u16 compute_eu_total(const struct sseu_dev_info *sseu) return total; } -static u32 get_ss_stride_mask(struct sseu_dev_info *sseu, u8 s, u32 ss_en) -{ - u32 ss_mask; - - ss_mask = ss_en >> (s * sseu->max_subslices); - ss_mask &= GENMASK(sseu->max_subslices - 1, 0); - - return ss_mask; -} - -static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u8 s_en, +static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u32 g_ss_en, u32 c_ss_en, u16 eu_en) { - int s, ss; + u32 valid_ss_mask = GENMASK(sseu->max_subslices - 1, 0); + int ss; /* g_ss_en/c_ss_en represent entire subslice mask across all slices */ GEM_BUG_ON(sseu->max_slices * sseu->max_subslices > sizeof(g_ss_en) * BITS_PER_BYTE); - for (s = 0; s < sseu->max_slices; s++) { - if ((s_en & BIT(s)) == 0) - continue; + sseu->slice_mask |= BIT(0); - sseu->slice_mask |= BIT(s); - - /* - * XeHP introduces the concept of compute vs geometry DSS. To - * reduce variation between GENs around subslice usage, store a - * mask for both the geometry and compute enabled masks since - * userspace will need to be able to query these masks - * independently. Also compute a total enabled subslice count - * for the purposes of selecting subslices to use in a - * particular GEM context. - */ - intel_sseu_set_subslices(sseu, s, sseu->compute_subslice_mask, - get_ss_stride_mask(sseu, s, c_ss_en)); - intel_sseu_set_subslices(sseu, s, sseu->geometry_subslice_mask, - get_ss_stride_mask(sseu, s, g_ss_en)); - intel_sseu_set_subslices(sseu, s, sseu->subslice_mask, - get_ss_stride_mask(sseu, s, - g_ss_en | c_ss_en)); + /* + * XeHP introduces the concept of compute vs geometry DSS. To reduce + * variation between GENs around subslice usage, store a mask for both + * the geometry and compute enabled masks since userspace will need to + * be able to query these masks independently. Also compute a total + * enabled subslice count for the purposes of selecting subslices to + * use in a particular GEM context. + */ + intel_sseu_set_subslices(sseu, 0, sseu->compute_subslice_mask, + c_ss_en & valid_ss_mask); + intel_sseu_set_subslices(sseu, 0, sseu->geometry_subslice_mask, + g_ss_en & valid_ss_mask); + intel_sseu_set_subslices(sseu, 0, sseu->subslice_mask, + (g_ss_en | c_ss_en) & valid_ss_mask); + + for (ss = 0; ss < sseu->max_subslices; ss++) + if (intel_sseu_has_subslice(sseu, 0, ss)) + sseu_set_eus(sseu, 0, ss, eu_en); - for (ss = 0; ss < sseu->max_subslices; ss++) - if (intel_sseu_has_subslice(sseu, s, ss)) - sseu_set_eus(sseu, s, ss, eu_en); - } sseu->eu_per_subslice = hweight16(eu_en); sseu->eu_total = compute_eu_total(sseu); } @@ -196,7 +181,7 @@ static void xehp_sseu_info_init(struct intel_gt *gt) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); - gen11_compute_sseu_info(sseu, 0x1, g_dss_en, c_dss_en, eu_en); + gen11_compute_sseu_info(sseu, g_dss_en, c_dss_en, eu_en); } static void gen12_sseu_info_init(struct intel_gt *gt) @@ -216,8 +201,13 @@ static void gen12_sseu_info_init(struct intel_gt *gt) */ intel_sseu_set_info(sseu, 1, 6, 16); + /* + * Although gen12 architecture supported multiple slices, TGL, RKL, + * DG1, and ADL only had a single slice. + */ s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & GEN11_GT_S_ENA_MASK; + drm_WARN_ON(>->i915->drm, s_en != 0x1); g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE); @@ -229,7 +219,7 @@ static void gen12_sseu_info_init(struct intel_gt *gt) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); - gen11_compute_sseu_info(sseu, s_en, g_dss_en, 0, eu_en); + gen11_compute_sseu_info(sseu, g_dss_en, 0, eu_en); /* TGL only supports slice-level power gating */ sseu->has_slice_pg = 1; @@ -248,14 +238,20 @@ static void gen11_sseu_info_init(struct intel_gt *gt) else intel_sseu_set_info(sseu, 1, 8, 8); + /* + * Although gen11 architecture supported multiple slices, ICL and + * EHL/JSL only had a single slice in practice. + */ s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & GEN11_GT_S_ENA_MASK; + drm_WARN_ON(>->i915->drm, s_en != 0x1); + ss_en = ~intel_uncore_read(uncore, GEN11_GT_SUBSLICE_DISABLE); eu_en = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & GEN11_EU_DIS_MASK); - gen11_compute_sseu_info(sseu, s_en, ss_en, 0, eu_en); + gen11_compute_sseu_info(sseu, ss_en, 0, eu_en); /* ICL has no power gating restrictions. */ sseu->has_slice_pg = 1; From patchwork Wed Jun 1 15:07:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 576E8CCA478 for ; Wed, 1 Jun 2022 15:08:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 57E2E10EAA4; Wed, 1 Jun 2022 15:07:54 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2D65010EAC8; Wed, 1 Jun 2022 15:07:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096072; x=1685632072; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+5NiJhvS1gJ6TCiPgl8byC9n1HP5VEWC+EQIClcJnHM=; b=nGGXdhvEgAQqM0js+C7RLSrwdCtp3m1TM/5MM7R9UH2nw3FRZ4v6X6Jr 72fyzZEn9zBFOc74dE+10ghIzNzevDiQvfomtbcw8O6DB16WbNDwxTqGK ZAS35N+UIjIFGejJcdNk/Wpm2UYpK0D8zS0LT6EDPcnCXdiIxpQEFIKVx AjUBC7yskv0ZULzrMYHxf7G2a5EOXAZ1h8qa+/SLY2L/nHuAp4y2SpXga QVOIwkwgjJMaogAbsFp7FMS05GcPis4eH3uh36mUkzJsjyQR0Njx1SBLJ c2H3xqS1jX/C1XVF9WGwW8NEr6Wp54cM1we1HVraIqPLIp+ju6ZDxvk6s A==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="338665534" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="338665534" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:32 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465451" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 4/6] drm/i915/sseu: Don't try to store EU mask internally in UAPI format Date: Wed, 1 Jun 2022 08:07:23 -0700 Message-Id: <20220601150725.521468-5-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Storing the EU mask internally in the same format the I915_QUERY topology queries use makes the final copy_to_user() a bit simpler, but makes the rest of the driver's SSEU more complicated and harder to follow. Let's switch to an internal representation that's more natural: Xe_HP platforms will be a simple array of u16 masks, whereas pre-Xe_HP platforms will be a two-dimensional array, indexed by [slice][subslice]. We'll convert to the uapi format only when the query uapi is called. v2: - Drop has_common_ss_eumask. We waste some space repeating identical EU masks for every single DSS, but the code is simpler without it. (Tvrtko) v3: - Mask down EUs passed to sseu_set_eus at the callsite rather than inside the function. (Tvrtko) - Eliminate sseu->eu_stride and calculate it when needed. (Tvrtko) Cc: Tvrtko Ursulin Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_sseu.c | 88 ++++++++++++++++++---------- drivers/gpu/drm/i915/gt/intel_sseu.h | 10 +++- drivers/gpu/drm/i915/i915_query.c | 13 ++-- 3 files changed, 73 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index 7e5222ad2f98..285cfd758bdc 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -19,8 +19,6 @@ void intel_sseu_set_info(struct sseu_dev_info *sseu, u8 max_slices, sseu->ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices); GEM_BUG_ON(sseu->ss_stride > GEN_MAX_SUBSLICE_STRIDE); - sseu->eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice); - GEM_BUG_ON(sseu->eu_stride > GEN_MAX_EU_STRIDE); } unsigned int @@ -78,47 +76,77 @@ intel_sseu_subslices_per_slice(const struct sseu_dev_info *sseu, u8 slice) return hweight32(intel_sseu_get_subslices(sseu, slice)); } -static int sseu_eu_idx(const struct sseu_dev_info *sseu, int slice, - int subslice) -{ - int slice_stride = sseu->max_subslices * sseu->eu_stride; - - return slice * slice_stride + subslice * sseu->eu_stride; -} - static u16 sseu_get_eus(const struct sseu_dev_info *sseu, int slice, int subslice) { - int i, offset = sseu_eu_idx(sseu, slice, subslice); - u16 eu_mask = 0; - - for (i = 0; i < sseu->eu_stride; i++) - eu_mask |= - ((u16)sseu->eu_mask[offset + i]) << (i * BITS_PER_BYTE); - - return eu_mask; + if (sseu->has_xehp_dss) { + WARN_ON(slice > 0); + return sseu->eu_mask.xehp[subslice]; + } else { + return sseu->eu_mask.hsw[slice][subslice]; + } } static void sseu_set_eus(struct sseu_dev_info *sseu, int slice, int subslice, u16 eu_mask) { - int i, offset = sseu_eu_idx(sseu, slice, subslice); - - for (i = 0; i < sseu->eu_stride; i++) - sseu->eu_mask[offset + i] = - (eu_mask >> (BITS_PER_BYTE * i)) & 0xff; + GEM_WARN_ON(eu_mask && __fls(eu_mask) >= sseu->max_eus_per_subslice); + if (sseu->has_xehp_dss) { + GEM_WARN_ON(slice > 0); + sseu->eu_mask.xehp[subslice] = eu_mask; + } else { + sseu->eu_mask.hsw[slice][subslice] = eu_mask; + } } static u16 compute_eu_total(const struct sseu_dev_info *sseu) { - u16 i, total = 0; + int s, ss, total = 0; - for (i = 0; i < ARRAY_SIZE(sseu->eu_mask); i++) - total += hweight8(sseu->eu_mask[i]); + for (s = 0; s < sseu->max_slices; s++) + for (ss = 0; ss < sseu->max_subslices; ss++) + if (sseu->has_xehp_dss) + total += hweight16(sseu->eu_mask.xehp[ss]); + else + total += hweight16(sseu->eu_mask.hsw[s][ss]); return total; } +/** + * intel_sseu_copy_eumask_to_user - Copy EU mask into a userspace buffer + * @to: Pointer to userspace buffer to copy to + * @sseu: SSEU structure containing EU mask to copy + * + * Copies the EU mask to a userspace buffer in the format expected by + * the query ioctl's topology queries. + * + * Returns the result of the copy_to_user() operation. + */ +int intel_sseu_copy_eumask_to_user(void __user *to, + const struct sseu_dev_info *sseu) +{ + u8 eu_mask[GEN_SS_MASK_SIZE * GEN_MAX_EU_STRIDE] = {}; + int eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice); + int len = sseu->max_slices * sseu->max_subslices * eu_stride; + int s, ss, i; + + for (s = 0; s < sseu->max_slices; s++) { + for (ss = 0; ss < sseu->max_subslices; ss++) { + int uapi_offset = + s * sseu->max_subslices * eu_stride + + ss * eu_stride; + u16 mask = sseu_get_eus(sseu, s, ss); + + for (i = 0; i < eu_stride; i++) + eu_mask[uapi_offset + i] = + (mask >> (BITS_PER_BYTE * i)) & 0xff; + } + } + + return copy_to_user(to, eu_mask, len); +} + static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u32 g_ss_en, u32 c_ss_en, u16 eu_en) { @@ -278,7 +306,7 @@ static void cherryview_sseu_info_init(struct intel_gt *gt) CHV_FGT_EU_DIS_SS0_R1_SHIFT) << 4); subslice_mask |= BIT(0); - sseu_set_eus(sseu, 0, 0, ~disabled_mask); + sseu_set_eus(sseu, 0, 0, ~disabled_mask & 0xFF); } if (!(fuse & CHV_FGT_DISABLE_SS1)) { @@ -289,7 +317,7 @@ static void cherryview_sseu_info_init(struct intel_gt *gt) CHV_FGT_EU_DIS_SS1_R1_SHIFT) << 4); subslice_mask |= BIT(1); - sseu_set_eus(sseu, 0, 1, ~disabled_mask); + sseu_set_eus(sseu, 0, 1, ~disabled_mask & 0xFF); } intel_sseu_set_subslices(sseu, 0, sseu->subslice_mask, subslice_mask); @@ -362,7 +390,7 @@ static void gen9_sseu_info_init(struct intel_gt *gt) eu_disabled_mask = (eu_disable >> (ss * 8)) & eu_mask; - sseu_set_eus(sseu, s, ss, ~eu_disabled_mask); + sseu_set_eus(sseu, s, ss, ~eu_disabled_mask & eu_mask); eu_per_ss = sseu->max_eus_per_subslice - hweight8(eu_disabled_mask); @@ -475,7 +503,7 @@ static void bdw_sseu_info_init(struct intel_gt *gt) eu_disabled_mask = eu_disable[s] >> (ss * sseu->max_eus_per_subslice); - sseu_set_eus(sseu, s, ss, ~eu_disabled_mask); + sseu_set_eus(sseu, s, ss, ~eu_disabled_mask & 0xFF); n_disabled = hweight8(eu_disabled_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index 4a041f9dc490..ffa375e68959 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -57,7 +57,11 @@ struct sseu_dev_info { u8 subslice_mask[GEN_SS_MASK_SIZE]; u8 geometry_subslice_mask[GEN_SS_MASK_SIZE]; u8 compute_subslice_mask[GEN_SS_MASK_SIZE]; - u8 eu_mask[GEN_SS_MASK_SIZE * GEN_MAX_EU_STRIDE]; + union { + u16 hsw[GEN_MAX_HSW_SLICES][GEN_MAX_SS_PER_HSW_SLICE]; + u16 xehp[GEN_MAX_DSS]; + } eu_mask; + u16 eu_total; u8 eu_per_subslice; u8 min_eu_in_pool; @@ -78,7 +82,6 @@ struct sseu_dev_info { u8 max_eus_per_subslice; u8 ss_stride; - u8 eu_stride; }; /* @@ -150,4 +153,7 @@ void intel_sseu_print_topology(struct drm_i915_private *i915, u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice); +int intel_sseu_copy_eumask_to_user(void __user *to, + const struct sseu_dev_info *sseu); + #endif /* __INTEL_SSEU_H__ */ diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 7584cec53d5d..89c475d525b8 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -35,6 +35,7 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, { struct drm_i915_query_topology_info topo; u32 slice_length, subslice_length, eu_length, total_length; + int eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice); int ret; BUILD_BUG_ON(sizeof(u8) != sizeof(sseu->slice_mask)); @@ -44,7 +45,7 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, slice_length = sizeof(sseu->slice_mask); subslice_length = sseu->max_slices * sseu->ss_stride; - eu_length = sseu->max_slices * sseu->max_subslices * sseu->eu_stride; + eu_length = sseu->max_slices * sseu->max_subslices * eu_stride; total_length = sizeof(topo) + slice_length + subslice_length + eu_length; @@ -61,7 +62,7 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, topo.subslice_offset = slice_length; topo.subslice_stride = sseu->ss_stride; topo.eu_offset = slice_length + subslice_length; - topo.eu_stride = sseu->eu_stride; + topo.eu_stride = eu_stride; if (copy_to_user(u64_to_user_ptr(query_item->data_ptr), &topo, sizeof(topo))) @@ -76,10 +77,10 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, subslice_mask, subslice_length)) return -EFAULT; - if (copy_to_user(u64_to_user_ptr(query_item->data_ptr + - sizeof(topo) + - slice_length + subslice_length), - sseu->eu_mask, eu_length)) + if (intel_sseu_copy_eumask_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(topo) + + slice_length + subslice_length), + sseu)) return -EFAULT; return total_length; From patchwork Wed Jun 1 15:07:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 557BBCCA477 for ; Wed, 1 Jun 2022 15:08:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EABCD10ECD7; Wed, 1 Jun 2022 15:07:57 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 65B0810EAA4; Wed, 1 Jun 2022 15:07:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096073; x=1685632073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DGM9Hl1DOCQvvDdWBF3WWWftRA03SUFGaIERhhVe0Zs=; b=Mf07zHMj/JTgNRveUnfzMjdwEUXn4h5/5eYeOHGIhaBuskSask95JwOf nheXCnYLm8VETm8NUj3XSHaFCm/gWzPAiLTA6tfH8GuKifzASQ+OTxCdD tydnnbmBDRBWJhn/5POhc5IF+WyJGIvN44fb417tI4p/RE4pnb3meTLnv 14CNgSakLeR66K8kEpauFNb/PXUK9l2Vijj8cyVSYGnkyH1rhRvxa42G1 M1FyWnyANHsfyjunDSDA2qsmo4MvEszm6HHYrRFoDIONScPIuFPj6RBN/ ARlHyLxdZFa5xppeFfEEDHkU2+9omjsp9uybyJHNWNQj2z04LYdJ8g0mj w==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="338665536" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="338665536" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:32 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465456" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 5/6] drm/i915/sseu: Disassociate internal subslice mask representation from uapi Date: Wed, 1 Jun 2022 08:07:24 -0700 Message-Id: <20220601150725.521468-6-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" As with EU masks, it's easier to store subslice/DSS masks internally in a format that's more natural for the driver to work with, and then only covert into the u8[] uapi form when the query ioctl is invoked. Since the hardware design changed significantly with Xe_HP, we'll use a union to choose between the old "hsw-style" subslice masks or the newer xehp mask. HSW-style masks will be stored in an array of u8's, indexed by slice (there's never more than 6 subslices per slice on older platforms). For Xe_HP and beyond where slices no longer exist, we only need a single bitmask. However we already know that this mask is eventually going to grow too large for a simple u64 to hold, so we'll represent it in a manner that can be operated on by the utilities in linux/bitmap.h. v2: - Fix typo: BIT(s) -> BIT(ss) in gen9_sseu_device_status() v3: - Eliminate sseu->ss_stride and just calculate the stride while specifically handling uapi. (Tvrtko) - Use BITMAP_BITS() macro to refer to size of masks rather than passing I915_MAX_SS_FUSE_BITS directly. (Tvrtko) - Report compute/geometry DSS masks separately when dumping Xe_HP SSEU info. (Tvrtko) - Restore dropped range checks to intel_sseu_has_subslice(). (Tvrtko) v4: - Make the bitmap size macro check the size of the .xehp field rather than the containing union. (Tvrtko) - Don't add GEM_BUG_ON() intel_sseu_has_subslice()'s check for whether slice or subslice ID exceed sseu->max_[sub]slices; various loops in the driver are expected to exceed these, so we should just silently return 'false.' v5: - Move XEHP_BITMAP_BITS() to the header so that we can also replace a usage of I915_MAX_SS_FUSE_BITS in one of the inline functions. (Bala) - Change the local variable in intel_slicemask_from_xehp_dssmask() from u16 to 'unsigned long' to make it a bit more future-proof. Cc: Tvrtko Ursulin Cc: Balasubramani Vivekanandan Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +- drivers/gpu/drm/i915/gt/intel_gt.c | 12 +- drivers/gpu/drm/i915/gt/intel_sseu.c | 261 +++++++++++-------- drivers/gpu/drm/i915/gt/intel_sseu.h | 79 ++++-- drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c | 30 +-- drivers/gpu/drm/i915/gt/intel_workarounds.c | 24 +- drivers/gpu/drm/i915/i915_getparam.c | 3 +- drivers/gpu/drm/i915/i915_query.c | 13 +- 9 files changed, 243 insertions(+), 188 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index ab4c5ab28e4d..a3bb73f5d53b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1875,6 +1875,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt, { const struct sseu_dev_info *device = >->info.sseu; struct drm_i915_private *i915 = gt->i915; + unsigned int dev_subslice_mask = intel_sseu_get_hsw_subslices(device, 0); /* No zeros in any field. */ if (!user->slice_mask || !user->subslice_mask || @@ -1901,7 +1902,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt, if (user->slice_mask & ~device->slice_mask) return -EINVAL; - if (user->subslice_mask & ~device->subslice_mask[0]) + if (user->subslice_mask & ~dev_subslice_mask) return -EINVAL; if (user->max_eus_per_subslice > device->max_eus_per_subslice) @@ -1915,7 +1916,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt, /* Part specific restrictions. */ if (GRAPHICS_VER(i915) == 11) { unsigned int hw_s = hweight8(device->slice_mask); - unsigned int hw_ss_per_s = hweight8(device->subslice_mask[0]); + unsigned int hw_ss_per_s = hweight8(dev_subslice_mask); unsigned int req_s = hweight8(context->slice_mask); unsigned int req_ss = hweight8(context->subslice_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 1adbf34c3632..f0acf8518a51 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -674,8 +674,8 @@ static void engine_mask_apply_compute_fuses(struct intel_gt *gt) if (GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) return; - ccs_mask = intel_slicemask_from_dssmask(intel_sseu_get_compute_subslices(&info->sseu), - ss_per_ccs); + ccs_mask = intel_slicemask_from_xehp_dssmask(info->sseu.compute_subslice_mask, + ss_per_ccs); /* * If all DSS in a quadrant are fused off, the corresponding CCS * engine is not available for use. diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 0a3931c011c6..ddfb98f70489 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -133,13 +133,6 @@ static const struct intel_mmio_range dg2_lncf_steering_table[] = { {}, }; -static u16 slicemask(struct intel_gt *gt, int count) -{ - u64 dss_mask = intel_sseu_get_subslices(>->info.sseu, 0); - - return intel_slicemask_from_dssmask(dss_mask, count); -} - int intel_gt_init_mmio(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; @@ -155,9 +148,12 @@ int intel_gt_init_mmio(struct intel_gt *gt) */ if (HAS_MSLICES(i915)) { gt->info.mslice_mask = - slicemask(gt, GEN_DSS_PER_MSLICE) | + intel_slicemask_from_xehp_dssmask(gt->info.sseu.subslice_mask, + GEN_DSS_PER_MSLICE); + gt->info.mslice_mask |= (intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) & GEN12_MEML3_EN_MASK); + if (!gt->info.mslice_mask) /* should be impossible! */ drm_warn(&i915->drm, "mslice mask all zero!\n"); } diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index 285cfd758bdc..38069330d828 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -16,9 +16,6 @@ void intel_sseu_set_info(struct sseu_dev_info *sseu, u8 max_slices, sseu->max_slices = max_slices; sseu->max_subslices = max_subslices; sseu->max_eus_per_subslice = max_eus_per_subslice; - - sseu->ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices); - GEM_BUG_ON(sseu->ss_stride > GEN_MAX_SUBSLICE_STRIDE); } unsigned int @@ -26,54 +23,24 @@ intel_sseu_subslice_total(const struct sseu_dev_info *sseu) { unsigned int i, total = 0; - for (i = 0; i < ARRAY_SIZE(sseu->subslice_mask); i++) - total += hweight8(sseu->subslice_mask[i]); - - return total; -} - -static u32 -sseu_get_subslices(const struct sseu_dev_info *sseu, - const u8 *subslice_mask, u8 slice) -{ - int i, offset = slice * sseu->ss_stride; - u32 mask = 0; - - GEM_BUG_ON(slice >= sseu->max_slices); - - for (i = 0; i < sseu->ss_stride; i++) - mask |= (u32)subslice_mask[offset + i] << i * BITS_PER_BYTE; - - return mask; -} - -u32 intel_sseu_get_subslices(const struct sseu_dev_info *sseu, u8 slice) -{ - return sseu_get_subslices(sseu, sseu->subslice_mask, slice); -} + if (sseu->has_xehp_dss) + return bitmap_weight(sseu->subslice_mask.xehp, + XEHP_BITMAP_BITS(sseu->subslice_mask)); -static u32 sseu_get_geometry_subslices(const struct sseu_dev_info *sseu) -{ - return sseu_get_subslices(sseu, sseu->geometry_subslice_mask, 0); -} + for (i = 0; i < ARRAY_SIZE(sseu->subslice_mask.hsw); i++) + total += hweight8(sseu->subslice_mask.hsw[i]); -u32 intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu) -{ - return sseu_get_subslices(sseu, sseu->compute_subslice_mask, 0); -} - -void intel_sseu_set_subslices(struct sseu_dev_info *sseu, int slice, - u8 *subslice_mask, u32 ss_mask) -{ - int offset = slice * sseu->ss_stride; - - memcpy(&subslice_mask[offset], &ss_mask, sseu->ss_stride); + return total; } unsigned int -intel_sseu_subslices_per_slice(const struct sseu_dev_info *sseu, u8 slice) +intel_sseu_get_hsw_subslices(const struct sseu_dev_info *sseu, u8 slice) { - return hweight32(intel_sseu_get_subslices(sseu, slice)); + WARN_ON(sseu->has_xehp_dss); + if (WARN_ON(slice >= sseu->max_slices)) + return 0; + + return sseu->subslice_mask.hsw[slice]; } static u16 sseu_get_eus(const struct sseu_dev_info *sseu, int slice, @@ -147,32 +114,66 @@ int intel_sseu_copy_eumask_to_user(void __user *to, return copy_to_user(to, eu_mask, len); } +/** + * intel_sseu_copy_ssmask_to_user - Copy subslice mask into a userspace buffer + * @to: Pointer to userspace buffer to copy to + * @sseu: SSEU structure containing subslice mask to copy + * + * Copies the subslice mask to a userspace buffer in the format expected by + * the query ioctl's topology queries. + * + * Returns the result of the copy_to_user() operation. + */ +int intel_sseu_copy_ssmask_to_user(void __user *to, + const struct sseu_dev_info *sseu) +{ + u8 ss_mask[GEN_SS_MASK_SIZE] = {}; + int ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices); + int len = sseu->max_slices * ss_stride; + int s, ss, i; + + for (s = 0; s < sseu->max_slices; s++) { + for (ss = 0; ss < sseu->max_subslices; ss++) { + i = s * ss_stride * BITS_PER_BYTE + ss; + + if (!intel_sseu_has_subslice(sseu, s, ss)) + continue; + + ss_mask[i / BITS_PER_BYTE] |= BIT(i % BITS_PER_BYTE); + } + } + + return copy_to_user(to, ss_mask, len); +} + static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, - u32 g_ss_en, u32 c_ss_en, u16 eu_en) + u32 ss_en, u16 eu_en) { u32 valid_ss_mask = GENMASK(sseu->max_subslices - 1, 0); int ss; - /* g_ss_en/c_ss_en represent entire subslice mask across all slices */ - GEM_BUG_ON(sseu->max_slices * sseu->max_subslices > - sizeof(g_ss_en) * BITS_PER_BYTE); + sseu->slice_mask |= BIT(0); + sseu->subslice_mask.hsw[0] = ss_en & valid_ss_mask; + + for (ss = 0; ss < sseu->max_subslices; ss++) + if (intel_sseu_has_subslice(sseu, 0, ss)) + sseu_set_eus(sseu, 0, ss, eu_en); + + sseu->eu_per_subslice = hweight16(eu_en); + sseu->eu_total = compute_eu_total(sseu); +} + +static void xehp_compute_sseu_info(struct sseu_dev_info *sseu, + u16 eu_en) +{ + int ss; sseu->slice_mask |= BIT(0); - /* - * XeHP introduces the concept of compute vs geometry DSS. To reduce - * variation between GENs around subslice usage, store a mask for both - * the geometry and compute enabled masks since userspace will need to - * be able to query these masks independently. Also compute a total - * enabled subslice count for the purposes of selecting subslices to - * use in a particular GEM context. - */ - intel_sseu_set_subslices(sseu, 0, sseu->compute_subslice_mask, - c_ss_en & valid_ss_mask); - intel_sseu_set_subslices(sseu, 0, sseu->geometry_subslice_mask, - g_ss_en & valid_ss_mask); - intel_sseu_set_subslices(sseu, 0, sseu->subslice_mask, - (g_ss_en | c_ss_en) & valid_ss_mask); + bitmap_or(sseu->subslice_mask.xehp, + sseu->compute_subslice_mask.xehp, + sseu->geometry_subslice_mask.xehp, + XEHP_BITMAP_BITS(sseu->subslice_mask)); for (ss = 0; ss < sseu->max_subslices; ss++) if (intel_sseu_has_subslice(sseu, 0, ss)) @@ -182,11 +183,31 @@ static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, sseu->eu_total = compute_eu_total(sseu); } +static void +xehp_load_dss_mask(struct intel_uncore *uncore, + intel_sseu_ss_mask_t *ssmask, + int numregs, + ...) +{ + va_list argp; + u32 fuse_val[I915_MAX_SS_FUSE_REGS] = {}; + int i; + + if (WARN_ON(numregs > I915_MAX_SS_FUSE_REGS)) + numregs = I915_MAX_SS_FUSE_REGS; + + va_start(argp, numregs); + for (i = 0; i < numregs; i++) + fuse_val[i] = intel_uncore_read(uncore, va_arg(argp, i915_reg_t)); + va_end(argp); + + bitmap_from_arr32(ssmask->xehp, fuse_val, numregs * 32); +} + static void xehp_sseu_info_init(struct intel_gt *gt) { struct sseu_dev_info *sseu = >->info.sseu; struct intel_uncore *uncore = gt->uncore; - u32 g_dss_en, c_dss_en = 0; u16 eu_en = 0; u8 eu_en_fuse; int eu; @@ -200,8 +221,10 @@ static void xehp_sseu_info_init(struct intel_gt *gt) intel_sseu_set_info(sseu, 1, 32, 16); sseu->has_xehp_dss = 1; - g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE); - c_dss_en = intel_uncore_read(uncore, GEN12_GT_COMPUTE_DSS_ENABLE); + xehp_load_dss_mask(uncore, &sseu->geometry_subslice_mask, 1, + GEN12_GT_GEOMETRY_DSS_ENABLE); + xehp_load_dss_mask(uncore, &sseu->compute_subslice_mask, 1, + GEN12_GT_COMPUTE_DSS_ENABLE); eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK; @@ -209,7 +232,7 @@ static void xehp_sseu_info_init(struct intel_gt *gt) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); - gen11_compute_sseu_info(sseu, g_dss_en, c_dss_en, eu_en); + xehp_compute_sseu_info(sseu, eu_en); } static void gen12_sseu_info_init(struct intel_gt *gt) @@ -247,7 +270,7 @@ static void gen12_sseu_info_init(struct intel_gt *gt) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); - gen11_compute_sseu_info(sseu, g_dss_en, 0, eu_en); + gen11_compute_sseu_info(sseu, g_dss_en, eu_en); /* TGL only supports slice-level power gating */ sseu->has_slice_pg = 1; @@ -279,7 +302,7 @@ static void gen11_sseu_info_init(struct intel_gt *gt) eu_en = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & GEN11_EU_DIS_MASK); - gen11_compute_sseu_info(sseu, ss_en, 0, eu_en); + gen11_compute_sseu_info(sseu, ss_en, eu_en); /* ICL has no power gating restrictions. */ sseu->has_slice_pg = 1; @@ -291,7 +314,6 @@ static void cherryview_sseu_info_init(struct intel_gt *gt) { struct sseu_dev_info *sseu = >->info.sseu; u32 fuse; - u8 subslice_mask = 0; fuse = intel_uncore_read(gt->uncore, CHV_FUSE_GT); @@ -305,7 +327,7 @@ static void cherryview_sseu_info_init(struct intel_gt *gt) (((fuse & CHV_FGT_EU_DIS_SS0_R1_MASK) >> CHV_FGT_EU_DIS_SS0_R1_SHIFT) << 4); - subslice_mask |= BIT(0); + sseu->subslice_mask.hsw[0] |= BIT(0); sseu_set_eus(sseu, 0, 0, ~disabled_mask & 0xFF); } @@ -316,12 +338,10 @@ static void cherryview_sseu_info_init(struct intel_gt *gt) (((fuse & CHV_FGT_EU_DIS_SS1_R1_MASK) >> CHV_FGT_EU_DIS_SS1_R1_SHIFT) << 4); - subslice_mask |= BIT(1); + sseu->subslice_mask.hsw[0] |= BIT(1); sseu_set_eus(sseu, 0, 1, ~disabled_mask & 0xFF); } - intel_sseu_set_subslices(sseu, 0, sseu->subslice_mask, subslice_mask); - sseu->eu_total = compute_eu_total(sseu); /* @@ -376,8 +396,7 @@ static void gen9_sseu_info_init(struct intel_gt *gt) /* skip disabled slice */ continue; - intel_sseu_set_subslices(sseu, s, sseu->subslice_mask, - subslice_mask); + sseu->subslice_mask.hsw[s] = subslice_mask; eu_disable = intel_uncore_read(uncore, GEN9_EU_DISABLE(s)); for (ss = 0; ss < sseu->max_subslices; ss++) { @@ -434,8 +453,8 @@ static void gen9_sseu_info_init(struct intel_gt *gt) sseu->has_eu_pg = sseu->eu_per_subslice > 2; if (IS_GEN9_LP(i915)) { -#define IS_SS_DISABLED(ss) (!(sseu->subslice_mask[0] & BIT(ss))) - info->has_pooled_eu = hweight8(sseu->subslice_mask[0]) == 3; +#define IS_SS_DISABLED(ss) (!(sseu->subslice_mask.hsw[0] & BIT(ss))) + info->has_pooled_eu = hweight8(sseu->subslice_mask.hsw[0]) == 3; sseu->min_eu_in_pool = 0; if (info->has_pooled_eu) { @@ -489,8 +508,7 @@ static void bdw_sseu_info_init(struct intel_gt *gt) /* skip disabled slice */ continue; - intel_sseu_set_subslices(sseu, s, sseu->subslice_mask, - subslice_mask); + sseu->subslice_mask.hsw[s] = subslice_mask; for (ss = 0; ss < sseu->max_subslices; ss++) { u8 eu_disabled_mask; @@ -587,8 +605,7 @@ static void hsw_sseu_info_init(struct intel_gt *gt) sseu->eu_per_subslice); for (s = 0; s < sseu->max_slices; s++) { - intel_sseu_set_subslices(sseu, s, sseu->subslice_mask, - subslice_mask); + sseu->subslice_mask.hsw[s] = subslice_mask; for (ss = 0; ss < sseu->max_subslices; ss++) { sseu_set_eus(sseu, s, ss, @@ -677,7 +694,7 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt, */ if (GRAPHICS_VER(i915) == 11 && slices == 1 && - subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) { + subslices > min_t(u8, 4, hweight8(sseu->subslice_mask.hsw[0]) / 2)) { GEM_BUG_ON(subslices & 1); subslice_pg = false; @@ -743,14 +760,29 @@ void intel_sseu_dump(const struct sseu_dev_info *sseu, struct drm_printer *p) { int s; - drm_printf(p, "slice total: %u, mask=%04x\n", - hweight8(sseu->slice_mask), sseu->slice_mask); - drm_printf(p, "subslice total: %u\n", intel_sseu_subslice_total(sseu)); - for (s = 0; s < sseu->max_slices; s++) { - drm_printf(p, "slice%d: %u subslices, mask=%08x\n", - s, intel_sseu_subslices_per_slice(sseu, s), - intel_sseu_get_subslices(sseu, s)); + if (sseu->has_xehp_dss) { + drm_printf(p, "subslice total: %u\n", + intel_sseu_subslice_total(sseu)); + drm_printf(p, "geometry dss mask=%*pb\n", + XEHP_BITMAP_BITS(sseu->geometry_subslice_mask), + sseu->geometry_subslice_mask.xehp); + drm_printf(p, "compute dss mask=%*pb\n", + XEHP_BITMAP_BITS(sseu->compute_subslice_mask), + sseu->compute_subslice_mask.xehp); + } else { + drm_printf(p, "slice total: %u, mask=%04x\n", + hweight8(sseu->slice_mask), sseu->slice_mask); + drm_printf(p, "subslice total: %u\n", + intel_sseu_subslice_total(sseu)); + + for (s = 0; s < sseu->max_slices; s++) { + u8 ss_mask = sseu->subslice_mask.hsw[s]; + + drm_printf(p, "slice%d: %u subslices, mask=%08x\n", + s, hweight8(ss_mask), ss_mask); + } } + drm_printf(p, "EU total: %u\n", sseu->eu_total); drm_printf(p, "EU per subslice: %u\n", sseu->eu_per_subslice); drm_printf(p, "has slice power gating: %s\n", @@ -767,9 +799,10 @@ static void sseu_print_hsw_topology(const struct sseu_dev_info *sseu, int s, ss; for (s = 0; s < sseu->max_slices; s++) { + u8 ss_mask = sseu->subslice_mask.hsw[s]; + drm_printf(p, "slice%d: %u subslice(s) (0x%08x):\n", - s, intel_sseu_subslices_per_slice(sseu, s), - intel_sseu_get_subslices(sseu, s)); + s, hweight8(ss_mask), ss_mask); for (ss = 0; ss < sseu->max_subslices; ss++) { u16 enabled_eus = sseu_get_eus(sseu, s, ss); @@ -783,16 +816,14 @@ static void sseu_print_hsw_topology(const struct sseu_dev_info *sseu, static void sseu_print_xehp_topology(const struct sseu_dev_info *sseu, struct drm_printer *p) { - u32 g_dss_mask = sseu_get_geometry_subslices(sseu); - u32 c_dss_mask = intel_sseu_get_compute_subslices(sseu); int dss; for (dss = 0; dss < sseu->max_subslices; dss++) { u16 enabled_eus = sseu_get_eus(sseu, 0, dss); drm_printf(p, "DSS_%02d: G:%3s C:%3s, %2u EUs (0x%04hx)\n", dss, - str_yes_no(g_dss_mask & BIT(dss)), - str_yes_no(c_dss_mask & BIT(dss)), + str_yes_no(test_bit(dss, sseu->geometry_subslice_mask.xehp)), + str_yes_no(test_bit(dss, sseu->compute_subslice_mask.xehp)), hweight16(enabled_eus), enabled_eus); } } @@ -810,20 +841,44 @@ void intel_sseu_print_topology(struct drm_i915_private *i915, } } -u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice) +void intel_sseu_print_ss_info(const char* type, + const struct sseu_dev_info *sseu, + struct seq_file *m) +{ + int s; + + if (sseu->has_xehp_dss) { + seq_printf(m, " %s Geometry DSS: %u\n", type, + bitmap_weight(sseu->geometry_subslice_mask.xehp, + XEHP_BITMAP_BITS(sseu->geometry_subslice_mask))); + seq_printf(m, " %s Compute DSS: %u\n", type, + bitmap_weight(sseu->compute_subslice_mask.xehp, + XEHP_BITMAP_BITS(sseu->compute_subslice_mask))); + } else { + for (s = 0; s < fls(sseu->slice_mask); s++) + seq_printf(m, " %s Slice%i subslices: %u\n", type, + s, hweight8(sseu->subslice_mask.hsw[s])); + } +} + +u16 intel_slicemask_from_xehp_dssmask(intel_sseu_ss_mask_t dss_mask, + int dss_per_slice) { - u16 slice_mask = 0; + intel_sseu_ss_mask_t per_slice_mask = {}; + unsigned long slice_mask = 0; int i; - WARN_ON(sizeof(dss_mask) * 8 / dss_per_slice > 8 * sizeof(slice_mask)); + WARN_ON(DIV_ROUND_UP(XEHP_BITMAP_BITS(dss_mask), dss_per_slice) > + 8 * sizeof(slice_mask)); - for (i = 0; dss_mask; i++) { - if (dss_mask & GENMASK(dss_per_slice - 1, 0)) + bitmap_fill(per_slice_mask.xehp, dss_per_slice); + for (i = 0; !bitmap_empty(dss_mask.xehp, XEHP_BITMAP_BITS(dss_mask)); i++) { + if (bitmap_intersects(dss_mask.xehp, per_slice_mask.xehp, dss_per_slice)) slice_mask |= BIT(i); - dss_mask >>= dss_per_slice; + bitmap_shift_right(dss_mask.xehp, dss_mask.xehp, dss_per_slice, + XEHP_BITMAP_BITS(dss_mask)); } return slice_mask; } - diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index ffa375e68959..61abcab5a76b 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -25,12 +25,16 @@ struct drm_printer; /* * Maximum number of subslices that can exist within a HSW-style slice. This * is only relevant to pre-Xe_HP platforms (Xe_HP and beyond use the - * GEN_MAX_DSS value below). + * I915_MAX_SS_FUSE_BITS value below). */ #define GEN_MAX_SS_PER_HSW_SLICE 6 -/* Maximum number of DSS on newer platforms (Xe_HP and beyond). */ -#define GEN_MAX_DSS 32 +/* + * Maximum number of 32-bit registers used by hardware to express the + * enabled/disabled subslices. + */ +#define I915_MAX_SS_FUSE_REGS 1 +#define I915_MAX_SS_FUSE_BITS (I915_MAX_SS_FUSE_REGS * 32) /* Maximum number of EUs that can exist within a subslice or DSS. */ #define GEN_MAX_EUS_PER_SS 16 @@ -38,7 +42,7 @@ struct drm_printer; #define SSEU_MAX(a, b) ((a) > (b) ? (a) : (b)) /* The maximum number of bits needed to express each subslice/DSS independently */ -#define GEN_SS_MASK_SIZE SSEU_MAX(GEN_MAX_DSS, \ +#define GEN_SS_MASK_SIZE SSEU_MAX(I915_MAX_SS_FUSE_BITS, \ GEN_MAX_HSW_SLICES * GEN_MAX_SS_PER_HSW_SLICE) #define GEN_SSEU_STRIDE(max_entries) DIV_ROUND_UP(max_entries, BITS_PER_BYTE) @@ -49,17 +53,26 @@ struct drm_printer; #define GEN_DSS_PER_CSLICE 8 #define GEN_DSS_PER_MSLICE 8 -#define GEN_MAX_GSLICES (GEN_MAX_DSS / GEN_DSS_PER_GSLICE) -#define GEN_MAX_CSLICES (GEN_MAX_DSS / GEN_DSS_PER_CSLICE) +#define GEN_MAX_GSLICES (I915_MAX_SS_FUSE_BITS / GEN_DSS_PER_GSLICE) +#define GEN_MAX_CSLICES (I915_MAX_SS_FUSE_BITS / GEN_DSS_PER_CSLICE) + +typedef union { + u8 hsw[GEN_MAX_HSW_SLICES]; + + /* Bitmap compatible with linux/bitmap.h; may exceed size of u64 */ + unsigned long xehp[BITS_TO_LONGS(I915_MAX_SS_FUSE_BITS)]; +} intel_sseu_ss_mask_t; + +#define XEHP_BITMAP_BITS(mask) ((int)BITS_PER_TYPE(typeof(mask.xehp))) struct sseu_dev_info { u8 slice_mask; - u8 subslice_mask[GEN_SS_MASK_SIZE]; - u8 geometry_subslice_mask[GEN_SS_MASK_SIZE]; - u8 compute_subslice_mask[GEN_SS_MASK_SIZE]; + intel_sseu_ss_mask_t subslice_mask; + intel_sseu_ss_mask_t geometry_subslice_mask; + intel_sseu_ss_mask_t compute_subslice_mask; union { u16 hsw[GEN_MAX_HSW_SLICES][GEN_MAX_SS_PER_HSW_SLICE]; - u16 xehp[GEN_MAX_DSS]; + u16 xehp[I915_MAX_SS_FUSE_BITS]; } eu_mask; u16 eu_total; @@ -80,8 +93,6 @@ struct sseu_dev_info { u8 max_slices; u8 max_subslices; u8 max_eus_per_subslice; - - u8 ss_stride; }; /* @@ -99,7 +110,7 @@ intel_sseu_from_device_info(const struct sseu_dev_info *sseu) { struct intel_sseu value = { .slice_mask = sseu->slice_mask, - .subslice_mask = sseu->subslice_mask[0], + .subslice_mask = sseu->subslice_mask.hsw[0], .min_eus_per_subslice = sseu->max_eus_per_subslice, .max_eus_per_subslice = sseu->max_eus_per_subslice, }; @@ -111,18 +122,28 @@ static inline bool intel_sseu_has_subslice(const struct sseu_dev_info *sseu, int slice, int subslice) { - u8 mask; - int ss_idx = subslice / BITS_PER_BYTE; - if (slice >= sseu->max_slices || subslice >= sseu->max_subslices) return false; - GEM_BUG_ON(ss_idx >= sseu->ss_stride); - - mask = sseu->subslice_mask[slice * sseu->ss_stride + ss_idx]; + if (sseu->has_xehp_dss) + return test_bit(subslice, sseu->subslice_mask.xehp); + else + return sseu->subslice_mask.hsw[slice] & BIT(subslice); +} - return mask & BIT(subslice % BITS_PER_BYTE); +/* + * Used to obtain the index of the first DSS. Can start searching from the + * beginning of a specific dss group (e.g., gslice, cslice, etc.) if + * groupsize and groupnum are non-zero. + */ +static inline unsigned int +intel_sseu_find_first_xehp_dss(const struct sseu_dev_info *sseu, int groupsize, + int groupnum) +{ + return find_next_bit(sseu->subslice_mask.xehp, + XEHP_BITMAP_BITS(sseu->subslice_mask), + groupnum * groupsize); } void intel_sseu_set_info(struct sseu_dev_info *sseu, u8 max_slices, @@ -132,14 +153,10 @@ unsigned int intel_sseu_subslice_total(const struct sseu_dev_info *sseu); unsigned int -intel_sseu_subslices_per_slice(const struct sseu_dev_info *sseu, u8 slice); +intel_sseu_get_hsw_subslices(const struct sseu_dev_info *sseu, u8 slice); -u32 intel_sseu_get_subslices(const struct sseu_dev_info *sseu, u8 slice); - -u32 intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu); - -void intel_sseu_set_subslices(struct sseu_dev_info *sseu, int slice, - u8 *subslice_mask, u32 ss_mask); +intel_sseu_ss_mask_t +intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu); void intel_sseu_info_init(struct intel_gt *gt); @@ -151,9 +168,15 @@ void intel_sseu_print_topology(struct drm_i915_private *i915, const struct sseu_dev_info *sseu, struct drm_printer *p); -u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice); +u16 intel_slicemask_from_xehp_dssmask(intel_sseu_ss_mask_t dss_mask, int dss_per_slice); int intel_sseu_copy_eumask_to_user(void __user *to, const struct sseu_dev_info *sseu); +int intel_sseu_copy_ssmask_to_user(void __user *to, + const struct sseu_dev_info *sseu); + +void intel_sseu_print_ss_info(const char* type, + const struct sseu_dev_info *sseu, + struct seq_file *m); #endif /* __INTEL_SSEU_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c b/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c index 2d5d011e01db..c2ee5e1826b5 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c @@ -4,6 +4,7 @@ * Copyright © 2020 Intel Corporation */ +#include #include #include "i915_drv.h" @@ -11,14 +12,6 @@ #include "intel_gt_regs.h" #include "intel_sseu_debugfs.h" -static void sseu_copy_subslices(const struct sseu_dev_info *sseu, - int slice, u8 *to_mask) -{ - int offset = slice * sseu->ss_stride; - - memcpy(&to_mask[offset], &sseu->subslice_mask[offset], sseu->ss_stride); -} - static void cherryview_sseu_device_status(struct intel_gt *gt, struct sseu_dev_info *sseu) { @@ -41,7 +34,7 @@ static void cherryview_sseu_device_status(struct intel_gt *gt, continue; sseu->slice_mask = BIT(0); - sseu->subslice_mask[0] |= BIT(ss); + sseu->subslice_mask.hsw[0] |= BIT(ss); eu_cnt = ((sig1[ss] & CHV_EU08_PG_ENABLE) ? 0 : 2) + ((sig1[ss] & CHV_EU19_PG_ENABLE) ? 0 : 2) + ((sig1[ss] & CHV_EU210_PG_ENABLE) ? 0 : 2) + @@ -92,7 +85,7 @@ static void gen11_sseu_device_status(struct intel_gt *gt, continue; sseu->slice_mask |= BIT(s); - sseu_copy_subslices(&info->sseu, s, sseu->subslice_mask); + sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s]; for (ss = 0; ss < info->sseu.max_subslices; ss++) { unsigned int eu_cnt; @@ -147,21 +140,17 @@ static void gen9_sseu_device_status(struct intel_gt *gt, sseu->slice_mask |= BIT(s); if (IS_GEN9_BC(gt->i915)) - sseu_copy_subslices(&info->sseu, s, - sseu->subslice_mask); + sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s]; for (ss = 0; ss < info->sseu.max_subslices; ss++) { unsigned int eu_cnt; - u8 ss_idx = s * info->sseu.ss_stride + - ss / BITS_PER_BYTE; if (IS_GEN9_LP(gt->i915)) { if (!(s_reg[s] & (GEN9_PGCTL_SS_ACK(ss)))) /* skip disabled subslice */ continue; - sseu->subslice_mask[ss_idx] |= - BIT(ss % BITS_PER_BYTE); + sseu->subslice_mask.hsw[s] |= BIT(ss); } eu_cnt = eu_reg[2 * s + ss / 2] & eu_mask[ss % 2]; @@ -188,8 +177,7 @@ static void bdw_sseu_device_status(struct intel_gt *gt, if (sseu->slice_mask) { sseu->eu_per_subslice = info->sseu.eu_per_subslice; for (s = 0; s < fls(sseu->slice_mask); s++) - sseu_copy_subslices(&info->sseu, s, - sseu->subslice_mask); + sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s]; sseu->eu_total = sseu->eu_per_subslice * intel_sseu_subslice_total(sseu); @@ -208,7 +196,6 @@ static void i915_print_sseu_info(struct seq_file *m, const struct sseu_dev_info *sseu) { const char *type = is_available_info ? "Available" : "Enabled"; - int s; seq_printf(m, " %s Slice Mask: %04x\n", type, sseu->slice_mask); @@ -216,10 +203,7 @@ static void i915_print_sseu_info(struct seq_file *m, hweight8(sseu->slice_mask)); seq_printf(m, " %s Subslice Total: %u\n", type, intel_sseu_subslice_total(sseu)); - for (s = 0; s < fls(sseu->slice_mask); s++) { - seq_printf(m, " %s Slice%i subslices: %u\n", type, - s, intel_sseu_subslices_per_slice(sseu, s)); - } + intel_sseu_print_ss_info(type, sseu, m); seq_printf(m, " %s EU Total: %u\n", type, sseu->eu_total); seq_printf(m, " %s EU Per Subslice: %u\n", type, diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index a604bc7c0701..6e875d4f5f65 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -950,8 +950,8 @@ gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal) * on s/ss combo, the read should be done with read_subslice_reg. */ slice = ffs(sseu->slice_mask) - 1; - GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask)); - subslice = ffs(intel_sseu_get_subslices(sseu, slice)); + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask.hsw)); + subslice = ffs(intel_sseu_get_hsw_subslices(sseu, slice)); GEM_BUG_ON(!subslice); subslice--; @@ -1089,11 +1089,10 @@ static void icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) { const struct sseu_dev_info *sseu = >->info.sseu; - unsigned int slice, subslice; + unsigned int subslice; GEM_BUG_ON(GRAPHICS_VER(gt->i915) < 11); GEM_BUG_ON(hweight8(sseu->slice_mask) > 1); - slice = 0; /* * Although a platform may have subslices, we need to always steer @@ -1104,7 +1103,7 @@ icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) * one of the higher subslices, we run the risk of reading back 0's or * random garbage. */ - subslice = __ffs(intel_sseu_get_subslices(sseu, slice)); + subslice = __ffs(intel_sseu_get_hsw_subslices(sseu, 0)); /* * If the subslice we picked above also steers us to a valid L3 bank, @@ -1114,7 +1113,7 @@ icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) if (gt->info.l3bank_mask & BIT(subslice)) gt->steering_table[L3BANK] = NULL; - __add_mcr_wa(gt, wal, slice, subslice); + __add_mcr_wa(gt, wal, 0, subslice); } static void @@ -1122,7 +1121,6 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) { const struct sseu_dev_info *sseu = >->info.sseu; unsigned long slice, subslice = 0, slice_mask = 0; - u64 dss_mask = 0; u32 lncf_mask = 0; int i; @@ -1153,8 +1151,8 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) */ /* Find the potential gslice candidates */ - dss_mask = intel_sseu_get_subslices(sseu, 0); - slice_mask = intel_slicemask_from_dssmask(dss_mask, GEN_DSS_PER_GSLICE); + slice_mask = intel_slicemask_from_xehp_dssmask(sseu->subslice_mask, + GEN_DSS_PER_GSLICE); /* * Find the potential LNCF candidates. Either LNCF within a valid @@ -1179,9 +1177,8 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) } slice = __ffs(slice_mask); - subslice = __ffs(dss_mask >> (slice * GEN_DSS_PER_GSLICE)); + subslice = intel_sseu_find_first_xehp_dss(sseu, GEN_DSS_PER_GSLICE, slice); WARN_ON(subslice > GEN_DSS_PER_GSLICE); - WARN_ON(dss_mask >> (slice * GEN_DSS_PER_GSLICE) == 0); __add_mcr_wa(gt, wal, slice, subslice); @@ -2062,9 +2059,8 @@ engine_fake_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) static bool needs_wa_1308578152(struct intel_engine_cs *engine) { - u64 dss_mask = intel_sseu_get_subslices(&engine->gt->info.sseu, 0); - - return (dss_mask & GENMASK(GEN_DSS_PER_GSLICE - 1, 0)) == 0; + return intel_sseu_find_first_xehp_dss(&engine->gt->info.sseu, 0, 0) > + GEN_DSS_PER_GSLICE; } static void diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index ac9767c56619..6fd15b39570c 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -162,8 +162,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, return -EINVAL; /* Only copy bits from the first slice */ - memcpy(&value, sseu->subslice_mask, - min(sseu->ss_stride, (u8)sizeof(value))); + value = intel_sseu_get_hsw_subslices(sseu, 0); if (!value) return -ENODEV; break; diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 89c475d525b8..0094f67c63f2 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -31,10 +31,11 @@ static int copy_query_item(void *query_hdr, size_t query_sz, static int fill_topology_info(const struct sseu_dev_info *sseu, struct drm_i915_query_item *query_item, - const u8 *subslice_mask) + intel_sseu_ss_mask_t subslice_mask) { struct drm_i915_query_topology_info topo; u32 slice_length, subslice_length, eu_length, total_length; + int ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices); int eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice); int ret; @@ -44,7 +45,7 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, return -ENODEV; slice_length = sizeof(sseu->slice_mask); - subslice_length = sseu->max_slices * sseu->ss_stride; + subslice_length = sseu->max_slices * ss_stride; eu_length = sseu->max_slices * sseu->max_subslices * eu_stride; total_length = sizeof(topo) + slice_length + subslice_length + eu_length; @@ -60,7 +61,7 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, topo.max_eus_per_subslice = sseu->max_eus_per_subslice; topo.subslice_offset = slice_length; - topo.subslice_stride = sseu->ss_stride; + topo.subslice_stride = ss_stride; topo.eu_offset = slice_length + subslice_length; topo.eu_stride = eu_stride; @@ -72,9 +73,9 @@ static int fill_topology_info(const struct sseu_dev_info *sseu, &sseu->slice_mask, slice_length)) return -EFAULT; - if (copy_to_user(u64_to_user_ptr(query_item->data_ptr + - sizeof(topo) + slice_length), - subslice_mask, subslice_length)) + if (intel_sseu_copy_ssmask_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(topo) + slice_length), + sseu)) return -EFAULT; if (intel_sseu_copy_eumask_to_user(u64_to_user_ptr(query_item->data_ptr + From patchwork Wed Jun 1 15:07:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Roper X-Patchwork-Id: 12866931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6249BCCA477 for ; Wed, 1 Jun 2022 15:07:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 89DB610EA78; Wed, 1 Jun 2022 15:07:36 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id DD83810EA63; Wed, 1 Jun 2022 15:07:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654096055; x=1685632055; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6aqW4bvZQ6KPwm3e2wHZec+Jvlz11MZyvWzJTklU1rs=; b=WqZVaWKGCKzRym5T+h0/QWRpkqniaA1/5uUeZiWiTXa1B3sVVWfKTZjv rXxWz9oIYgRbAjPmEfjHqodLovu24+C4QUYgBBKranRwjwQkJQ2soEF0p 5yf20N0VUR1hBrMENqxNCLofagThcmv82wb1Kg5H7Je5fX6j2ZryVKAXl 1XUf7s47XIpZBg+/Fsc3xFXB05FhefpIHM2YYASty60ejNd85cFG6xgjB Lgz/QRuAHYAEG9lC/0Nqv8Ql224fd3IcgsLYmIJXMTILIpXnVBBBeQ5fC vzPrks7eKzpCvQVLu3eXDl8ScnBez3ivJX8KDMWJJG9aPBrz6FBv6zntD Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="257688445" X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="257688445" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:32 -0700 X-IronPort-AV: E=Sophos;i="5.91,268,1647327600"; d="scan'208";a="667465458" Received: from mdroper-desk1.fm.intel.com ([10.1.27.134]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 08:07:31 -0700 From: Matt Roper To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 6/6] drm/i915/pvc: Add SSEU changes Date: Wed, 1 Jun 2022 08:07:25 -0700 Message-Id: <20220601150725.521468-7-matthew.d.roper@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220601150725.521468-1-matthew.d.roper@intel.com> References: <20220601150725.521468-1-matthew.d.roper@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: balasubramani.vivekanandan@intel.com, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" PVC splits the mask of enabled DSS over two registers. It also changes the meaning of the EU fuse register such that each bit represents a single EU rather than a pair of EUs. Signed-off-by: Matt Roper Acked-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_sseu.c | 31 ++++++++++++++++++------ drivers/gpu/drm/i915/gt/intel_sseu.h | 2 +- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_pci.c | 3 ++- drivers/gpu/drm/i915/intel_device_info.h | 1 + 6 files changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 58e9b464d564..6aa1ceaa8d27 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -561,6 +561,7 @@ #define GEN11_GT_VEBOX_DISABLE_MASK (0x0f << GEN11_GT_VEBOX_DISABLE_SHIFT) #define GEN12_GT_COMPUTE_DSS_ENABLE _MMIO(0x9144) +#define XEHPC_GT_COMPUTE_DSS_ENABLE_EXT _MMIO(0x9148) #define GEN6_UCGCTL1 _MMIO(0x9400) #define GEN6_GAMUNIT_CLOCK_GATE_DISABLE (1 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index 38069330d828..5cc1896a6cb9 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -210,27 +210,44 @@ static void xehp_sseu_info_init(struct intel_gt *gt) struct intel_uncore *uncore = gt->uncore; u16 eu_en = 0; u8 eu_en_fuse; + int num_compute_regs, num_geometry_regs; int eu; + if (IS_PONTEVECCHIO(gt->i915)) { + num_geometry_regs = 0; + num_compute_regs = 2; + } else { + num_geometry_regs = 1; + num_compute_regs = 1; + } + /* * The concept of slice has been removed in Xe_HP. To be compatible * with prior generations, assume a single slice across the entire * device. Then calculate out the DSS for each workload type within * that software slice. */ - intel_sseu_set_info(sseu, 1, 32, 16); + intel_sseu_set_info(sseu, 1, + 32 * max(num_geometry_regs, num_compute_regs), + 16); sseu->has_xehp_dss = 1; - xehp_load_dss_mask(uncore, &sseu->geometry_subslice_mask, 1, + xehp_load_dss_mask(uncore, &sseu->geometry_subslice_mask, + num_geometry_regs, GEN12_GT_GEOMETRY_DSS_ENABLE); - xehp_load_dss_mask(uncore, &sseu->compute_subslice_mask, 1, - GEN12_GT_COMPUTE_DSS_ENABLE); + xehp_load_dss_mask(uncore, &sseu->compute_subslice_mask, + num_compute_regs, + GEN12_GT_COMPUTE_DSS_ENABLE, + XEHPC_GT_COMPUTE_DSS_ENABLE_EXT); eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK; - for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++) - if (eu_en_fuse & BIT(eu)) - eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); + if (HAS_ONE_EU_PER_FUSE_BIT(gt->i915)) + eu_en = eu_en_fuse; + else + for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++) + if (eu_en_fuse & BIT(eu)) + eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); xehp_compute_sseu_info(sseu, eu_en); } diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index 61abcab5a76b..b86f23f5b3a1 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -33,7 +33,7 @@ struct drm_printer; * Maximum number of 32-bit registers used by hardware to express the * enabled/disabled subslices. */ -#define I915_MAX_SS_FUSE_REGS 1 +#define I915_MAX_SS_FUSE_REGS 2 #define I915_MAX_SS_FUSE_BITS (I915_MAX_SS_FUSE_REGS * 32) /* Maximum number of EUs that can exist within a subslice or DSS. */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bc1a7ff19463..c3854b8a014f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1359,6 +1359,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_3D_PIPELINE(i915) (INTEL_INFO(i915)->has_3d_pipeline) +#define HAS_ONE_EU_PER_FUSE_BIT(i915) (INTEL_INFO(i915)->has_one_eu_per_fuse_bit) + /* i915_gem.c */ void i915_gem_init_early(struct drm_i915_private *dev_priv); void i915_gem_cleanup_early(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 61386d1bb07b..047a6e326031 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1089,7 +1089,8 @@ static const struct intel_device_info ats_m_info = { XE_HP_FEATURES, \ .dma_mask_size = 52, \ .has_3d_pipeline = 0, \ - .has_l3_ccs_read = 1 + .has_l3_ccs_read = 1, \ + .has_one_eu_per_fuse_bit = 1 __maybe_unused static const struct intel_device_info pvc_info = { diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 4e1c80966ab5..346f17f2dce8 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -158,6 +158,7 @@ enum intel_ppgtt_type { func(has_logical_ring_elsq); \ func(has_media_ratio_mode); \ func(has_mslices); \ + func(has_one_eu_per_fuse_bit); \ func(has_pooled_eu); \ func(has_pxp); \ func(has_rc6); \