From patchwork Tue Aug 30 20:34:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor X-Patchwork-Id: 12959851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 88D82ECAAA1 for ; Tue, 30 Aug 2022 20:34:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EAA5610E010; Tue, 30 Aug 2022 20:34:46 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0B7D010E12F; Tue, 30 Aug 2022 20:34:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 85E3B61522; Tue, 30 Aug 2022 20:34:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1AF78C433B5; Tue, 30 Aug 2022 20:34:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661891678; bh=iBw4U+2fzQ5oGWVGdweyLaKJ4ZBymbB1s/nzHDkcubE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C9yyiI5AmObsBBLDO7ulJ/djy8IYj8uepUVXr0sWEFUJjAFdPLlg278nOr4OLgHrR L6Hd/Mnwk8JXKIpdC203E9DBMz1UmJPGUZltNv9yk3qt7CM9G1AlKvfYpBEmzDnVqR id8ewui4HTJ9LU7gu0vx9s+L4aSVzC2SZyuzVO3UjiBWQoSsyaN4HUL8dvfeGbUPg/ Q/M8214uEQtnADZ08pHQa2XfZujaYJ4F13B9AbtRvnVsmVDVdo2vX5SrrLcgYTNA/u AO7Xyc1JYEix7oqLU+eQe7CPv+wcu40Gcr+XYn5uhdD5Ft6dO5pZq8gHyOGIyeWI3S ZP64lYbdPrPxg== From: Nathan Chancellor To: Harry Wentland , Leo Li , Rodrigo Siqueira , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" Subject: [PATCH 1/5] drm/amd/display: Reduce number of arguments of dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport() Date: Tue, 30 Aug 2022 13:34:05 -0700 Message-Id: <20220830203409.3491379-2-nathan@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220830203409.3491379-1-nathan@kernel.org> References: <20220830203409.3491379-1-nathan@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Rix , llvm@lists.linux.dev, Nick Desaulniers , patches@lists.linux.dev, dri-devel@lists.freedesktop.org, Nathan Chancellor , amd-gfx@lists.freedesktop.org, Sudip Mukherjee Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer created at the top of dml32_ModeSupportAndSystemConfigurationFull(). This reduces the total amount of stack space that dml32_ModeSupportAndSystemConfigurationFull() uses by 216 bytes with LLVM 16 (2152 -> 1936), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1721:6: error: stack frame size (2152) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Additionally, while modifying the arguments to dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(), use 'v' consistently, instead of 'v' mixed with 'mode_lib->vba'. Link: https://github.com/ClangBuiltLinux/linux/issues/1681 Reported-by: "Sudip Mukherjee (Codethink)" Signed-off-by: Nathan Chancellor --- .../dc/dml/dcn32/display_mode_vba_32.c | 118 ++------- .../dc/dml/dcn32/display_mode_vba_util_32.c | 248 ++++++++---------- .../dc/dml/dcn32/display_mode_vba_util_32.h | 33 +-- 3 files changed, 140 insertions(+), 259 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c index d8014bfbc3fe..7da90fba95fc 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c @@ -1163,58 +1163,28 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters.SMNLatency = mode_lib->vba.SMNLatency; dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( - mode_lib->vba.USRRetrainingRequiredFinal, - mode_lib->vba.UsesMALLForPStateChange, - mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb], - mode_lib->vba.NumberOfActiveSurfaces, - mode_lib->vba.MaxLineBufferLines, - mode_lib->vba.LineBufferSizeFinal, - mode_lib->vba.WritebackInterfaceBufferSize, - mode_lib->vba.DCFCLK, - mode_lib->vba.ReturnBW, - mode_lib->vba.SynchronizeTimingsFinal, - mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal, - mode_lib->vba.DRRDisplay, - v->dpte_group_bytes, - v->meta_row_height, - v->meta_row_height_chroma, + v, + v->PrefetchModePerState[v->VoltageLevel][v->maxMpcComb], + v->DCFCLK, + v->ReturnBW, v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters, - mode_lib->vba.WritebackChunkSize, - mode_lib->vba.SOCCLK, + v->SOCCLK, v->DCFCLKDeepSleep, - mode_lib->vba.DETBufferSizeY, - mode_lib->vba.DETBufferSizeC, - mode_lib->vba.SwathHeightY, - mode_lib->vba.SwathHeightC, - mode_lib->vba.LBBitPerPixel, + v->DETBufferSizeY, + v->DETBufferSizeC, + v->SwathHeightY, + v->SwathHeightC, v->SwathWidthY, v->SwathWidthC, - mode_lib->vba.HRatio, - mode_lib->vba.HRatioChroma, - mode_lib->vba.vtaps, - mode_lib->vba.VTAPsChroma, - mode_lib->vba.VRatio, - mode_lib->vba.VRatioChroma, - mode_lib->vba.HTotal, - mode_lib->vba.VTotal, - mode_lib->vba.VActive, - mode_lib->vba.PixelClock, - mode_lib->vba.BlendingAndTiming, - mode_lib->vba.DPPPerPlane, + v->DPPPerPlane, v->BytePerPixelDETY, v->BytePerPixelDETC, v->DSTXAfterScaler, v->DSTYAfterScaler, - mode_lib->vba.WritebackEnable, - mode_lib->vba.WritebackPixelFormat, - mode_lib->vba.WritebackDestinationWidth, - mode_lib->vba.WritebackDestinationHeight, - mode_lib->vba.WritebackSourceHeight, v->UnboundedRequestEnabled, v->CompressedBufferSizeInkByte, /* Output */ - &v->Watermark, &v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.dummy_dramchange_support, v->MaxActiveDRAMClockChangeLatencySupported, v->SubViewportLinesNeededInMALL, @@ -3559,62 +3529,32 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l { dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( - mode_lib->vba.USRRetrainingRequiredFinal, - mode_lib->vba.UsesMALLForPStateChange, - mode_lib->vba.PrefetchModePerState[i][j], - mode_lib->vba.NumberOfActiveSurfaces, - mode_lib->vba.MaxLineBufferLines, - mode_lib->vba.LineBufferSizeFinal, - mode_lib->vba.WritebackInterfaceBufferSize, - mode_lib->vba.DCFCLKState[i][j], - mode_lib->vba.ReturnBWPerState[i][j], - mode_lib->vba.SynchronizeTimingsFinal, - mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal, - mode_lib->vba.DRRDisplay, - mode_lib->vba.dpte_group_bytes, - mode_lib->vba.meta_row_height, - mode_lib->vba.meta_row_height_chroma, + v, + v->PrefetchModePerState[i][j], + v->DCFCLKState[i][j], + v->ReturnBWPerState[i][j], v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.mSOCParameters, - mode_lib->vba.WritebackChunkSize, - mode_lib->vba.SOCCLKPerState[i], - mode_lib->vba.ProjectedDCFCLKDeepSleep[i][j], - mode_lib->vba.DETBufferSizeYThisState, - mode_lib->vba.DETBufferSizeCThisState, - mode_lib->vba.SwathHeightYThisState, - mode_lib->vba.SwathHeightCThisState, - mode_lib->vba.LBBitPerPixel, - mode_lib->vba.SwathWidthYThisState, // 24 - mode_lib->vba.SwathWidthCThisState, - mode_lib->vba.HRatio, - mode_lib->vba.HRatioChroma, - mode_lib->vba.vtaps, - mode_lib->vba.VTAPsChroma, - mode_lib->vba.VRatio, - mode_lib->vba.VRatioChroma, - mode_lib->vba.HTotal, - mode_lib->vba.VTotal, - mode_lib->vba.VActive, - mode_lib->vba.PixelClock, - mode_lib->vba.BlendingAndTiming, - mode_lib->vba.NoOfDPPThisState, - mode_lib->vba.BytePerPixelInDETY, - mode_lib->vba.BytePerPixelInDETC, + v->SOCCLKPerState[i], + v->ProjectedDCFCLKDeepSleep[i][j], + v->DETBufferSizeYThisState, + v->DETBufferSizeCThisState, + v->SwathHeightYThisState, + v->SwathHeightCThisState, + v->SwathWidthYThisState, // 24 + v->SwathWidthCThisState, + v->NoOfDPPThisState, + v->BytePerPixelInDETY, + v->BytePerPixelInDETC, v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTXAfterScaler, v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTYAfterScaler, - mode_lib->vba.WritebackEnable, - mode_lib->vba.WritebackPixelFormat, - mode_lib->vba.WritebackDestinationWidth, - mode_lib->vba.WritebackDestinationHeight, - mode_lib->vba.WritebackSourceHeight, - mode_lib->vba.UnboundedRequestEnabledThisState, - mode_lib->vba.CompressedBufferSizeInkByteThisState, + v->UnboundedRequestEnabledThisState, + v->CompressedBufferSizeInkByteThisState, /* Output */ - &mode_lib->vba.Watermark, // Store the values in vba - &mode_lib->vba.DRAMClockChangeSupport[i][j], + &v->DRAMClockChangeSupport[i][j], &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single2[0], // double *MaxActiveDRAMClockChangeLatencySupported &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_integer[0], // Long SubViewportLinesNeededInMALL[] - &mode_lib->vba.FCLKChangeSupport[i][j], + &v->FCLKChangeSupport[i][j], &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single2[1], // double *MinActiveFCLKChangeLatencySupported &mode_lib->vba.USRRetrainingSupport[i][j], mode_lib->vba.ActiveDRAMClockChangeLatencyMarginPerState[i][j]); diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c index dc501ee7d01a..6c7b748ee62b 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c @@ -4209,58 +4209,28 @@ void dml32_CalculateFlipSchedule( } // CalculateFlipSchedule void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( - bool USRRetrainingRequiredFinal, - enum dm_use_mall_for_pstate_change_mode UseMALLForPStateChange[], + struct vba_vars_st *v, unsigned int PrefetchMode, - unsigned int NumberOfActiveSurfaces, - unsigned int MaxLineBufferLines, - unsigned int LineBufferSize, - unsigned int WritebackInterfaceBufferSize, double DCFCLK, double ReturnBW, - bool SynchronizeTimingsFinal, - bool SynchronizeDRRDisplaysForUCLKPStateChangeFinal, - bool DRRDisplay[], - unsigned int dpte_group_bytes[], - unsigned int meta_row_height[], - unsigned int meta_row_height_chroma[], SOCParametersList mmSOCParameters, - unsigned int WritebackChunkSize, double SOCCLK, double DCFClkDeepSleep, unsigned int DETBufferSizeY[], unsigned int DETBufferSizeC[], unsigned int SwathHeightY[], unsigned int SwathHeightC[], - unsigned int LBBitPerPixel[], double SwathWidthY[], double SwathWidthC[], - double HRatio[], - double HRatioChroma[], - unsigned int VTaps[], - unsigned int VTapsChroma[], - double VRatio[], - double VRatioChroma[], - unsigned int HTotal[], - unsigned int VTotal[], - unsigned int VActive[], - double PixelClock[], - unsigned int BlendingAndTiming[], unsigned int DPPPerSurface[], double BytePerPixelDETY[], double BytePerPixelDETC[], double DSTXAfterScaler[], double DSTYAfterScaler[], - bool WritebackEnable[], - enum source_format_class WritebackPixelFormat[], - double WritebackDestinationWidth[], - double WritebackDestinationHeight[], - double WritebackSourceHeight[], bool UnboundedRequestEnabled, unsigned int CompressedBufferSizeInkByte, /* Output */ - Watermarks *Watermark, enum clock_change_support *DRAMClockChangeSupport, double MaxActiveDRAMClockChangeLatencySupported[], unsigned int SubViewportLinesNeededInMALL[], @@ -4302,136 +4272,136 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( unsigned int LBLatencyHidingSourceLinesY[DC__NUM_DPP__MAX]; unsigned int LBLatencyHidingSourceLinesC[DC__NUM_DPP__MAX]; - Watermark->UrgentWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency; - Watermark->USRRetrainingWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency + v->Watermark.UrgentWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency; + v->Watermark.USRRetrainingWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency + mmSOCParameters.USRRetrainingLatency + mmSOCParameters.SMNLatency; - Watermark->DRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + Watermark->UrgentWatermark; - Watermark->FCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + Watermark->UrgentWatermark; - Watermark->StutterExitWatermark = mmSOCParameters.SRExitTime + mmSOCParameters.ExtraLatency + v->Watermark.DRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + v->Watermark.UrgentWatermark; + v->Watermark.FCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + v->Watermark.UrgentWatermark; + v->Watermark.StutterExitWatermark = mmSOCParameters.SRExitTime + mmSOCParameters.ExtraLatency + 10 / DCFClkDeepSleep; - Watermark->StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitTime + mmSOCParameters.ExtraLatency + v->Watermark.StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitTime + mmSOCParameters.ExtraLatency + 10 / DCFClkDeepSleep; - Watermark->Z8StutterExitWatermark = mmSOCParameters.SRExitZ8Time + mmSOCParameters.ExtraLatency + v->Watermark.Z8StutterExitWatermark = mmSOCParameters.SRExitZ8Time + mmSOCParameters.ExtraLatency + 10 / DCFClkDeepSleep; - Watermark->Z8StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitZ8Time + v->Watermark.Z8StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitZ8Time + mmSOCParameters.ExtraLatency + 10 / DCFClkDeepSleep; #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: UrgentLatency = %f\n", __func__, mmSOCParameters.UrgentLatency); dml_print("DML::%s: ExtraLatency = %f\n", __func__, mmSOCParameters.ExtraLatency); dml_print("DML::%s: DRAMClockChangeLatency = %f\n", __func__, mmSOCParameters.DRAMClockChangeLatency); - dml_print("DML::%s: UrgentWatermark = %f\n", __func__, Watermark->UrgentWatermark); - dml_print("DML::%s: USRRetrainingWatermark = %f\n", __func__, Watermark->USRRetrainingWatermark); - dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, Watermark->DRAMClockChangeWatermark); - dml_print("DML::%s: FCLKChangeWatermark = %f\n", __func__, Watermark->FCLKChangeWatermark); - dml_print("DML::%s: StutterExitWatermark = %f\n", __func__, Watermark->StutterExitWatermark); - dml_print("DML::%s: StutterEnterPlusExitWatermark = %f\n", __func__, Watermark->StutterEnterPlusExitWatermark); - dml_print("DML::%s: Z8StutterExitWatermark = %f\n", __func__, Watermark->Z8StutterExitWatermark); + dml_print("DML::%s: UrgentWatermark = %f\n", __func__, v->Watermark.UrgentWatermark); + dml_print("DML::%s: USRRetrainingWatermark = %f\n", __func__, v->Watermark.USRRetrainingWatermark); + dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, v->Watermark.DRAMClockChangeWatermark); + dml_print("DML::%s: FCLKChangeWatermark = %f\n", __func__, v->Watermark.FCLKChangeWatermark); + dml_print("DML::%s: StutterExitWatermark = %f\n", __func__, v->Watermark.StutterExitWatermark); + dml_print("DML::%s: StutterEnterPlusExitWatermark = %f\n", __func__, v->Watermark.StutterEnterPlusExitWatermark); + dml_print("DML::%s: Z8StutterExitWatermark = %f\n", __func__, v->Watermark.Z8StutterExitWatermark); dml_print("DML::%s: Z8StutterEnterPlusExitWatermark = %f\n", - __func__, Watermark->Z8StutterEnterPlusExitWatermark); + __func__, v->Watermark.Z8StutterEnterPlusExitWatermark); #endif TotalActiveWriteback = 0; - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - if (WritebackEnable[k] == true) + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + if (v->WritebackEnable[k] == true) TotalActiveWriteback = TotalActiveWriteback + 1; } if (TotalActiveWriteback <= 1) { - Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency; + v->Watermark.WritebackUrgentWatermark = mmSOCParameters.WritebackLatency; } else { - Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency - + WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; + v->Watermark.WritebackUrgentWatermark = mmSOCParameters.WritebackLatency + + v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; } - if (USRRetrainingRequiredFinal) - Watermark->WritebackUrgentWatermark = Watermark->WritebackUrgentWatermark + if (v->USRRetrainingRequiredFinal) + v->Watermark.WritebackUrgentWatermark = v->Watermark.WritebackUrgentWatermark + mmSOCParameters.USRRetrainingLatency; if (TotalActiveWriteback <= 1) { - Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + v->Watermark.WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + mmSOCParameters.WritebackLatency; - Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + v->Watermark.WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + mmSOCParameters.WritebackLatency; } else { - Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency - + mmSOCParameters.WritebackLatency + WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; - Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency - + mmSOCParameters.WritebackLatency + WritebackChunkSize * 1024 / 32 / SOCCLK; + v->Watermark.WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + + mmSOCParameters.WritebackLatency + v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; + v->Watermark.WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + + mmSOCParameters.WritebackLatency + v->WritebackChunkSize * 1024 / 32 / SOCCLK; } - if (USRRetrainingRequiredFinal) - Watermark->WritebackDRAMClockChangeWatermark = Watermark->WritebackDRAMClockChangeWatermark + if (v->USRRetrainingRequiredFinal) + v->Watermark.WritebackDRAMClockChangeWatermark = v->Watermark.WritebackDRAMClockChangeWatermark + mmSOCParameters.USRRetrainingLatency; - if (USRRetrainingRequiredFinal) - Watermark->WritebackFCLKChangeWatermark = Watermark->WritebackFCLKChangeWatermark + if (v->USRRetrainingRequiredFinal) + v->Watermark.WritebackFCLKChangeWatermark = v->Watermark.WritebackFCLKChangeWatermark + mmSOCParameters.USRRetrainingLatency; #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: WritebackDRAMClockChangeWatermark = %f\n", - __func__, Watermark->WritebackDRAMClockChangeWatermark); - dml_print("DML::%s: WritebackFCLKChangeWatermark = %f\n", __func__, Watermark->WritebackFCLKChangeWatermark); - dml_print("DML::%s: WritebackUrgentWatermark = %f\n", __func__, Watermark->WritebackUrgentWatermark); - dml_print("DML::%s: USRRetrainingRequiredFinal = %d\n", __func__, USRRetrainingRequiredFinal); + __func__, v->Watermark.WritebackDRAMClockChangeWatermark); + dml_print("DML::%s: WritebackFCLKChangeWatermark = %f\n", __func__, v->Watermark.WritebackFCLKChangeWatermark); + dml_print("DML::%s: WritebackUrgentWatermark = %f\n", __func__, v->Watermark.WritebackUrgentWatermark); + dml_print("DML::%s: v->USRRetrainingRequiredFinal = %d\n", __func__, v->USRRetrainingRequiredFinal); dml_print("DML::%s: USRRetrainingLatency = %f\n", __func__, mmSOCParameters.USRRetrainingLatency); #endif - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - TotalPixelBW = TotalPixelBW + DPPPerSurface[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k] + - SwathWidthC[k] * BytePerPixelDETC[k] * VRatioChroma[k]) / (HTotal[k] / PixelClock[k]); + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + TotalPixelBW = TotalPixelBW + DPPPerSurface[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k] + + SwathWidthC[k] * BytePerPixelDETC[k] * v->VRatioChroma[k]) / (v->HTotal[k] / v->PixelClock[k]); } - for (k = 0; k < NumberOfActiveSurfaces; ++k) { + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { - LBLatencyHidingSourceLinesY[k] = dml_min((double) MaxLineBufferLines, dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(HRatio[k], 1.0)), 1)) - (VTaps[k] - 1); - LBLatencyHidingSourceLinesC[k] = dml_min((double) MaxLineBufferLines, dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(HRatioChroma[k], 1.0)), 1)) - (VTapsChroma[k] - 1); + LBLatencyHidingSourceLinesY[k] = dml_min((double) v->MaxLineBufferLines, dml_floor(v->LineBufferSizeFinal / v->LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(v->HRatio[k], 1.0)), 1)) - (v->vtaps[k] - 1); + LBLatencyHidingSourceLinesC[k] = dml_min((double) v->MaxLineBufferLines, dml_floor(v->LineBufferSizeFinal / v->LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(v->HRatioChroma[k], 1.0)), 1)) - (v->VTAPsChroma[k] - 1); #ifdef __DML_VBA_DEBUG__ - dml_print("DML::%s: k=%d, MaxLineBufferLines = %d\n", __func__, k, MaxLineBufferLines); - dml_print("DML::%s: k=%d, LineBufferSize = %d\n", __func__, k, LineBufferSize); - dml_print("DML::%s: k=%d, LBBitPerPixel = %d\n", __func__, k, LBBitPerPixel[k]); - dml_print("DML::%s: k=%d, HRatio = %f\n", __func__, k, HRatio[k]); - dml_print("DML::%s: k=%d, VTaps = %d\n", __func__, k, VTaps[k]); + dml_print("DML::%s: k=%d, v->MaxLineBufferLines = %d\n", __func__, k, v->MaxLineBufferLines); + dml_print("DML::%s: k=%d, v->LineBufferSizeFinal = %d\n", __func__, k, v->LineBufferSizeFinal); + dml_print("DML::%s: k=%d, v->LBBitPerPixel = %d\n", __func__, k, v->LBBitPerPixel[k]); + dml_print("DML::%s: k=%d, v->HRatio = %f\n", __func__, k, v->HRatio[k]); + dml_print("DML::%s: k=%d, v->vtaps = %d\n", __func__, k, v->vtaps[k]); #endif - EffectiveLBLatencyHidingY = LBLatencyHidingSourceLinesY[k] / VRatio[k] * (HTotal[k] / PixelClock[k]); - EffectiveLBLatencyHidingC = LBLatencyHidingSourceLinesC[k] / VRatioChroma[k] * (HTotal[k] / PixelClock[k]); + EffectiveLBLatencyHidingY = LBLatencyHidingSourceLinesY[k] / v->VRatio[k] * (v->HTotal[k] / v->PixelClock[k]); + EffectiveLBLatencyHidingC = LBLatencyHidingSourceLinesC[k] / v->VRatioChroma[k] * (v->HTotal[k] / v->PixelClock[k]); EffectiveDETBufferSizeY = DETBufferSizeY[k]; if (UnboundedRequestEnabled) { EffectiveDETBufferSizeY = EffectiveDETBufferSizeY + CompressedBufferSizeInkByte * 1024 - * (SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k]) - / (HTotal[k] / PixelClock[k]) / TotalPixelBW; + * (SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k]) + / (v->HTotal[k] / v->PixelClock[k]) / TotalPixelBW; } LinesInDETY[k] = (double) EffectiveDETBufferSizeY / BytePerPixelDETY[k] / SwathWidthY[k]; LinesInDETYRoundedDownToSwath[k] = dml_floor(LinesInDETY[k], SwathHeightY[k]); - FullDETBufferingTimeY = LinesInDETYRoundedDownToSwath[k] * (HTotal[k] / PixelClock[k]) / VRatio[k]; + FullDETBufferingTimeY = LinesInDETYRoundedDownToSwath[k] * (v->HTotal[k] / v->PixelClock[k]) / v->VRatio[k]; ActiveClockChangeLatencyHidingY = EffectiveLBLatencyHidingY + FullDETBufferingTimeY - - (DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k] / PixelClock[k]; + - (DSTXAfterScaler[k] / v->HTotal[k] + DSTYAfterScaler[k]) * v->HTotal[k] / v->PixelClock[k]; - if (NumberOfActiveSurfaces > 1) { + if (v->NumberOfActiveSurfaces > 1) { ActiveClockChangeLatencyHidingY = ActiveClockChangeLatencyHidingY - - (1 - 1 / NumberOfActiveSurfaces) * SwathHeightY[k] * HTotal[k] - / PixelClock[k] / VRatio[k]; + - (1 - 1 / v->NumberOfActiveSurfaces) * SwathHeightY[k] * v->HTotal[k] + / v->PixelClock[k] / v->VRatio[k]; } if (BytePerPixelDETC[k] > 0) { LinesInDETC[k] = DETBufferSizeC[k] / BytePerPixelDETC[k] / SwathWidthC[k]; LinesInDETCRoundedDownToSwath[k] = dml_floor(LinesInDETC[k], SwathHeightC[k]); - FullDETBufferingTimeC = LinesInDETCRoundedDownToSwath[k] * (HTotal[k] / PixelClock[k]) - / VRatioChroma[k]; + FullDETBufferingTimeC = LinesInDETCRoundedDownToSwath[k] * (v->HTotal[k] / v->PixelClock[k]) + / v->VRatioChroma[k]; ActiveClockChangeLatencyHidingC = EffectiveLBLatencyHidingC + FullDETBufferingTimeC - - (DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k] - / PixelClock[k]; - if (NumberOfActiveSurfaces > 1) { + - (DSTXAfterScaler[k] / v->HTotal[k] + DSTYAfterScaler[k]) * v->HTotal[k] + / v->PixelClock[k]; + if (v->NumberOfActiveSurfaces > 1) { ActiveClockChangeLatencyHidingC = ActiveClockChangeLatencyHidingC - - (1 - 1 / NumberOfActiveSurfaces) * SwathHeightC[k] * HTotal[k] - / PixelClock[k] / VRatioChroma[k]; + - (1 - 1 / v->NumberOfActiveSurfaces) * SwathHeightC[k] * v->HTotal[k] + / v->PixelClock[k] / v->VRatioChroma[k]; } ActiveClockChangeLatencyHiding = dml_min(ActiveClockChangeLatencyHidingY, ActiveClockChangeLatencyHidingC); @@ -4439,24 +4409,24 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( ActiveClockChangeLatencyHiding = ActiveClockChangeLatencyHidingY; } - ActiveDRAMClockChangeLatencyMargin[k] = ActiveClockChangeLatencyHiding - Watermark->UrgentWatermark - - Watermark->DRAMClockChangeWatermark; - ActiveFCLKChangeLatencyMargin[k] = ActiveClockChangeLatencyHiding - Watermark->UrgentWatermark - - Watermark->FCLKChangeWatermark; - USRRetrainingLatencyMargin[k] = ActiveClockChangeLatencyHiding - Watermark->USRRetrainingWatermark; - - if (WritebackEnable[k]) { - WritebackLatencyHiding = WritebackInterfaceBufferSize * 1024 - / (WritebackDestinationWidth[k] * WritebackDestinationHeight[k] - / (WritebackSourceHeight[k] * HTotal[k] / PixelClock[k]) * 4); - if (WritebackPixelFormat[k] == dm_444_64) + ActiveDRAMClockChangeLatencyMargin[k] = ActiveClockChangeLatencyHiding - v->Watermark.UrgentWatermark + - v->Watermark.DRAMClockChangeWatermark; + ActiveFCLKChangeLatencyMargin[k] = ActiveClockChangeLatencyHiding - v->Watermark.UrgentWatermark + - v->Watermark.FCLKChangeWatermark; + USRRetrainingLatencyMargin[k] = ActiveClockChangeLatencyHiding - v->Watermark.USRRetrainingWatermark; + + if (v->WritebackEnable[k]) { + WritebackLatencyHiding = v->WritebackInterfaceBufferSize * 1024 + / (v->WritebackDestinationWidth[k] * v->WritebackDestinationHeight[k] + / (v->WritebackSourceHeight[k] * v->HTotal[k] / v->PixelClock[k]) * 4); + if (v->WritebackPixelFormat[k] == dm_444_64) WritebackLatencyHiding = WritebackLatencyHiding / 2; WritebackDRAMClockChangeLatencyMargin = WritebackLatencyHiding - - Watermark->WritebackDRAMClockChangeWatermark; + - v->Watermark.WritebackDRAMClockChangeWatermark; WritebackFCLKChangeLatencyMargin = WritebackLatencyHiding - - Watermark->WritebackFCLKChangeWatermark; + - v->Watermark.WritebackFCLKChangeWatermark; ActiveDRAMClockChangeLatencyMargin[k] = dml_min(ActiveDRAMClockChangeLatencyMargin[k], WritebackFCLKChangeLatencyMargin); @@ -4464,22 +4434,22 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( WritebackDRAMClockChangeLatencyMargin); } MaxActiveDRAMClockChangeLatencySupported[k] = - (UseMALLForPStateChange[k] == dm_use_mall_pstate_change_phantom_pipe) ? + (v->UsesMALLForPStateChange[k] == dm_use_mall_pstate_change_phantom_pipe) ? 0 : (ActiveDRAMClockChangeLatencyMargin[k] + mmSOCParameters.DRAMClockChangeLatency); } - for (i = 0; i < NumberOfActiveSurfaces; ++i) { - for (j = 0; j < NumberOfActiveSurfaces; ++j) { + for (i = 0; i < v->NumberOfActiveSurfaces; ++i) { + for (j = 0; j < v->NumberOfActiveSurfaces; ++j) { if (i == j || - (BlendingAndTiming[i] == i && BlendingAndTiming[j] == i) || - (BlendingAndTiming[j] == j && BlendingAndTiming[i] == j) || - (BlendingAndTiming[i] == BlendingAndTiming[j] && BlendingAndTiming[i] != i) || - (SynchronizeTimingsFinal && PixelClock[i] == PixelClock[j] && - HTotal[i] == HTotal[j] && VTotal[i] == VTotal[j] && - VActive[i] == VActive[j]) || (SynchronizeDRRDisplaysForUCLKPStateChangeFinal && - (DRRDisplay[i] || DRRDisplay[j]))) { + (v->BlendingAndTiming[i] == i && v->BlendingAndTiming[j] == i) || + (v->BlendingAndTiming[j] == j && v->BlendingAndTiming[i] == j) || + (v->BlendingAndTiming[i] == v->BlendingAndTiming[j] && v->BlendingAndTiming[i] != i) || + (v->SynchronizeTimingsFinal && v->PixelClock[i] == v->PixelClock[j] && + v->HTotal[i] == v->HTotal[j] && v->VTotal[i] == v->VTotal[j] && + v->VActive[i] == v->VActive[j]) || (v->SynchronizeDRRDisplaysForUCLKPStateChangeFinal && + (v->DRRDisplay[i] || v->DRRDisplay[j]))) { SynchronizedSurfaces[i][j] = true; } else { SynchronizedSurfaces[i][j] = false; @@ -4487,8 +4457,8 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( } } - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + if ((v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && (!FoundFirstSurfaceWithMinActiveFCLKChangeMargin || ActiveFCLKChangeLatencyMargin[k] < MinActiveFCLKChangeMargin)) { FoundFirstSurfaceWithMinActiveFCLKChangeMargin = true; @@ -4500,9 +4470,9 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( *MinActiveFCLKChangeLatencySupported = MinActiveFCLKChangeMargin + mmSOCParameters.FCLKChangeLatency; SameTimingForFCLKChange = true; - for (k = 0; k < NumberOfActiveSurfaces; ++k) { + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { if (!SynchronizedSurfaces[k][SurfaceWithMinActiveFCLKChangeMargin]) { - if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && + if ((v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && (SameTimingForFCLKChange || ActiveFCLKChangeLatencyMargin[k] < SecondMinActiveFCLKChangeMarginOneDisplayInVBLank)) { @@ -4522,17 +4492,17 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( } *USRRetrainingSupport = true; - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + if ((v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) && (USRRetrainingLatencyMargin[k] < 0)) { *USRRetrainingSupport = false; } } - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - if (UseMALLForPStateChange[k] != dm_use_mall_pstate_change_full_frame && - UseMALLForPStateChange[k] != dm_use_mall_pstate_change_sub_viewport && - UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe && + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + if (v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_full_frame && + v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_sub_viewport && + v->UsesMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe && ActiveDRAMClockChangeLatencyMargin[k] < 0) { if (PrefetchMode > 0) { DRAMClockChangeSupportNumber = 2; @@ -4546,10 +4516,10 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( } } - for (k = 0; k < NumberOfActiveSurfaces; ++k) { - if (UseMALLForPStateChange[k] == dm_use_mall_pstate_change_full_frame) + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { + if (v->UsesMALLForPStateChange[k] == dm_use_mall_pstate_change_full_frame) DRAMClockChangeMethod = 1; - else if (UseMALLForPStateChange[k] == dm_use_mall_pstate_change_sub_viewport) + else if (v->UsesMALLForPStateChange[k] == dm_use_mall_pstate_change_sub_viewport) DRAMClockChangeMethod = 2; } @@ -4576,16 +4546,16 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( *DRAMClockChangeSupport = dm_dram_clock_change_unsupported; } - for (k = 0; k < NumberOfActiveSurfaces; ++k) { + for (k = 0; k < v->NumberOfActiveSurfaces; ++k) { unsigned int dst_y_pstate; unsigned int src_y_pstate_l; unsigned int src_y_pstate_c; unsigned int src_y_ahead_l, src_y_ahead_c, sub_vp_lines_l, sub_vp_lines_c; - dst_y_pstate = dml_ceil((mmSOCParameters.DRAMClockChangeLatency + mmSOCParameters.UrgentLatency) / (HTotal[k] / PixelClock[k]), 1); - src_y_pstate_l = dml_ceil(dst_y_pstate * VRatio[k], SwathHeightY[k]); + dst_y_pstate = dml_ceil((mmSOCParameters.DRAMClockChangeLatency + mmSOCParameters.UrgentLatency) / (v->HTotal[k] / v->PixelClock[k]), 1); + src_y_pstate_l = dml_ceil(dst_y_pstate * v->VRatio[k], SwathHeightY[k]); src_y_ahead_l = dml_floor(DETBufferSizeY[k] / BytePerPixelDETY[k] / SwathWidthY[k], SwathHeightY[k]) + LBLatencyHidingSourceLinesY[k]; - sub_vp_lines_l = src_y_pstate_l + src_y_ahead_l + meta_row_height[k]; + sub_vp_lines_l = src_y_pstate_l + src_y_ahead_l + v->meta_row_height[k]; #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: k=%d, DETBufferSizeY = %d\n", __func__, k, DETBufferSizeY[k]); @@ -4596,21 +4566,21 @@ dml_print("DML::%s: k=%d, LBLatencyHidingSourceLinesY = %d\n", __func__, k, LBL dml_print("DML::%s: k=%d, dst_y_pstate = %d\n", __func__, k, dst_y_pstate); dml_print("DML::%s: k=%d, src_y_pstate_l = %d\n", __func__, k, src_y_pstate_l); dml_print("DML::%s: k=%d, src_y_ahead_l = %d\n", __func__, k, src_y_ahead_l); -dml_print("DML::%s: k=%d, meta_row_height = %d\n", __func__, k, meta_row_height[k]); +dml_print("DML::%s: k=%d, v->meta_row_height = %d\n", __func__, k, v->meta_row_height[k]); dml_print("DML::%s: k=%d, sub_vp_lines_l = %d\n", __func__, k, sub_vp_lines_l); #endif SubViewportLinesNeededInMALL[k] = sub_vp_lines_l; if (BytePerPixelDETC[k] > 0) { - src_y_pstate_c = dml_ceil(dst_y_pstate * VRatioChroma[k], SwathHeightC[k]); + src_y_pstate_c = dml_ceil(dst_y_pstate * v->VRatioChroma[k], SwathHeightC[k]); src_y_ahead_c = dml_floor(DETBufferSizeC[k] / BytePerPixelDETC[k] / SwathWidthC[k], SwathHeightC[k]) + LBLatencyHidingSourceLinesC[k]; - sub_vp_lines_c = src_y_pstate_c + src_y_ahead_c + meta_row_height_chroma[k]; + sub_vp_lines_c = src_y_pstate_c + src_y_ahead_c + v->meta_row_height_chroma[k]; SubViewportLinesNeededInMALL[k] = dml_max(sub_vp_lines_l, sub_vp_lines_c); #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: k=%d, src_y_pstate_c = %d\n", __func__, k, src_y_pstate_c); dml_print("DML::%s: k=%d, src_y_ahead_c = %d\n", __func__, k, src_y_ahead_c); -dml_print("DML::%s: k=%d, meta_row_height_chroma = %d\n", __func__, k, meta_row_height_chroma[k]); +dml_print("DML::%s: k=%d, v->meta_row_height_chroma = %d\n", __func__, k, v->meta_row_height_chroma[k]); dml_print("DML::%s: k=%d, sub_vp_lines_c = %d\n", __func__, k, sub_vp_lines_c); #endif } diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h index 626f6605e2d5..7cdf0418830a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h @@ -30,6 +30,7 @@ #include "os_types.h" #include "../dc_features.h" #include "../display_mode_structs.h" +#include "dml/display_mode_vba.h" unsigned int dml32_dscceComputeDelay( unsigned int bpc, @@ -808,58 +809,28 @@ void dml32_CalculateFlipSchedule( bool *ImmediateFlipSupportedForPipe); void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( - bool USRRetrainingRequiredFinal, - enum dm_use_mall_for_pstate_change_mode UseMALLForPStateChange[], + struct vba_vars_st *v, unsigned int PrefetchMode, - unsigned int NumberOfActiveSurfaces, - unsigned int MaxLineBufferLines, - unsigned int LineBufferSize, - unsigned int WritebackInterfaceBufferSize, double DCFCLK, double ReturnBW, - bool SynchronizeTimingsFinal, - bool SynchronizeDRRDisplaysForUCLKPStateChangeFinal, - bool DRRDisplay[], - unsigned int dpte_group_bytes[], - unsigned int meta_row_height[], - unsigned int meta_row_height_chroma[], SOCParametersList mmSOCParameters, - unsigned int WritebackChunkSize, double SOCCLK, double DCFClkDeepSleep, unsigned int DETBufferSizeY[], unsigned int DETBufferSizeC[], unsigned int SwathHeightY[], unsigned int SwathHeightC[], - unsigned int LBBitPerPixel[], double SwathWidthY[], double SwathWidthC[], - double HRatio[], - double HRatioChroma[], - unsigned int VTaps[], - unsigned int VTapsChroma[], - double VRatio[], - double VRatioChroma[], - unsigned int HTotal[], - unsigned int VTotal[], - unsigned int VActive[], - double PixelClock[], - unsigned int BlendingAndTiming[], unsigned int DPPPerSurface[], double BytePerPixelDETY[], double BytePerPixelDETC[], double DSTXAfterScaler[], double DSTYAfterScaler[], - bool WritebackEnable[], - enum source_format_class WritebackPixelFormat[], - double WritebackDestinationWidth[], - double WritebackDestinationHeight[], - double WritebackSourceHeight[], bool UnboundedRequestEnabled, unsigned int CompressedBufferSizeInkByte, /* Output */ - Watermarks *Watermark, enum clock_change_support *DRAMClockChangeSupport, double MaxActiveDRAMClockChangeLatencySupported[], unsigned int SubViewportLinesNeededInMALL[], From patchwork Tue Aug 30 20:34:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor X-Patchwork-Id: 12959853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15F61ECAAD5 for ; Tue, 30 Aug 2022 20:35:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 653AD10E12F; Tue, 30 Aug 2022 20:34:51 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2784C10E12D; Tue, 30 Aug 2022 20:34:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A8644B81D81; Tue, 30 Aug 2022 20:34:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8EE97C4347C; Tue, 30 Aug 2022 20:34:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661891680; bh=T13qHrNnUxbUDWhssZ9m9lTfcGa79RhWW97V8xxFWaM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LjiIV8l7VhSwbPBCzkgygncthngy5r/WJggyO1hXlGHM/gtDojn0lydXa5cqXhQN8 zw77USOxemcReUbyMW8MIRiUC5WMKxWbQiC6Pxi2SaX1zIBSLHKbBtvJ/TJZuSp+Sp fzpaSDH91npRAS1HcHWeQrg9wOBMG8Qqo4Mc9plBXlO5LWxhbTaAY39V2FD98OW4C1 aJCvXkpaozmBV0OKkl9Q8HsGiTRmEsnrkQYEjc8V9PLDiyL9kb2CYWTYvs1tPnD4KA sU54FxZAWOMyjK0zFguxkxRPZI9YQ4pT3YZQDKFvLk42ksMU6C1s8LLCJNjnHDHzP9 fJi1LrdD8x+Iw== From: Nathan Chancellor To: Harry Wentland , Leo Li , Rodrigo Siqueira , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" Subject: [PATCH 2/5] drm/amd/display: Reduce number of arguments of dml32_CalculatePrefetchSchedule() Date: Tue, 30 Aug 2022 13:34:06 -0700 Message-Id: <20220830203409.3491379-3-nathan@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220830203409.3491379-1-nathan@kernel.org> References: <20220830203409.3491379-1-nathan@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Rix , llvm@lists.linux.dev, Nick Desaulniers , patches@lists.linux.dev, dri-devel@lists.freedesktop.org, Nathan Chancellor , amd-gfx@lists.freedesktop.org, Sudip Mukherjee Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Several of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml32_ModeSupportAndSystemConfigurationFull() uses by 208 bytes with LLVM 16 (1936 -> 1728), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1721:6: error: stack frame size (2152) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Additionally, while modifying the arguments to dml32_CalculatePrefetchSchedule(), use 'v' consistently, instead of 'v' mixed with 'mode_lib->vba'. Link: https://github.com/ClangBuiltLinux/linux/issues/1681 Reported-by: "Sudip Mukherjee (Codethink)" Signed-off-by: Nathan Chancellor --- .../dc/dml/dcn32/display_mode_vba_32.c | 118 +++++++----------- .../dc/dml/dcn32/display_mode_vba_util_32.c | 75 +++++------ .../dc/dml/dcn32/display_mode_vba_util_32.h | 18 +-- 3 files changed, 78 insertions(+), 133 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c index 7da90fba95fc..58c6cc58583a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c @@ -755,30 +755,18 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.myPipe.BytePerPixelY = v->BytePerPixelY[k]; v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.myPipe.BytePerPixelC = v->BytePerPixelC[k]; v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.myPipe.ProgressiveToInterlaceUnitInOPP = mode_lib->vba.ProgressiveToInterlaceUnitInOPP; - v->ErrorResult[k] = dml32_CalculatePrefetchSchedule(v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.HostVMInefficiencyFactor, - &v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.myPipe, v->DSCDelay[k], - mode_lib->vba.DPPCLKDelaySubtotal + mode_lib->vba.DPPCLKDelayCNVCFormater, - mode_lib->vba.DPPCLKDelaySCL, - mode_lib->vba.DPPCLKDelaySCLLBOnly, - mode_lib->vba.DPPCLKDelayCNVCCursor, - mode_lib->vba.DISPCLKDelaySubtotal, - (unsigned int) (v->SwathWidthY[k] / mode_lib->vba.HRatio[k]), - mode_lib->vba.OutputFormat[k], - mode_lib->vba.MaxInterDCNTileRepeaters, + v->ErrorResult[k] = dml32_CalculatePrefetchSchedule( + v, + k, + v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.HostVMInefficiencyFactor, + &v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.myPipe, + v->DSCDelay[k], + (unsigned int) (v->SwathWidthY[k] / v->HRatio[k]), dml_min(v->VStartupLines, v->MaxVStartupLines[k]), v->MaxVStartupLines[k], - mode_lib->vba.GPUVMMaxPageTableLevels, - mode_lib->vba.GPUVMEnable, - mode_lib->vba.HostVMEnable, - mode_lib->vba.HostVMMaxNonCachedPageTableLevels, - mode_lib->vba.HostVMMinPageSize, - mode_lib->vba.DynamicMetadataEnable[k], - mode_lib->vba.DynamicMetadataVMEnabled, - mode_lib->vba.DynamicMetadataLinesBeforeActiveRequired[k], - mode_lib->vba.DynamicMetadataTransmittedBytes[k], v->UrgentLatency, v->UrgentExtraLatency, - mode_lib->vba.TCalc, + v->TCalc, v->PDEAndMetaPTEBytesFrame[k], v->MetaRowByte[k], v->PixelPTEBytesPerRow[k], @@ -792,8 +780,8 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman v->MaxNumSwathC[k], v->swath_width_luma_ub[k], v->swath_width_chroma_ub[k], - mode_lib->vba.SwathHeightY[k], - mode_lib->vba.SwathHeightC[k], + v->SwathHeightY[k], + v->SwathHeightC[k], TWait, /* Output */ &v->DSTXAfterScaler[k], @@ -3230,63 +3218,47 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l mode_lib->vba.NoTimeForPrefetch[i][j][k] = dml32_CalculatePrefetchSchedule( + v, + k, v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.HostVMInefficiencyFactor, &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.myPipe, - mode_lib->vba.DSCDelayPerState[i][k], - mode_lib->vba.DPPCLKDelaySubtotal + - mode_lib->vba.DPPCLKDelayCNVCFormater, - mode_lib->vba.DPPCLKDelaySCL, - mode_lib->vba.DPPCLKDelaySCLLBOnly, - mode_lib->vba.DPPCLKDelayCNVCCursor, - mode_lib->vba.DISPCLKDelaySubtotal, - mode_lib->vba.SwathWidthYThisState[k] / - mode_lib->vba.HRatio[k], - mode_lib->vba.OutputFormat[k], - mode_lib->vba.MaxInterDCNTileRepeaters, - dml_min(mode_lib->vba.MaxVStartup, - mode_lib->vba.MaximumVStartup[i][j][k]), - mode_lib->vba.MaximumVStartup[i][j][k], - mode_lib->vba.GPUVMMaxPageTableLevels, - mode_lib->vba.GPUVMEnable, mode_lib->vba.HostVMEnable, - mode_lib->vba.HostVMMaxNonCachedPageTableLevels, - mode_lib->vba.HostVMMinPageSize, - mode_lib->vba.DynamicMetadataEnable[k], - mode_lib->vba.DynamicMetadataVMEnabled, - mode_lib->vba.DynamicMetadataLinesBeforeActiveRequired[k], - mode_lib->vba.DynamicMetadataTransmittedBytes[k], - mode_lib->vba.UrgLatency[i], - mode_lib->vba.ExtraLatency, - mode_lib->vba.TimeCalc, - mode_lib->vba.PDEAndMetaPTEBytesPerFrame[i][j][k], - mode_lib->vba.MetaRowBytes[i][j][k], - mode_lib->vba.DPTEBytesPerRow[i][j][k], - mode_lib->vba.PrefetchLinesY[i][j][k], - mode_lib->vba.SwathWidthYThisState[k], - mode_lib->vba.PrefillY[k], - mode_lib->vba.MaxNumSwY[k], - mode_lib->vba.PrefetchLinesC[i][j][k], - mode_lib->vba.SwathWidthCThisState[k], - mode_lib->vba.PrefillC[k], - mode_lib->vba.MaxNumSwC[k], - mode_lib->vba.swath_width_luma_ub_this_state[k], - mode_lib->vba.swath_width_chroma_ub_this_state[k], - mode_lib->vba.SwathHeightYThisState[k], - mode_lib->vba.SwathHeightCThisState[k], mode_lib->vba.TWait, + v->DSCDelayPerState[i][k], + v->SwathWidthYThisState[k] / v->HRatio[k], + dml_min(v->MaxVStartup, v->MaximumVStartup[i][j][k]), + v->MaximumVStartup[i][j][k], + v->UrgLatency[i], + v->ExtraLatency, + v->TimeCalc, + v->PDEAndMetaPTEBytesPerFrame[i][j][k], + v->MetaRowBytes[i][j][k], + v->DPTEBytesPerRow[i][j][k], + v->PrefetchLinesY[i][j][k], + v->SwathWidthYThisState[k], + v->PrefillY[k], + v->MaxNumSwY[k], + v->PrefetchLinesC[i][j][k], + v->SwathWidthCThisState[k], + v->PrefillC[k], + v->MaxNumSwC[k], + v->swath_width_luma_ub_this_state[k], + v->swath_width_chroma_ub_this_state[k], + v->SwathHeightYThisState[k], + v->SwathHeightCThisState[k], v->TWait, /* Output */ &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTXAfterScaler[k], &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTYAfterScaler[k], - &mode_lib->vba.LineTimesForPrefetch[k], - &mode_lib->vba.PrefetchBW[k], - &mode_lib->vba.LinesForMetaPTE[k], - &mode_lib->vba.LinesForMetaAndDPTERow[k], - &mode_lib->vba.VRatioPreY[i][j][k], - &mode_lib->vba.VRatioPreC[i][j][k], - &mode_lib->vba.RequiredPrefetchPixelDataBWLuma[0][0][k], - &mode_lib->vba.RequiredPrefetchPixelDataBWChroma[0][0][k], - &mode_lib->vba.NoTimeForDynamicMetadata[i][j][k], - &mode_lib->vba.Tno_bw[k], - &mode_lib->vba.prefetch_vmrow_bw[k], + &v->LineTimesForPrefetch[k], + &v->PrefetchBW[k], + &v->LinesForMetaPTE[k], + &v->LinesForMetaAndDPTERow[k], + &v->VRatioPreY[i][j][k], + &v->VRatioPreC[i][j][k], + &v->RequiredPrefetchPixelDataBWLuma[0][0][k], + &v->RequiredPrefetchPixelDataBWChroma[0][0][k], + &v->NoTimeForDynamicMetadata[i][j][k], + &v->Tno_bw[k], + &v->prefetch_vmrow_bw[k], &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single[0], // double *Tdmdl_vm &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single[1], // double *Tdmdl &v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single[2], // double *TSetup diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c index 6c7b748ee62b..61be183a4036 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c @@ -3366,28 +3366,14 @@ double dml32_CalculateExtraLatency( } // CalculateExtraLatency bool dml32_CalculatePrefetchSchedule( + struct vba_vars_st *v, + unsigned int k, double HostVMInefficiencyFactor, DmlPipe *myPipe, unsigned int DSCDelay, - double DPPCLKDelaySubtotalPlusCNVCFormater, - double DPPCLKDelaySCL, - double DPPCLKDelaySCLLBOnly, - double DPPCLKDelayCNVCCursor, - double DISPCLKDelaySubtotal, unsigned int DPP_RECOUT_WIDTH, - enum output_format_class OutputFormat, - unsigned int MaxInterDCNTileRepeaters, unsigned int VStartup, unsigned int MaxVStartup, - unsigned int GPUVMPageTableLevels, - bool GPUVMEnable, - bool HostVMEnable, - unsigned int HostVMMaxNonCachedPageTableLevels, - double HostVMMinPageSize, - bool DynamicMetadataEnable, - bool DynamicMetadataVMEnabled, - int DynamicMetadataLinesBeforeActiveRequired, - unsigned int DynamicMetadataTransmittedBytes, double UrgentLatency, double UrgentExtraLatency, double TCalc, @@ -3428,6 +3414,7 @@ bool dml32_CalculatePrefetchSchedule( double *VUpdateWidthPix, double *VReadyOffsetPix) { + double DPPCLKDelaySubtotalPlusCNVCFormater = v->DPPCLKDelaySubtotal + v->DPPCLKDelayCNVCFormater; bool MyError = false; unsigned int DPPCycles, DISPCLKCycles; double DSTTotalPixelsAfterScaler; @@ -3464,27 +3451,27 @@ bool dml32_CalculatePrefetchSchedule( double Tsw_est1 = 0; double Tsw_est3 = 0; - if (GPUVMEnable == true && HostVMEnable == true) - HostVMDynamicLevelsTrips = HostVMMaxNonCachedPageTableLevels; + if (v->GPUVMEnable == true && v->HostVMEnable == true) + HostVMDynamicLevelsTrips = v->HostVMMaxNonCachedPageTableLevels; else HostVMDynamicLevelsTrips = 0; #ifdef __DML_VBA_DEBUG__ - dml_print("DML::%s: GPUVMEnable = %d\n", __func__, GPUVMEnable); - dml_print("DML::%s: GPUVMPageTableLevels = %d\n", __func__, GPUVMPageTableLevels); + dml_print("DML::%s: v->GPUVMEnable = %d\n", __func__, v->GPUVMEnable); + dml_print("DML::%s: v->GPUVMMaxPageTableLevels = %d\n", __func__, v->GPUVMMaxPageTableLevels); dml_print("DML::%s: DCCEnable = %d\n", __func__, myPipe->DCCEnable); - dml_print("DML::%s: HostVMEnable=%d HostVMInefficiencyFactor=%f\n", - __func__, HostVMEnable, HostVMInefficiencyFactor); + dml_print("DML::%s: v->HostVMEnable=%d HostVMInefficiencyFactor=%f\n", + __func__, v->HostVMEnable, HostVMInefficiencyFactor); #endif dml32_CalculateVUpdateAndDynamicMetadataParameters( - MaxInterDCNTileRepeaters, + v->MaxInterDCNTileRepeaters, myPipe->Dppclk, myPipe->Dispclk, myPipe->DCFClkDeepSleep, myPipe->PixelClock, myPipe->HTotal, myPipe->VBlank, - DynamicMetadataTransmittedBytes, - DynamicMetadataLinesBeforeActiveRequired, + v->DynamicMetadataTransmittedBytes[k], + v->DynamicMetadataLinesBeforeActiveRequired[k], myPipe->InterlaceEnable, myPipe->ProgressiveToInterlaceUnitInOPP, TSetup, @@ -3499,19 +3486,19 @@ bool dml32_CalculatePrefetchSchedule( LineTime = myPipe->HTotal / myPipe->PixelClock; trip_to_mem = UrgentLatency; - Tvm_trips = UrgentExtraLatency + trip_to_mem * (GPUVMPageTableLevels * (HostVMDynamicLevelsTrips + 1) - 1); + Tvm_trips = UrgentExtraLatency + trip_to_mem * (v->GPUVMMaxPageTableLevels * (HostVMDynamicLevelsTrips + 1) - 1); - if (DynamicMetadataVMEnabled == true) + if (v->DynamicMetadataVMEnabled == true) *Tdmdl = TWait + Tvm_trips + trip_to_mem; else *Tdmdl = TWait + UrgentExtraLatency; #ifdef __DML_VBA_ALLOW_DELTA__ - if (DynamicMetadataEnable == false) + if (v->DynamicMetadataEnable[k] == false) *Tdmdl = 0.0; #endif - if (DynamicMetadataEnable == true) { + if (v->DynamicMetadataEnable[k] == true) { if (VStartup * LineTime < *TSetup + *Tdmdl + Tdmbf + Tdmec + Tdmsks) { *NotEnoughTimeForDynamicMetadata = true; #ifdef __DML_VBA_DEBUG__ @@ -3531,17 +3518,17 @@ bool dml32_CalculatePrefetchSchedule( *NotEnoughTimeForDynamicMetadata = false; } - *Tdmdl_vm = (DynamicMetadataEnable == true && DynamicMetadataVMEnabled == true && - GPUVMEnable == true ? TWait + Tvm_trips : 0); + *Tdmdl_vm = (v->DynamicMetadataEnable[k] == true && v->DynamicMetadataVMEnabled == true && + v->GPUVMEnable == true ? TWait + Tvm_trips : 0); if (myPipe->ScalerEnabled) - DPPCycles = DPPCLKDelaySubtotalPlusCNVCFormater + DPPCLKDelaySCL; + DPPCycles = DPPCLKDelaySubtotalPlusCNVCFormater + v->DPPCLKDelaySCL; else - DPPCycles = DPPCLKDelaySubtotalPlusCNVCFormater + DPPCLKDelaySCLLBOnly; + DPPCycles = DPPCLKDelaySubtotalPlusCNVCFormater + v->DPPCLKDelaySCLLBOnly; - DPPCycles = DPPCycles + myPipe->NumberOfCursors * DPPCLKDelayCNVCCursor; + DPPCycles = DPPCycles + myPipe->NumberOfCursors * v->DPPCLKDelayCNVCCursor; - DISPCLKCycles = DISPCLKDelaySubtotal; + DISPCLKCycles = v->DISPCLKDelaySubtotal; if (myPipe->Dppclk == 0.0 || myPipe->Dispclk == 0.0) return true; @@ -3567,7 +3554,7 @@ bool dml32_CalculatePrefetchSchedule( dml_print("DML::%s: DSTXAfterScaler: %d\n", __func__, *DSTXAfterScaler); #endif - if (OutputFormat == dm_420 || (myPipe->InterlaceEnable && myPipe->ProgressiveToInterlaceUnitInOPP)) + if (v->OutputFormat[k] == dm_420 || (myPipe->InterlaceEnable && myPipe->ProgressiveToInterlaceUnitInOPP)) *DSTYAfterScaler = 1; else *DSTYAfterScaler = 0; @@ -3584,13 +3571,13 @@ bool dml32_CalculatePrefetchSchedule( Tr0_trips = trip_to_mem * (HostVMDynamicLevelsTrips + 1); - if (GPUVMEnable == true) { + if (v->GPUVMEnable == true) { Tvm_trips_rounded = dml_ceil(4.0 * Tvm_trips / LineTime, 1.0) / 4.0 * LineTime; Tr0_trips_rounded = dml_ceil(4.0 * Tr0_trips / LineTime, 1.0) / 4.0 * LineTime; - if (GPUVMPageTableLevels >= 3) { + if (v->GPUVMMaxPageTableLevels >= 3) { *Tno_bw = UrgentExtraLatency + trip_to_mem * - (double) ((GPUVMPageTableLevels - 2) * (HostVMDynamicLevelsTrips + 1) - 1); - } else if (GPUVMPageTableLevels == 1 && myPipe->DCCEnable != true) { + (double) ((v->GPUVMMaxPageTableLevels - 2) * (HostVMDynamicLevelsTrips + 1) - 1); + } else if (v->GPUVMMaxPageTableLevels == 1 && myPipe->DCCEnable != true) { Tr0_trips_rounded = dml_ceil(4.0 * UrgentExtraLatency / LineTime, 1.0) / 4.0 * LineTime; // VBA_ERROR *Tno_bw = UrgentExtraLatency; @@ -3625,7 +3612,7 @@ bool dml32_CalculatePrefetchSchedule( min_Lsw = dml_max(min_Lsw, 1.0); Lsw_oto = dml_ceil(4.0 * dml_max(prefetch_sw_bytes / prefetch_bw_oto / LineTime, min_Lsw), 1.0) / 4.0; - if (GPUVMEnable == true) { + if (v->GPUVMEnable == true) { Tvm_oto = dml_max3( Tvm_trips, *Tno_bw + PDEAndMetaPTEBytesFrame * HostVMInefficiencyFactor / prefetch_bw_oto, @@ -3633,7 +3620,7 @@ bool dml32_CalculatePrefetchSchedule( } else Tvm_oto = LineTime / 4.0; - if ((GPUVMEnable == true || myPipe->DCCEnable == true)) { + if ((v->GPUVMEnable == true || myPipe->DCCEnable == true)) { Tr0_oto = dml_max4( Tr0_trips, (MetaRowByte + PixelPTEBytesPerRow * HostVMInefficiencyFactor) / prefetch_bw_oto, @@ -3836,7 +3823,7 @@ bool dml32_CalculatePrefetchSchedule( #endif if (prefetch_bw_equ > 0) { - if (GPUVMEnable == true) { + if (v->GPUVMEnable == true) { Tvm_equ = dml_max3(*Tno_bw + PDEAndMetaPTEBytesFrame * HostVMInefficiencyFactor / prefetch_bw_equ, Tvm_trips, LineTime / 4); @@ -3844,7 +3831,7 @@ bool dml32_CalculatePrefetchSchedule( Tvm_equ = LineTime / 4; } - if ((GPUVMEnable == true || myPipe->DCCEnable == true)) { + if ((v->GPUVMEnable == true || myPipe->DCCEnable == true)) { Tr0_equ = dml_max4((MetaRowByte + PixelPTEBytesPerRow * HostVMInefficiencyFactor) / prefetch_bw_equ, Tr0_trips, (LineTime - Tvm_equ) / 2, LineTime / 4); diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h index 7cdf0418830a..3dbc9cf46aad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h @@ -714,28 +714,14 @@ double dml32_CalculateExtraLatency( unsigned int HostVMMaxNonCachedPageTableLevels); bool dml32_CalculatePrefetchSchedule( + struct vba_vars_st *v, + unsigned int k, double HostVMInefficiencyFactor, DmlPipe *myPipe, unsigned int DSCDelay, - double DPPCLKDelaySubtotalPlusCNVCFormater, - double DPPCLKDelaySCL, - double DPPCLKDelaySCLLBOnly, - double DPPCLKDelayCNVCCursor, - double DISPCLKDelaySubtotal, unsigned int DPP_RECOUT_WIDTH, - enum output_format_class OutputFormat, - unsigned int MaxInterDCNTileRepeaters, unsigned int VStartup, unsigned int MaxVStartup, - unsigned int GPUVMPageTableLevels, - bool GPUVMEnable, - bool HostVMEnable, - unsigned int HostVMMaxNonCachedPageTableLevels, - double HostVMMinPageSize, - bool DynamicMetadataEnable, - bool DynamicMetadataVMEnabled, - int DynamicMetadataLinesBeforeActiveRequired, - unsigned int DynamicMetadataTransmittedBytes, double UrgentLatency, double UrgentExtraLatency, double TCalc, From patchwork Tue Aug 30 20:34:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor X-Patchwork-Id: 12959854 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD0E8C65C0D for ; Tue, 30 Aug 2022 20:35:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4A62B10E13B; Tue, 30 Aug 2022 20:35:13 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB72710E12D; Tue, 30 Aug 2022 20:34:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2CE7FB81DC7; Tue, 30 Aug 2022 20:34:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBD44C433C1; Tue, 30 Aug 2022 20:34:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661891681; bh=16t0x2n4Flcre9YXo43Tz/IS7eY4V2y0TnxxbYQOx5c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CJlxlLuGxc/WkcMBLHraAy0fUtzKz2FbQfw++2LSpS9pOeWZi2F7CxULnTk4FpLMw ef0SRW/mO9BJDenOC6FmS8p/QrGT7PHLLFE72yJcaV8AySugkxIe3A+27mjEzvHEgd 5AadXSbGt4AXDOjjMGpVE4foLT6pwf6xEtzNDxDuS0cEUXIzBzqOl7TMQ8Iss0HoCi /ISws03LLn0p1kQ9uycJTQZCjSXYJ1idmARMLnLXTeJ6d9uSBXdUg7/gYEf1ETU6wY 8Fng9mWec1tP0dThoC8U1ueY3geOQcx6wFFwpk2jfZvUIVU3xg4bQs9A9i0hGZcQqH G7P8JRqbyDzCw== From: Nathan Chancellor To: Harry Wentland , Leo Li , Rodrigo Siqueira , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" Subject: [PATCH 3/5] drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport() Date: Tue, 30 Aug 2022 13:34:07 -0700 Message-Id: <20220830203409.3491379-4-nathan@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220830203409.3491379-1-nathan@kernel.org> References: <20220830203409.3491379-1-nathan@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Rix , llvm@lists.linux.dev, Nick Desaulniers , patches@lists.linux.dev, dri-devel@lists.freedesktop.org, Nathan Chancellor , amd-gfx@lists.freedesktop.org, Sudip Mukherjee Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml31_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with LLVM 16 (2216 -> 1976), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame size (2216) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1681 Reported-by: "Sudip Mukherjee (Codethink)" Signed-off-by: Nathan Chancellor --- .../dc/dml/dcn31/display_mode_vba_31.c | 248 ++++-------------- 1 file changed, 52 insertions(+), 196 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c index d63b4209b14c..21c74ee1deec 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c @@ -311,64 +311,28 @@ static void CalculateVupdateAndDynamicMetadataParameters( static void CalculateWatermarksAndDRAMSpeedChangeSupport( struct display_mode_lib *mode_lib, unsigned int PrefetchMode, - unsigned int NumberOfActivePlanes, - unsigned int MaxLineBufferLines, - unsigned int LineBufferSize, - unsigned int WritebackInterfaceBufferSize, double DCFCLK, double ReturnBW, - bool SynchronizedVBlank, - unsigned int dpte_group_bytes[], - unsigned int MetaChunkSize, double UrgentLatency, double ExtraLatency, - double WritebackLatency, - double WritebackChunkSize, double SOCCLK, - double DRAMClockChangeLatency, - double SRExitTime, - double SREnterPlusExitTime, - double SRExitZ8Time, - double SREnterPlusExitZ8Time, double DCFCLKDeepSleep, unsigned int DETBufferSizeY[], unsigned int DETBufferSizeC[], unsigned int SwathHeightY[], unsigned int SwathHeightC[], - unsigned int LBBitPerPixel[], double SwathWidthY[], double SwathWidthC[], - double HRatio[], - double HRatioChroma[], - unsigned int vtaps[], - unsigned int VTAPsChroma[], - double VRatio[], - double VRatioChroma[], - unsigned int HTotal[], - double PixelClock[], - unsigned int BlendingAndTiming[], unsigned int DPPPerPlane[], double BytePerPixelDETY[], double BytePerPixelDETC[], - double DSTXAfterScaler[], - double DSTYAfterScaler[], - bool WritebackEnable[], - enum source_format_class WritebackPixelFormat[], - double WritebackDestinationWidth[], - double WritebackDestinationHeight[], - double WritebackSourceHeight[], bool UnboundedRequestEnabled, int unsigned CompressedBufferSizeInkByte, enum clock_change_support *DRAMClockChangeSupport, - double *UrgentWatermark, - double *WritebackUrgentWatermark, - double *DRAMClockChangeWatermark, - double *WritebackDRAMClockChangeWatermark, double *StutterExitWatermark, double *StutterEnterPlusExitWatermark, double *Z8StutterExitWatermark, - double *Z8StutterEnterPlusExitWatermark, - double *MinActiveDRAMClockChangeLatencySupported); + double *Z8StutterEnterPlusExitWatermark); static void CalculateDCFCLKDeepSleep( struct display_mode_lib *mode_lib, @@ -3017,64 +2981,28 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman CalculateWatermarksAndDRAMSpeedChangeSupport( mode_lib, PrefetchMode, - v->NumberOfActivePlanes, - v->MaxLineBufferLines, - v->LineBufferSize, - v->WritebackInterfaceBufferSize, v->DCFCLK, v->ReturnBW, - v->SynchronizedVBlank, - v->dpte_group_bytes, - v->MetaChunkSize, v->UrgentLatency, v->UrgentExtraLatency, - v->WritebackLatency, - v->WritebackChunkSize, v->SOCCLK, - v->DRAMClockChangeLatency, - v->SRExitTime, - v->SREnterPlusExitTime, - v->SRExitZ8Time, - v->SREnterPlusExitZ8Time, v->DCFCLKDeepSleep, v->DETBufferSizeY, v->DETBufferSizeC, v->SwathHeightY, v->SwathHeightC, - v->LBBitPerPixel, v->SwathWidthY, v->SwathWidthC, - v->HRatio, - v->HRatioChroma, - v->vtaps, - v->VTAPsChroma, - v->VRatio, - v->VRatioChroma, - v->HTotal, - v->PixelClock, - v->BlendingAndTiming, v->DPPPerPlane, v->BytePerPixelDETY, v->BytePerPixelDETC, - v->DSTXAfterScaler, - v->DSTYAfterScaler, - v->WritebackEnable, - v->WritebackPixelFormat, - v->WritebackDestinationWidth, - v->WritebackDestinationHeight, - v->WritebackSourceHeight, v->UnboundedRequestEnabled, v->CompressedBufferSizeInkByte, &DRAMClockChangeSupport, - &v->UrgentWatermark, - &v->WritebackUrgentWatermark, - &v->DRAMClockChangeWatermark, - &v->WritebackDRAMClockChangeWatermark, &v->StutterExitWatermark, &v->StutterEnterPlusExitWatermark, &v->Z8StutterExitWatermark, - &v->Z8StutterEnterPlusExitWatermark, - &v->MinActiveDRAMClockChangeLatencySupported); + &v->Z8StutterEnterPlusExitWatermark); for (k = 0; k < v->NumberOfActivePlanes; ++k) { if (v->WritebackEnable[k] == true) { @@ -5384,64 +5312,28 @@ void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l CalculateWatermarksAndDRAMSpeedChangeSupport( mode_lib, v->PrefetchModePerState[i][j], - v->NumberOfActivePlanes, - v->MaxLineBufferLines, - v->LineBufferSize, - v->WritebackInterfaceBufferSize, v->DCFCLKState[i][j], v->ReturnBWPerState[i][j], - v->SynchronizedVBlank, - v->dpte_group_bytes, - v->MetaChunkSize, v->UrgLatency[i], v->ExtraLatency, - v->WritebackLatency, - v->WritebackChunkSize, v->SOCCLKPerState[i], - v->DRAMClockChangeLatency, - v->SRExitTime, - v->SREnterPlusExitTime, - v->SRExitZ8Time, - v->SREnterPlusExitZ8Time, v->ProjectedDCFCLKDeepSleep[i][j], v->DETBufferSizeYThisState, v->DETBufferSizeCThisState, v->SwathHeightYThisState, v->SwathHeightCThisState, - v->LBBitPerPixel, v->SwathWidthYThisState, v->SwathWidthCThisState, - v->HRatio, - v->HRatioChroma, - v->vtaps, - v->VTAPsChroma, - v->VRatio, - v->VRatioChroma, - v->HTotal, - v->PixelClock, - v->BlendingAndTiming, v->NoOfDPPThisState, v->BytePerPixelInDETY, v->BytePerPixelInDETC, - v->DSTXAfterScaler, - v->DSTYAfterScaler, - v->WritebackEnable, - v->WritebackPixelFormat, - v->WritebackDestinationWidth, - v->WritebackDestinationHeight, - v->WritebackSourceHeight, UnboundedRequestEnabledThisState, CompressedBufferSizeInkByteThisState, &v->DRAMClockChangeSupport[i][j], - &v->UrgentWatermark, - &v->WritebackUrgentWatermark, - &v->DRAMClockChangeWatermark, - &v->WritebackDRAMClockChangeWatermark, - &dummy, &dummy, &dummy, &dummy, - &v->MinActiveDRAMClockChangeLatencySupported); + &dummy); } } @@ -5566,64 +5458,28 @@ void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l static void CalculateWatermarksAndDRAMSpeedChangeSupport( struct display_mode_lib *mode_lib, unsigned int PrefetchMode, - unsigned int NumberOfActivePlanes, - unsigned int MaxLineBufferLines, - unsigned int LineBufferSize, - unsigned int WritebackInterfaceBufferSize, double DCFCLK, double ReturnBW, - bool SynchronizedVBlank, - unsigned int dpte_group_bytes[], - unsigned int MetaChunkSize, double UrgentLatency, double ExtraLatency, - double WritebackLatency, - double WritebackChunkSize, double SOCCLK, - double DRAMClockChangeLatency, - double SRExitTime, - double SREnterPlusExitTime, - double SRExitZ8Time, - double SREnterPlusExitZ8Time, double DCFCLKDeepSleep, unsigned int DETBufferSizeY[], unsigned int DETBufferSizeC[], unsigned int SwathHeightY[], unsigned int SwathHeightC[], - unsigned int LBBitPerPixel[], double SwathWidthY[], double SwathWidthC[], - double HRatio[], - double HRatioChroma[], - unsigned int vtaps[], - unsigned int VTAPsChroma[], - double VRatio[], - double VRatioChroma[], - unsigned int HTotal[], - double PixelClock[], - unsigned int BlendingAndTiming[], unsigned int DPPPerPlane[], double BytePerPixelDETY[], double BytePerPixelDETC[], - double DSTXAfterScaler[], - double DSTYAfterScaler[], - bool WritebackEnable[], - enum source_format_class WritebackPixelFormat[], - double WritebackDestinationWidth[], - double WritebackDestinationHeight[], - double WritebackSourceHeight[], bool UnboundedRequestEnabled, int unsigned CompressedBufferSizeInkByte, enum clock_change_support *DRAMClockChangeSupport, - double *UrgentWatermark, - double *WritebackUrgentWatermark, - double *DRAMClockChangeWatermark, - double *WritebackDRAMClockChangeWatermark, double *StutterExitWatermark, double *StutterEnterPlusExitWatermark, double *Z8StutterExitWatermark, - double *Z8StutterEnterPlusExitWatermark, - double *MinActiveDRAMClockChangeLatencySupported) + double *Z8StutterEnterPlusExitWatermark) { struct vba_vars_st *v = &mode_lib->vba; double EffectiveLBLatencyHidingY; @@ -5643,103 +5499,103 @@ static void CalculateWatermarksAndDRAMSpeedChangeSupport( double TotalPixelBW = 0.0; int k, j; - *UrgentWatermark = UrgentLatency + ExtraLatency; + v->UrgentWatermark = UrgentLatency + ExtraLatency; #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: UrgentLatency = %f\n", __func__, UrgentLatency); dml_print("DML::%s: ExtraLatency = %f\n", __func__, ExtraLatency); - dml_print("DML::%s: UrgentWatermark = %f\n", __func__, *UrgentWatermark); + dml_print("DML::%s: UrgentWatermark = %f\n", __func__, v->UrgentWatermark); #endif - *DRAMClockChangeWatermark = DRAMClockChangeLatency + *UrgentWatermark; + v->DRAMClockChangeWatermark = v->DRAMClockChangeLatency + v->UrgentWatermark; #ifdef __DML_VBA_DEBUG__ - dml_print("DML::%s: DRAMClockChangeLatency = %f\n", __func__, DRAMClockChangeLatency); - dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, *DRAMClockChangeWatermark); + dml_print("DML::%s: v->DRAMClockChangeLatency = %f\n", __func__, v->DRAMClockChangeLatency ); + dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, v->DRAMClockChangeWatermark); #endif v->TotalActiveWriteback = 0; - for (k = 0; k < NumberOfActivePlanes; ++k) { - if (WritebackEnable[k] == true) { + for (k = 0; k < v->NumberOfActivePlanes; ++k) { + if (v->WritebackEnable[k] == true) { v->TotalActiveWriteback = v->TotalActiveWriteback + 1; } } if (v->TotalActiveWriteback <= 1) { - *WritebackUrgentWatermark = WritebackLatency; + v->WritebackUrgentWatermark = v->WritebackLatency; } else { - *WritebackUrgentWatermark = WritebackLatency + WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; + v->WritebackUrgentWatermark = v->WritebackLatency + v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; } if (v->TotalActiveWriteback <= 1) { - *WritebackDRAMClockChangeWatermark = DRAMClockChangeLatency + WritebackLatency; + v->WritebackDRAMClockChangeWatermark = v->DRAMClockChangeLatency + v->WritebackLatency; } else { - *WritebackDRAMClockChangeWatermark = DRAMClockChangeLatency + WritebackLatency + WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; + v->WritebackDRAMClockChangeWatermark = v->DRAMClockChangeLatency + v->WritebackLatency + v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK; } - for (k = 0; k < NumberOfActivePlanes; ++k) { + for (k = 0; k < v->NumberOfActivePlanes; ++k) { TotalPixelBW = TotalPixelBW - + DPPPerPlane[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k] + SwathWidthC[k] * BytePerPixelDETC[k] * VRatioChroma[k]) - / (HTotal[k] / PixelClock[k]); + + DPPPerPlane[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k] + SwathWidthC[k] * BytePerPixelDETC[k] * v->VRatioChroma[k]) + / (v->HTotal[k] / v->PixelClock[k]); } - for (k = 0; k < NumberOfActivePlanes; ++k) { + for (k = 0; k < v->NumberOfActivePlanes; ++k) { double EffectiveDETBufferSizeY = DETBufferSizeY[k]; v->LBLatencyHidingSourceLinesY = dml_min( - (double) MaxLineBufferLines, - dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(HRatio[k], 1.0)), 1)) - (vtaps[k] - 1); + (double) v->MaxLineBufferLines, + dml_floor(v->LineBufferSize / v->LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(v->HRatio[k], 1.0)), 1)) - (v->vtaps[k] - 1); v->LBLatencyHidingSourceLinesC = dml_min( - (double) MaxLineBufferLines, - dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(HRatioChroma[k], 1.0)), 1)) - (VTAPsChroma[k] - 1); + (double) v->MaxLineBufferLines, + dml_floor(v->LineBufferSize / v->LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(v->HRatioChroma[k], 1.0)), 1)) - (v->VTAPsChroma[k] - 1); - EffectiveLBLatencyHidingY = v->LBLatencyHidingSourceLinesY / VRatio[k] * (HTotal[k] / PixelClock[k]); + EffectiveLBLatencyHidingY = v->LBLatencyHidingSourceLinesY / v->VRatio[k] * (v->HTotal[k] / v->PixelClock[k]); - EffectiveLBLatencyHidingC = v->LBLatencyHidingSourceLinesC / VRatioChroma[k] * (HTotal[k] / PixelClock[k]); + EffectiveLBLatencyHidingC = v->LBLatencyHidingSourceLinesC / v->VRatioChroma[k] * (v->HTotal[k] / v->PixelClock[k]); if (UnboundedRequestEnabled) { EffectiveDETBufferSizeY = EffectiveDETBufferSizeY - + CompressedBufferSizeInkByte * 1024 * SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k] / (HTotal[k] / PixelClock[k]) / TotalPixelBW; + + CompressedBufferSizeInkByte * 1024 * SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k] / (v->HTotal[k] / v->PixelClock[k]) / TotalPixelBW; } LinesInDETY[k] = (double) EffectiveDETBufferSizeY / BytePerPixelDETY[k] / SwathWidthY[k]; LinesInDETYRoundedDownToSwath[k] = dml_floor(LinesInDETY[k], SwathHeightY[k]); - FullDETBufferingTimeY = LinesInDETYRoundedDownToSwath[k] * (HTotal[k] / PixelClock[k]) / VRatio[k]; + FullDETBufferingTimeY = LinesInDETYRoundedDownToSwath[k] * (v->HTotal[k] / v->PixelClock[k]) / v->VRatio[k]; if (BytePerPixelDETC[k] > 0) { LinesInDETC = v->DETBufferSizeC[k] / BytePerPixelDETC[k] / SwathWidthC[k]; LinesInDETCRoundedDownToSwath = dml_floor(LinesInDETC, SwathHeightC[k]); - FullDETBufferingTimeC = LinesInDETCRoundedDownToSwath * (HTotal[k] / PixelClock[k]) / VRatioChroma[k]; + FullDETBufferingTimeC = LinesInDETCRoundedDownToSwath * (v->HTotal[k] / v->PixelClock[k]) / v->VRatioChroma[k]; } else { LinesInDETC = 0; FullDETBufferingTimeC = 999999; } ActiveDRAMClockChangeLatencyMarginY = EffectiveLBLatencyHidingY + FullDETBufferingTimeY - - ((double) DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k] / PixelClock[k] - *UrgentWatermark - *DRAMClockChangeWatermark; + - ((double) v->DSTXAfterScaler[k] / v->HTotal[k] + v->DSTYAfterScaler[k]) * v->HTotal[k] / v->PixelClock[k] - v->UrgentWatermark - v->DRAMClockChangeWatermark; - if (NumberOfActivePlanes > 1) { + if (v->NumberOfActivePlanes > 1) { ActiveDRAMClockChangeLatencyMarginY = ActiveDRAMClockChangeLatencyMarginY - - (1 - 1.0 / NumberOfActivePlanes) * SwathHeightY[k] * HTotal[k] / PixelClock[k] / VRatio[k]; + - (1 - 1.0 / v->NumberOfActivePlanes) * SwathHeightY[k] * v->HTotal[k] / v->PixelClock[k] / v->VRatio[k]; } if (BytePerPixelDETC[k] > 0) { ActiveDRAMClockChangeLatencyMarginC = EffectiveLBLatencyHidingC + FullDETBufferingTimeC - - ((double) DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k] / PixelClock[k] - *UrgentWatermark - *DRAMClockChangeWatermark; + - ((double) v->DSTXAfterScaler[k] / v->HTotal[k] + v->DSTYAfterScaler[k]) * v->HTotal[k] / v->PixelClock[k] - v->UrgentWatermark - v->DRAMClockChangeWatermark; - if (NumberOfActivePlanes > 1) { + if (v->NumberOfActivePlanes > 1) { ActiveDRAMClockChangeLatencyMarginC = ActiveDRAMClockChangeLatencyMarginC - - (1 - 1.0 / NumberOfActivePlanes) * SwathHeightC[k] * HTotal[k] / PixelClock[k] / VRatioChroma[k]; + - (1 - 1.0 / v->NumberOfActivePlanes) * SwathHeightC[k] * v->HTotal[k] / v->PixelClock[k] / v->VRatioChroma[k]; } v->ActiveDRAMClockChangeLatencyMargin[k] = dml_min(ActiveDRAMClockChangeLatencyMarginY, ActiveDRAMClockChangeLatencyMarginC); } else { v->ActiveDRAMClockChangeLatencyMargin[k] = ActiveDRAMClockChangeLatencyMarginY; } - if (WritebackEnable[k] == true) { - WritebackDRAMClockChangeLatencyHiding = WritebackInterfaceBufferSize * 1024 - / (WritebackDestinationWidth[k] * WritebackDestinationHeight[k] / (WritebackSourceHeight[k] * HTotal[k] / PixelClock[k]) * 4); - if (WritebackPixelFormat[k] == dm_444_64) { + if (v->WritebackEnable[k] == true) { + WritebackDRAMClockChangeLatencyHiding = v->WritebackInterfaceBufferSize * 1024 + / (v->WritebackDestinationWidth[k] * v->WritebackDestinationHeight[k] / (v->WritebackSourceHeight[k] * v->HTotal[k] / v->PixelClock[k]) * 4); + if (v->WritebackPixelFormat[k] == dm_444_64) { WritebackDRAMClockChangeLatencyHiding = WritebackDRAMClockChangeLatencyHiding / 2; } WritebackDRAMClockChangeLatencyMargin = WritebackDRAMClockChangeLatencyHiding - v->WritebackDRAMClockChangeWatermark; @@ -5749,14 +5605,14 @@ static void CalculateWatermarksAndDRAMSpeedChangeSupport( v->MinActiveDRAMClockChangeMargin = 999999; PlaneWithMinActiveDRAMClockChangeMargin = 0; - for (k = 0; k < NumberOfActivePlanes; ++k) { + for (k = 0; k < v->NumberOfActivePlanes; ++k) { if (v->ActiveDRAMClockChangeLatencyMargin[k] < v->MinActiveDRAMClockChangeMargin) { v->MinActiveDRAMClockChangeMargin = v->ActiveDRAMClockChangeLatencyMargin[k]; - if (BlendingAndTiming[k] == k) { + if (v->BlendingAndTiming[k] == k) { PlaneWithMinActiveDRAMClockChangeMargin = k; } else { - for (j = 0; j < NumberOfActivePlanes; ++j) { - if (BlendingAndTiming[k] == j) { + for (j = 0; j < v->NumberOfActivePlanes; ++j) { + if (v->BlendingAndTiming[k] == j) { PlaneWithMinActiveDRAMClockChangeMargin = j; } } @@ -5764,11 +5620,11 @@ static void CalculateWatermarksAndDRAMSpeedChangeSupport( } } - *MinActiveDRAMClockChangeLatencySupported = v->MinActiveDRAMClockChangeMargin + DRAMClockChangeLatency; + v->MinActiveDRAMClockChangeLatencySupported = v->MinActiveDRAMClockChangeMargin + v->DRAMClockChangeLatency ; SecondMinActiveDRAMClockChangeMarginOneDisplayInVBLank = 999999; - for (k = 0; k < NumberOfActivePlanes; ++k) { - if (!((k == PlaneWithMinActiveDRAMClockChangeMargin) && (BlendingAndTiming[k] == k)) && !(BlendingAndTiming[k] == PlaneWithMinActiveDRAMClockChangeMargin) + for (k = 0; k < v->NumberOfActivePlanes; ++k) { + if (!((k == PlaneWithMinActiveDRAMClockChangeMargin) && (v->BlendingAndTiming[k] == k)) && !(v->BlendingAndTiming[k] == PlaneWithMinActiveDRAMClockChangeMargin) && v->ActiveDRAMClockChangeLatencyMargin[k] < SecondMinActiveDRAMClockChangeMarginOneDisplayInVBLank) { SecondMinActiveDRAMClockChangeMarginOneDisplayInVBLank = v->ActiveDRAMClockChangeLatencyMargin[k]; } @@ -5776,25 +5632,25 @@ static void CalculateWatermarksAndDRAMSpeedChangeSupport( v->TotalNumberOfActiveOTG = 0; - for (k = 0; k < NumberOfActivePlanes; ++k) { - if (BlendingAndTiming[k] == k) { + for (k = 0; k < v->NumberOfActivePlanes; ++k) { + if (v->BlendingAndTiming[k] == k) { v->TotalNumberOfActiveOTG = v->TotalNumberOfActiveOTG + 1; } } if (v->MinActiveDRAMClockChangeMargin > 0 && PrefetchMode == 0) { *DRAMClockChangeSupport = dm_dram_clock_change_vactive; - } else if ((SynchronizedVBlank == true || v->TotalNumberOfActiveOTG == 1 + } else if ((v->SynchronizedVBlank == true || v->TotalNumberOfActiveOTG == 1 || SecondMinActiveDRAMClockChangeMarginOneDisplayInVBLank > 0) && PrefetchMode == 0) { *DRAMClockChangeSupport = dm_dram_clock_change_vblank; } else { *DRAMClockChangeSupport = dm_dram_clock_change_unsupported; } - *StutterExitWatermark = SRExitTime + ExtraLatency + 10 / DCFCLKDeepSleep; - *StutterEnterPlusExitWatermark = (SREnterPlusExitTime + ExtraLatency + 10 / DCFCLKDeepSleep); - *Z8StutterExitWatermark = SRExitZ8Time + ExtraLatency + 10 / DCFCLKDeepSleep; - *Z8StutterEnterPlusExitWatermark = SREnterPlusExitZ8Time + ExtraLatency + 10 / DCFCLKDeepSleep; + *StutterExitWatermark = v->SRExitTime + ExtraLatency + 10 / DCFCLKDeepSleep; + *StutterEnterPlusExitWatermark = (v->SREnterPlusExitTime + ExtraLatency + 10 / DCFCLKDeepSleep); + *Z8StutterExitWatermark = v->SRExitZ8Time + ExtraLatency + 10 / DCFCLKDeepSleep; + *Z8StutterEnterPlusExitWatermark = v->SREnterPlusExitZ8Time + ExtraLatency + 10 / DCFCLKDeepSleep; #ifdef __DML_VBA_DEBUG__ dml_print("DML::%s: StutterExitWatermark = %f\n", __func__, *StutterExitWatermark); From patchwork Tue Aug 30 20:34:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor X-Patchwork-Id: 12959856 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EBDEFECAAA1 for ; Tue, 30 Aug 2022 20:35:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 42E3510E140; Tue, 30 Aug 2022 20:35:14 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id ABABB10E010; Tue, 30 Aug 2022 20:34:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 67D38B81DBF; Tue, 30 Aug 2022 20:34:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48B57C43470; Tue, 30 Aug 2022 20:34:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661891683; bh=CUtBjjdshpPXxjJcJz+3IMXYzCXn/0q/pIedPVkG+hM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZVBxag8peYCWX904DWFiHv1MDJlVTwosJuoadzEKa5jTuVnYKt6KhK54koL42j6/c l2tRtpgH5qEKFSJ768DJ431k3uS2jF+k8Y9JleW6BnTSRXtEXcTm1dR7Yo/7ERC/Fy AljedMEhuB6BtrIStVVN6mVFoohdZvD9CHlEr7SjY1c2HskhZ9jKF9xeTfCJvy9kOq YycSfQCEQUDQlEmLzQDsw7NDFg8PEhPqR4/6q2tkNt6wbVVW9O8ZxoLwjg6M3m4dy0 NmoPyShDfDPF2CQp23R4mow2V/PSfglLG6JY22kcoSirf+9K/zfeCwjb2I9Ar9qc4N KYsoWqbNv73Eg== From: Nathan Chancellor To: Harry Wentland , Leo Li , Rodrigo Siqueira , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" Subject: [PATCH 4/5] drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule() Date: Tue, 30 Aug 2022 13:34:08 -0700 Message-Id: <20220830203409.3491379-5-nathan@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220830203409.3491379-1-nathan@kernel.org> References: <20220830203409.3491379-1-nathan@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Rix , llvm@lists.linux.dev, Nick Desaulniers , patches@lists.linux.dev, dri-devel@lists.freedesktop.org, Nathan Chancellor , amd-gfx@lists.freedesktop.org, Sudip Mukherjee Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml31_ModeSupportAndSystemConfigurationFull() uses by 112 bytes with LLVM 16 (1976 -> 1864), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame size (2216) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1681 Reported-by: "Sudip Mukherjee (Codethink)" Signed-off-by: Nathan Chancellor --- .../dc/dml/dcn31/display_mode_vba_31.c | 172 +++++------------- 1 file changed, 47 insertions(+), 125 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c index 21c74ee1deec..60ee936e6436 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c @@ -251,33 +251,13 @@ static void CalculateRowBandwidth( static void CalculateFlipSchedule( struct display_mode_lib *mode_lib, + unsigned int k, double HostVMInefficiencyFactor, double UrgentExtraLatency, double UrgentLatency, - unsigned int GPUVMMaxPageTableLevels, - bool HostVMEnable, - unsigned int HostVMMaxNonCachedPageTableLevels, - bool GPUVMEnable, - double HostVMMinPageSize, double PDEAndMetaPTEBytesPerFrame, double MetaRowBytes, - double DPTEBytesPerRow, - double BandwidthAvailableForImmediateFlip, - unsigned int TotImmediateFlipBytes, - enum source_format_class SourcePixelFormat, - double LineTime, - double VRatio, - double VRatioChroma, - double Tno_bw, - bool DCCEnable, - unsigned int dpte_row_height, - unsigned int meta_row_height, - unsigned int dpte_row_height_chroma, - unsigned int meta_row_height_chroma, - double *DestinationLinesToRequestVMInImmediateFlip, - double *DestinationLinesToRequestRowInImmediateFlip, - double *final_flip_bw, - bool *ImmediateFlipSupportedForPipe); + double DPTEBytesPerRow); static double CalculateWriteBackDelay( enum source_format_class WritebackPixelFormat, double WritebackHRatio, @@ -2868,33 +2848,13 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman for (k = 0; k < v->NumberOfActivePlanes; ++k) { CalculateFlipSchedule( mode_lib, + k, HostVMInefficiencyFactor, v->UrgentExtraLatency, v->UrgentLatency, - v->GPUVMMaxPageTableLevels, - v->HostVMEnable, - v->HostVMMaxNonCachedPageTableLevels, - v->GPUVMEnable, - v->HostVMMinPageSize, v->PDEAndMetaPTEBytesFrame[k], v->MetaRowByte[k], - v->PixelPTEBytesPerRow[k], - v->BandwidthAvailableForImmediateFlip, - v->TotImmediateFlipBytes, - v->SourcePixelFormat[k], - v->HTotal[k] / v->PixelClock[k], - v->VRatio[k], - v->VRatioChroma[k], - v->Tno_bw[k], - v->DCCEnable[k], - v->dpte_row_height[k], - v->meta_row_height[k], - v->dpte_row_height_chroma[k], - v->meta_row_height_chroma[k], - &v->DestinationLinesToRequestVMInImmediateFlip[k], - &v->DestinationLinesToRequestRowInImmediateFlip[k], - &v->final_flip_bw[k], - &v->ImmediateFlipSupportedForPipe[k]); + v->PixelPTEBytesPerRow[k]); } v->total_dcn_read_bw_with_flip = 0.0; @@ -3526,61 +3486,43 @@ static void CalculateRowBandwidth( static void CalculateFlipSchedule( struct display_mode_lib *mode_lib, + unsigned int k, double HostVMInefficiencyFactor, double UrgentExtraLatency, double UrgentLatency, - unsigned int GPUVMMaxPageTableLevels, - bool HostVMEnable, - unsigned int HostVMMaxNonCachedPageTableLevels, - bool GPUVMEnable, - double HostVMMinPageSize, double PDEAndMetaPTEBytesPerFrame, double MetaRowBytes, - double DPTEBytesPerRow, - double BandwidthAvailableForImmediateFlip, - unsigned int TotImmediateFlipBytes, - enum source_format_class SourcePixelFormat, - double LineTime, - double VRatio, - double VRatioChroma, - double Tno_bw, - bool DCCEnable, - unsigned int dpte_row_height, - unsigned int meta_row_height, - unsigned int dpte_row_height_chroma, - unsigned int meta_row_height_chroma, - double *DestinationLinesToRequestVMInImmediateFlip, - double *DestinationLinesToRequestRowInImmediateFlip, - double *final_flip_bw, - bool *ImmediateFlipSupportedForPipe) + double DPTEBytesPerRow) { + struct vba_vars_st *v = &mode_lib->vba; double min_row_time = 0.0; unsigned int HostVMDynamicLevelsTrips; double TimeForFetchingMetaPTEImmediateFlip; double TimeForFetchingRowInVBlankImmediateFlip; double ImmediateFlipBW; + double LineTime = v->HTotal[k] / v->PixelClock[k]; - if (GPUVMEnable == true && HostVMEnable == true) { - HostVMDynamicLevelsTrips = HostVMMaxNonCachedPageTableLevels; + if (v->GPUVMEnable == true && v->HostVMEnable == true) { + HostVMDynamicLevelsTrips = v->HostVMMaxNonCachedPageTableLevels; } else { HostVMDynamicLevelsTrips = 0; } - if (GPUVMEnable == true || DCCEnable == true) { - ImmediateFlipBW = (PDEAndMetaPTEBytesPerFrame + MetaRowBytes + DPTEBytesPerRow) * BandwidthAvailableForImmediateFlip / TotImmediateFlipBytes; + if (v->GPUVMEnable == true || v->DCCEnable[k] == true) { + ImmediateFlipBW = (PDEAndMetaPTEBytesPerFrame + MetaRowBytes + DPTEBytesPerRow) * v->BandwidthAvailableForImmediateFlip / v->TotImmediateFlipBytes; } - if (GPUVMEnable == true) { + if (v->GPUVMEnable == true) { TimeForFetchingMetaPTEImmediateFlip = dml_max3( - Tno_bw + PDEAndMetaPTEBytesPerFrame * HostVMInefficiencyFactor / ImmediateFlipBW, - UrgentExtraLatency + UrgentLatency * (GPUVMMaxPageTableLevels * (HostVMDynamicLevelsTrips + 1) - 1), + v->Tno_bw[k] + PDEAndMetaPTEBytesPerFrame * HostVMInefficiencyFactor / ImmediateFlipBW, + UrgentExtraLatency + UrgentLatency * (v->GPUVMMaxPageTableLevels * (HostVMDynamicLevelsTrips + 1) - 1), LineTime / 4.0); } else { TimeForFetchingMetaPTEImmediateFlip = 0; } - *DestinationLinesToRequestVMInImmediateFlip = dml_ceil(4.0 * (TimeForFetchingMetaPTEImmediateFlip / LineTime), 1) / 4.0; - if ((GPUVMEnable == true || DCCEnable == true)) { + v->DestinationLinesToRequestVMInImmediateFlip[k] = dml_ceil(4.0 * (TimeForFetchingMetaPTEImmediateFlip / LineTime), 1) / 4.0; + if ((v->GPUVMEnable == true || v->DCCEnable[k] == true)) { TimeForFetchingRowInVBlankImmediateFlip = dml_max3( (MetaRowBytes + DPTEBytesPerRow * HostVMInefficiencyFactor) / ImmediateFlipBW, UrgentLatency * (HostVMDynamicLevelsTrips + 1), @@ -3589,54 +3531,54 @@ static void CalculateFlipSchedule( TimeForFetchingRowInVBlankImmediateFlip = 0; } - *DestinationLinesToRequestRowInImmediateFlip = dml_ceil(4.0 * (TimeForFetchingRowInVBlankImmediateFlip / LineTime), 1) / 4.0; + v->DestinationLinesToRequestRowInImmediateFlip[k] = dml_ceil(4.0 * (TimeForFetchingRowInVBlankImmediateFlip / LineTime), 1) / 4.0; - if (GPUVMEnable == true) { - *final_flip_bw = dml_max( - PDEAndMetaPTEBytesPerFrame * HostVMInefficiencyFactor / (*DestinationLinesToRequestVMInImmediateFlip * LineTime), - (MetaRowBytes + DPTEBytesPerRow * HostVMInefficiencyFactor) / (*DestinationLinesToRequestRowInImmediateFlip * LineTime)); - } else if ((GPUVMEnable == true || DCCEnable == true)) { - *final_flip_bw = (MetaRowBytes + DPTEBytesPerRow * HostVMInefficiencyFactor) / (*DestinationLinesToRequestRowInImmediateFlip * LineTime); + if (v->GPUVMEnable == true) { + v->final_flip_bw[k] = dml_max( + PDEAndMetaPTEBytesPerFrame * HostVMInefficiencyFactor / (v->DestinationLinesToRequestVMInImmediateFlip[k] * LineTime), + (MetaRowBytes + DPTEBytesPerRow * HostVMInefficiencyFactor) / (v->DestinationLinesToRequestRowInImmediateFlip[k] * LineTime)); + } else if ((v->GPUVMEnable == true || v->DCCEnable[k] == true)) { + v->final_flip_bw[k] = (MetaRowBytes + DPTEBytesPerRow * HostVMInefficiencyFactor) / (v->DestinationLinesToRequestRowInImmediateFlip[k] * LineTime); } else { - *final_flip_bw = 0; + v->final_flip_bw[k] = 0; } - if (SourcePixelFormat == dm_420_8 || SourcePixelFormat == dm_420_10 || SourcePixelFormat == dm_rgbe_alpha) { - if (GPUVMEnable == true && DCCEnable != true) { - min_row_time = dml_min(dpte_row_height * LineTime / VRatio, dpte_row_height_chroma * LineTime / VRatioChroma); - } else if (GPUVMEnable != true && DCCEnable == true) { - min_row_time = dml_min(meta_row_height * LineTime / VRatio, meta_row_height_chroma * LineTime / VRatioChroma); + if (v->SourcePixelFormat[k] == dm_420_8 || v->SourcePixelFormat[k] == dm_420_10 || v->SourcePixelFormat[k] == dm_rgbe_alpha) { + if (v->GPUVMEnable == true && v->DCCEnable[k] != true) { + min_row_time = dml_min(v->dpte_row_height[k] * LineTime / v->VRatio[k], v->dpte_row_height_chroma[k] * LineTime / v->VRatioChroma[k]); + } else if (v->GPUVMEnable != true && v->DCCEnable[k] == true) { + min_row_time = dml_min(v->meta_row_height[k] * LineTime / v->VRatio[k], v->meta_row_height_chroma[k] * LineTime / v->VRatioChroma[k]); } else { min_row_time = dml_min4( - dpte_row_height * LineTime / VRatio, - meta_row_height * LineTime / VRatio, - dpte_row_height_chroma * LineTime / VRatioChroma, - meta_row_height_chroma * LineTime / VRatioChroma); + v->dpte_row_height[k] * LineTime / v->VRatio[k], + v->meta_row_height[k] * LineTime / v->VRatio[k], + v->dpte_row_height_chroma[k] * LineTime / v->VRatioChroma[k], + v->meta_row_height_chroma[k] * LineTime / v->VRatioChroma[k]); } } else { - if (GPUVMEnable == true && DCCEnable != true) { - min_row_time = dpte_row_height * LineTime / VRatio; - } else if (GPUVMEnable != true && DCCEnable == true) { - min_row_time = meta_row_height * LineTime / VRatio; + if (v->GPUVMEnable == true && v->DCCEnable[k] != true) { + min_row_time = v->dpte_row_height[k] * LineTime / v->VRatio[k]; + } else if (v->GPUVMEnable != true && v->DCCEnable[k] == true) { + min_row_time = v->meta_row_height[k] * LineTime / v->VRatio[k]; } else { - min_row_time = dml_min(dpte_row_height * LineTime / VRatio, meta_row_height * LineTime / VRatio); + min_row_time = dml_min(v->dpte_row_height[k] * LineTime / v->VRatio[k], v->meta_row_height[k] * LineTime / v->VRatio[k]); } } - if (*DestinationLinesToRequestVMInImmediateFlip >= 32 || *DestinationLinesToRequestRowInImmediateFlip >= 16 + if (v->DestinationLinesToRequestVMInImmediateFlip[k] >= 32 || v->DestinationLinesToRequestRowInImmediateFlip[k] >= 16 || TimeForFetchingMetaPTEImmediateFlip + 2 * TimeForFetchingRowInVBlankImmediateFlip > min_row_time) { - *ImmediateFlipSupportedForPipe = false; + v->ImmediateFlipSupportedForPipe[k] = false; } else { - *ImmediateFlipSupportedForPipe = true; + v->ImmediateFlipSupportedForPipe[k] = true; } #ifdef __DML_VBA_DEBUG__ - dml_print("DML::%s: DestinationLinesToRequestVMInImmediateFlip = %f\n", __func__, *DestinationLinesToRequestVMInImmediateFlip); - dml_print("DML::%s: DestinationLinesToRequestRowInImmediateFlip = %f\n", __func__, *DestinationLinesToRequestRowInImmediateFlip); + dml_print("DML::%s: DestinationLinesToRequestVMInImmediateFlip = %f\n", __func__, v->DestinationLinesToRequestVMInImmediateFlip[k]); + dml_print("DML::%s: DestinationLinesToRequestRowInImmediateFlip = %f\n", __func__, v->DestinationLinesToRequestRowInImmediateFlip[k]); dml_print("DML::%s: TimeForFetchingMetaPTEImmediateFlip = %f\n", __func__, TimeForFetchingMetaPTEImmediateFlip); dml_print("DML::%s: TimeForFetchingRowInVBlankImmediateFlip = %f\n", __func__, TimeForFetchingRowInVBlankImmediateFlip); dml_print("DML::%s: min_row_time = %f\n", __func__, min_row_time); - dml_print("DML::%s: ImmediateFlipSupportedForPipe = %d\n", __func__, *ImmediateFlipSupportedForPipe); + dml_print("DML::%s: ImmediateFlipSupportedForPipe = %d\n", __func__, v->ImmediateFlipSupportedForPipe[k]); #endif } @@ -5228,33 +5170,13 @@ void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l for (k = 0; k < v->NumberOfActivePlanes; k++) { CalculateFlipSchedule( mode_lib, + k, HostVMInefficiencyFactor, v->ExtraLatency, v->UrgLatency[i], - v->GPUVMMaxPageTableLevels, - v->HostVMEnable, - v->HostVMMaxNonCachedPageTableLevels, - v->GPUVMEnable, - v->HostVMMinPageSize, v->PDEAndMetaPTEBytesPerFrame[i][j][k], v->MetaRowBytes[i][j][k], - v->DPTEBytesPerRow[i][j][k], - v->BandwidthAvailableForImmediateFlip, - v->TotImmediateFlipBytes, - v->SourcePixelFormat[k], - v->HTotal[k] / v->PixelClock[k], - v->VRatio[k], - v->VRatioChroma[k], - v->Tno_bw[k], - v->DCCEnable[k], - v->dpte_row_height[k], - v->meta_row_height[k], - v->dpte_row_height_chroma[k], - v->meta_row_height_chroma[k], - &v->DestinationLinesToRequestVMInImmediateFlip[k], - &v->DestinationLinesToRequestRowInImmediateFlip[k], - &v->final_flip_bw[k], - &v->ImmediateFlipSupportedForPipe[k]); + v->DPTEBytesPerRow[i][j][k]); } v->total_dcn_read_bw_with_flip = 0.0; for (k = 0; k < v->NumberOfActivePlanes; k++) { From patchwork Tue Aug 30 20:34:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor X-Patchwork-Id: 12959855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7734AECAAA1 for ; Tue, 30 Aug 2022 20:35:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C530F10E13C; Tue, 30 Aug 2022 20:35:13 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6002410E12F; Tue, 30 Aug 2022 20:34:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F2A1DB81D9E; Tue, 30 Aug 2022 20:34:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD6B3C433B5; Tue, 30 Aug 2022 20:34:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661891684; bh=WVBuKJ212UE+zzf0H8C8ieZNG0AG50VfiOb6I0YS3mY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l857E0wjwijunpEOX5+FnvlEt3Pa+7McmDTdjsacbFfKqnrocXE9XzJlOZNDeTI/M YlFLCasAqWoq8VoRElFMHmADOCu8AjDnN79mCzgcZMPBJRSOiGBOROdFn14Rnx6KSi 6tBufBfgBsDP4GLJgSX6DMUx3RScq7rZnbycOpCr0ppVOIFQ0uhbGTt8WmBFiEOhqn /zi0m8qgOVqawTk4MJR2nfscKoN+RR7KNKhGn4o0J20IQtgOXwZByTrunv+abK2eM/ Ncb6Y4TgRDRulyJZd7YYK/CKrB0Nw3zdXl9f8OrTrL1zLPLzCNwu5oJ07369vHrxty kNTwI+DczR/+g== From: Nathan Chancellor To: Harry Wentland , Leo Li , Rodrigo Siqueira , Alex Deucher , =?utf-8?q?Christian_K=C3=B6nig?= , "Pan, Xinhui" Subject: [PATCH 5/5] drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage Date: Tue, 30 Aug 2022 13:34:09 -0700 Message-Id: <20220830203409.3491379-6-nathan@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220830203409.3491379-1-nathan@kernel.org> References: <20220830203409.3491379-1-nathan@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Rix , llvm@lists.linux.dev, Nick Desaulniers , patches@lists.linux.dev, dri-devel@lists.freedesktop.org, Nathan Chancellor , amd-gfx@lists.freedesktop.org, Sudip Mukherjee Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This function consumes a lot of stack space and it blows up the size of dml30_ModeSupportAndSystemConfigurationFull() with clang: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3542:6: error: stack frame size (2200) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Commit a0f7e7f759cf ("drm/amd/display: fix i386 frame size warning") aimed to address this for i386 but it did not help x86_64. To reduce the amount of stack space that dml30_ModeSupportAndSystemConfigurationFull() uses, mark UseMinimumDCFCLK() as noinline, using the _for_stack variant for documentation. While this will increase the total amount of stack usage between the two functions (1632 and 1304 bytes respectively), it will make sure both stay below the limit of 2048 bytes for these files. The aforementioned change does help reduce UseMinimumDCFCLK()'s stack usage so it should not be reverted in favor of this change. Link: https://github.com/ClangBuiltLinux/linux/issues/1681 Reported-by: "Sudip Mukherjee (Codethink)" Signed-off-by: Nathan Chancellor --- drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c index b7fa003ffe06..6991a68061ef 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c @@ -6497,7 +6497,7 @@ static double CalculateUrgentLatency( return ret; } -static void UseMinimumDCFCLK( +static noinline_for_stack void UseMinimumDCFCLK( struct display_mode_lib *mode_lib, struct vba_vars_st *v, int MaxPrefetchMode,