drm/msm/a6xx: Fix excessive stack usage

Message ID	20241027-stack-size-fix-v1-1-764e2e3566cb@quicinc.com (mailing list archive)
State	New
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Akhil P Oommen <quic_akhilpo@quicinc.com> Date: Sun, 27 Oct 2024 23:35:47 +0530 Subject: [PATCH] drm/msm/a6xx: Fix excessive stack usage MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-ID: <20241027-stack-size-fix-v1-1-764e2e3566cb@quicinc.com> To: Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>, "Konrad Dybcio" <konradybcio@kernel.org>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, Marijn Suijten <marijn.suijten@somainline.org>, David Airlie <airlied@gmail.com>, "Simona Vetter" <simona@ffwll.ch>, Nathan Chancellor <nathan@kernel.org>, "Nick Desaulniers" <ndesaulniers@google.com>, Bill Wendling <morbo@google.com>, Justin Stitt <justinstitt@google.com> CC: <linux-arm-msm@vger.kernel.org>, <dri-devel@lists.freedesktop.org>, <freedreno@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>, <llvm@lists.linux.dev>, Arnd Bergmann <arnd@kernel.org>, Akhil P Oommen <quic_akhilpo@quicinc.com> Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/msm/a6xx: Fix excessive stack usage \| expand drm/msm/a6xx: Fix excessive stack usage

Message ID

20241027-stack-size-fix-v1-1-764e2e3566cb@quicinc.com (mailing list archive)

State

New

Headers

From: Akhil P Oommen <quic_akhilpo@quicinc.com>
Date: Sun, 27 Oct 2024 23:35:47 +0530
Subject: [PATCH] drm/msm/a6xx: Fix excessive stack usage
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-ID: <20241027-stack-size-fix-v1-1-764e2e3566cb@quicinc.com>
To: Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>, "Konrad
 Dybcio" <konradybcio@kernel.org>,
 Abhinav Kumar <quic_abhinavk@quicinc.com>,
 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, Marijn Suijten
 <marijn.suijten@somainline.org>, David Airlie <airlied@gmail.com>, "Simona
 Vetter" <simona@ffwll.ch>, Nathan Chancellor <nathan@kernel.org>, "Nick
 Desaulniers" <ndesaulniers@google.com>, Bill Wendling <morbo@google.com>,
 Justin Stitt <justinstitt@google.com>
CC: <linux-arm-msm@vger.kernel.org>, <dri-devel@lists.freedesktop.org>,
 <freedreno@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
 <llvm@lists.linux.dev>, Arnd Bergmann <arnd@kernel.org>, Akhil P Oommen
 <quic_akhilpo@quicinc.com>
Precedence: list
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/msm/a6xx: Fix excessive stack usage | expand

Commit Message

Akhil P Oommen Oct. 27, 2024, 6:05 p.m. UTC

Clang-19 and above sometimes end up with multiple copies of the large
a6xx_hfi_msg_bw_table structure on the stack. The problem is that
a6xx_hfi_send_bw_table() calls a number of device specific functions to
fill the structure, but these create another copy of the structure on
the stack which gets copied to the first.

If the functions get inlined, that busts the warning limit:

drivers/gpu/drm/msm/adreno/a6xx_hfi.c:631:12: error: stack frame size (1032) exceeds limit (1024) in 'a6xx_hfi_send_bw_table' [-Werror,-Wframe-larger-than]

Fix this by kmalloc-ating struct a6xx_hfi_msg_bw_table instead of using
the stack. Also, use this opportunity to skip re-initializing this table
to optimize gpu wake up latency.

Cc: Arnd Bergmann <arnd@kernel.org>

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 34 ++++++++++++++++++++++------------
 2 files changed, 23 insertions(+), 12 deletions(-)


---
base-commit: 74c374648ed08efb2ef339656f2764c28c046956
change-id: 20241024-stack-size-fix-28af7abd3fab

Best regards,

Comments

Arnd Bergmann Oct. 27, 2024, 6:43 p.m. UTC | #1

On Sun, Oct 27, 2024, at 18:05, Akhil P Oommen wrote:
> Clang-19 and above sometimes end up with multiple copies of the large
> a6xx_hfi_msg_bw_table structure on the stack. The problem is that
> a6xx_hfi_send_bw_table() calls a number of device specific functions to
> fill the structure, but these create another copy of the structure on
> the stack which gets copied to the first.
>
> If the functions get inlined, that busts the warning limit:
>
> drivers/gpu/drm/msm/adreno/a6xx_hfi.c:631:12: error: stack frame size 
> (1032) exceeds limit (1024) in 'a6xx_hfi_send_bw_table' 
> [-Werror,-Wframe-larger-than]
>
> Fix this by kmalloc-ating struct a6xx_hfi_msg_bw_table instead of using
> the stack. Also, use this opportunity to skip re-initializing this table
> to optimize gpu wake up latency.
>
> Cc: Arnd Bergmann <arnd@kernel.org>

Please change this to "Reported-by:"

The patch looks correct to me, just one idea for improvement.

> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> index 94b6c5cab6f4..b4a79f88ccf4 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
> @@ -99,6 +99,7 @@ struct a6xx_gmu {
>  	struct completion pd_gate;
> 
>  	struct qmp *qmp;
> +	struct a6xx_hfi_msg_bw_table *bw_table;
>  };

I think the bw_table is better just embedded
in here rather than referenced as a pointer:

> +	if (gmu->bw_table)
> +		goto send;
> +
> +	msg = devm_kzalloc(gmu->dev, sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;

It looked like it's always allocated here when the device
is up, so you can avoid the extra overhead for keeping
track of the allocation.

      Arnd

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index 94b6c5cab6f4..b4a79f88ccf4 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -99,6 +99,7 @@  struct a6xx_gmu {
 	struct completion pd_gate;
 
 	struct qmp *qmp;
+	struct a6xx_hfi_msg_bw_table *bw_table;
 };
 
 static inline u32 gmu_read(struct a6xx_gmu *gmu, u32 offset)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
index cdb3f6e74d3e..55e51c81be1f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
@@ -630,32 +630,42 @@  static void a6xx_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
 
 static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
 {
-	struct a6xx_hfi_msg_bw_table msg = { 0 };
+	struct a6xx_hfi_msg_bw_table *msg;
 	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
 
+	if (gmu->bw_table)
+		goto send;
+
+	msg = devm_kzalloc(gmu->dev, sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
 	if (adreno_is_a618(adreno_gpu))
-		a618_build_bw_table(&msg);
+		a618_build_bw_table(msg);
 	else if (adreno_is_a619(adreno_gpu))
-		a619_build_bw_table(&msg);
+		a619_build_bw_table(msg);
 	else if (adreno_is_a640_family(adreno_gpu))
-		a640_build_bw_table(&msg);
+		a640_build_bw_table(msg);
 	else if (adreno_is_a650(adreno_gpu))
-		a650_build_bw_table(&msg);
+		a650_build_bw_table(msg);
 	else if (adreno_is_7c3(adreno_gpu))
-		adreno_7c3_build_bw_table(&msg);
+		adreno_7c3_build_bw_table(msg);
 	else if (adreno_is_a660(adreno_gpu))
-		a660_build_bw_table(&msg);
+		a660_build_bw_table(msg);
 	else if (adreno_is_a690(adreno_gpu))
-		a690_build_bw_table(&msg);
+		a690_build_bw_table(msg);
 	else if (adreno_is_a730(adreno_gpu))
-		a730_build_bw_table(&msg);
+		a730_build_bw_table(msg);
 	else if (adreno_is_a740_family(adreno_gpu))
-		a740_build_bw_table(&msg);
+		a740_build_bw_table(msg);
 	else
-		a6xx_build_bw_table(&msg);
+		a6xx_build_bw_table(msg);
+
+	gmu->bw_table = msg;
 
-	return a6xx_hfi_send_msg(gmu, HFI_H2F_MSG_BW_TABLE, &msg, sizeof(msg),
+send:
+	return a6xx_hfi_send_msg(gmu, HFI_H2F_MSG_BW_TABLE, gmu->bw_table, sizeof(*(gmu->bw_table)),
 		NULL, 0);
 }

drm/msm/a6xx: Fix excessive stack usage

Commit Message

Comments

Patch