Message ID | 20241205-topic-sm8x50-gpu-bw-vote-v4-3-9650d15dd435@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/msm: adreno: add support for DDR bandwidth scaling via GMU | expand |
On 12/5/2024 8:31 PM, Neil Armstrong wrote: > The Adreno GPU Management Unit (GMU) can also scale the ddr > bandwidth along the frequency and power domain level, but for > now we statically fill the bw_table with values from the > downstream driver. > > Only the first entry is used, which is a disable vote, so we > currently rely on scaling via the linux interconnect paths. > > Let's dynamically generate the bw_table with the vote values > previously calculated from the OPPs. > > Those entries will then be used by the GMU when passing the > appropriate bandwidth level while voting for a gpu frequency. > > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 41 ++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c > index cb8844ed46b29c4569d05eb7a24f7b27e173190f..fc4bfad51de9a3b6617fbbd03471a5851d43ce88 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c > @@ -5,7 +5,10 @@ > #include <linux/circ_buf.h> > #include <linux/list.h> > > +#include <dt-bindings/interconnect/qcom,icc.h> > + > #include <soc/qcom/cmd-db.h> > +#include <soc/qcom/tcs.h> > > #include "a6xx_gmu.h" > #include "a6xx_gmu.xml.h" > @@ -259,6 +262,39 @@ static int a6xx_hfi_send_perf_table(struct a6xx_gmu *gmu) > NULL, 0); > } > > +static void a6xx_generate_bw_table(const struct a6xx_info *info, struct a6xx_gmu *gmu, > + struct a6xx_hfi_msg_bw_table *msg) > +{ > + unsigned int i, j; > + > + msg->ddr_wait_bitmask = QCOM_ICC_TAG_ALWAYS; Why this is QCOM_ICC_TAG_ALWAYS? IIRC, this bitmask informs RPMH whether it should wait for previous BCM vote to complete. Can we implement the same logic from kgsl to create this bitmask? > + > + for (i = 0; i < GMU_MAX_BCMS; i++) { > + if (!info->bcms[i].name) > + break; > + msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcms[i].name); > + } > + msg->ddr_cmds_num = i; > + > + for (i = 0; i < gmu->nr_gpu_bws; ++i) > + for (j = 0; j < msg->ddr_cmds_num; j++) > + msg->ddr_cmds_data[i][j] = gmu->gpu_ib_votes[i][j]; > + msg->bw_level_num = gmu->nr_gpu_bws; > + > + /* > + * These are the CX (CNOC) votes - these are used by the GMU > + * The 'CN0' BCM is used on all targets, and votes are basically > + * 'off' and 'on' states with first bit to enable the path. > + */ > + > + msg->cnoc_cmds_num = 1; > + msg->cnoc_wait_bitmask = QCOM_ICC_TAG_AMC; Same here. Rest looks fine to me. -Akhil > + > + msg->cnoc_cmds_addrs[0] = cmd_db_read_addr("CN0"); > + msg->cnoc_cmds_data[0][0] = BCM_TCS_CMD(true, false, 0, 0); > + msg->cnoc_cmds_data[1][0] = BCM_TCS_CMD(true, true, 0, BIT(0)); > +} > + > static void a618_build_bw_table(struct a6xx_hfi_msg_bw_table *msg) > { > /* Send a single "off" entry since the 618 GMU doesn't do bus scaling */ > @@ -664,6 +700,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) > struct a6xx_hfi_msg_bw_table *msg; > struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); > struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; > + const struct a6xx_info *info = adreno_gpu->info->a6xx; > > if (gmu->bw_table) > goto send; > @@ -672,7 +709,9 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) > if (!msg) > return -ENOMEM; > > - if (adreno_is_a618(adreno_gpu)) > + if (info->bcms && gmu->nr_gpu_bws > 1) > + a6xx_generate_bw_table(info, gmu, msg); > + else if (adreno_is_a618(adreno_gpu)) > a618_build_bw_table(msg); > else if (adreno_is_a619(adreno_gpu)) > a619_build_bw_table(msg); >
On 09/12/2024 13:11, Akhil P Oommen wrote: > On 12/5/2024 8:31 PM, Neil Armstrong wrote: >> The Adreno GPU Management Unit (GMU) can also scale the ddr >> bandwidth along the frequency and power domain level, but for >> now we statically fill the bw_table with values from the >> downstream driver. >> >> Only the first entry is used, which is a disable vote, so we >> currently rely on scaling via the linux interconnect paths. >> >> Let's dynamically generate the bw_table with the vote values >> previously calculated from the OPPs. >> >> Those entries will then be used by the GMU when passing the >> appropriate bandwidth level while voting for a gpu frequency. >> >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- >> drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 41 ++++++++++++++++++++++++++++++++++- >> 1 file changed, 40 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c >> index cb8844ed46b29c4569d05eb7a24f7b27e173190f..fc4bfad51de9a3b6617fbbd03471a5851d43ce88 100644 >> --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c >> @@ -5,7 +5,10 @@ >> #include <linux/circ_buf.h> >> #include <linux/list.h> >> >> +#include <dt-bindings/interconnect/qcom,icc.h> >> + >> #include <soc/qcom/cmd-db.h> >> +#include <soc/qcom/tcs.h> >> >> #include "a6xx_gmu.h" >> #include "a6xx_gmu.xml.h" >> @@ -259,6 +262,39 @@ static int a6xx_hfi_send_perf_table(struct a6xx_gmu *gmu) >> NULL, 0); >> } >> >> +static void a6xx_generate_bw_table(const struct a6xx_info *info, struct a6xx_gmu *gmu, >> + struct a6xx_hfi_msg_bw_table *msg) >> +{ >> + unsigned int i, j; >> + >> + msg->ddr_wait_bitmask = QCOM_ICC_TAG_ALWAYS; > > Why this is QCOM_ICC_TAG_ALWAYS? > > IIRC, this bitmask informs RPMH whether it should wait for previous BCM > vote to complete. Can we implement the same logic from kgsl to create > this bitmask? Ack, Let me check > > >> + >> + for (i = 0; i < GMU_MAX_BCMS; i++) { >> + if (!info->bcms[i].name) >> + break; >> + msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcms[i].name); >> + } >> + msg->ddr_cmds_num = i; >> + >> + for (i = 0; i < gmu->nr_gpu_bws; ++i) >> + for (j = 0; j < msg->ddr_cmds_num; j++) >> + msg->ddr_cmds_data[i][j] = gmu->gpu_ib_votes[i][j]; >> + msg->bw_level_num = gmu->nr_gpu_bws; >> + >> + /* >> + * These are the CX (CNOC) votes - these are used by the GMU >> + * The 'CN0' BCM is used on all targets, and votes are basically >> + * 'off' and 'on' states with first bit to enable the path. >> + */ >> + >> + msg->cnoc_cmds_num = 1; >> + msg->cnoc_wait_bitmask = QCOM_ICC_TAG_AMC; > > Same here. > > Rest looks fine to me. Thanks, Neil > > -Akhil > >> + >> + msg->cnoc_cmds_addrs[0] = cmd_db_read_addr("CN0"); >> + msg->cnoc_cmds_data[0][0] = BCM_TCS_CMD(true, false, 0, 0); >> + msg->cnoc_cmds_data[1][0] = BCM_TCS_CMD(true, true, 0, BIT(0)); >> +} >> + >> static void a618_build_bw_table(struct a6xx_hfi_msg_bw_table *msg) >> { >> /* Send a single "off" entry since the 618 GMU doesn't do bus scaling */ >> @@ -664,6 +700,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) >> struct a6xx_hfi_msg_bw_table *msg; >> struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); >> struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; >> + const struct a6xx_info *info = adreno_gpu->info->a6xx; >> >> if (gmu->bw_table) >> goto send; >> @@ -672,7 +709,9 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) >> if (!msg) >> return -ENOMEM; >> >> - if (adreno_is_a618(adreno_gpu)) >> + if (info->bcms && gmu->nr_gpu_bws > 1) >> + a6xx_generate_bw_table(info, gmu, msg); >> + else if (adreno_is_a618(adreno_gpu)) >> a618_build_bw_table(msg); >> else if (adreno_is_a619(adreno_gpu)) >> a619_build_bw_table(msg); >> >
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c index cb8844ed46b29c4569d05eb7a24f7b27e173190f..fc4bfad51de9a3b6617fbbd03471a5851d43ce88 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c @@ -5,7 +5,10 @@ #include <linux/circ_buf.h> #include <linux/list.h> +#include <dt-bindings/interconnect/qcom,icc.h> + #include <soc/qcom/cmd-db.h> +#include <soc/qcom/tcs.h> #include "a6xx_gmu.h" #include "a6xx_gmu.xml.h" @@ -259,6 +262,39 @@ static int a6xx_hfi_send_perf_table(struct a6xx_gmu *gmu) NULL, 0); } +static void a6xx_generate_bw_table(const struct a6xx_info *info, struct a6xx_gmu *gmu, + struct a6xx_hfi_msg_bw_table *msg) +{ + unsigned int i, j; + + msg->ddr_wait_bitmask = QCOM_ICC_TAG_ALWAYS; + + for (i = 0; i < GMU_MAX_BCMS; i++) { + if (!info->bcms[i].name) + break; + msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcms[i].name); + } + msg->ddr_cmds_num = i; + + for (i = 0; i < gmu->nr_gpu_bws; ++i) + for (j = 0; j < msg->ddr_cmds_num; j++) + msg->ddr_cmds_data[i][j] = gmu->gpu_ib_votes[i][j]; + msg->bw_level_num = gmu->nr_gpu_bws; + + /* + * These are the CX (CNOC) votes - these are used by the GMU + * The 'CN0' BCM is used on all targets, and votes are basically + * 'off' and 'on' states with first bit to enable the path. + */ + + msg->cnoc_cmds_num = 1; + msg->cnoc_wait_bitmask = QCOM_ICC_TAG_AMC; + + msg->cnoc_cmds_addrs[0] = cmd_db_read_addr("CN0"); + msg->cnoc_cmds_data[0][0] = BCM_TCS_CMD(true, false, 0, 0); + msg->cnoc_cmds_data[1][0] = BCM_TCS_CMD(true, true, 0, BIT(0)); +} + static void a618_build_bw_table(struct a6xx_hfi_msg_bw_table *msg) { /* Send a single "off" entry since the 618 GMU doesn't do bus scaling */ @@ -664,6 +700,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) struct a6xx_hfi_msg_bw_table *msg; struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + const struct a6xx_info *info = adreno_gpu->info->a6xx; if (gmu->bw_table) goto send; @@ -672,7 +709,9 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) if (!msg) return -ENOMEM; - if (adreno_is_a618(adreno_gpu)) + if (info->bcms && gmu->nr_gpu_bws > 1) + a6xx_generate_bw_table(info, gmu, msg); + else if (adreno_is_a618(adreno_gpu)) a618_build_bw_table(msg); else if (adreno_is_a619(adreno_gpu)) a619_build_bw_table(msg);
The Adreno GPU Management Unit (GMU) can also scale the ddr bandwidth along the frequency and power domain level, but for now we statically fill the bw_table with values from the downstream driver. Only the first entry is used, which is a disable vote, so we currently rely on scaling via the linux interconnect paths. Let's dynamically generate the bw_table with the vote values previously calculated from the OPPs. Those entries will then be used by the GMU when passing the appropriate bandwidth level while voting for a gpu frequency. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 41 ++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)