diff mbox series

[v4,4/8] soc: mediatek: mtk-cmdq: Add pa_base parsing for unsupported subsys ID hardware

Message ID 20250218054405.2017918-5-jason-jh.lin@mediatek.com (mailing list archive)
State New
Headers show
Series Add GCE support for MT8196 | expand

Commit Message

Jason-JH Lin (林睿祥) Feb. 18, 2025, 5:41 a.m. UTC
When GCE executes instructions, the corresponding hardware register
can be found through the subsys ID. For hardware that does not support
subsys ID, its subsys ID will be set to invalid value and its physical
address needs to be used to generate GCE instructions.

This commit adds a pa_base parsing flow to the cmdq_client_reg structure
for these unsupported subsys ID hardware.

Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
---
 drivers/soc/mediatek/mtk-cmdq-helper.c | 18 ++++++++++++++++--
 include/linux/soc/mediatek/mtk-cmdq.h  |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

Comments

AngeloGioacchino Del Regno March 4, 2025, 9:35 a.m. UTC | #1
Il 18/02/25 06:41, Jason-JH Lin ha scritto:
> When GCE executes instructions, the corresponding hardware register
> can be found through the subsys ID. For hardware that does not support
> subsys ID, its subsys ID will be set to invalid value and its physical
> address needs to be used to generate GCE instructions.
> 
> This commit adds a pa_base parsing flow to the cmdq_client_reg structure
> for these unsupported subsys ID hardware.
> 

Does this work only for the MMINFRA located GCEs, or does this work also for
the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?

In order to actually review and decide, I do need to know :-)

Thanks,
Angelo
Jason-JH Lin (林睿祥) March 5, 2025, 9:46 a.m. UTC | #2
On Tue, 2025-03-04 at 10:35 +0100, AngeloGioacchino Del Regno wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Il 18/02/25 06:41, Jason-JH Lin ha scritto:
> > When GCE executes instructions, the corresponding hardware register
> > can be found through the subsys ID. For hardware that does not
> > support
> > subsys ID, its subsys ID will be set to invalid value and its
> > physical
> > address needs to be used to generate GCE instructions.
> > 
> > This commit adds a pa_base parsing flow to the cmdq_client_reg
> > structure
> > for these unsupported subsys ID hardware.
> > 
> 
> Does this work only for the MMINFRA located GCEs, or does this work
> also for
> the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?
> 
> In order to actually review and decide, I do need to know :-)
> 

Yes, it's for the SoCs without subsys ID, it's not related to the
MMINFRA.

This can also work on MT8173/83/92/95 // MT6795/6893/etc.
You can remove the `mediatek,gce-client-reg` properties in their dtsi
and cherry-pick this series to verify it. :-)

Regards,
Jason-JH Lin

> Thanks,
> Angelo
> 
>
AngeloGioacchino Del Regno March 5, 2025, 11:03 a.m. UTC | #3
Il 05/03/25 10:46, Jason-JH Lin (林睿祥) ha scritto:
> On Tue, 2025-03-04 at 10:35 +0100, AngeloGioacchino Del Regno wrote:
>>
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>
>>
>> Il 18/02/25 06:41, Jason-JH Lin ha scritto:
>>> When GCE executes instructions, the corresponding hardware register
>>> can be found through the subsys ID. For hardware that does not
>>> support
>>> subsys ID, its subsys ID will be set to invalid value and its
>>> physical
>>> address needs to be used to generate GCE instructions.
>>>
>>> This commit adds a pa_base parsing flow to the cmdq_client_reg
>>> structure
>>> for these unsupported subsys ID hardware.
>>>
>>
>> Does this work only for the MMINFRA located GCEs, or does this work
>> also for
>> the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?
>>
>> In order to actually review and decide, I do need to know :-)
>>
> 
> Yes, it's for the SoCs without subsys ID, it's not related to the
> MMINFRA.
> 
> This can also work on MT8173/83/92/95 // MT6795/6893/etc.
> You can remove the `mediatek,gce-client-reg` properties in their dtsi
> and cherry-pick this series to verify it. :-)
> 

This is curious - and that brings more questions to the table (for curiosity
more than anything else at this point).

Since this is a way to make use of the CMDQ for address ranges that are not tied
to any subsys id (hence no gce-client-reg and just physical address parsing for
generating instructions), do you know what are the performance implications of
using this, instead of subsys IDs on SoCs that do support them?

Being clear: if we were to migrate a SoC like MT8195 to using this globally
instead of using subsys ids, would the performance be degraded?
And if yes, do you know by how much?

What you're proposing almost looks like being too good to be true - and makes
me wonder, at this point, why the subsys id was used in the first place :-)

Cheers!
Angelo

> Regards,
> Jason-JH Lin
> 
>> Thanks,
>> Angelo
>>
>>
>
Jason-JH Lin (林睿祥) March 5, 2025, 3:58 p.m. UTC | #4
On Wed, 2025-03-05 at 12:03 +0100, AngeloGioacchino Del Regno wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Il 05/03/25 10:46, Jason-JH Lin (林睿祥) ha scritto:
> > On Tue, 2025-03-04 at 10:35 +0100, AngeloGioacchino Del Regno
> > wrote:
> > > 
> > > External email : Please do not click links or open attachments
> > > until
> > > you have verified the sender or the content.
> > > 
> > > 
> > > Il 18/02/25 06:41, Jason-JH Lin ha scritto:
> > > > When GCE executes instructions, the corresponding hardware
> > > > register
> > > > can be found through the subsys ID. For hardware that does not
> > > > support
> > > > subsys ID, its subsys ID will be set to invalid value and its
> > > > physical
> > > > address needs to be used to generate GCE instructions.
> > > > 
> > > > This commit adds a pa_base parsing flow to the cmdq_client_reg
> > > > structure
> > > > for these unsupported subsys ID hardware.
> > > > 
> > > 
> > > Does this work only for the MMINFRA located GCEs, or does this
> > > work
> > > also for
> > > the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?
> > > 
> > > In order to actually review and decide, I do need to know :-)
> > > 
> > 
> > Yes, it's for the SoCs without subsys ID, it's not related to the
> > MMINFRA.
> > 
> > This can also work on MT8173/83/92/95 // MT6795/6893/etc.
> > You can remove the `mediatek,gce-client-reg` properties in their
> > dtsi
> > and cherry-pick this series to verify it. :-)
> > 
> 
> This is curious - and that brings more questions to the table (for
> curiosity
> more than anything else at this point).
> 
> Since this is a way to make use of the CMDQ for address ranges that
> are not tied
> to any subsys id (hence no gce-client-reg and just physical address
> parsing for
> generating instructions), do you know what are the performance
> implications of
> using this, instead of subsys IDs on SoCs that do support them?
> 

The main advantage of using subsys ID is to reduce the number of
instruction.
Without subsy ID, you will need one more `ASSIGN` instruction to assign
the high bytes of the physical address.

E,g. In mt8195-gce.h: #define SUBSYS_1c00XXXX 3

If you want GCE to write the value 0x0000000f to 0x1c00_002c.

With subsys ID, you can use only one instruction to achieve it:
1. WRITE value: 0x000000f to subsys: 0x3 + offset: 0x0002c
- OP code: WRTIE = 0x90
- subsys ID: 0x1c00XXXX = 0x03
- offset: 0x002c
- value: 0x0000000f

Without subsys ID, you will need 2 instructions to achieve it:
1. ASSIGN address high bytes: 0x1c00 to GCE temp register: SPR0
- OP code: LOGIC = 0xa0
- arg_type: register, value, value = (0x8)
- sub OP: ASSIGN = 0x0
- register index to store the assign value: SPR0 = 0x0
- value to assign: 0x1c00
2. WRITE value: 0x0000000f to temp register: SPR0 + offset:0x002c
- OP code: WRITE = 0x90
- sub OP(temp register index): SPR0 = 0x0
- offset for temp register: 0x002c
- value: 0x0000000f

> Being clear: if we were to migrate a SoC like MT8195 to using this
> globally
> instead of using subsys ids, would the performance be degraded?
> And if yes, do you know by how much?
> 

E,g. If the inst number with subsys ID is N.
1. If CMDQ is implement like this, then inst number will be (N * 2):
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c 
assign SPR0 = 0x1c00
write B to SPR0 + offset: 0x3c 
assign SPR0 = 0x1c00
write C to SPR0 + offset: 0x4c 
...

2. If CMDQ is implement like this, the inst number will be (N + 1 * n):
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c 
write B to SPR0 + offset: 0x3c
write C to SPR0 + offset: 0x4c
...

When the same cmd buffer changes the base address for n times:
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c
assign SPR0 = 0x1c01
write B to SPR0 + offset: 0x2c
assign SPR0 = 0x1c02
write C to SPR0 + offset: 0x2c
assign SPR0 = 0x1c00
write D to SPR0 + offset: 0x3c
...

So you can imagine the performance will increase, but maybe not too
much if we use it in the right way...
Except the old SoC that didn't support SPR and CPR. The reason will be
addressed in the next paragraph.

> What you're proposing almost looks like being too good to be true -
> and makes
> me wonder, at this point, why the subsys id was used in the first
> place :-)
> 

That's because of the old GCE version in the old SoC only support GPR,
it didn't support SPR and CPR.

GPR:
All 32 GCE threads share the same GPR0~GPR15, GPR will be affected by
other GCE threads if they use it at the same time.

SPR:
Each GCE thread has 4 SPR, SPR won't be affected by another GCE thread.

CPR:
All 32 GCE threads share the same CPR, there are over 1000 CPR can be
used. It need to be managed properly to avoid the resource conflicting.

Due to the GPR resource restriction in the old GCE version, the usage
of subsys ID can avoid GPR conflicting issues when multiple GCE threads
are using GPR to physical assign high bytes all the time.


I have simplified some complicate instruction rules, so the description
above may not be 100% matched to the CMDQ helper driver code.
But I think the main concept is correct.
Hope these explanation can help well :-)

Regards,
Jason-JH Lin

> Cheers!
> Angelo
> 
> > Regards,
> > Jason-JH Lin
> > 
> > > Thanks,
> > > Angelo
> > > 
> > > 
> > 
> 
> 
>
AngeloGioacchino Del Regno March 5, 2025, 5:40 p.m. UTC | #5
Il 05/03/25 16:58, Jason-JH Lin (林睿祥) ha scritto:
> On Wed, 2025-03-05 at 12:03 +0100, AngeloGioacchino Del Regno wrote:
>>
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>
>>
>> Il 05/03/25 10:46, Jason-JH Lin (林睿祥) ha scritto:
>>> On Tue, 2025-03-04 at 10:35 +0100, AngeloGioacchino Del Regno
>>> wrote:
>>>>
>>>> External email : Please do not click links or open attachments
>>>> until
>>>> you have verified the sender or the content.
>>>>
>>>>
>>>> Il 18/02/25 06:41, Jason-JH Lin ha scritto:
>>>>> When GCE executes instructions, the corresponding hardware
>>>>> register
>>>>> can be found through the subsys ID. For hardware that does not
>>>>> support
>>>>> subsys ID, its subsys ID will be set to invalid value and its
>>>>> physical
>>>>> address needs to be used to generate GCE instructions.
>>>>>
>>>>> This commit adds a pa_base parsing flow to the cmdq_client_reg
>>>>> structure
>>>>> for these unsupported subsys ID hardware.
>>>>>
>>>>
>>>> Does this work only for the MMINFRA located GCEs, or does this
>>>> work
>>>> also for
>>>> the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?
>>>>
>>>> In order to actually review and decide, I do need to know :-)
>>>>
>>>
>>> Yes, it's for the SoCs without subsys ID, it's not related to the
>>> MMINFRA.
>>>
>>> This can also work on MT8173/83/92/95 // MT6795/6893/etc.
>>> You can remove the `mediatek,gce-client-reg` properties in their
>>> dtsi
>>> and cherry-pick this series to verify it. :-)
>>>
>>
>> This is curious - and that brings more questions to the table (for
>> curiosity
>> more than anything else at this point).
>>
>> Since this is a way to make use of the CMDQ for address ranges that
>> are not tied
>> to any subsys id (hence no gce-client-reg and just physical address
>> parsing for
>> generating instructions), do you know what are the performance
>> implications of
>> using this, instead of subsys IDs on SoCs that do support them?
>>
> 
> The main advantage of using subsys ID is to reduce the number of
> instruction.
> Without subsy ID, you will need one more `ASSIGN` instruction to assign
> the high bytes of the physical address.
> 
> E,g. In mt8195-gce.h: #define SUBSYS_1c00XXXX 3
> 
> If you want GCE to write the value 0x0000000f to 0x1c00_002c.
> 
> With subsys ID, you can use only one instruction to achieve it:
> 1. WRITE value: 0x000000f to subsys: 0x3 + offset: 0x0002c
> - OP code: WRTIE = 0x90
> - subsys ID: 0x1c00XXXX = 0x03
> - offset: 0x002c
> - value: 0x0000000f
> 
> Without subsys ID, you will need 2 instructions to achieve it:
> 1. ASSIGN address high bytes: 0x1c00 to GCE temp register: SPR0
> - OP code: LOGIC = 0xa0
> - arg_type: register, value, value = (0x8)
> - sub OP: ASSIGN = 0x0
> - register index to store the assign value: SPR0 = 0x0
> - value to assign: 0x1c00
> 2. WRITE value: 0x0000000f to temp register: SPR0 + offset:0x002c
> - OP code: WRITE = 0x90
> - sub OP(temp register index): SPR0 = 0x0
> - offset for temp register: 0x002c
> - value: 0x0000000f
> 
>> Being clear: if we were to migrate a SoC like MT8195 to using this
>> globally
>> instead of using subsys ids, would the performance be degraded?
>> And if yes, do you know by how much?
>>
> 
> E,g. If the inst number with subsys ID is N.
> 1. If CMDQ is implement like this, then inst number will be (N * 2):
> assign SPR0 = 0x1c00
> write A to SPR0 + offset: 0x2c
> assign SPR0 = 0x1c00
> write B to SPR0 + offset: 0x3c
> assign SPR0 = 0x1c00
> write C to SPR0 + offset: 0x4c
> ...
> 
> 2. If CMDQ is implement like this, the inst number will be (N + 1 * n):
> assign SPR0 = 0x1c00
> write A to SPR0 + offset: 0x2c
> write B to SPR0 + offset: 0x3c
> write C to SPR0 + offset: 0x4c
> ...
> 
> When the same cmd buffer changes the base address for n times:
> assign SPR0 = 0x1c00
> write A to SPR0 + offset: 0x2c
> assign SPR0 = 0x1c01
> write B to SPR0 + offset: 0x2c
> assign SPR0 = 0x1c02
> write C to SPR0 + offset: 0x2c
> assign SPR0 = 0x1c00
> write D to SPR0 + offset: 0x3c
> ...
> 
> So you can imagine the performance will increase, but maybe not too
> much if we use it in the right way...
> Except the old SoC that didn't support SPR and CPR. The reason will be
> addressed in the next paragraph.
> 
>> What you're proposing almost looks like being too good to be true -
>> and makes
>> me wonder, at this point, why the subsys id was used in the first
>> place :-)
>>
> 
> That's because of the old GCE version in the old SoC only support GPR,
> it didn't support SPR and CPR.
> 
> GPR:
> All 32 GCE threads share the same GPR0~GPR15, GPR will be affected by
> other GCE threads if they use it at the same time.
> 
> SPR:
> Each GCE thread has 4 SPR, SPR won't be affected by another GCE thread.
> 
> CPR:
> All 32 GCE threads share the same CPR, there are over 1000 CPR can be
> used. It need to be managed properly to avoid the resource conflicting.
> 
> Due to the GPR resource restriction in the old GCE version, the usage
> of subsys ID can avoid GPR conflicting issues when multiple GCE threads
> are using GPR to physical assign high bytes all the time.
> 
> 
> I have simplified some complicate instruction rules, so the description
> above may not be 100% matched to the CMDQ helper driver code.
> But I think the main concept is correct.
> Hope these explanation can help well :-)
> 

Jason, that's simply wonderful. Thanks a lot for this precious description.

Yes, this has clarified even more than I was asking for, and besides,
yeah I know that there are some rules to follow, some of which I know,
some of which I imagine - and describing all of that would need lots
and lots of text - again, I know, and no worries about that! :-D

Thanks again!
Angelo

> Regards,
> Jason-JH Lin
> 
>> Cheers!
>> Angelo
>>
>>> Regards,
>>> Jason-JH Lin
>>>
>>>> Thanks,
>>>> Angelo
>>>>
>>>>
>>>
>>
>>
>>
>
diff mbox series

Patch

diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c
index 455221e8de24..aa9853100d78 100644
--- a/drivers/soc/mediatek/mtk-cmdq-helper.c
+++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
@@ -8,6 +8,7 @@ 
 #include <linux/module.h>
 #include <linux/mailbox_controller.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/soc/mediatek/mtk-cmdq.h>
 
 #define CMDQ_WRITE_ENABLE_MASK	BIT(0)
@@ -60,20 +61,30 @@  int cmdq_dev_get_client_reg(struct device *dev,
 			    struct cmdq_client_reg *client_reg, int idx)
 {
 	struct of_phandle_args spec;
+	struct resource res;
 	int err;
 
 	if (!client_reg)
 		return -ENOENT;
 
+	if (of_address_to_resource(dev->of_node, 0, &res) != 0) {
+		dev_err(dev, "Missing reg in %s node\n", dev->of_node->full_name);
+		return -EINVAL;
+	}
+	client_reg->pa_base = res.start;
+
 	err = of_parse_phandle_with_fixed_args(dev->of_node,
 					       "mediatek,gce-client-reg",
 					       3, idx, &spec);
 	if (err < 0) {
-		dev_warn(dev,
+		dev_dbg(dev,
 			"error %d can't parse gce-client-reg property (%d)",
 			err, idx);
 
-		return err;
+		/* make subsys invalid */
+		client_reg->subsys = CMDQ_SUBSYS_INVALID;
+
+		return 0;
 	}
 
 	client_reg->subsys = (u8)spec.args[0];
@@ -130,6 +141,9 @@  int cmdq_pkt_create(struct cmdq_client *client, struct cmdq_pkt *pkt, size_t siz
 
 	pkt->buf_size = size;
 
+	/* need to use pkt->cl->chan later to call mbox APIs when generating instruction */
+	pkt->cl = (void *)client;
+
 	dev = client->chan->mbox->dev;
 	dma_addr = dma_map_single(dev, pkt->va_base, pkt->buf_size,
 				  DMA_TO_DEVICE);
diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h
index 0c3906e8ad19..3578cc9200e9 100644
--- a/include/linux/soc/mediatek/mtk-cmdq.h
+++ b/include/linux/soc/mediatek/mtk-cmdq.h
@@ -23,6 +23,8 @@ 
 #define CMDQ_THR_SPR_IDX2	(2)
 #define CMDQ_THR_SPR_IDX3	(3)
 
+#define CMDQ_SUBSYS_INVALID	(U8_MAX)
+
 struct cmdq_pkt;
 
 enum cmdq_logic_op {
@@ -52,6 +54,7 @@  struct cmdq_operand {
 
 struct cmdq_client_reg {
 	u8 subsys;
+	phys_addr_t pa_base;
 	u16 offset;
 	u16 size;
 };