diff mbox series

clk: mediatek: Disable ACP to fix 3D on MT8192

Message ID 20220110181330.3224-1-alyssa.rosenzweig@collabora.com (mailing list archive)
State New, archived
Headers show
Series clk: mediatek: Disable ACP to fix 3D on MT8192 | expand

Commit Message

Alyssa Rosenzweig Jan. 10, 2022, 6:13 p.m. UTC
Set a mysterious chicken bit in the MT8192 clock driver (!) to get the
Mali GPU on MT8192 to work. This workaround is from the downstream Mali
driver shipped in ChromeOS. The change there is unsuitable for mainline
but good as a reference for the hardware behaviour:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5

That links to an internal Google issue tracker which I assume has more
information on the bug. I would appreciate if someone from Google or
MediaTek could explain what this change actually does and why it's
necessary on MT8192.

At any rate, this register logically belongs to the MT8192 "infra" clock
device, so it makes sense to set it there too. This avoids adding any
platform-specific hacks to the 3D driver, either mainline (Panfrost) or
legacy (kbase).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Nick Fan <Nick.Fan@mediatek.com>
Cc: Nicolas Boichat <drinkcat@chromium.org>
---
 drivers/clk/mediatek/clk-mt8192.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Stephen Boyd Jan. 14, 2022, 2:08 a.m. UTC | #1
Quoting Alyssa Rosenzweig (2022-01-10 10:13:30)
> @@ -1238,6 +1265,10 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev)
>         if (r)
>                 return r;
>  
> +       r = clk_mt8192_infra_disable_mfg2acp(pdev);
> +       if (r)
> +               return r;

If this fails it leaves a bunch of allocations around that never get
freed. It's a problem before this patch so it's mostly just making me
grumpy that proper cleanup isn't already here. Can this be the first
thing done in probe instead?
Robin Murphy Jan. 14, 2022, 1:23 p.m. UTC | #2
On 2022-01-10 18:13, Alyssa Rosenzweig wrote:
> Set a mysterious chicken bit in the MT8192 clock driver (!) to get the
> Mali GPU on MT8192 to work. This workaround is from the downstream Mali
> driver shipped in ChromeOS. The change there is unsuitable for mainline
> but good as a reference for the hardware behaviour:
> 
> https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5
> 
> That links to an internal Google issue tracker which I assume has more
> information on the bug. I would appreciate if someone from Google or
> MediaTek could explain what this change actually does and why it's
> necessary on MT8192.
> 
> At any rate, this register logically belongs to the MT8192 "infra" clock
> device, so it makes sense to set it there too. This avoids adding any
> platform-specific hacks to the 3D driver, either mainline (Panfrost) or
> legacy (kbase).

Does this really have anything to do with clocks? My (uninformed) 
impression is that the infracfg blocks are general amalgamations of 
configuration registers that simply happen to contain clock and reset 
controls among other functionality. In particular, "ACP" usually refers 
to the Accelerator Coherency Port of a CPU cluster or DSU, and given the 
stated symptom of the issue affected by it, my first guess would be that 
this bit might indeed control routing of GPU traffic either to the ACP 
or the (presumably non-coherent) main interconnect.

If that is the case, I think this would logically belong as a 
SoC-specific quirk in panfrost, where we'd need to retrieve the syscon 
regmap for ourselves (see around line 800 of drivers/iommu/mtk_iommu.c 
for a similar example).

Cheers,
Robin.

> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Cc: Nick Fan <Nick.Fan@mediatek.com>
> Cc: Nicolas Boichat <drinkcat@chromium.org>
> ---
>   drivers/clk/mediatek/clk-mt8192.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/clk/mediatek/clk-mt8192.c b/drivers/clk/mediatek/clk-mt8192.c
> index cbc7c6dbe0f4..e3673494d08d 100644
> --- a/drivers/clk/mediatek/clk-mt8192.c
> +++ b/drivers/clk/mediatek/clk-mt8192.c
> @@ -1179,6 +1179,10 @@ static const struct mtk_pll_data plls[] = {
>   
>   static struct clk_onecell_data *top_clk_data;
>   
> +/* Control registers in the infra block used to set a chicken bit */
> +#define INFRA_CTRL 0x290
> +#define INFRA_CTRL_DISABLE_MFG2ACP BIT(9)
> +
>   static void clk_mt8192_top_init_early(struct device_node *node)
>   {
>   	int i;
> @@ -1224,6 +1228,29 @@ static int clk_mt8192_top_probe(struct platform_device *pdev)
>   	return of_clk_add_provider(node, of_clk_src_onecell_get, top_clk_data);
>   }
>   
> +/*
> + * Disable ACP on the infra clock. Setting this quirk is required for 3D to
> + * work correctly. Without this quirk, any work queued to the Mali GPU faults,
> + * for example raising a Data Invalid Fault. This suggests the GPU is failing
> + * to read back the contents of shared CPU/GPU memory correctly, perhaps due to
> + * a MT8192 platform integration issue breaking memory or caches.
> + *
> + * Relevant downstream change:
> + * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5
> + */
> +static int clk_mt8192_infra_disable_mfg2acp(struct platform_device *pdev)
> +{
> +	void __iomem *base = devm_platform_ioremap_resource(pdev, 0);
> +	void __iomem *infra_ctrl = base + INFRA_CTRL;
> +
> +	if (IS_ERR(base))
> +		return PTR_ERR(base);
> +
> +	writel(readl(infra_ctrl) | INFRA_CTRL_DISABLE_MFG2ACP, infra_ctrl);
> +
> +	return 0;
> +}
> +
>   static int clk_mt8192_infra_probe(struct platform_device *pdev)
>   {
>   	struct clk_onecell_data *clk_data;
> @@ -1238,6 +1265,10 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev)
>   	if (r)
>   		return r;
>   
> +	r = clk_mt8192_infra_disable_mfg2acp(pdev);
> +	if (r)
> +		return r;
> +
>   	return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data);
>   }
>
Alyssa Rosenzweig Jan. 14, 2022, 1:47 p.m. UTC | #3
> > That links to an internal Google issue tracker which I assume has more
> > information on the bug. I would appreciate if someone from Google or
> > MediaTek could explain what this change actually does and why it's
> > necessary on MT8192.
> > 
> > At any rate, this register logically belongs to the MT8192 "infra" clock
> > device, so it makes sense to set it there too. This avoids adding any
> > platform-specific hacks to the 3D driver, either mainline (Panfrost) or
> > legacy (kbase).
> 
> Does this really have anything to do with clocks?

I have no idea. MediaTek, Google, please explain.

> In particular, "ACP" usually refers to the Accelerator Coherency Port
> of a CPU cluster or DSU, and given the stated symptom of the issue
> affected by it, my first guess would be that this bit might indeed
> control routing of GPU traffic either to the ACP or the (presumably
> non-coherent) main interconnect.

I'd easily believe that.

> If that is the case, I think this would logically belong as a SoC-specific
> quirk in panfrost, where we'd need to retrieve the syscon regmap for
> ourselves (see around line 800 of drivers/iommu/mtk_iommu.c for a similar
> example).

Alright. Doing this in panfrost seems ugly but if that's the right place
for it, that's the right place for it.
Stephen Boyd Jan. 14, 2022, 10:55 p.m. UTC | #4
Quoting Robin Murphy (2022-01-14 05:23:51)
> On 2022-01-10 18:13, Alyssa Rosenzweig wrote:
> > Set a mysterious chicken bit in the MT8192 clock driver (!) to get the
> > Mali GPU on MT8192 to work. This workaround is from the downstream Mali
> > driver shipped in ChromeOS. The change there is unsuitable for mainline
> > but good as a reference for the hardware behaviour:
> > 
> > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5
> > 
> > That links to an internal Google issue tracker which I assume has more
> > information on the bug. I would appreciate if someone from Google or
> > MediaTek could explain what this change actually does and why it's
> > necessary on MT8192.
> > 
> > At any rate, this register logically belongs to the MT8192 "infra" clock
> > device, so it makes sense to set it there too. This avoids adding any
> > platform-specific hacks to the 3D driver, either mainline (Panfrost) or
> > legacy (kbase).
> 
> Does this really have anything to do with clocks? My (uninformed) 
> impression is that the infracfg blocks are general amalgamations of 
> configuration registers that simply happen to contain clock and reset 
> controls among other functionality. In particular, "ACP" usually refers 
> to the Accelerator Coherency Port of a CPU cluster or DSU, and given the 
> stated symptom of the issue affected by it, my first guess would be that 
> this bit might indeed control routing of GPU traffic either to the ACP 
> or the (presumably non-coherent) main interconnect.
> 
> If that is the case, I think this would logically belong as a 
> SoC-specific quirk in panfrost, where we'd need to retrieve the syscon 
> regmap for ourselves (see around line 800 of drivers/iommu/mtk_iommu.c 
> for a similar example).

What's the benefit of putting this into the GPU driver? Is it going to
be runtime managed? It seems most logically related to the SoC or
interconnect, for which there isn't any device driver but there could
be.
Chen-Yu Tsai Jan. 18, 2022, 7:19 a.m. UTC | #5
Hi,

On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote:
>
> > > That links to an internal Google issue tracker which I assume has more
> > > information on the bug. I would appreciate if someone from Google or
> > > MediaTek could explain what this change actually does and why it's
> > > necessary on MT8192.
> > >
> > > At any rate, this register logically belongs to the MT8192 "infra" clock
> > > device, so it makes sense to set it there too. This avoids adding any
> > > platform-specific hacks to the 3D driver, either mainline (Panfrost) or
> > > legacy (kbase).
> >
> > Does this really have anything to do with clocks?
>
> I have no idea. MediaTek, Google, please explain.
>
> > In particular, "ACP" usually refers to the Accelerator Coherency Port
> > of a CPU cluster or DSU, and given the stated symptom of the issue
> > affected by it, my first guess would be that this bit might indeed
> > control routing of GPU traffic either to the ACP or the (presumably
> > non-coherent) main interconnect.
>
> I'd easily believe that.

As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port.
And the bit in infracfg toggles whether ACP is used or not.

Explanation from MediaTek in verbatim:

-------------------------------------------------------------------------
The ACP path on MT8192 is just for experimental only.
We are not intended to enable ACP by design.

But due to an unexpected operation, it was accidently opened by default.
So we need a patch to disable the ACP for MT8192.
-------------------------------------------------------------------------


Regards
ChenYu

> > If that is the case, I think this would logically belong as a SoC-specific
> > quirk in panfrost, where we'd need to retrieve the syscon regmap for
> > ourselves (see around line 800 of drivers/iommu/mtk_iommu.c for a similar
> > example).
>
> Alright. Doing this in panfrost seems ugly but if that's the right place
> for it, that's the right place for it.
Robin Murphy Jan. 18, 2022, 3:01 p.m. UTC | #6
On 2022-01-18 07:19, Chen-Yu Tsai wrote:
> Hi,
> 
> On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote:
>>
>>>> That links to an internal Google issue tracker which I assume has more
>>>> information on the bug. I would appreciate if someone from Google or
>>>> MediaTek could explain what this change actually does and why it's
>>>> necessary on MT8192.
>>>>
>>>> At any rate, this register logically belongs to the MT8192 "infra" clock
>>>> device, so it makes sense to set it there too. This avoids adding any
>>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or
>>>> legacy (kbase).
>>>
>>> Does this really have anything to do with clocks?
>>
>> I have no idea. MediaTek, Google, please explain.
>>
>>> In particular, "ACP" usually refers to the Accelerator Coherency Port
>>> of a CPU cluster or DSU, and given the stated symptom of the issue
>>> affected by it, my first guess would be that this bit might indeed
>>> control routing of GPU traffic either to the ACP or the (presumably
>>> non-coherent) main interconnect.
>>
>> I'd easily believe that.
> 
> As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port.
> And the bit in infracfg toggles whether ACP is used or not.
> 
> Explanation from MediaTek in verbatim:
> 
> -------------------------------------------------------------------------
> The ACP path on MT8192 is just for experimental only.
> We are not intended to enable ACP by design.
> 
> But due to an unexpected operation, it was accidently opened by default.
> So we need a patch to disable the ACP for MT8192.
> -------------------------------------------------------------------------

Aha! That's great, thanks ChenYu!

Stephen, my thinking here is that if this feature controls the GPU 
interconnect, and only matters when the GPU is going to be used (as 
strongly implied by the downstream implementation), then the GPU driver 
is the only interested party and may as well take responsibility if 
there's no better alternative.

I'd agree that if there was already a "base" infracfg driver doing 
general system-wide set-and-forget configuration then it would equally 
well fit in there, but that doesn't seem to be the case. Short of trying 
to abuse the bp_infracfg data in the mtk-pm-domains driver (which 
doesn't seem like a particularly pleasant idea), the code to poke a bit 
into a syscon regmap is going to be pretty much the same wherever we add 
it. There's already a bit of a pattern for MTK drivers to look up and 
poke their own infracfg bits directly as needed, so between that and the 
downstream implementation for this particular bit, leaving it to 
Panfrost seems like the least surprising option.

Cheers,
Robin.
Stephen Boyd Jan. 19, 2022, 2:18 a.m. UTC | #7
Quoting Robin Murphy (2022-01-18 07:01:46)
> On 2022-01-18 07:19, Chen-Yu Tsai wrote:
> > Hi,
> > 
> > On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote:
> >>
> >>>> That links to an internal Google issue tracker which I assume has more
> >>>> information on the bug. I would appreciate if someone from Google or
> >>>> MediaTek could explain what this change actually does and why it's
> >>>> necessary on MT8192.
> >>>>
> >>>> At any rate, this register logically belongs to the MT8192 "infra" clock
> >>>> device, so it makes sense to set it there too. This avoids adding any
> >>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or
> >>>> legacy (kbase).
> >>>
> >>> Does this really have anything to do with clocks?
> >>
> >> I have no idea. MediaTek, Google, please explain.
> >>
> >>> In particular, "ACP" usually refers to the Accelerator Coherency Port
> >>> of a CPU cluster or DSU, and given the stated symptom of the issue
> >>> affected by it, my first guess would be that this bit might indeed
> >>> control routing of GPU traffic either to the ACP or the (presumably
> >>> non-coherent) main interconnect.
> >>
> >> I'd easily believe that.
> > 
> > As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port.
> > And the bit in infracfg toggles whether ACP is used or not.
> > 
> > Explanation from MediaTek in verbatim:
> > 
> > -------------------------------------------------------------------------
> > The ACP path on MT8192 is just for experimental only.
> > We are not intended to enable ACP by design.
> > 
> > But due to an unexpected operation, it was accidently opened by default.
> > So we need a patch to disable the ACP for MT8192.
> > -------------------------------------------------------------------------
> 
> Aha! That's great, thanks ChenYu!
> 
> Stephen, my thinking here is that if this feature controls the GPU 
> interconnect, and only matters when the GPU is going to be used (as 
> strongly implied by the downstream implementation), then the GPU driver 
> is the only interested party and may as well take responsibility if 
> there's no better alternative.
> 
> I'd agree that if there was already a "base" infracfg driver doing 
> general system-wide set-and-forget configuration then it would equally 
> well fit in there, but that doesn't seem to be the case.

Wouldn't this first set-and-forget configuration fit that bill? We can't
have a "base" driver because why?

> Short of trying 
> to abuse the bp_infracfg data in the mtk-pm-domains driver (which 
> doesn't seem like a particularly pleasant idea), the code to poke a bit 
> into a syscon regmap is going to be pretty much the same wherever we add 
> it. There's already a bit of a pattern for MTK drivers to look up and 
> poke their own infracfg bits directly as needed, so between that and the 
> downstream implementation for this particular bit, leaving it to 
> Panfrost seems like the least surprising option.
> 

I'd prefer we leave the SoC glue out of device drivers for subsystems
that really don't want to or need to know about the SoC level details.
The GPU driver wants to live life drawing triangles! :) It doesn't want
to know that the ACP path didn't work out on some SoC it got plopped
down into. And of course GPU is the only interested party, because the
SoC glue for the GPU is all messed up so GPU can't operate properly
without this bit toggled. I wonder where the fix would end up if this
port was shared by more than one driver. Probably back here in the
closest thing there is to the SoC driver.

It's not as simple as poking bits in some SoC glue IO space
unconditionally either. The GPU driver will need to know which SoC is
being used and then only poke the bits if the affected SoC is in use. Or
we'll have some DT binding update to poke the bit if some syscon
property is present in the DT node. Either way, it's a set-and-forget
thing, so the GPU driver will now have some set-and-forget logic for one
SoC out of many that it supports; do it once at boot, grab a regmap,
parse some more stuff to make sure it's needed, poke the bit, release
the regmap, finally start drawing.

Of course, I won't oppose the mess being moved somewhere outside of the
subsystem I maintain ;-) I was mainly curious to understand why the
regmap path is proposed.
Robin Murphy Jan. 20, 2022, 2:22 p.m. UTC | #8
On 2022-01-19 02:18, Stephen Boyd wrote:
> Quoting Robin Murphy (2022-01-18 07:01:46)
>> On 2022-01-18 07:19, Chen-Yu Tsai wrote:
>>> Hi,
>>>
>>> On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote:
>>>>
>>>>>> That links to an internal Google issue tracker which I assume has more
>>>>>> information on the bug. I would appreciate if someone from Google or
>>>>>> MediaTek could explain what this change actually does and why it's
>>>>>> necessary on MT8192.
>>>>>>
>>>>>> At any rate, this register logically belongs to the MT8192 "infra" clock
>>>>>> device, so it makes sense to set it there too. This avoids adding any
>>>>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or
>>>>>> legacy (kbase).
>>>>>
>>>>> Does this really have anything to do with clocks?
>>>>
>>>> I have no idea. MediaTek, Google, please explain.
>>>>
>>>>> In particular, "ACP" usually refers to the Accelerator Coherency Port
>>>>> of a CPU cluster or DSU, and given the stated symptom of the issue
>>>>> affected by it, my first guess would be that this bit might indeed
>>>>> control routing of GPU traffic either to the ACP or the (presumably
>>>>> non-coherent) main interconnect.
>>>>
>>>> I'd easily believe that.
>>>
>>> As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port.
>>> And the bit in infracfg toggles whether ACP is used or not.
>>>
>>> Explanation from MediaTek in verbatim:
>>>
>>> -------------------------------------------------------------------------
>>> The ACP path on MT8192 is just for experimental only.
>>> We are not intended to enable ACP by design.
>>>
>>> But due to an unexpected operation, it was accidently opened by default.
>>> So we need a patch to disable the ACP for MT8192.
>>> -------------------------------------------------------------------------
>>
>> Aha! That's great, thanks ChenYu!
>>
>> Stephen, my thinking here is that if this feature controls the GPU
>> interconnect, and only matters when the GPU is going to be used (as
>> strongly implied by the downstream implementation), then the GPU driver
>> is the only interested party and may as well take responsibility if
>> there's no better alternative.
>>
>> I'd agree that if there was already a "base" infracfg driver doing
>> general system-wide set-and-forget configuration then it would equally
>> well fit in there, but that doesn't seem to be the case.
> 
> Wouldn't this first set-and-forget configuration fit that bill? We can't
> have a "base" driver because why?

Sure, everything has a starting point somewhere, it just means more work 
for someone to have to do. I'm not that person - I'm just here as a 
curious reviewer asking questions to help refine the abstraction - so I 
chose to lean towards the pragmatic side here given what I know about 
how much Alyssa enjoys kernel development ;)

>> Short of trying
>> to abuse the bp_infracfg data in the mtk-pm-domains driver (which
>> doesn't seem like a particularly pleasant idea), the code to poke a bit
>> into a syscon regmap is going to be pretty much the same wherever we add
>> it. There's already a bit of a pattern for MTK drivers to look up and
>> poke their own infracfg bits directly as needed, so between that and the
>> downstream implementation for this particular bit, leaving it to
>> Panfrost seems like the least surprising option.
>>
> 
> I'd prefer we leave the SoC glue out of device drivers for subsystems
> that really don't want to or need to know about the SoC level details.
> The GPU driver wants to live life drawing triangles! :) It doesn't want
> to know that the ACP path didn't work out on some SoC it got plopped
> down into. And of course GPU is the only interested party, because the
> SoC glue for the GPU is all messed up so GPU can't operate properly
> without this bit toggled. I wonder where the fix would end up if this
> port was shared by more than one driver. Probably back here in the
> closest thing there is to the SoC driver.

As I hoped to imply, I agree that that's a perfectly valid line of 
reasoning too. However it does gloss over certain other considerations 
like managing dependencies between the drivers such that it's not too 
cryptic for a user to configure a kernel that actually works as 
expected, and the GPU driver has a guarantee that the configuration 
really has been done by the point that it wants to start DMA, for instance.

> It's not as simple as poking bits in some SoC glue IO space
> unconditionally either. The GPU driver will need to know which SoC is
> being used and then only poke the bits if the affected SoC is in use. Or
> we'll have some DT binding update to poke the bit if some syscon
> property is present in the DT node. Either way, it's a set-and-forget
> thing, so the GPU driver will now have some set-and-forget logic for one
> SoC out of many that it supports; do it once at boot, grab a regmap,
> parse some more stuff to make sure it's needed, poke the bit, release
> the regmap, finally start drawing.

In this case we do happen to have this handy function called 
panfrost_probe() which already deals with one-off startup stuff :P

We also already have SoC-specific GPU compatibles because even without 
experimental interconnect easter eggs, people integrate these IPs in 
fairly involved ways and there's a fair degree of variety. However 
unless we want to be super-strict it's also not too hard to simply 
assume that if we can find a "mediatek,mt8192-infracfg" syscon then we 
set the MT8192 magic bit within it, and if we can't then we don't.

> Of course, I won't oppose the mess being moved somewhere outside of the
> subsystem I maintain ;-) I was mainly curious to understand why the
> regmap path is proposed.

Well, regmap because it's a syscon, so whoever's accessing it that 
should be via its existing regmap rather than going behind its back. To 
be fair, there is a nascent infracfg "driver" already (even if it's just 
two helper functions), so adding some new infrastructure in there is a 
clear possibility - the functionally-similar Rockchip GRF already has 
something comparable, for example - it's just somewhat more code and 
more work thinking through the additional reasoning, compared to piling 
SoC-specific GPU-related stuff into the place that already knows about 
SoC-specific GPU stuff. As things stand, if someone *is* prepared to 
take that on then it's fine by me!

FWIW, I have no desire to look more closely at the downstream driver, 
but I did notice in the context of the linked patch that there appeared 
to be some power-management-looking stuff as well as this magic bit, so 
if it's possible that that might be something we care about in future 
and mean we end up needing to poke syscons from Panfrost anyway, it 
might want factoring in to the decision.

Cheers,
Robin.
Alyssa Rosenzweig Jan. 20, 2022, 2:27 p.m. UTC | #9
> We also already have SoC-specific GPU compatibles because even without
> experimental interconnect easter eggs, people integrate these IPs in fairly
> involved ways and there's a fair degree of variety. However unless we want
> to be super-strict it's also not too hard to simply assume that if we can
> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit
> within it, and if we can't then we don't.

We need a MT8192-specific compatible for the GPU anyway due to "unique"
power management requirements, this is why the MT8183 before it has a
specific GPU compatible. So I'm not worried about the compatible.
AngeloGioacchino Del Regno Feb. 15, 2022, 10:44 a.m. UTC | #10
Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto:
>> We also already have SoC-specific GPU compatibles because even without
>> experimental interconnect easter eggs, people integrate these IPs in fairly
>> involved ways and there's a fair degree of variety. However unless we want
>> to be super-strict it's also not too hard to simply assume that if we can
>> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit
>> within it, and if we can't then we don't.
> 
> We need a MT8192-specific compatible for the GPU anyway due to "unique"
> power management requirements, this is why the MT8183 before it has a
> specific GPU compatible. So I'm not worried about the compatible.
> 

Thing is, as it was explained, this is about a unwanted SoC misconfiguration,
hence this is very specific to one SoC, which *happens to* integrate a Mali GPU.

I agree with Stephen's reasoning - also in my opinion, the panfrost driver should
be dedicated to managing the Mali GPUs and *not* the SoCs on which it is present,
so disabling the Accelerator Coherency Port for MFG should be performed inside of
files that are dealing with the specific SoC that requires this configuration (or,
if you want, quirk).

Simply put, though, as you already perfectly know, there is no driver that is
dedicated to exclusively manage the "extra" INFRA bits, so here's what I've been
thinking for a while; my logical reasoning:
- Doing it in the IOMMU driver may seem at a first glance to make some sense,
   but then, does this really have anything to do with the IOMMU? I don't think so;
- Performing the disablement in mtk-pm-domains is very shady... there's nothing
   that screams "power" in that;
- This doesn't scream "clocks" either, I understand that;
- As far as I understand this specific thing won't happen anymore (or at least,
   not in MediaTek land, but I also don't expect to see this on other SoCs).

Getting back to MediaTek-land, only MT8192 is (and will ever be) affected, from
what I understand... and there is one driver that is very specific to, targets
only, and would probe only on MT8192 - which also happens to manage the very same
iospace that we also want to poke at to disable this bit......

... clk-mt8192!

For this reason, I think that (unfortunately, to some extent) the most transparent
and cleanest approach is the one that Alyssa took, which is to perform this
set-and-forget operation there - also keeping in mind that panfrost obviously has
no way to finish probing (hence, no way to actually use the GPU device) *before*
the driver that provides clocks to it also probes and registers .. the clocks.

My personal view on this would imply that Alyssa sends a v2 of this commit that
includes MediaTek's explanation on why the ACP has to be disabled (as much as
nicely expanding the ACP acronym as a documentation effort).

I'm sorry for this wall of text (and for boring you with it! :P), but I felt like
giving an extensive personal opinion on this would've been nice.

Thank you all!
Angelo
Robin Murphy Feb. 15, 2022, 3:21 p.m. UTC | #11
On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote:
> Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto:
>>> We also already have SoC-specific GPU compatibles because even without
>>> experimental interconnect easter eggs, people integrate these IPs in 
>>> fairly
>>> involved ways and there's a fair degree of variety. However unless we 
>>> want
>>> to be super-strict it's also not too hard to simply assume that if we 
>>> can
>>> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic 
>>> bit
>>> within it, and if we can't then we don't.
>>
>> We need a MT8192-specific compatible for the GPU anyway due to "unique"
>> power management requirements, this is why the MT8183 before it has a
>> specific GPU compatible. So I'm not worried about the compatible.
>>
> 
> Thing is, as it was explained, this is about a unwanted SoC 
> misconfiguration,
> hence this is very specific to one SoC, which *happens to* integrate a 
> Mali GPU.
> 
> I agree with Stephen's reasoning - also in my opinion, the panfrost 
> driver should
> be dedicated to managing the Mali GPUs and *not* the SoCs on which it is 
> present,
> so disabling the Accelerator Coherency Port for MFG should be performed 
> inside of
> files that are dealing with the specific SoC that requires this 
> configuration (or,
> if you want, quirk).
> 
> Simply put, though, as you already perfectly know, there is no driver 
> that is
> dedicated to exclusively manage the "extra" INFRA bits, so here's what 
> I've been
> thinking for a while; my logical reasoning:
> - Doing it in the IOMMU driver may seem at a first glance to make some 
> sense,
>    but then, does this really have anything to do with the IOMMU? I 
> don't think so;
> - Performing the disablement in mtk-pm-domains is very shady... there's 
> nothing
>    that screams "power" in that;
> - This doesn't scream "clocks" either, I understand that;
> - As far as I understand this specific thing won't happen anymore (or at 
> least,
>    not in MediaTek land, but I also don't expect to see this on other 
> SoCs).
> 
> Getting back to MediaTek-land, only MT8192 is (and will ever be) 
> affected, from
> what I understand... and there is one driver that is very specific to, 
> targets
> only, and would probe only on MT8192 - which also happens to manage the 
> very same
> iospace that we also want to poke at to disable this bit......
> 
> ... clk-mt8192!

The only trouble with that argument is that it falls apart under the
slightest scrutiny ;)

The relevant parts of the power domain, IOMMU and GPU drivers are every
bit as SoC-specific as the clock driver, and it is the generic syscon
driver which owns and manages the actual MMIO space.

All in all this has now convinced me that it *is* worth the slight extra
effort to put random infracfg stuff in the random infracfg stuff place.
And by "slight" I mean it turns out I've spent longer writing this prose
than bashing out the illustrative diff below, which is the last opinion
I shall have on this matter. Feel free to take it and fix it, or do 
anything else if you prefer :)

Cheers,
Robin.

----->8-----
diff --git a/drivers/soc/mediatek/mtk-infracfg.c 
b/drivers/soc/mediatek/mtk-infracfg.c
index 0590b68e0d78..af22d0f3d547 100644
--- a/drivers/soc/mediatek/mtk-infracfg.c
+++ b/drivers/soc/mediatek/mtk-infracfg.c
@@ -72,3 +72,20 @@ int mtk_infracfg_clear_bus_protection(struct regmap 
*infracfg, u32 mask,

  	return ret;
  }
+
+static int __init mtk_infracfg_init(void)
+{
+	struct regmap *infracfg;
+
+	/*
+	 * MT8192 has an experimental path to route GPU traffic to the DSU's
+	 * Accelerator Coherency Port, which is inadvertently enabled by
+	 * default. It turns out not to work very well, so disable it.
+	 */
+	infracfg = syscon_regmap_lookup_by_compatible("mediatek,mt8192-infracfg");
+	if (infracfg)
+		regmap_set_bits(infracfg, MT8192_INFRA_CTRL,
+				MT8192_INFRA_CTRL_DISABLE_MFG2ACP);
+	return 0;
+}
+postcore_initcall(mtk_infracfg_init);
diff --git a/include/linux/soc/mediatek/infracfg.h 
b/include/linux/soc/mediatek/infracfg.h
index 4615a228da51..94f0f338ce49 100644
--- a/include/linux/soc/mediatek/infracfg.h
+++ b/include/linux/soc/mediatek/infracfg.h
@@ -147,6 +147,9 @@
  #define INFRA_TOPAXI_PROTECTEN_SET		0x0260
  #define INFRA_TOPAXI_PROTECTEN_CLR		0x0264

+#define MT8192_INFRA_CTRL			0x290
+#define MT8192_INFRA_CTRL_DISABLE_MFG2ACP	BIT(9)
+
  #define REG_INFRA_MISC				0xf00
  #define F_DDR_4GB_SUPPORT_EN			BIT(13)
AngeloGioacchino Del Regno Feb. 15, 2022, 4:12 p.m. UTC | #12
Il 15/02/22 16:21, Robin Murphy ha scritto:
> On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote:
>> Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto:
>>>> We also already have SoC-specific GPU compatibles because even without
>>>> experimental interconnect easter eggs, people integrate these IPs in fairly
>>>> involved ways and there's a fair degree of variety. However unless we want
>>>> to be super-strict it's also not too hard to simply assume that if we can
>>>> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit
>>>> within it, and if we can't then we don't.
>>>
>>> We need a MT8192-specific compatible for the GPU anyway due to "unique"
>>> power management requirements, this is why the MT8183 before it has a
>>> specific GPU compatible. So I'm not worried about the compatible.
>>>
>>
>> Thing is, as it was explained, this is about a unwanted SoC misconfiguration,
>> hence this is very specific to one SoC, which *happens to* integrate a Mali GPU.
>>
>> I agree with Stephen's reasoning - also in my opinion, the panfrost driver should
>> be dedicated to managing the Mali GPUs and *not* the SoCs on which it is present,
>> so disabling the Accelerator Coherency Port for MFG should be performed inside of
>> files that are dealing with the specific SoC that requires this configuration (or,
>> if you want, quirk).
>>
>> Simply put, though, as you already perfectly know, there is no driver that is
>> dedicated to exclusively manage the "extra" INFRA bits, so here's what I've been
>> thinking for a while; my logical reasoning:
>> - Doing it in the IOMMU driver may seem at a first glance to make some sense,
>>    but then, does this really have anything to do with the IOMMU? I don't think so;
>> - Performing the disablement in mtk-pm-domains is very shady... there's nothing
>>    that screams "power" in that;
>> - This doesn't scream "clocks" either, I understand that;
>> - As far as I understand this specific thing won't happen anymore (or at least,
>>    not in MediaTek land, but I also don't expect to see this on other SoCs).
>>
>> Getting back to MediaTek-land, only MT8192 is (and will ever be) affected, from
>> what I understand... and there is one driver that is very specific to, targets
>> only, and would probe only on MT8192 - which also happens to manage the very same
>> iospace that we also want to poke at to disable this bit......
>>
>> ... clk-mt8192!
> 
> The only trouble with that argument is that it falls apart under the
> slightest scrutiny ;)
> 
> The relevant parts of the power domain, IOMMU and GPU drivers are every
> bit as SoC-specific as the clock driver, and it is the generic syscon
> driver which owns and manages the actual MMIO space.
> 
> All in all this has now convinced me that it *is* worth the slight extra
> effort to put random infracfg stuff in the random infracfg stuff place.
> And by "slight" I mean it turns out I've spent longer writing this prose
> than bashing out the illustrative diff below, which is the last opinion
> I shall have on this matter. Feel free to take it and fix it, or do anything else 
> if you prefer :)
> 

Hello Robin,

I am a bit ashamed to admit that I haven't fully considered the option of putting
the random infracfg stuff in the random infracfg stuff place.

This solution looks like being the most sensible thing that we can actually do to
patch up this bit.

Thanks a lot for the advice!!

Cheers,
Angelo

> Cheers,
> Robin.
> 
> ----->8-----
> diff --git a/drivers/soc/mediatek/mtk-infracfg.c b/drivers/soc/mediatek/mtk-infracfg.c
> index 0590b68e0d78..af22d0f3d547 100644
> --- a/drivers/soc/mediatek/mtk-infracfg.c
> +++ b/drivers/soc/mediatek/mtk-infracfg.c
> @@ -72,3 +72,20 @@ int mtk_infracfg_clear_bus_protection(struct regmap *infracfg, 
> u32 mask,
> 
>       return ret;
>   }
> +
> +static int __init mtk_infracfg_init(void)
> +{
> +    struct regmap *infracfg;
> +
> +    /*
> +     * MT8192 has an experimental path to route GPU traffic to the DSU's
> +     * Accelerator Coherency Port, which is inadvertently enabled by
> +     * default. It turns out not to work very well, so disable it.
> +     */
> +    infracfg = syscon_regmap_lookup_by_compatible("mediatek,mt8192-infracfg");
> +    if (infracfg)
> +        regmap_set_bits(infracfg, MT8192_INFRA_CTRL,
> +                MT8192_INFRA_CTRL_DISABLE_MFG2ACP);
> +    return 0;
> +}
> +postcore_initcall(mtk_infracfg_init);
> diff --git a/include/linux/soc/mediatek/infracfg.h 
> b/include/linux/soc/mediatek/infracfg.h
> index 4615a228da51..94f0f338ce49 100644
> --- a/include/linux/soc/mediatek/infracfg.h
> +++ b/include/linux/soc/mediatek/infracfg.h
> @@ -147,6 +147,9 @@
>   #define INFRA_TOPAXI_PROTECTEN_SET        0x0260
>   #define INFRA_TOPAXI_PROTECTEN_CLR        0x0264
> 
> +#define MT8192_INFRA_CTRL            0x290
> +#define MT8192_INFRA_CTRL_DISABLE_MFG2ACP    BIT(9)
> +
>   #define REG_INFRA_MISC                0xf00
>   #define F_DDR_4GB_SUPPORT_EN            BIT(13)
>
Stephen Boyd Feb. 17, 2022, 9:40 p.m. UTC | #13
Quoting Robin Murphy (2022-02-15 07:21:52)
> On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote:
> 
> All in all this has now convinced me that it *is* worth the slight extra
> effort to put random infracfg stuff in the random infracfg stuff place.
> And by "slight" I mean it turns out I've spent longer writing this prose
> than bashing out the illustrative diff below, which is the last opinion
> I shall have on this matter. Feel free to take it and fix it, or do 
> anything else if you prefer :)
> 
> Cheers,
> Robin.
> 

Acked-by: Stephen Boyd <sboyd@kernel.org>
diff mbox series

Patch

diff --git a/drivers/clk/mediatek/clk-mt8192.c b/drivers/clk/mediatek/clk-mt8192.c
index cbc7c6dbe0f4..e3673494d08d 100644
--- a/drivers/clk/mediatek/clk-mt8192.c
+++ b/drivers/clk/mediatek/clk-mt8192.c
@@ -1179,6 +1179,10 @@  static const struct mtk_pll_data plls[] = {
 
 static struct clk_onecell_data *top_clk_data;
 
+/* Control registers in the infra block used to set a chicken bit */
+#define INFRA_CTRL 0x290
+#define INFRA_CTRL_DISABLE_MFG2ACP BIT(9)
+
 static void clk_mt8192_top_init_early(struct device_node *node)
 {
 	int i;
@@ -1224,6 +1228,29 @@  static int clk_mt8192_top_probe(struct platform_device *pdev)
 	return of_clk_add_provider(node, of_clk_src_onecell_get, top_clk_data);
 }
 
+/*
+ * Disable ACP on the infra clock. Setting this quirk is required for 3D to
+ * work correctly. Without this quirk, any work queued to the Mali GPU faults,
+ * for example raising a Data Invalid Fault. This suggests the GPU is failing
+ * to read back the contents of shared CPU/GPU memory correctly, perhaps due to
+ * a MT8192 platform integration issue breaking memory or caches.
+ *
+ * Relevant downstream change:
+ * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5
+ */
+static int clk_mt8192_infra_disable_mfg2acp(struct platform_device *pdev)
+{
+	void __iomem *base = devm_platform_ioremap_resource(pdev, 0);
+	void __iomem *infra_ctrl = base + INFRA_CTRL;
+
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	writel(readl(infra_ctrl) | INFRA_CTRL_DISABLE_MFG2ACP, infra_ctrl);
+
+	return 0;
+}
+
 static int clk_mt8192_infra_probe(struct platform_device *pdev)
 {
 	struct clk_onecell_data *clk_data;
@@ -1238,6 +1265,10 @@  static int clk_mt8192_infra_probe(struct platform_device *pdev)
 	if (r)
 		return r;
 
+	r = clk_mt8192_infra_disable_mfg2acp(pdev);
+	if (r)
+		return r;
+
 	return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data);
 }