Message ID | 20220110181330.3224-1-alyssa.rosenzweig@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clk: mediatek: Disable ACP to fix 3D on MT8192 | expand |
Quoting Alyssa Rosenzweig (2022-01-10 10:13:30) > @@ -1238,6 +1265,10 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev) > if (r) > return r; > > + r = clk_mt8192_infra_disable_mfg2acp(pdev); > + if (r) > + return r; If this fails it leaves a bunch of allocations around that never get freed. It's a problem before this patch so it's mostly just making me grumpy that proper cleanup isn't already here. Can this be the first thing done in probe instead?
On 2022-01-10 18:13, Alyssa Rosenzweig wrote: > Set a mysterious chicken bit in the MT8192 clock driver (!) to get the > Mali GPU on MT8192 to work. This workaround is from the downstream Mali > driver shipped in ChromeOS. The change there is unsuitable for mainline > but good as a reference for the hardware behaviour: > > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 > > That links to an internal Google issue tracker which I assume has more > information on the bug. I would appreciate if someone from Google or > MediaTek could explain what this change actually does and why it's > necessary on MT8192. > > At any rate, this register logically belongs to the MT8192 "infra" clock > device, so it makes sense to set it there too. This avoids adding any > platform-specific hacks to the 3D driver, either mainline (Panfrost) or > legacy (kbase). Does this really have anything to do with clocks? My (uninformed) impression is that the infracfg blocks are general amalgamations of configuration registers that simply happen to contain clock and reset controls among other functionality. In particular, "ACP" usually refers to the Accelerator Coherency Port of a CPU cluster or DSU, and given the stated symptom of the issue affected by it, my first guess would be that this bit might indeed control routing of GPU traffic either to the ACP or the (presumably non-coherent) main interconnect. If that is the case, I think this would logically belong as a SoC-specific quirk in panfrost, where we'd need to retrieve the syscon regmap for ourselves (see around line 800 of drivers/iommu/mtk_iommu.c for a similar example). Cheers, Robin. > Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> > Cc: Nick Fan <Nick.Fan@mediatek.com> > Cc: Nicolas Boichat <drinkcat@chromium.org> > --- > drivers/clk/mediatek/clk-mt8192.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/drivers/clk/mediatek/clk-mt8192.c b/drivers/clk/mediatek/clk-mt8192.c > index cbc7c6dbe0f4..e3673494d08d 100644 > --- a/drivers/clk/mediatek/clk-mt8192.c > +++ b/drivers/clk/mediatek/clk-mt8192.c > @@ -1179,6 +1179,10 @@ static const struct mtk_pll_data plls[] = { > > static struct clk_onecell_data *top_clk_data; > > +/* Control registers in the infra block used to set a chicken bit */ > +#define INFRA_CTRL 0x290 > +#define INFRA_CTRL_DISABLE_MFG2ACP BIT(9) > + > static void clk_mt8192_top_init_early(struct device_node *node) > { > int i; > @@ -1224,6 +1228,29 @@ static int clk_mt8192_top_probe(struct platform_device *pdev) > return of_clk_add_provider(node, of_clk_src_onecell_get, top_clk_data); > } > > +/* > + * Disable ACP on the infra clock. Setting this quirk is required for 3D to > + * work correctly. Without this quirk, any work queued to the Mali GPU faults, > + * for example raising a Data Invalid Fault. This suggests the GPU is failing > + * to read back the contents of shared CPU/GPU memory correctly, perhaps due to > + * a MT8192 platform integration issue breaking memory or caches. > + * > + * Relevant downstream change: > + * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 > + */ > +static int clk_mt8192_infra_disable_mfg2acp(struct platform_device *pdev) > +{ > + void __iomem *base = devm_platform_ioremap_resource(pdev, 0); > + void __iomem *infra_ctrl = base + INFRA_CTRL; > + > + if (IS_ERR(base)) > + return PTR_ERR(base); > + > + writel(readl(infra_ctrl) | INFRA_CTRL_DISABLE_MFG2ACP, infra_ctrl); > + > + return 0; > +} > + > static int clk_mt8192_infra_probe(struct platform_device *pdev) > { > struct clk_onecell_data *clk_data; > @@ -1238,6 +1265,10 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev) > if (r) > return r; > > + r = clk_mt8192_infra_disable_mfg2acp(pdev); > + if (r) > + return r; > + > return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); > } >
> > That links to an internal Google issue tracker which I assume has more > > information on the bug. I would appreciate if someone from Google or > > MediaTek could explain what this change actually does and why it's > > necessary on MT8192. > > > > At any rate, this register logically belongs to the MT8192 "infra" clock > > device, so it makes sense to set it there too. This avoids adding any > > platform-specific hacks to the 3D driver, either mainline (Panfrost) or > > legacy (kbase). > > Does this really have anything to do with clocks? I have no idea. MediaTek, Google, please explain. > In particular, "ACP" usually refers to the Accelerator Coherency Port > of a CPU cluster or DSU, and given the stated symptom of the issue > affected by it, my first guess would be that this bit might indeed > control routing of GPU traffic either to the ACP or the (presumably > non-coherent) main interconnect. I'd easily believe that. > If that is the case, I think this would logically belong as a SoC-specific > quirk in panfrost, where we'd need to retrieve the syscon regmap for > ourselves (see around line 800 of drivers/iommu/mtk_iommu.c for a similar > example). Alright. Doing this in panfrost seems ugly but if that's the right place for it, that's the right place for it.
Quoting Robin Murphy (2022-01-14 05:23:51) > On 2022-01-10 18:13, Alyssa Rosenzweig wrote: > > Set a mysterious chicken bit in the MT8192 clock driver (!) to get the > > Mali GPU on MT8192 to work. This workaround is from the downstream Mali > > driver shipped in ChromeOS. The change there is unsuitable for mainline > > but good as a reference for the hardware behaviour: > > > > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 > > > > That links to an internal Google issue tracker which I assume has more > > information on the bug. I would appreciate if someone from Google or > > MediaTek could explain what this change actually does and why it's > > necessary on MT8192. > > > > At any rate, this register logically belongs to the MT8192 "infra" clock > > device, so it makes sense to set it there too. This avoids adding any > > platform-specific hacks to the 3D driver, either mainline (Panfrost) or > > legacy (kbase). > > Does this really have anything to do with clocks? My (uninformed) > impression is that the infracfg blocks are general amalgamations of > configuration registers that simply happen to contain clock and reset > controls among other functionality. In particular, "ACP" usually refers > to the Accelerator Coherency Port of a CPU cluster or DSU, and given the > stated symptom of the issue affected by it, my first guess would be that > this bit might indeed control routing of GPU traffic either to the ACP > or the (presumably non-coherent) main interconnect. > > If that is the case, I think this would logically belong as a > SoC-specific quirk in panfrost, where we'd need to retrieve the syscon > regmap for ourselves (see around line 800 of drivers/iommu/mtk_iommu.c > for a similar example). What's the benefit of putting this into the GPU driver? Is it going to be runtime managed? It seems most logically related to the SoC or interconnect, for which there isn't any device driver but there could be.
Hi, On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote: > > > > That links to an internal Google issue tracker which I assume has more > > > information on the bug. I would appreciate if someone from Google or > > > MediaTek could explain what this change actually does and why it's > > > necessary on MT8192. > > > > > > At any rate, this register logically belongs to the MT8192 "infra" clock > > > device, so it makes sense to set it there too. This avoids adding any > > > platform-specific hacks to the 3D driver, either mainline (Panfrost) or > > > legacy (kbase). > > > > Does this really have anything to do with clocks? > > I have no idea. MediaTek, Google, please explain. > > > In particular, "ACP" usually refers to the Accelerator Coherency Port > > of a CPU cluster or DSU, and given the stated symptom of the issue > > affected by it, my first guess would be that this bit might indeed > > control routing of GPU traffic either to the ACP or the (presumably > > non-coherent) main interconnect. > > I'd easily believe that. As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port. And the bit in infracfg toggles whether ACP is used or not. Explanation from MediaTek in verbatim: ------------------------------------------------------------------------- The ACP path on MT8192 is just for experimental only. We are not intended to enable ACP by design. But due to an unexpected operation, it was accidently opened by default. So we need a patch to disable the ACP for MT8192. ------------------------------------------------------------------------- Regards ChenYu > > If that is the case, I think this would logically belong as a SoC-specific > > quirk in panfrost, where we'd need to retrieve the syscon regmap for > > ourselves (see around line 800 of drivers/iommu/mtk_iommu.c for a similar > > example). > > Alright. Doing this in panfrost seems ugly but if that's the right place > for it, that's the right place for it.
On 2022-01-18 07:19, Chen-Yu Tsai wrote: > Hi, > > On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote: >> >>>> That links to an internal Google issue tracker which I assume has more >>>> information on the bug. I would appreciate if someone from Google or >>>> MediaTek could explain what this change actually does and why it's >>>> necessary on MT8192. >>>> >>>> At any rate, this register logically belongs to the MT8192 "infra" clock >>>> device, so it makes sense to set it there too. This avoids adding any >>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or >>>> legacy (kbase). >>> >>> Does this really have anything to do with clocks? >> >> I have no idea. MediaTek, Google, please explain. >> >>> In particular, "ACP" usually refers to the Accelerator Coherency Port >>> of a CPU cluster or DSU, and given the stated symptom of the issue >>> affected by it, my first guess would be that this bit might indeed >>> control routing of GPU traffic either to the ACP or the (presumably >>> non-coherent) main interconnect. >> >> I'd easily believe that. > > As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port. > And the bit in infracfg toggles whether ACP is used or not. > > Explanation from MediaTek in verbatim: > > ------------------------------------------------------------------------- > The ACP path on MT8192 is just for experimental only. > We are not intended to enable ACP by design. > > But due to an unexpected operation, it was accidently opened by default. > So we need a patch to disable the ACP for MT8192. > ------------------------------------------------------------------------- Aha! That's great, thanks ChenYu! Stephen, my thinking here is that if this feature controls the GPU interconnect, and only matters when the GPU is going to be used (as strongly implied by the downstream implementation), then the GPU driver is the only interested party and may as well take responsibility if there's no better alternative. I'd agree that if there was already a "base" infracfg driver doing general system-wide set-and-forget configuration then it would equally well fit in there, but that doesn't seem to be the case. Short of trying to abuse the bp_infracfg data in the mtk-pm-domains driver (which doesn't seem like a particularly pleasant idea), the code to poke a bit into a syscon regmap is going to be pretty much the same wherever we add it. There's already a bit of a pattern for MTK drivers to look up and poke their own infracfg bits directly as needed, so between that and the downstream implementation for this particular bit, leaving it to Panfrost seems like the least surprising option. Cheers, Robin.
Quoting Robin Murphy (2022-01-18 07:01:46) > On 2022-01-18 07:19, Chen-Yu Tsai wrote: > > Hi, > > > > On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote: > >> > >>>> That links to an internal Google issue tracker which I assume has more > >>>> information on the bug. I would appreciate if someone from Google or > >>>> MediaTek could explain what this change actually does and why it's > >>>> necessary on MT8192. > >>>> > >>>> At any rate, this register logically belongs to the MT8192 "infra" clock > >>>> device, so it makes sense to set it there too. This avoids adding any > >>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or > >>>> legacy (kbase). > >>> > >>> Does this really have anything to do with clocks? > >> > >> I have no idea. MediaTek, Google, please explain. > >> > >>> In particular, "ACP" usually refers to the Accelerator Coherency Port > >>> of a CPU cluster or DSU, and given the stated symptom of the issue > >>> affected by it, my first guess would be that this bit might indeed > >>> control routing of GPU traffic either to the ACP or the (presumably > >>> non-coherent) main interconnect. > >> > >> I'd easily believe that. > > > > As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port. > > And the bit in infracfg toggles whether ACP is used or not. > > > > Explanation from MediaTek in verbatim: > > > > ------------------------------------------------------------------------- > > The ACP path on MT8192 is just for experimental only. > > We are not intended to enable ACP by design. > > > > But due to an unexpected operation, it was accidently opened by default. > > So we need a patch to disable the ACP for MT8192. > > ------------------------------------------------------------------------- > > Aha! That's great, thanks ChenYu! > > Stephen, my thinking here is that if this feature controls the GPU > interconnect, and only matters when the GPU is going to be used (as > strongly implied by the downstream implementation), then the GPU driver > is the only interested party and may as well take responsibility if > there's no better alternative. > > I'd agree that if there was already a "base" infracfg driver doing > general system-wide set-and-forget configuration then it would equally > well fit in there, but that doesn't seem to be the case. Wouldn't this first set-and-forget configuration fit that bill? We can't have a "base" driver because why? > Short of trying > to abuse the bp_infracfg data in the mtk-pm-domains driver (which > doesn't seem like a particularly pleasant idea), the code to poke a bit > into a syscon regmap is going to be pretty much the same wherever we add > it. There's already a bit of a pattern for MTK drivers to look up and > poke their own infracfg bits directly as needed, so between that and the > downstream implementation for this particular bit, leaving it to > Panfrost seems like the least surprising option. > I'd prefer we leave the SoC glue out of device drivers for subsystems that really don't want to or need to know about the SoC level details. The GPU driver wants to live life drawing triangles! :) It doesn't want to know that the ACP path didn't work out on some SoC it got plopped down into. And of course GPU is the only interested party, because the SoC glue for the GPU is all messed up so GPU can't operate properly without this bit toggled. I wonder where the fix would end up if this port was shared by more than one driver. Probably back here in the closest thing there is to the SoC driver. It's not as simple as poking bits in some SoC glue IO space unconditionally either. The GPU driver will need to know which SoC is being used and then only poke the bits if the affected SoC is in use. Or we'll have some DT binding update to poke the bit if some syscon property is present in the DT node. Either way, it's a set-and-forget thing, so the GPU driver will now have some set-and-forget logic for one SoC out of many that it supports; do it once at boot, grab a regmap, parse some more stuff to make sure it's needed, poke the bit, release the regmap, finally start drawing. Of course, I won't oppose the mess being moved somewhere outside of the subsystem I maintain ;-) I was mainly curious to understand why the regmap path is proposed.
On 2022-01-19 02:18, Stephen Boyd wrote: > Quoting Robin Murphy (2022-01-18 07:01:46) >> On 2022-01-18 07:19, Chen-Yu Tsai wrote: >>> Hi, >>> >>> On Fri, Jan 14, 2022 at 9:47 PM Alyssa Rosenzweig <alyssa@collabora.com> wrote: >>>> >>>>>> That links to an internal Google issue tracker which I assume has more >>>>>> information on the bug. I would appreciate if someone from Google or >>>>>> MediaTek could explain what this change actually does and why it's >>>>>> necessary on MT8192. >>>>>> >>>>>> At any rate, this register logically belongs to the MT8192 "infra" clock >>>>>> device, so it makes sense to set it there too. This avoids adding any >>>>>> platform-specific hacks to the 3D driver, either mainline (Panfrost) or >>>>>> legacy (kbase). >>>>> >>>>> Does this really have anything to do with clocks? >>>> >>>> I have no idea. MediaTek, Google, please explain. >>>> >>>>> In particular, "ACP" usually refers to the Accelerator Coherency Port >>>>> of a CPU cluster or DSU, and given the stated symptom of the issue >>>>> affected by it, my first guess would be that this bit might indeed >>>>> control routing of GPU traffic either to the ACP or the (presumably >>>>> non-coherent) main interconnect. >>>> >>>> I'd easily believe that. >>> >>> As Robin guessed, "ACP" here does refer to the Accelerator Coherency Port. >>> And the bit in infracfg toggles whether ACP is used or not. >>> >>> Explanation from MediaTek in verbatim: >>> >>> ------------------------------------------------------------------------- >>> The ACP path on MT8192 is just for experimental only. >>> We are not intended to enable ACP by design. >>> >>> But due to an unexpected operation, it was accidently opened by default. >>> So we need a patch to disable the ACP for MT8192. >>> ------------------------------------------------------------------------- >> >> Aha! That's great, thanks ChenYu! >> >> Stephen, my thinking here is that if this feature controls the GPU >> interconnect, and only matters when the GPU is going to be used (as >> strongly implied by the downstream implementation), then the GPU driver >> is the only interested party and may as well take responsibility if >> there's no better alternative. >> >> I'd agree that if there was already a "base" infracfg driver doing >> general system-wide set-and-forget configuration then it would equally >> well fit in there, but that doesn't seem to be the case. > > Wouldn't this first set-and-forget configuration fit that bill? We can't > have a "base" driver because why? Sure, everything has a starting point somewhere, it just means more work for someone to have to do. I'm not that person - I'm just here as a curious reviewer asking questions to help refine the abstraction - so I chose to lean towards the pragmatic side here given what I know about how much Alyssa enjoys kernel development ;) >> Short of trying >> to abuse the bp_infracfg data in the mtk-pm-domains driver (which >> doesn't seem like a particularly pleasant idea), the code to poke a bit >> into a syscon regmap is going to be pretty much the same wherever we add >> it. There's already a bit of a pattern for MTK drivers to look up and >> poke their own infracfg bits directly as needed, so between that and the >> downstream implementation for this particular bit, leaving it to >> Panfrost seems like the least surprising option. >> > > I'd prefer we leave the SoC glue out of device drivers for subsystems > that really don't want to or need to know about the SoC level details. > The GPU driver wants to live life drawing triangles! :) It doesn't want > to know that the ACP path didn't work out on some SoC it got plopped > down into. And of course GPU is the only interested party, because the > SoC glue for the GPU is all messed up so GPU can't operate properly > without this bit toggled. I wonder where the fix would end up if this > port was shared by more than one driver. Probably back here in the > closest thing there is to the SoC driver. As I hoped to imply, I agree that that's a perfectly valid line of reasoning too. However it does gloss over certain other considerations like managing dependencies between the drivers such that it's not too cryptic for a user to configure a kernel that actually works as expected, and the GPU driver has a guarantee that the configuration really has been done by the point that it wants to start DMA, for instance. > It's not as simple as poking bits in some SoC glue IO space > unconditionally either. The GPU driver will need to know which SoC is > being used and then only poke the bits if the affected SoC is in use. Or > we'll have some DT binding update to poke the bit if some syscon > property is present in the DT node. Either way, it's a set-and-forget > thing, so the GPU driver will now have some set-and-forget logic for one > SoC out of many that it supports; do it once at boot, grab a regmap, > parse some more stuff to make sure it's needed, poke the bit, release > the regmap, finally start drawing. In this case we do happen to have this handy function called panfrost_probe() which already deals with one-off startup stuff :P We also already have SoC-specific GPU compatibles because even without experimental interconnect easter eggs, people integrate these IPs in fairly involved ways and there's a fair degree of variety. However unless we want to be super-strict it's also not too hard to simply assume that if we can find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit within it, and if we can't then we don't. > Of course, I won't oppose the mess being moved somewhere outside of the > subsystem I maintain ;-) I was mainly curious to understand why the > regmap path is proposed. Well, regmap because it's a syscon, so whoever's accessing it that should be via its existing regmap rather than going behind its back. To be fair, there is a nascent infracfg "driver" already (even if it's just two helper functions), so adding some new infrastructure in there is a clear possibility - the functionally-similar Rockchip GRF already has something comparable, for example - it's just somewhat more code and more work thinking through the additional reasoning, compared to piling SoC-specific GPU-related stuff into the place that already knows about SoC-specific GPU stuff. As things stand, if someone *is* prepared to take that on then it's fine by me! FWIW, I have no desire to look more closely at the downstream driver, but I did notice in the context of the linked patch that there appeared to be some power-management-looking stuff as well as this magic bit, so if it's possible that that might be something we care about in future and mean we end up needing to poke syscons from Panfrost anyway, it might want factoring in to the decision. Cheers, Robin.
> We also already have SoC-specific GPU compatibles because even without > experimental interconnect easter eggs, people integrate these IPs in fairly > involved ways and there's a fair degree of variety. However unless we want > to be super-strict it's also not too hard to simply assume that if we can > find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit > within it, and if we can't then we don't. We need a MT8192-specific compatible for the GPU anyway due to "unique" power management requirements, this is why the MT8183 before it has a specific GPU compatible. So I'm not worried about the compatible.
Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto: >> We also already have SoC-specific GPU compatibles because even without >> experimental interconnect easter eggs, people integrate these IPs in fairly >> involved ways and there's a fair degree of variety. However unless we want >> to be super-strict it's also not too hard to simply assume that if we can >> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit >> within it, and if we can't then we don't. > > We need a MT8192-specific compatible for the GPU anyway due to "unique" > power management requirements, this is why the MT8183 before it has a > specific GPU compatible. So I'm not worried about the compatible. > Thing is, as it was explained, this is about a unwanted SoC misconfiguration, hence this is very specific to one SoC, which *happens to* integrate a Mali GPU. I agree with Stephen's reasoning - also in my opinion, the panfrost driver should be dedicated to managing the Mali GPUs and *not* the SoCs on which it is present, so disabling the Accelerator Coherency Port for MFG should be performed inside of files that are dealing with the specific SoC that requires this configuration (or, if you want, quirk). Simply put, though, as you already perfectly know, there is no driver that is dedicated to exclusively manage the "extra" INFRA bits, so here's what I've been thinking for a while; my logical reasoning: - Doing it in the IOMMU driver may seem at a first glance to make some sense, but then, does this really have anything to do with the IOMMU? I don't think so; - Performing the disablement in mtk-pm-domains is very shady... there's nothing that screams "power" in that; - This doesn't scream "clocks" either, I understand that; - As far as I understand this specific thing won't happen anymore (or at least, not in MediaTek land, but I also don't expect to see this on other SoCs). Getting back to MediaTek-land, only MT8192 is (and will ever be) affected, from what I understand... and there is one driver that is very specific to, targets only, and would probe only on MT8192 - which also happens to manage the very same iospace that we also want to poke at to disable this bit...... ... clk-mt8192! For this reason, I think that (unfortunately, to some extent) the most transparent and cleanest approach is the one that Alyssa took, which is to perform this set-and-forget operation there - also keeping in mind that panfrost obviously has no way to finish probing (hence, no way to actually use the GPU device) *before* the driver that provides clocks to it also probes and registers .. the clocks. My personal view on this would imply that Alyssa sends a v2 of this commit that includes MediaTek's explanation on why the ACP has to be disabled (as much as nicely expanding the ACP acronym as a documentation effort). I'm sorry for this wall of text (and for boring you with it! :P), but I felt like giving an extensive personal opinion on this would've been nice. Thank you all! Angelo
On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote: > Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto: >>> We also already have SoC-specific GPU compatibles because even without >>> experimental interconnect easter eggs, people integrate these IPs in >>> fairly >>> involved ways and there's a fair degree of variety. However unless we >>> want >>> to be super-strict it's also not too hard to simply assume that if we >>> can >>> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic >>> bit >>> within it, and if we can't then we don't. >> >> We need a MT8192-specific compatible for the GPU anyway due to "unique" >> power management requirements, this is why the MT8183 before it has a >> specific GPU compatible. So I'm not worried about the compatible. >> > > Thing is, as it was explained, this is about a unwanted SoC > misconfiguration, > hence this is very specific to one SoC, which *happens to* integrate a > Mali GPU. > > I agree with Stephen's reasoning - also in my opinion, the panfrost > driver should > be dedicated to managing the Mali GPUs and *not* the SoCs on which it is > present, > so disabling the Accelerator Coherency Port for MFG should be performed > inside of > files that are dealing with the specific SoC that requires this > configuration (or, > if you want, quirk). > > Simply put, though, as you already perfectly know, there is no driver > that is > dedicated to exclusively manage the "extra" INFRA bits, so here's what > I've been > thinking for a while; my logical reasoning: > - Doing it in the IOMMU driver may seem at a first glance to make some > sense, > but then, does this really have anything to do with the IOMMU? I > don't think so; > - Performing the disablement in mtk-pm-domains is very shady... there's > nothing > that screams "power" in that; > - This doesn't scream "clocks" either, I understand that; > - As far as I understand this specific thing won't happen anymore (or at > least, > not in MediaTek land, but I also don't expect to see this on other > SoCs). > > Getting back to MediaTek-land, only MT8192 is (and will ever be) > affected, from > what I understand... and there is one driver that is very specific to, > targets > only, and would probe only on MT8192 - which also happens to manage the > very same > iospace that we also want to poke at to disable this bit...... > > ... clk-mt8192! The only trouble with that argument is that it falls apart under the slightest scrutiny ;) The relevant parts of the power domain, IOMMU and GPU drivers are every bit as SoC-specific as the clock driver, and it is the generic syscon driver which owns and manages the actual MMIO space. All in all this has now convinced me that it *is* worth the slight extra effort to put random infracfg stuff in the random infracfg stuff place. And by "slight" I mean it turns out I've spent longer writing this prose than bashing out the illustrative diff below, which is the last opinion I shall have on this matter. Feel free to take it and fix it, or do anything else if you prefer :) Cheers, Robin. ----->8----- diff --git a/drivers/soc/mediatek/mtk-infracfg.c b/drivers/soc/mediatek/mtk-infracfg.c index 0590b68e0d78..af22d0f3d547 100644 --- a/drivers/soc/mediatek/mtk-infracfg.c +++ b/drivers/soc/mediatek/mtk-infracfg.c @@ -72,3 +72,20 @@ int mtk_infracfg_clear_bus_protection(struct regmap *infracfg, u32 mask, return ret; } + +static int __init mtk_infracfg_init(void) +{ + struct regmap *infracfg; + + /* + * MT8192 has an experimental path to route GPU traffic to the DSU's + * Accelerator Coherency Port, which is inadvertently enabled by + * default. It turns out not to work very well, so disable it. + */ + infracfg = syscon_regmap_lookup_by_compatible("mediatek,mt8192-infracfg"); + if (infracfg) + regmap_set_bits(infracfg, MT8192_INFRA_CTRL, + MT8192_INFRA_CTRL_DISABLE_MFG2ACP); + return 0; +} +postcore_initcall(mtk_infracfg_init); diff --git a/include/linux/soc/mediatek/infracfg.h b/include/linux/soc/mediatek/infracfg.h index 4615a228da51..94f0f338ce49 100644 --- a/include/linux/soc/mediatek/infracfg.h +++ b/include/linux/soc/mediatek/infracfg.h @@ -147,6 +147,9 @@ #define INFRA_TOPAXI_PROTECTEN_SET 0x0260 #define INFRA_TOPAXI_PROTECTEN_CLR 0x0264 +#define MT8192_INFRA_CTRL 0x290 +#define MT8192_INFRA_CTRL_DISABLE_MFG2ACP BIT(9) + #define REG_INFRA_MISC 0xf00 #define F_DDR_4GB_SUPPORT_EN BIT(13)
Il 15/02/22 16:21, Robin Murphy ha scritto: > On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote: >> Il 20/01/22 15:27, Alyssa Rosenzweig ha scritto: >>>> We also already have SoC-specific GPU compatibles because even without >>>> experimental interconnect easter eggs, people integrate these IPs in fairly >>>> involved ways and there's a fair degree of variety. However unless we want >>>> to be super-strict it's also not too hard to simply assume that if we can >>>> find a "mediatek,mt8192-infracfg" syscon then we set the MT8192 magic bit >>>> within it, and if we can't then we don't. >>> >>> We need a MT8192-specific compatible for the GPU anyway due to "unique" >>> power management requirements, this is why the MT8183 before it has a >>> specific GPU compatible. So I'm not worried about the compatible. >>> >> >> Thing is, as it was explained, this is about a unwanted SoC misconfiguration, >> hence this is very specific to one SoC, which *happens to* integrate a Mali GPU. >> >> I agree with Stephen's reasoning - also in my opinion, the panfrost driver should >> be dedicated to managing the Mali GPUs and *not* the SoCs on which it is present, >> so disabling the Accelerator Coherency Port for MFG should be performed inside of >> files that are dealing with the specific SoC that requires this configuration (or, >> if you want, quirk). >> >> Simply put, though, as you already perfectly know, there is no driver that is >> dedicated to exclusively manage the "extra" INFRA bits, so here's what I've been >> thinking for a while; my logical reasoning: >> - Doing it in the IOMMU driver may seem at a first glance to make some sense, >> but then, does this really have anything to do with the IOMMU? I don't think so; >> - Performing the disablement in mtk-pm-domains is very shady... there's nothing >> that screams "power" in that; >> - This doesn't scream "clocks" either, I understand that; >> - As far as I understand this specific thing won't happen anymore (or at least, >> not in MediaTek land, but I also don't expect to see this on other SoCs). >> >> Getting back to MediaTek-land, only MT8192 is (and will ever be) affected, from >> what I understand... and there is one driver that is very specific to, targets >> only, and would probe only on MT8192 - which also happens to manage the very same >> iospace that we also want to poke at to disable this bit...... >> >> ... clk-mt8192! > > The only trouble with that argument is that it falls apart under the > slightest scrutiny ;) > > The relevant parts of the power domain, IOMMU and GPU drivers are every > bit as SoC-specific as the clock driver, and it is the generic syscon > driver which owns and manages the actual MMIO space. > > All in all this has now convinced me that it *is* worth the slight extra > effort to put random infracfg stuff in the random infracfg stuff place. > And by "slight" I mean it turns out I've spent longer writing this prose > than bashing out the illustrative diff below, which is the last opinion > I shall have on this matter. Feel free to take it and fix it, or do anything else > if you prefer :) > Hello Robin, I am a bit ashamed to admit that I haven't fully considered the option of putting the random infracfg stuff in the random infracfg stuff place. This solution looks like being the most sensible thing that we can actually do to patch up this bit. Thanks a lot for the advice!! Cheers, Angelo > Cheers, > Robin. > > ----->8----- > diff --git a/drivers/soc/mediatek/mtk-infracfg.c b/drivers/soc/mediatek/mtk-infracfg.c > index 0590b68e0d78..af22d0f3d547 100644 > --- a/drivers/soc/mediatek/mtk-infracfg.c > +++ b/drivers/soc/mediatek/mtk-infracfg.c > @@ -72,3 +72,20 @@ int mtk_infracfg_clear_bus_protection(struct regmap *infracfg, > u32 mask, > > return ret; > } > + > +static int __init mtk_infracfg_init(void) > +{ > + struct regmap *infracfg; > + > + /* > + * MT8192 has an experimental path to route GPU traffic to the DSU's > + * Accelerator Coherency Port, which is inadvertently enabled by > + * default. It turns out not to work very well, so disable it. > + */ > + infracfg = syscon_regmap_lookup_by_compatible("mediatek,mt8192-infracfg"); > + if (infracfg) > + regmap_set_bits(infracfg, MT8192_INFRA_CTRL, > + MT8192_INFRA_CTRL_DISABLE_MFG2ACP); > + return 0; > +} > +postcore_initcall(mtk_infracfg_init); > diff --git a/include/linux/soc/mediatek/infracfg.h > b/include/linux/soc/mediatek/infracfg.h > index 4615a228da51..94f0f338ce49 100644 > --- a/include/linux/soc/mediatek/infracfg.h > +++ b/include/linux/soc/mediatek/infracfg.h > @@ -147,6 +147,9 @@ > #define INFRA_TOPAXI_PROTECTEN_SET 0x0260 > #define INFRA_TOPAXI_PROTECTEN_CLR 0x0264 > > +#define MT8192_INFRA_CTRL 0x290 > +#define MT8192_INFRA_CTRL_DISABLE_MFG2ACP BIT(9) > + > #define REG_INFRA_MISC 0xf00 > #define F_DDR_4GB_SUPPORT_EN BIT(13) >
Quoting Robin Murphy (2022-02-15 07:21:52) > On 2022-02-15 10:44, AngeloGioacchino Del Regno wrote: > > All in all this has now convinced me that it *is* worth the slight extra > effort to put random infracfg stuff in the random infracfg stuff place. > And by "slight" I mean it turns out I've spent longer writing this prose > than bashing out the illustrative diff below, which is the last opinion > I shall have on this matter. Feel free to take it and fix it, or do > anything else if you prefer :) > > Cheers, > Robin. > Acked-by: Stephen Boyd <sboyd@kernel.org>
diff --git a/drivers/clk/mediatek/clk-mt8192.c b/drivers/clk/mediatek/clk-mt8192.c index cbc7c6dbe0f4..e3673494d08d 100644 --- a/drivers/clk/mediatek/clk-mt8192.c +++ b/drivers/clk/mediatek/clk-mt8192.c @@ -1179,6 +1179,10 @@ static const struct mtk_pll_data plls[] = { static struct clk_onecell_data *top_clk_data; +/* Control registers in the infra block used to set a chicken bit */ +#define INFRA_CTRL 0x290 +#define INFRA_CTRL_DISABLE_MFG2ACP BIT(9) + static void clk_mt8192_top_init_early(struct device_node *node) { int i; @@ -1224,6 +1228,29 @@ static int clk_mt8192_top_probe(struct platform_device *pdev) return of_clk_add_provider(node, of_clk_src_onecell_get, top_clk_data); } +/* + * Disable ACP on the infra clock. Setting this quirk is required for 3D to + * work correctly. Without this quirk, any work queued to the Mali GPU faults, + * for example raising a Data Invalid Fault. This suggests the GPU is failing + * to read back the contents of shared CPU/GPU memory correctly, perhaps due to + * a MT8192 platform integration issue breaking memory or caches. + * + * Relevant downstream change: + * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 + */ +static int clk_mt8192_infra_disable_mfg2acp(struct platform_device *pdev) +{ + void __iomem *base = devm_platform_ioremap_resource(pdev, 0); + void __iomem *infra_ctrl = base + INFRA_CTRL; + + if (IS_ERR(base)) + return PTR_ERR(base); + + writel(readl(infra_ctrl) | INFRA_CTRL_DISABLE_MFG2ACP, infra_ctrl); + + return 0; +} + static int clk_mt8192_infra_probe(struct platform_device *pdev) { struct clk_onecell_data *clk_data; @@ -1238,6 +1265,10 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev) if (r) return r; + r = clk_mt8192_infra_disable_mfg2acp(pdev); + if (r) + return r; + return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); }
Set a mysterious chicken bit in the MT8192 clock driver (!) to get the Mali GPU on MT8192 to work. This workaround is from the downstream Mali driver shipped in ChromeOS. The change there is unsuitable for mainline but good as a reference for the hardware behaviour: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 That links to an internal Google issue tracker which I assume has more information on the bug. I would appreciate if someone from Google or MediaTek could explain what this change actually does and why it's necessary on MT8192. At any rate, this register logically belongs to the MT8192 "infra" clock device, so it makes sense to set it there too. This avoids adding any platform-specific hacks to the 3D driver, either mainline (Panfrost) or legacy (kbase). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Nick Fan <Nick.Fan@mediatek.com> Cc: Nicolas Boichat <drinkcat@chromium.org> --- drivers/clk/mediatek/clk-mt8192.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)