mbox series

[v2,00/18] i.MX8MM GPC improvements and BLK_CTRL driver

Message ID 20210721204703.1424034-1-l.stach@pengutronix.de (mailing list archive)
Headers show
Series i.MX8MM GPC improvements and BLK_CTRL driver | expand

Message

Lucas Stach July 21, 2021, 8:46 p.m. UTC
Hi all,

second revision of the GPC improvements and BLK_CTRL driver to make use
of all the power-domains on the i.MX8MM. I'm not going to repeat the full
blurb from the v1 cover letter here, but if you are not familiar with
i.MX8MM power domains, it may be worth a read.

This 2nd revision fixes the DT bindings to be valid yaml, some small
failure path issues and most importantly the interaction with system
suspend/resume. With the previous version some of the power domains
would not come up correctly after a suspend/resume cycle.

Updated testing git trees here, disclaimer still applies:
https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains
https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing

Regards,
Lucas

Frieder Schrempf (1):
  arm64: dts: imx8mm: Add GPU nodes for 2D and 3D core

Lucas Stach (15):
  Revert "soc: imx: gpcv2: move reset assert after requesting domain
    power up"
  soc: imx: gpcv2: add lockdep annotation
  soc: imx: gpcv2: add domain option to keep domain clocks enabled
  soc: imx: gpcv2: keep i.MX8M* bus clocks enabled
  soc: imx: gpcv2: support system suspend/resume
  dt-bindings: soc: add binding for i.MX8MM VPU blk-ctrl
  dt-bindings: power: imx8mm: add defines for VPU blk-ctrl domains
  soc: imx: add i.MX8M blk-ctrl driver
  dt-bindings: soc: add binding for i.MX8MM DISP blk-ctrl
  dt-bindings: power: imx8mm: add defines for DISP blk-ctrl domains
  soc: imx: imx8m-blk-ctrl: add DISP blk-ctrl
  arm64: dts: imx8mm: add GPC node
  arm64: dts: imx8mm: put USB controllers into power-domains
  arm64: dts: imx8mm: add VPU blk-ctrl
  arm64: dts: imx8mm: add DISP blk-ctrl

Marek Vasut (2):
  soc: imx: gpcv2: Turn domain->pgc into bitfield
  soc: imx: gpcv2: Set both GPC_PGC_nCTRL(GPU_2D|GPU_3D) for MX8MM GPU
    domain

 .../soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml     |  94 ++++
 .../soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml      |  76 +++
 arch/arm64/boot/dts/freescale/imx8mm.dtsi     | 180 ++++++
 drivers/soc/imx/Makefile                      |   1 +
 drivers/soc/imx/gpcv2.c                       | 130 +++--
 drivers/soc/imx/imx8m-blk-ctrl.c              | 525 ++++++++++++++++++
 include/dt-bindings/power/imx8mm-power.h      |   9 +
 7 files changed, 974 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml
 create mode 100644 Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml
 create mode 100644 drivers/soc/imx/imx8m-blk-ctrl.c

Comments

Peng Fan (OSS) Aug. 5, 2021, 9:35 a.m. UTC | #1
> Subject: [PATCH v2 00/18] i.MX8MM GPC improvements and BLK_CTRL driver
> 
> Hi all,
> 
> second revision of the GPC improvements and BLK_CTRL driver to make use
> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
> blurb from the v1 cover letter here, but if you are not familiar with i.MX8MM
> power domains, it may be worth a read.
> 
> This 2nd revision fixes the DT bindings to be valid yaml, some small failure
> path issues and most importantly the interaction with system
> suspend/resume. With the previous version some of the power domains
> would not come up correctly after a suspend/resume cycle.

Thanks for the work. I gave a test, boot and suspend/resume work with display.

Tested-by: Peng Fan <peng.fan@nxp.com>

> 
> Updated testing git trees here, disclaimer still applies:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pen
> gutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domai
> ns&amp;data=04%7C01%7Cpeng.fan%40nxp.com%7C3ef1698b8c53454da41
> 808d94c88b577%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 7624972323848567%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata
> =PbhVVIqDcUMtMmurwpp2PoSYaAzXgRKVvBccd%2BL26oc%3D&amp;reserv
> ed=0
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pen
> gutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domai
> ns-testing&amp;data=04%7C01%7Cpeng.fan%40nxp.com%7C3ef1698b8c534
> 54da41808d94c88b577%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C
> 0%7C637624972323848567%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a
> mp;sdata=rAuBbsQ5%2FpZJhuocWmapgNwERxat0IQsRfBiQpeJzuk%3D&amp;
> reserved=0
> 
> Regards,
> Lucas
> 
> Frieder Schrempf (1):
>   arm64: dts: imx8mm: Add GPU nodes for 2D and 3D core
> 
> Lucas Stach (15):
>   Revert "soc: imx: gpcv2: move reset assert after requesting domain
>     power up"
>   soc: imx: gpcv2: add lockdep annotation
>   soc: imx: gpcv2: add domain option to keep domain clocks enabled
>   soc: imx: gpcv2: keep i.MX8M* bus clocks enabled
>   soc: imx: gpcv2: support system suspend/resume
>   dt-bindings: soc: add binding for i.MX8MM VPU blk-ctrl
>   dt-bindings: power: imx8mm: add defines for VPU blk-ctrl domains
>   soc: imx: add i.MX8M blk-ctrl driver
>   dt-bindings: soc: add binding for i.MX8MM DISP blk-ctrl
>   dt-bindings: power: imx8mm: add defines for DISP blk-ctrl domains
>   soc: imx: imx8m-blk-ctrl: add DISP blk-ctrl
>   arm64: dts: imx8mm: add GPC node
>   arm64: dts: imx8mm: put USB controllers into power-domains
>   arm64: dts: imx8mm: add VPU blk-ctrl
>   arm64: dts: imx8mm: add DISP blk-ctrl
> 
> Marek Vasut (2):
>   soc: imx: gpcv2: Turn domain->pgc into bitfield
>   soc: imx: gpcv2: Set both GPC_PGC_nCTRL(GPU_2D|GPU_3D) for MX8MM
> GPU
>     domain
> 
>  .../soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml     |  94 ++++
>  .../soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml      |  76 +++
>  arch/arm64/boot/dts/freescale/imx8mm.dtsi     | 180 ++++++
>  drivers/soc/imx/Makefile                      |   1 +
>  drivers/soc/imx/gpcv2.c                       | 130 +++--
>  drivers/soc/imx/imx8m-blk-ctrl.c              | 525
> ++++++++++++++++++
>  include/dt-bindings/power/imx8mm-power.h      |   9 +
>  7 files changed, 974 insertions(+), 41 deletions(-)  create mode 100644
> Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml
>  create mode 100644
> Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml
>  create mode 100644 drivers/soc/imx/imx8m-blk-ctrl.c
> 
> --
> 2.30.2
Frieder Schrempf Aug. 5, 2021, 10:18 a.m. UTC | #2
On 21.07.21 22:46, Lucas Stach wrote:
> Hi all,
> 
> second revision of the GPC improvements and BLK_CTRL driver to make use
> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
> blurb from the v1 cover letter here, but if you are not familiar with
> i.MX8MM power domains, it may be worth a read.
> 
> This 2nd revision fixes the DT bindings to be valid yaml, some small
> failure path issues and most importantly the interaction with system
> suspend/resume. With the previous version some of the power domains
> would not come up correctly after a suspend/resume cycle.
> 
> Updated testing git trees here, disclaimer still applies:
> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains
> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing

I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!

I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
Frieder Schrempf Aug. 5, 2021, 6:56 p.m. UTC | #3
On 05.08.21 12:18, Frieder Schrempf wrote:
> On 21.07.21 22:46, Lucas Stach wrote:
>> Hi all,
>>
>> second revision of the GPC improvements and BLK_CTRL driver to make use
>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>> blurb from the v1 cover letter here, but if you are not familiar with
>> i.MX8MM power domains, it may be worth a read.
>>
>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>> failure path issues and most importantly the interaction with system
>> suspend/resume. With the previous version some of the power domains
>> would not come up correctly after a suspend/resume cycle.
>>
>> Updated testing git trees here, disclaimer still applies:
>> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains
>> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing
> 
> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
> 
> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
> 

Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.

Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.

If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.

And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:

#!/bin/sh

glmark2-es2-drm &

while true;
do
    echo +10 > /sys/class/rtc/rtc0/wakealarm
    echo mem > /sys/power/state
    sleep 5
done;
Lucas Stach Aug. 9, 2021, 11:01 a.m. UTC | #4
Hi Frieder,

Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
> On 05.08.21 12:18, Frieder Schrempf wrote:
> > On 21.07.21 22:46, Lucas Stach wrote:
> > > Hi all,
> > > 
> > > second revision of the GPC improvements and BLK_CTRL driver to make use
> > > of all the power-domains on the i.MX8MM. I'm not going to repeat the full
> > > blurb from the v1 cover letter here, but if you are not familiar with
> > > i.MX8MM power domains, it may be worth a read.
> > > 
> > > This 2nd revision fixes the DT bindings to be valid yaml, some small
> > > failure path issues and most importantly the interaction with system
> > > suspend/resume. With the previous version some of the power domains
> > > would not come up correctly after a suspend/resume cycle.
> > > 
> > > Updated testing git trees here, disclaimer still applies:
> > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains
> > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing
> > 
> > I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
> > 
> > I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
> > 
> 
> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
> 
> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
> 
> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
> 
> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
> 
> #!/bin/sh
> 
> glmark2-es2-drm &
> 
> while true;
> do
>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>     echo mem > /sys/power/state
>     sleep 5
> done;

Hm, that's unfortunate.

I'm back from a two week vacation, but it looks like I won't have much
time available to look into this issue soon. It would be very helpful
if you could try to pinpoint the hang a bit more.  If you can reproduce
the hang with no_console_suspend you might be able to extract a bit
more info in which stage the hang happens (suspend, resume, TF-A, etc.)
If the hang is in the kernel you might be able to add some prints to
the suspend/resume paths to be able to track down the exact point of
the hang.

I'm happy to look into the issue once it's better known where to look,
but I fear that I won't have time to do the above investigation myself
short term. Frieder, is this something you could help with over the
next few days?

Regards,
Lucas
Frieder Schrempf Aug. 9, 2021, 11:50 a.m. UTC | #5
On 09.08.21 13:01, Lucas Stach wrote:
> Hi Frieder,
> 
> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
>> On 05.08.21 12:18, Frieder Schrempf wrote:
>>> On 21.07.21 22:46, Lucas Stach wrote:
>>>> Hi all,
>>>>
>>>> second revision of the GPC improvements and BLK_CTRL driver to make use
>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>>>> blurb from the v1 cover letter here, but if you are not familiar with
>>>> i.MX8MM power domains, it may be worth a read.
>>>>
>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>>>> failure path issues and most importantly the interaction with system
>>>> suspend/resume. With the previous version some of the power domains
>>>> would not come up correctly after a suspend/resume cycle.
>>>>
>>>> Updated testing git trees here, disclaimer still applies:
>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&amp;reserved=0
>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&amp;reserved=0
>>>
>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
>>>
>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
>>>
>>
>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
>>
>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
>>
>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
>>
>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>
>> #!/bin/sh
>>
>> glmark2-es2-drm &
>>
>> while true;
>> do
>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>     echo mem > /sys/power/state
>>     sleep 5
>> done;
> 
> Hm, that's unfortunate.
> 
> I'm back from a two week vacation, but it looks like I won't have much
> time available to look into this issue soon. It would be very helpful
> if you could try to pinpoint the hang a bit more.  If you can reproduce
> the hang with no_console_suspend you might be able to extract a bit
> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
> If the hang is in the kernel you might be able to add some prints to
> the suspend/resume paths to be able to track down the exact point of
> the hang.
> 
> I'm happy to look into the issue once it's better known where to look,
> but I fear that I won't have time to do the above investigation myself
> short term. Frieder, is this something you could help with over the
> next few days?

I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.

@Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?

On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
Adam Ford Aug. 9, 2021, 6:51 p.m. UTC | #6
On Mon, Aug 9, 2021 at 6:50 AM Frieder Schrempf
<frieder.schrempf@kontron.de> wrote:
>
> On 09.08.21 13:01, Lucas Stach wrote:
> > Hi Frieder,
> >
> > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
> >> On 05.08.21 12:18, Frieder Schrempf wrote:
> >>> On 21.07.21 22:46, Lucas Stach wrote:
> >>>> Hi all,
> >>>>
> >>>> second revision of the GPC improvements and BLK_CTRL driver to make use
> >>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
> >>>> blurb from the v1 cover letter here, but if you are not familiar with
> >>>> i.MX8MM power domains, it may be worth a read.
> >>>>
> >>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
> >>>> failure path issues and most importantly the interaction with system
> >>>> suspend/resume. With the previous version some of the power domains
> >>>> would not come up correctly after a suspend/resume cycle.
> >>>>
> >>>> Updated testing git trees here, disclaimer still applies:
> >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&amp;reserved=0
> >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&amp;reserved=0
> >>>
> >>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
> >>>
> >>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
> >>>
> >>
> >> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
> >>
> >> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
> >>
> >> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
> >>
> >> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
> >>
> >> #!/bin/sh
> >>
> >> glmark2-es2-drm &
> >>
> >> while true;
> >> do
> >>     echo +10 > /sys/class/rtc/rtc0/wakealarm
> >>     echo mem > /sys/power/state
> >>     sleep 5
> >> done;
> >
> > Hm, that's unfortunate.
> >
> > I'm back from a two week vacation, but it looks like I won't have much
> > time available to look into this issue soon. It would be very helpful
> > if you could try to pinpoint the hang a bit more.  If you can reproduce
> > the hang with no_console_suspend you might be able to extract a bit
> > more info in which stage the hang happens (suspend, resume, TF-A, etc.)
> > If the hang is in the kernel you might be able to add some prints to
> > the suspend/resume paths to be able to track down the exact point of
> > the hang.
> >
> > I'm happy to look into the issue once it's better known where to look,
> > but I fear that I won't have time to do the above investigation myself
> > short term. Frieder, is this something you could help with over the
> > next few days?
>
> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
>
> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?

right now i am on medical leave due to a broken wrist, and i wont be
able to help until it heals.

sorry

adam
>
> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
Tim Harvey Aug. 30, 2021, 10:06 p.m. UTC | #7
On Mon, Aug 9, 2021 at 4:01 AM Lucas Stach <l.stach@pengutronix.de> wrote:
>
> Hi Frieder,
>
> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
> > On 05.08.21 12:18, Frieder Schrempf wrote:
> > > On 21.07.21 22:46, Lucas Stach wrote:
> > > > Hi all,
> > > >
> > > > second revision of the GPC improvements and BLK_CTRL driver to make use
> > > > of all the power-domains on the i.MX8MM. I'm not going to repeat the full
> > > > blurb from the v1 cover letter here, but if you are not familiar with
> > > > i.MX8MM power domains, it may be worth a read.
> > > >
> > > > This 2nd revision fixes the DT bindings to be valid yaml, some small
> > > > failure path issues and most importantly the interaction with system
> > > > suspend/resume. With the previous version some of the power domains
> > > > would not come up correctly after a suspend/resume cycle.
> > > >
> > > > Updated testing git trees here, disclaimer still applies:
> > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains
> > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing
> > >
> > > I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
> > >
> > > I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
> > >
> >
> > Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
> >
> > Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
> >
> > If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
> >
> > And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
> >
> > #!/bin/sh
> >
> > glmark2-es2-drm &
> >
> > while true;
> > do
> >     echo +10 > /sys/class/rtc/rtc0/wakealarm
> >     echo mem > /sys/power/state
> >     sleep 5
> > done;
>
> Hm, that's unfortunate.
>
> I'm back from a two week vacation, but it looks like I won't have much
> time available to look into this issue soon. It would be very helpful
> if you could try to pinpoint the hang a bit more.  If you can reproduce
> the hang with no_console_suspend you might be able to extract a bit
> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
> If the hang is in the kernel you might be able to add some prints to
> the suspend/resume paths to be able to track down the exact point of
> the hang.
>
> I'm happy to look into the issue once it's better known where to look,
> but I fear that I won't have time to do the above investigation myself
> short term. Frieder, is this something you could help with over the
> next few days?
>

Lucas / Frieder,

Can you update us on where you are at with this patch series? I fear
we are going to go through another kernel release without IMX8MM
blk-ctl support and all the things that depend on it such as
USB/PCI/DSI/CSI/GPU/VPU. If there is some specific testing you need
please let me know what I can do to help. I have a variety of IMX8MM
hardware but not a lot of time or knowledge with regards to
troubleshooting suspend/resume issues.

Are the issues found a regression?

Best regards,

Tim
Frieder Schrempf Sept. 1, 2021, 10:03 a.m. UTC | #8
Hi Lucas,

On 09.08.21 13:50, Frieder Schrempf wrote:
> On 09.08.21 13:01, Lucas Stach wrote:
>> Hi Frieder,
>>
>> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
>>> On 05.08.21 12:18, Frieder Schrempf wrote:
>>>> On 21.07.21 22:46, Lucas Stach wrote:
>>>>> Hi all,
>>>>>
>>>>> second revision of the GPC improvements and BLK_CTRL driver to make use
>>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>>>>> blurb from the v1 cover letter here, but if you are not familiar with
>>>>> i.MX8MM power domains, it may be worth a read.
>>>>>
>>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>>>>> failure path issues and most importantly the interaction with system
>>>>> suspend/resume. With the previous version some of the power domains
>>>>> would not come up correctly after a suspend/resume cycle.
>>>>>
>>>>> Updated testing git trees here, disclaimer still applies:
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=raKaop3FUcsfKMyu13qCeyRKCgkObRuTAc73iQ4BYSI%3D&amp;reserved=0
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bmtM%2FxJ3Y9QpGkMhTDHLrLQ2AD0X7DqbspUMdkS%2B7MY%3D&amp;reserved=0
>>>>
>>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
>>>>
>>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
>>>>
>>>
>>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
>>>
>>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
>>>
>>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
>>>
>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>>
>>> #!/bin/sh
>>>
>>> glmark2-es2-drm &
>>>
>>> while true;
>>> do
>>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>>     echo mem > /sys/power/state
>>>     sleep 5
>>> done;
>>
>> Hm, that's unfortunate.
>>
>> I'm back from a two week vacation, but it looks like I won't have much
>> time available to look into this issue soon. It would be very helpful
>> if you could try to pinpoint the hang a bit more.  If you can reproduce
>> the hang with no_console_suspend you might be able to extract a bit
>> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
>> If the hang is in the kernel you might be able to add some prints to
>> the suspend/resume paths to be able to track down the exact point of
>> the hang.
>>
>> I'm happy to look into the issue once it's better known where to look,
>> but I fear that I won't have time to do the above investigation myself
>> short term. Frieder, is this something you could help with over the
>> next few days?
> 
> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
> 
> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?
> 
> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.

I ran a few more suspend/resume cycles and watched the log. The first
2.5 hours nothing noteworthy happened, except that glmark2 crashed again
at some point.

Then suddenly the following lines were printed while suspending:

  imx-pgc imx-pgc-domain.6: failed to command PGC
  PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110
  imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110
  PM: Some devices failed to suspend, or early wake event detected

After that, the suspending continues to fail with the following on each try:

  PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22
  imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22
  PM: Some devices failed to suspend, or early wake event detected

So far I didn't run into a lockup again with this test, but I will
continue trying to reproduce it and retrieve more information.

Best regards
Frieder
Frieder Schrempf Sept. 1, 2021, 10:30 a.m. UTC | #9
Hi Tim,

On 31.08.21 00:06, Tim Harvey wrote:
> On Mon, Aug 9, 2021 at 4:01 AM Lucas Stach <l.stach@pengutronix.de> wrote:
>>
>> Hi Frieder,
>>
>> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
>>> On 05.08.21 12:18, Frieder Schrempf wrote:
>>>> On 21.07.21 22:46, Lucas Stach wrote:
>>>>> Hi all,
>>>>>
>>>>> second revision of the GPC improvements and BLK_CTRL driver to make use
>>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>>>>> blurb from the v1 cover letter here, but if you are not familiar with
>>>>> i.MX8MM power domains, it may be worth a read.
>>>>>
>>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>>>>> failure path issues and most importantly the interaction with system
>>>>> suspend/resume. With the previous version some of the power domains
>>>>> would not come up correctly after a suspend/resume cycle.
>>>>>
>>>>> Updated testing git trees here, disclaimer still applies:
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C35d8c33691eb4355196c08d96c0281b5%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637659580288796439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XrDOPLcL5D6PYt8ihbhURkuD9bzABOOfP6hJ5x341lM%3D&amp;reserved=0
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C35d8c33691eb4355196c08d96c0281b5%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637659580288796439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9J016OR46KgfdlM4pG%2F5rkO6pT%2FOBwgLTMRqF10it%2Fg%3D&amp;reserved=0
>>>>
>>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
>>>>
>>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
>>>>
>>>
>>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
>>>
>>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
>>>
>>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
>>>
>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>>
>>> #!/bin/sh
>>>
>>> glmark2-es2-drm &
>>>
>>> while true;
>>> do
>>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>>     echo mem > /sys/power/state
>>>     sleep 5
>>> done;
>>
>> Hm, that's unfortunate.
>>
>> I'm back from a two week vacation, but it looks like I won't have much
>> time available to look into this issue soon. It would be very helpful
>> if you could try to pinpoint the hang a bit more.  If you can reproduce
>> the hang with no_console_suspend you might be able to extract a bit
>> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
>> If the hang is in the kernel you might be able to add some prints to
>> the suspend/resume paths to be able to track down the exact point of
>> the hang.
>>
>> I'm happy to look into the issue once it's better known where to look,
>> but I fear that I won't have time to do the above investigation myself
>> short term. Frieder, is this something you could help with over the
>> next few days?
>>
> 
> Lucas / Frieder,
> 
> Can you update us on where you are at with this patch series? I fear
> we are going to go through another kernel release without IMX8MM
> blk-ctl support and all the things that depend on it such as
> USB/PCI/DSI/CSI/GPU/VPU. If there is some specific testing you need
> please let me know what I can do to help. I have a variety of IMX8MM
> hardware but not a lot of time or knowledge with regards to
> troubleshooting suspend/resume issues.

I try to help as good as I can, but unfortunately my time is very
limited and I didn't make much progress in investigating the issue(s) so
far.

If you could do some testing on your side, this would be very
appreciated. It would be good if you could setup a recent kernel with
Lucas' patchset applied and do some supsend/resume cycle testing as
described above. Use 'no_console_suspend' in the cmdline and look for
any error messages in the log or lockups of the device.

You probably also need some users for the PD or BLK-CTRL, like GPU, DSI,
USB, etc. (that's what I currently have enabled). You can find the tree
I'm currently using here:
https://github.com/fschrempf/linux/tree/next-ktn-pd-blk-ctl-lucas.

> Are the issues found a regression?

Regression compared to what? To the v1 patches? I don't think so.

We didn't have any stable solution for BLK-CTRL support so far and what
we have is probably not tested extensively, yet. So I guess it's not
really unexpected that there are still issues, but it's very frustrating
that after all the efforts, there maybe is still something in the HW
that doesn't behave as expected.

Best regards,
Frieder
Frieder Schrempf Sept. 1, 2021, 12:16 p.m. UTC | #10
On 01.09.21 12:03, Frieder Schrempf wrote:
> Hi Lucas,
> 
> On 09.08.21 13:50, Frieder Schrempf wrote:
>> On 09.08.21 13:01, Lucas Stach wrote:
>>> Hi Frieder,
>>>
>>> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
>>>> On 05.08.21 12:18, Frieder Schrempf wrote:
>>>>> On 21.07.21 22:46, Lucas Stach wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> second revision of the GPC improvements and BLK_CTRL driver to make use
>>>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>>>>>> blurb from the v1 cover letter here, but if you are not familiar with
>>>>>> i.MX8MM power domains, it may be worth a read.
>>>>>>
>>>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>>>>>> failure path issues and most importantly the interaction with system
>>>>>> suspend/resume. With the previous version some of the power domains
>>>>>> would not come up correctly after a suspend/resume cycle.
>>>>>>
>>>>>> Updated testing git trees here, disclaimer still applies:
>>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cbf3a4cacd1e047be747b08d96d39e713%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637660917728575954%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0A7jRJH16d3T1S868RHg57csVuDUtgB3lNl2A3QZdus%3D&amp;reserved=0
>>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&amp;data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cbf3a4cacd1e047be747b08d96d39e713%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637660917728575954%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gC5jcC0w3VP4HiJYQMBWD%2FHQzU2rr7KjtGG82Snh4X0%3D&amp;reserved=0
>>>>>
>>>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
>>>>>
>>>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
>>>>>
>>>>
>>>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
>>>>
>>>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
>>>>
>>>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
>>>>
>>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>>>
>>>> #!/bin/sh
>>>>
>>>> glmark2-es2-drm &
>>>>
>>>> while true;
>>>> do
>>>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>>>     echo mem > /sys/power/state
>>>>     sleep 5
>>>> done;
>>>
>>> Hm, that's unfortunate.
>>>
>>> I'm back from a two week vacation, but it looks like I won't have much
>>> time available to look into this issue soon. It would be very helpful
>>> if you could try to pinpoint the hang a bit more.  If you can reproduce
>>> the hang with no_console_suspend you might be able to extract a bit
>>> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
>>> If the hang is in the kernel you might be able to add some prints to
>>> the suspend/resume paths to be able to track down the exact point of
>>> the hang.
>>>
>>> I'm happy to look into the issue once it's better known where to look,
>>> but I fear that I won't have time to do the above investigation myself
>>> short term. Frieder, is this something you could help with over the
>>> next few days?
>>
>> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
>>
>> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?
>>
>> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
> 
> I ran a few more suspend/resume cycles and watched the log. The first
> 2.5 hours nothing noteworthy happened, except that glmark2 crashed again
> at some point.

Facepalm! Of course glmark2 didn't crash, it just doesn't loop endlessly
as I expected it to do, which totally makes sense for a benchmark. Using
--run-forever should do the trick.
Lucas Stach Sept. 2, 2021, 10:25 a.m. UTC | #11
Hi Frieder,

Am Mittwoch, dem 01.09.2021 um 12:03 +0200 schrieb Frieder Schrempf:
[...]
> > > 
> > > > 
> > > > And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
> > > > 
> > > > #!/bin/sh
> > > > 
> > > > glmark2-es2-drm &
> > > > 
> > > > while true;
> > > > do
> > > >     echo +10 > /sys/class/rtc/rtc0/wakealarm
> > > >     echo mem > /sys/power/state
> > > >     sleep 5
> > > > done;
> > > 
> > > Hm, that's unfortunate.
> > > 
> > > I'm back from a two week vacation, but it looks like I won't have much
> > > time available to look into this issue soon. It would be very helpful
> > > if you could try to pinpoint the hang a bit more.  If you can reproduce
> > > the hang with no_console_suspend you might be able to extract a bit
> > > more info in which stage the hang happens (suspend, resume, TF-A, etc.)
> > > If the hang is in the kernel you might be able to add some prints to
> > > the suspend/resume paths to be able to track down the exact point of
> > > the hang.
> > > 
> > > I'm happy to look into the issue once it's better known where to look,
> > > but I fear that I won't have time to do the above investigation myself
> > > short term. Frieder, is this something you could help with over the
> > > next few days?
> > 
> > I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
> > 
> > @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?
> > 
> > On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
> 
> I ran a few more suspend/resume cycles and watched the log. The first
> 2.5 hours nothing noteworthy happened, except that glmark2 crashed again
> at some point.
> 
> Then suddenly the following lines were printed while suspending:
> 
>   imx-pgc imx-pgc-domain.6: failed to command PGC
>   PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110
>   imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110
>   PM: Some devices failed to suspend, or early wake event detected
> 
> After that, the suspending continues to fail with the following on each try:
> 
>   PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22
>   imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22
>   PM: Some devices failed to suspend, or early wake event detected
> 
> So far I didn't run into a lockup again with this test, but I will
> continue trying to reproduce it and retrieve more information.

If you run into this "failed to command PGC" state again, I would be
very interested in the GPC state there. You should be able to dump the
full register state from the GPC regmap in debugfs.

Regards,
Lucas
Frieder Schrempf Sept. 6, 2021, 7:49 a.m. UTC | #12
On 02.09.21 12:25, Lucas Stach wrote:
> Hi Frieder,
> 
> Am Mittwoch, dem 01.09.2021 um 12:03 +0200 schrieb Frieder Schrempf:
> [...]
>>>>
>>>>>
>>>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>>>>
>>>>> #!/bin/sh
>>>>>
>>>>> glmark2-es2-drm &
>>>>>
>>>>> while true;
>>>>> do
>>>>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>>>>     echo mem > /sys/power/state
>>>>>     sleep 5
>>>>> done;
>>>>
>>>> Hm, that's unfortunate.
>>>>
>>>> I'm back from a two week vacation, but it looks like I won't have much
>>>> time available to look into this issue soon. It would be very helpful
>>>> if you could try to pinpoint the hang a bit more.  If you can reproduce
>>>> the hang with no_console_suspend you might be able to extract a bit
>>>> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
>>>> If the hang is in the kernel you might be able to add some prints to
>>>> the suspend/resume paths to be able to track down the exact point of
>>>> the hang.
>>>>
>>>> I'm happy to look into the issue once it's better known where to look,
>>>> but I fear that I won't have time to do the above investigation myself
>>>> short term. Frieder, is this something you could help with over the
>>>> next few days?
>>>
>>> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
>>>
>>> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?
>>>
>>> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
>>
>> I ran a few more suspend/resume cycles and watched the log. The first
>> 2.5 hours nothing noteworthy happened, except that glmark2 crashed again
>> at some point.
>>
>> Then suddenly the following lines were printed while suspending:
>>
>>   imx-pgc imx-pgc-domain.6: failed to command PGC
>>   PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110
>>   imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110
>>   PM: Some devices failed to suspend, or early wake event detected
>>
>> After that, the suspending continues to fail with the following on each try:
>>
>>   PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22
>>   imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22
>>   PM: Some devices failed to suspend, or early wake event detected
>>
>> So far I didn't run into a lockup again with this test, but I will
>> continue trying to reproduce it and retrieve more information.
> 
> If you run into this "failed to command PGC" state again, I would be
> very interested in the GPC state there. You should be able to dump the
> full register state from the GPC regmap in debugfs.

I tried to reproduce this with the same setup for several days now, but
I didn't run into this error again so far. It seems to be something that
occurs only very rarely.

I also got only a single lockup with this board and something like ~40 h
testing in total. On the other hand I have a different board (same
design) that shows the lockups much more often.

I hope I can provide more data soon, but I can't promise anything.