ACPI: EC: Set ec_no_wakeup for Lenovo Go S

Message ID	20250331204442.1727618-1-superm1@kernel.org (mailing list archive)
State	Superseded, archived
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37E2818C011 for <linux-acpi@vger.kernel.org>; Mon, 31 Mar 2025 20:44:50 +0000 (UTC) From: Mario Limonciello <superm1@kernel.org> To: mario.limonciello@amd.com, rafael@kernel.org Cc: Xino JS1 Ni <nijs1@lenovo.com>, Antheas Kapenekakis <lkml@antheas.dev>, linux-acpi@vger.kernel.org Subject: [PATCH] ACPI: EC: Set ec_no_wakeup for Lenovo Go S Date: Mon, 31 Mar 2025 15:44:00 -0500 Message-ID: <20250331204442.1727618-1-superm1@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	ACPI: EC: Set ec_no_wakeup for Lenovo Go S \| expand ACPI: EC: Set ec_no_wakeup for Lenovo Go S

Mario Limonciello March 31, 2025, 8:44 p.m. UTC

From: Mario Limonciello <mario.limonciello@amd.com>

When AC adapter is unplugged or plugged in EC wakes from
HW sleep but APU doesn't enter back into HW sleep.

The reason this hapens is that when APU exits HW sleep the power
rails the EC controls will power up the TCON.  The TCON has a
GPIO that will be toggled during this time.  The GPIO is not marked
as a wakeup source however GPIO controller still has an unserviced
interrupt and it will block entering HW sleep again. Clearing the
GPIO doesn't help, the TCON raises it again until it's been initialized
by i2c-hid.

Fixing this would require TCON F/W changes and it's already broken
in the wild on production hardware.

To avoid triggering this issue add a quirk to avoid letting EC wake
up system at all.  The power button still works properly on this system.

Cc: Xino JS1 Ni <nijs1@lenovo.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Antheas Kapenekakis March 31, 2025, 8:51 p.m. UTC | #1

On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
>
> From: Mario Limonciello <mario.limonciello@amd.com>
>
> When AC adapter is unplugged or plugged in EC wakes from
> HW sleep but APU doesn't enter back into HW sleep.
>
> The reason this hapens is that when APU exits HW sleep the power
> rails the EC controls will power up the TCON.  The TCON has a
> GPIO that will be toggled during this time.  The GPIO is not marked
> as a wakeup source however GPIO controller still has an unserviced
> interrupt and it will block entering HW sleep again. Clearing the
> GPIO doesn't help, the TCON raises it again until it's been initialized
> by i2c-hid.
>
> Fixing this would require TCON F/W changes and it's already broken
> in the wild on production hardware.
>
> To avoid triggering this issue add a quirk to avoid letting EC wake
> up system at all.  The power button still works properly on this system.

Hi Mario,
I reported this issue to you early in January, did all the debugging
for it, found the cause, made this patch, tested it, and finally
deployed it as well. Then sent it to Xino.

Then you pushed back for perfectly valid reasons, and we had a
multi-week long back and forth trying to find the proper cause for
this.

So from my side I do not get why I am just a reported-by here. This is
my patch. We also had a discussion about this out of band.

Antheas

> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index 8db09d81918fb..3c5f34892734e 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>                         DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>                 },
>         },
> +       /*
> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> +        */
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> +               }
> +       },
>         { },
>  };
>
> --
> 2.43.0
>

Antheas Kapenekakis March 31, 2025, 8:53 p.m. UTC | #2

On Mon, 31 Mar 2025 at 22:51, Antheas Kapenekakis <lkml@antheas.dev> wrote:
>
> On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
> >
> > From: Mario Limonciello <mario.limonciello@amd.com>
> >
> > When AC adapter is unplugged or plugged in EC wakes from
> > HW sleep but APU doesn't enter back into HW sleep.
> >
> > The reason this hapens is that when APU exits HW sleep the power
> > rails the EC controls will power up the TCON.  The TCON has a
> > GPIO that will be toggled during this time.  The GPIO is not marked
> > as a wakeup source however GPIO controller still has an unserviced
> > interrupt and it will block entering HW sleep again. Clearing the
> > GPIO doesn't help, the TCON raises it again until it's been initialized
> > by i2c-hid.
> >
> > Fixing this would require TCON F/W changes and it's already broken
> > in the wild on production hardware.
> >
> > To avoid triggering this issue add a quirk to avoid letting EC wake
> > up system at all.  The power button still works properly on this system.
>
> Hi Mario,
> I reported this issue to you early in January, did all the debugging
> for it, found the cause, made this patch, tested it, and finally
> deployed it as well. Then sent it to Xino.
>
> Then you pushed back for perfectly valid reasons, and we had a
> multi-week long back and forth trying to find the proper cause for
> this.
>
> So from my side I do not get why I am just a reported-by here. This is
> my patch. We also had a discussion about this out of band.
>
> Antheas

It is interesting you ended up finding the cause. Which makes
attributing this a bit murkier.

Antheas
>
> > Cc: Xino JS1 Ni <nijs1@lenovo.com>
> > Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > ---
> >  drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> > index 8db09d81918fb..3c5f34892734e 100644
> > --- a/drivers/acpi/ec.c
> > +++ b/drivers/acpi/ec.c
> > @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
> >                         DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
> >                 },
> >         },
> > +       /*
> > +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> > +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> > +        */
> > +       {
> > +               .matches = {
> > +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> > +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> > +               }
> > +       },
> > +       {
> > +               .matches = {
> > +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> > +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> > +               }
> > +       },
> > +       {
> > +               .matches = {
> > +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> > +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> > +               }
> > +       },
> > +       {
> > +               .matches = {
> > +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> > +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> > +               }
> > +       },
> >         { },
> >  };
> >
> > --
> > 2.43.0
> >

Mario Limonciello March 31, 2025, 8:55 p.m. UTC | #3

On 3/31/2025 3:53 PM, Antheas Kapenekakis wrote:
> On Mon, 31 Mar 2025 at 22:51, Antheas Kapenekakis <lkml@antheas.dev> wrote:
>>
>> On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
>>>
>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>
>>> When AC adapter is unplugged or plugged in EC wakes from
>>> HW sleep but APU doesn't enter back into HW sleep.
>>>
>>> The reason this hapens is that when APU exits HW sleep the power
>>> rails the EC controls will power up the TCON.  The TCON has a
>>> GPIO that will be toggled during this time.  The GPIO is not marked
>>> as a wakeup source however GPIO controller still has an unserviced
>>> interrupt and it will block entering HW sleep again. Clearing the
>>> GPIO doesn't help, the TCON raises it again until it's been initialized
>>> by i2c-hid.
>>>
>>> Fixing this would require TCON F/W changes and it's already broken
>>> in the wild on production hardware.
>>>
>>> To avoid triggering this issue add a quirk to avoid letting EC wake
>>> up system at all.  The power button still works properly on this system.
>>
>> Hi Mario,
>> I reported this issue to you early in January, did all the debugging
>> for it, found the cause, made this patch, tested it, and finally
>> deployed it as well. Then sent it to Xino.
>>
>> Then you pushed back for perfectly valid reasons, and we had a
>> multi-week long back and forth trying to find the proper cause for
>> this.
>>
>> So from my side I do not get why I am just a reported-by here. This is
>> my patch. We also had a discussion about this out of band.
>>
>> Antheas
> 
> It is interesting you ended up finding the cause. Which makes
> attributing this a bit murkier.
> 
> Antheas

Hi Antheas,

FWIW - you and I separately created very similar patches.

There has been more debugging that is not public (as you can see from 
the content of this commit message).

What tag would you like in this case?

Thanks,

>>
>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>>>   1 file changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
>>> index 8db09d81918fb..3c5f34892734e 100644
>>> --- a/drivers/acpi/ec.c
>>> +++ b/drivers/acpi/ec.c
>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>>>                          DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>>>                  },
>>>          },
>>> +       /*
>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>> +        */
>>> +       {
>>> +               .matches = {
>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
>>> +               }
>>> +       },
>>> +       {
>>> +               .matches = {
>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
>>> +               }
>>> +       },
>>> +       {
>>> +               .matches = {
>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
>>> +               }
>>> +       },
>>> +       {
>>> +               .matches = {
>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
>>> +               }
>>> +       },
>>>          { },
>>>   };
>>>
>>> --
>>> 2.43.0
>>>

Antheas Kapenekakis March 31, 2025, 9:16 p.m. UTC | #4

On Mon, 31 Mar 2025 at 22:55, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 3/31/2025 3:53 PM, Antheas Kapenekakis wrote:
> > On Mon, 31 Mar 2025 at 22:51, Antheas Kapenekakis <lkml@antheas.dev> wrote:
> >>
> >> On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
> >>>
> >>> From: Mario Limonciello <mario.limonciello@amd.com>
> >>>
> >>> When AC adapter is unplugged or plugged in EC wakes from
> >>> HW sleep but APU doesn't enter back into HW sleep.
> >>>
> >>> The reason this hapens is that when APU exits HW sleep the power
> >>> rails the EC controls will power up the TCON.  The TCON has a
> >>> GPIO that will be toggled during this time.  The GPIO is not marked
> >>> as a wakeup source however GPIO controller still has an unserviced
> >>> interrupt and it will block entering HW sleep again. Clearing the
> >>> GPIO doesn't help, the TCON raises it again until it's been initialized
> >>> by i2c-hid.
> >>>
> >>> Fixing this would require TCON F/W changes and it's already broken
> >>> in the wild on production hardware.
> >>>
> >>> To avoid triggering this issue add a quirk to avoid letting EC wake
> >>> up system at all.  The power button still works properly on this system.
> >>
> >> Hi Mario,
> >> I reported this issue to you early in January, did all the debugging
> >> for it, found the cause, made this patch, tested it, and finally
> >> deployed it as well. Then sent it to Xino.
> >>
> >> Then you pushed back for perfectly valid reasons, and we had a
> >> multi-week long back and forth trying to find the proper cause for
> >> this.
> >>
> >> So from my side I do not get why I am just a reported-by here. This is
> >> my patch. We also had a discussion about this out of band.
> >>
> >> Antheas
> >
> > It is interesting you ended up finding the cause. Which makes
> > attributing this a bit murkier.
> >
> > Antheas
>
> Hi Antheas,
>
> FWIW - you and I separately created very similar patches.

There is only one way to write a no_ec_wakeup patch after I reported
to you that it fixes it, here [1] mine is the same, even the DMI order
is the same.

> There has been more debugging that is not public (as you can see from
> the content of this commit message).
>
> What tag would you like in this case?

I think co-developed-by since you also worked very hard on fixing
this. I do not know if b4 picks up co-developed tags.

Not that I am content with it, which you might have noticed with my
absence in the amd/drm issue tracker.

So, was it the touchscreen after all? Did you verify this by tweaking
its firmware?

Antheas

[1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63


> Thanks,
>
> >>
> >>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> >>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >>> ---
> >>>   drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
> >>>   1 file changed, 28 insertions(+)
> >>>
> >>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> >>> index 8db09d81918fb..3c5f34892734e 100644
> >>> --- a/drivers/acpi/ec.c
> >>> +++ b/drivers/acpi/ec.c
> >>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
> >>>                          DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
> >>>                  },
> >>>          },
> >>> +       /*
> >>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> >>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>> +        */
> >>> +       {
> >>> +               .matches = {
> >>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> >>> +               }
> >>> +       },
> >>> +       {
> >>> +               .matches = {
> >>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> >>> +               }
> >>> +       },
> >>> +       {
> >>> +               .matches = {
> >>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> >>> +               }
> >>> +       },
> >>> +       {
> >>> +               .matches = {
> >>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> >>> +               }
> >>> +       },
> >>>          { },
> >>>   };
> >>>
> >>> --
> >>> 2.43.0
> >>>
>

Mario Limonciello April 1, 2025, 2:40 a.m. UTC | #5

On 3/31/2025 4:16 PM, Antheas Kapenekakis wrote:
> On Mon, 31 Mar 2025 at 22:55, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 3/31/2025 3:53 PM, Antheas Kapenekakis wrote:
>>> On Mon, 31 Mar 2025 at 22:51, Antheas Kapenekakis <lkml@antheas.dev> wrote:
>>>>
>>>> On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
>>>>>
>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>
>>>>> When AC adapter is unplugged or plugged in EC wakes from
>>>>> HW sleep but APU doesn't enter back into HW sleep.
>>>>>
>>>>> The reason this hapens is that when APU exits HW sleep the power
>>>>> rails the EC controls will power up the TCON.  The TCON has a
>>>>> GPIO that will be toggled during this time.  The GPIO is not marked
>>>>> as a wakeup source however GPIO controller still has an unserviced
>>>>> interrupt and it will block entering HW sleep again. Clearing the
>>>>> GPIO doesn't help, the TCON raises it again until it's been initialized
>>>>> by i2c-hid.
>>>>>
>>>>> Fixing this would require TCON F/W changes and it's already broken
>>>>> in the wild on production hardware.
>>>>>
>>>>> To avoid triggering this issue add a quirk to avoid letting EC wake
>>>>> up system at all.  The power button still works properly on this system.
>>>>
>>>> Hi Mario,
>>>> I reported this issue to you early in January, did all the debugging
>>>> for it, found the cause, made this patch, tested it, and finally
>>>> deployed it as well. Then sent it to Xino.
>>>>
>>>> Then you pushed back for perfectly valid reasons, and we had a
>>>> multi-week long back and forth trying to find the proper cause for
>>>> this.
>>>>
>>>> So from my side I do not get why I am just a reported-by here. This is
>>>> my patch. We also had a discussion about this out of band.
>>>>
>>>> Antheas
>>>
>>> It is interesting you ended up finding the cause. Which makes
>>> attributing this a bit murkier.
>>>
>>> Antheas
>>
>> Hi Antheas,
>>
>> FWIW - you and I separately created very similar patches.
> 
> There is only one way to write a no_ec_wakeup patch after I reported
> to you that it fixes it, here [1] mine is the same, even the DMI order
> is the same.

You mean the same order as my patch for GoS that applied to acp-6x?

commit b9a8ea185f3f8 ("ASoC: acp: Support microphone from Lenovo Go S")

> 
>> There has been more debugging that is not public (as you can see from
>> the content of this commit message).
>>
>> What tag would you like in this case?
> 
> I think co-developed-by since you also worked very hard on fixing
> this. I do not know if b4 picks up co-developed tags.
> 

Here are tags for linking to your patch development to be picked up.

Link: 
https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>

> Not that I am content with it, which you might have noticed with my
> absence in the amd/drm issue tracker.
> 
> So, was it the touchscreen after all? Did you verify this by tweaking
> its firmware?

Yes it's the touchscreen causing this issue.  It was confirmed by a 
hardware rework.

> 
> Antheas
> 
> [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> 
> 
>> Thanks,
>>
>>>>
>>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
>>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>> ---
>>>>>    drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>>>>>    1 file changed, 28 insertions(+)
>>>>>
>>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
>>>>> index 8db09d81918fb..3c5f34892734e 100644
>>>>> --- a/drivers/acpi/ec.c
>>>>> +++ b/drivers/acpi/ec.c
>>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>>>>>                           DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>>>>>                   },
>>>>>           },
>>>>> +       /*
>>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
>>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>> +        */
>>>>> +       {
>>>>> +               .matches = {
>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
>>>>> +               }
>>>>> +       },
>>>>> +       {
>>>>> +               .matches = {
>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
>>>>> +               }
>>>>> +       },
>>>>> +       {
>>>>> +               .matches = {
>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
>>>>> +               }
>>>>> +       },
>>>>> +       {
>>>>> +               .matches = {
>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
>>>>> +               }
>>>>> +       },
>>>>>           { },
>>>>>    };
>>>>>
>>>>> --
>>>>> 2.43.0
>>>>>
>>

Antheas Kapenekakis April 1, 2025, 9:01 a.m. UTC | #6

On Tue, 1 Apr 2025 at 04:40, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 3/31/2025 4:16 PM, Antheas Kapenekakis wrote:
> > On Mon, 31 Mar 2025 at 22:55, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 3/31/2025 3:53 PM, Antheas Kapenekakis wrote:
> >>> On Mon, 31 Mar 2025 at 22:51, Antheas Kapenekakis <lkml@antheas.dev> wrote:
> >>>>
> >>>> On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>>
> >>>>> From: Mario Limonciello <mario.limonciello@amd.com>
> >>>>>
> >>>>> When AC adapter is unplugged or plugged in EC wakes from
> >>>>> HW sleep but APU doesn't enter back into HW sleep.
> >>>>>
> >>>>> The reason this hapens is that when APU exits HW sleep the power
> >>>>> rails the EC controls will power up the TCON.  The TCON has a
> >>>>> GPIO that will be toggled during this time.  The GPIO is not marked
> >>>>> as a wakeup source however GPIO controller still has an unserviced
> >>>>> interrupt and it will block entering HW sleep again. Clearing the
> >>>>> GPIO doesn't help, the TCON raises it again until it's been initialized
> >>>>> by i2c-hid.
> >>>>>
> >>>>> Fixing this would require TCON F/W changes and it's already broken
> >>>>> in the wild on production hardware.
> >>>>>
> >>>>> To avoid triggering this issue add a quirk to avoid letting EC wake
> >>>>> up system at all.  The power button still works properly on this system.
> >>>>
> >>>> Hi Mario,
> >>>> I reported this issue to you early in January, did all the debugging
> >>>> for it, found the cause, made this patch, tested it, and finally
> >>>> deployed it as well. Then sent it to Xino.
> >>>>
> >>>> Then you pushed back for perfectly valid reasons, and we had a
> >>>> multi-week long back and forth trying to find the proper cause for
> >>>> this.
> >>>>
> >>>> So from my side I do not get why I am just a reported-by here. This is
> >>>> my patch. We also had a discussion about this out of band.
> >>>>
> >>>> Antheas
> >>>
> >>> It is interesting you ended up finding the cause. Which makes
> >>> attributing this a bit murkier.
> >>>
> >>> Antheas
> >>
> >> Hi Antheas,
> >>
> >> FWIW - you and I separately created very similar patches.
> >
> > There is only one way to write a no_ec_wakeup patch after I reported
> > to you that it fixes it, here [1] mine is the same, even the DMI order
> > is the same.
>
> You mean the same order as my patch for GoS that applied to acp-6x?
>
> commit b9a8ea185f3f8 ("ASoC: acp: Support microphone from Lenovo Go S")

Yes, which was submitted, accepted, and prior work at the time, so you
do not get to be credited for it as part of this patch as well.

> >
> >> There has been more debugging that is not public (as you can see from
> >> the content of this commit message).
> >>
> >> What tag would you like in this case?
> >
> > I think co-developed-by since you also worked very hard on fixing
> > this. I do not know if b4 picks up co-developed tags.
> >
>
> Here are tags for linking to your patch development to be picked up.
>
> Link:
> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>

Anyways, you got what my problem here was. You nacked and bikeshed
this patch for 2 months, and that was after I did all the background
research, testing and deployed it [1], so you could find the real
cause, which I let you do as a _professional courtesy_. Then, out of a
sudden you are the primary author on a patch I authored and you nacked
and started testing after it was done [2].

I guess a nicer way of saying this is that you make it hard to
collaborate on kernel development. When I bring up issues to you, do
the background research, bisect, and general grunt work for them, you
do a minor cleanup which is easy for you as a kernel developer, then
strip the credit for them and I have to hunt you down to get some of
it back. This is not a productive environment, I cannot work like
this.

I think this is the 6th or 8th time this happened but this time it is
particularly egregious, because you had me spend 20 hours debugging
offshoots after my patch was already done in random directions trying
to find a real cause, only to see me get dropped to a normal reported
by, and that is after I told you off very harshly because of [2].
Otherwise the reported-by might have been missing too.

In any case, there is no point in rehashing this over and over.
Authorship in this series is mostly fine now, so it can go through.

And to avoid having this conversation again, there is another Legion
Go S [3] patch you nacked and froze the testing for, so you could go
on the manhunt for the real cause of this one. But it will probably be
needed and you will find that as you get TDP controls going. So if you
want me to prepare that in a timely manner, because that one actually
needs rewriting to be posted, now is the time to say so.

Antheas

[1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
[2] https://gitlab.com/evlaV/linux-integration/-/commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
[3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760

> > Not that I am content with it, which you might have noticed with my
> > absence in the amd/drm issue tracker.
> >
> > So, was it the touchscreen after all? Did you verify this by tweaking
> > its firmware?
>
> Yes it's the touchscreen causing this issue.  It was confirmed by a
> hardware rework.
>
> >
> > Antheas
> >
> > [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >
> >
> >> Thanks,
> >>
> >>>>
> >>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> >>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >>>>> ---
> >>>>>    drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
> >>>>>    1 file changed, 28 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> >>>>> index 8db09d81918fb..3c5f34892734e 100644
> >>>>> --- a/drivers/acpi/ec.c
> >>>>> +++ b/drivers/acpi/ec.c
> >>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
> >>>>>                           DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
> >>>>>                   },
> >>>>>           },
> >>>>> +       /*
> >>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> >>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>> +        */
> >>>>> +       {
> >>>>> +               .matches = {
> >>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> >>>>> +               }
> >>>>> +       },
> >>>>> +       {
> >>>>> +               .matches = {
> >>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> >>>>> +               }
> >>>>> +       },
> >>>>> +       {
> >>>>> +               .matches = {
> >>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> >>>>> +               }
> >>>>> +       },
> >>>>> +       {
> >>>>> +               .matches = {
> >>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> >>>>> +               }
> >>>>> +       },
> >>>>>           { },
> >>>>>    };
> >>>>>
> >>>>> --
> >>>>> 2.43.0
> >>>>>
> >>
>

Mario Limonciello April 1, 2025, 12:29 p.m. UTC | #7

>> Here are tags for linking to your patch development to be picked up.
>>
>> Link:
>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> 

I don't believe that b4 will pick these up, so I will send out a v2 with 
them and mark this patch as superceded in patchwork so that Rafael 
doesn't have to pull everything out of this thread manually.

> 
> And to avoid having this conversation again, there is another Legion
> Go S [3] patch you nacked and froze the testing for, so you could go
> on the manhunt for the real cause of this one. But it will probably be
> needed and you will find that as you get TDP controls going. So if you
> want me to prepare that in a timely manner, because that one actually
> needs rewriting to be posted, now is the time to say so.

Can you please propose what you have in mind on the mailing lists to 
discuss?  It's relatively expensive (in the unit of tech debt) to add 
quirk infrastructure and so we need to make sure it is the right solution.

Derek is working on CPU coefficient tuning in a completely separate 
driver.  If there are issues with that, I would generally prefer the 
fixes to be in that driver.

> 
> Antheas
> 
> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> [2] https://gitlab.com/evlaV/linux-integration/-/commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
> 
>>> Not that I am content with it, which you might have noticed with my
>>> absence in the amd/drm issue tracker.
>>>
>>> So, was it the touchscreen after all? Did you verify this by tweaking
>>> its firmware?
>>
>> Yes it's the touchscreen causing this issue.  It was confirmed by a
>> hardware rework.
>>
>>>
>>> Antheas
>>>
>>> [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>
>>>
>>>> Thanks,
>>>>
>>>>>>
>>>>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
>>>>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>> ---
>>>>>>>     drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 28 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
>>>>>>> index 8db09d81918fb..3c5f34892734e 100644
>>>>>>> --- a/drivers/acpi/ec.c
>>>>>>> +++ b/drivers/acpi/ec.c
>>>>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>>>>>>>                            DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>>>>>>>                    },
>>>>>>>            },
>>>>>>> +       /*
>>>>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
>>>>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>>>> +        */
>>>>>>> +       {
>>>>>>> +               .matches = {
>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
>>>>>>> +               }
>>>>>>> +       },
>>>>>>> +       {
>>>>>>> +               .matches = {
>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
>>>>>>> +               }
>>>>>>> +       },
>>>>>>> +       {
>>>>>>> +               .matches = {
>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
>>>>>>> +               }
>>>>>>> +       },
>>>>>>> +       {
>>>>>>> +               .matches = {
>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
>>>>>>> +               }
>>>>>>> +       },
>>>>>>>            { },
>>>>>>>     };
>>>>>>>
>>>>>>> --
>>>>>>> 2.43.0
>>>>>>>
>>>>
>>

Antheas Kapenekakis April 1, 2025, 12:45 p.m. UTC | #8

On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
>
> >> Here are tags for linking to your patch development to be picked up.
> >>
> >> Link:
> >> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> >> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> >
>
> I don't believe that b4 will pick these up, so I will send out a v2 with
> them and mark this patch as superceded in patchwork so that Rafael
> doesn't have to pull everything out of this thread manually.
>
> >
> > And to avoid having this conversation again, there is another Legion
> > Go S [3] patch you nacked and froze the testing for, so you could go
> > on the manhunt for the real cause of this one. But it will probably be
> > needed and you will find that as you get TDP controls going. So if you
> > want me to prepare that in a timely manner, because that one actually
> > needs rewriting to be posted, now is the time to say so.
>
> Can you please propose what you have in mind on the mailing lists to
> discuss?  It's relatively expensive (in the unit of tech debt) to add
> quirk infrastructure and so we need to make sure it is the right solution.
>
> Derek is working on CPU coefficient tuning in a completely separate
> driver.  If there are issues with that, I would generally prefer the
> fixes to be in that driver.

CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
try to make sure the quirk can be potentially added there, or in any
driver*.

The idea is to rewrite the patch series to just add a simple delay
field on the s2idle quirk struct. Then the biggest delay wins and gets
placed in ->begin. We have been using that series for ~6 months now,
and it turns out that having a delay system for every call is quite
pointless. But there are also situations where you might have a device
such as the Z13 Folio which looks like a USB device but listens to
s2idle notifications through ACPI, so the hid subsystem might need to
be able to inject a small delay there.

But rewriting the series will take 1-2 weeks, so I need a heads up now
if you need it for the Go S launch.

Specifically for the Z13 folio, since I brought that up, it seems like
all Aura devices including the Ally need a 300ms delay to fade their
backlights after sleep entry but before D3, but my testing has been
mixed here because KDE plays with the backlight while i test the
hid-asus series.

*for general device stability such as in the Go S, I'd have a slight
preference for a non-platform quirk though.

Antheas

> >
> > Antheas
> >
> > [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> > [2] https://gitlab.com/evlaV/linux-integration/-/commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> > [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
> >
> >>> Not that I am content with it, which you might have noticed with my
> >>> absence in the amd/drm issue tracker.
> >>>
> >>> So, was it the touchscreen after all? Did you verify this by tweaking
> >>> its firmware?
> >>
> >> Yes it's the touchscreen causing this issue.  It was confirmed by a
> >> hardware rework.
> >>
> >>>
> >>> Antheas
> >>>
> >>> [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>>>>
> >>>>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> >>>>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >>>>>>> ---
> >>>>>>>     drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
> >>>>>>>     1 file changed, 28 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> >>>>>>> index 8db09d81918fb..3c5f34892734e 100644
> >>>>>>> --- a/drivers/acpi/ec.c
> >>>>>>> +++ b/drivers/acpi/ec.c
> >>>>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
> >>>>>>>                            DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
> >>>>>>>                    },
> >>>>>>>            },
> >>>>>>> +       /*
> >>>>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> >>>>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>>>> +        */
> >>>>>>> +       {
> >>>>>>> +               .matches = {
> >>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> >>>>>>> +               }
> >>>>>>> +       },
> >>>>>>> +       {
> >>>>>>> +               .matches = {
> >>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> >>>>>>> +               }
> >>>>>>> +       },
> >>>>>>> +       {
> >>>>>>> +               .matches = {
> >>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> >>>>>>> +               }
> >>>>>>> +       },
> >>>>>>> +       {
> >>>>>>> +               .matches = {
> >>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> >>>>>>> +               }
> >>>>>>> +       },
> >>>>>>>            { },
> >>>>>>>     };
> >>>>>>>
> >>>>>>> --
> >>>>>>> 2.43.0
> >>>>>>>
> >>>>
> >>
>

Antheas Kapenekakis April 1, 2025, 12:47 p.m. UTC | #9

On Mon, 31 Mar 2025 at 22:44, Mario Limonciello <superm1@kernel.org> wrote:
>
> From: Mario Limonciello <mario.limonciello@amd.com>
>
> When AC adapter is unplugged or plugged in EC wakes from
> HW sleep but APU doesn't enter back into HW sleep.
>
> The reason this hapens is that when APU exits HW sleep the power

Speaking of a V2, typo /\

> rails the EC controls will power up the TCON.  The TCON has a
> GPIO that will be toggled during this time.  The GPIO is not marked
> as a wakeup source however GPIO controller still has an unserviced
> interrupt and it will block entering HW sleep again. Clearing the
> GPIO doesn't help, the TCON raises it again until it's been initialized
> by i2c-hid.

And missing commas/weird grammar above

A

> Fixing this would require TCON F/W changes and it's already broken
> in the wild on production hardware.
>
> To avoid triggering this issue add a quirk to avoid letting EC wake
> up system at all.  The power button still works properly on this system.
>
> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index 8db09d81918fb..3c5f34892734e 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>                         DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>                 },
>         },
> +       /*
> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> +        */
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> +               }
> +       },
> +       {
> +               .matches = {
> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> +               }
> +       },
>         { },
>  };
>
> --
> 2.43.0
>

Mario Limonciello April 1, 2025, 2:09 p.m. UTC | #10

On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
>>
>>>> Here are tags for linking to your patch development to be picked up.
>>>>
>>>> Link:
>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>
>>
>> I don't believe that b4 will pick these up, so I will send out a v2 with
>> them and mark this patch as superceded in patchwork so that Rafael
>> doesn't have to pull everything out of this thread manually.

FTR I don't have permission on patchwork for linux-acpi.

I sent out v2 though.

>>
>>>
>>> And to avoid having this conversation again, there is another Legion
>>> Go S [3] patch you nacked and froze the testing for, so you could go
>>> on the manhunt for the real cause of this one. But it will probably be
>>> needed and you will find that as you get TDP controls going. So if you
>>> want me to prepare that in a timely manner, because that one actually
>>> needs rewriting to be posted, now is the time to say so.
>>
>> Can you please propose what you have in mind on the mailing lists to
>> discuss?  It's relatively expensive (in the unit of tech debt) to add
>> quirk infrastructure and so we need to make sure it is the right solution.
>>
>> Derek is working on CPU coefficient tuning in a completely separate
>> driver.  If there are issues with that, I would generally prefer the
>> fixes to be in that driver.
> 
> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> try to make sure the quirk can be potentially added there, or in any
> driver*.

Yes things like fPPT, sPPT, STAPM, STT limits.

> 
> The idea is to rewrite the patch series to just add a simple delay
> field on the s2idle quirk struct. Then the biggest delay wins and gets
> placed in ->begin. We have been using that series for ~6 months now,
> and it turns out that having a delay system for every call is quite
> pointless. But there are also situations where you might have a device
> such as the Z13 Folio which looks like a USB device but listens to
> s2idle notifications through ACPI, so the hid subsystem might need to
> be able to inject a small delay there.

So the "general" problem with injecting delays is they are typically not 
scalable as they're usually empirically measured and there is no 
handshake with the firmware.

Say for example the EC has some hardcoded value of 200ms to wait for 
something.  IIRC the Linux timer infrastructure can be off by ~13%.  So 
if you put 175ms it might work sometimes.  You get some reports of this, 
so you extend it to 200ms.  Great it works 100% of the time because the 
old hardcoded value in the EC was 200ms.

Now say a new EC firmware comes out that for $REASONS changes it to 
250ms.  Your old empirically measured value stops working, spend a bunch 
of cycles debugging it, measure the new one.  You change it to 250ms, 
but people with the old one have a problem now because the timing changed.

So now you have to add infrastructure to say what version of the 
firmware gets what delay.

Then you find out there is another SKU of that model which needs a 
different delay, so your complexity has ballooned.

What if all these "delays" were FW timeouts from failing to service an 
interrupt?  Or what if they were a flow problem like the device expected 
you to issue a STOP command before a RESET command?

So we need to be /incredibly careful/ with delays and 100% they are the 
right answer to a problem.

> 
> But rewriting the series will take 1-2 weeks, so I need a heads up now
> if you need it for the Go S launch.
> 
> Specifically for the Z13 folio, since I brought that up, it seems like
> all Aura devices including the Ally need a 300ms delay to fade their
> backlights after sleep entry but before D3, but my testing has been
> mixed here because KDE plays with the backlight while i test the
> hid-asus series.
> 
> *for general device stability such as in the Go S, I'd have a slight
> preference for a non-platform quirk though.
> 
> Antheas
> 
>>>
>>> Antheas
>>>
>>> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
>>> [2] https://gitlab.com/evlaV/linux-integration/-/commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
>>> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
>>>
>>>>> Not that I am content with it, which you might have noticed with my
>>>>> absence in the amd/drm issue tracker.
>>>>>
>>>>> So, was it the touchscreen after all? Did you verify this by tweaking
>>>>> its firmware?
>>>>
>>>> Yes it's the touchscreen causing this issue.  It was confirmed by a
>>>> hardware rework.
>>>>
>>>>>
>>>>> Antheas
>>>>>
>>>>> [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>>
>>>>>>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
>>>>>>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
>>>>>>>>>      1 file changed, 28 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
>>>>>>>>> index 8db09d81918fb..3c5f34892734e 100644
>>>>>>>>> --- a/drivers/acpi/ec.c
>>>>>>>>> +++ b/drivers/acpi/ec.c
>>>>>>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
>>>>>>>>>                             DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
>>>>>>>>>                     },
>>>>>>>>>             },
>>>>>>>>> +       /*
>>>>>>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
>>>>>>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
>>>>>>>>> +        */
>>>>>>>>> +       {
>>>>>>>>> +               .matches = {
>>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
>>>>>>>>> +               }
>>>>>>>>> +       },
>>>>>>>>> +       {
>>>>>>>>> +               .matches = {
>>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
>>>>>>>>> +               }
>>>>>>>>> +       },
>>>>>>>>> +       {
>>>>>>>>> +               .matches = {
>>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
>>>>>>>>> +               }
>>>>>>>>> +       },
>>>>>>>>> +       {
>>>>>>>>> +               .matches = {
>>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
>>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
>>>>>>>>> +               }
>>>>>>>>> +       },
>>>>>>>>>             { },
>>>>>>>>>      };
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.43.0
>>>>>>>>>
>>>>>>
>>>>
>>

Xino Ni April 1, 2025, 2:16 p.m. UTC | #11

On 4/1/2025 5:01 PM, Antheas Kapenekakis wrote:
> Anyways, you got what my problem here was. You nacked and bikeshed
> this patch for 2 months, and that was after I did all the background
> research, testing and deployed it [1], so you could find the real
> cause, which I let you do as a _professional courtesy_. Then, out of a
> sudden you are the primary author on a patch I authored and you nacked
> and started testing after it was done [2].
> 
> I guess a nicer way of saying this is that you make it hard to
> collaborate on kernel development. When I bring up issues to you, do
> the background research, bisect, and general grunt work for them, you
> do a minor cleanup which is easy for you as a kernel developer, then
> strip the credit for them and I have to hunt you down to get some of
> it back. This is not a productive environment, I cannot work like
> this.
> 
> I think this is the 6th or 8th time this happened but this time it is
> particularly egregious, because you had me spend 20 hours debugging
> offshoots after my patch was already done in random directions trying
> to find a real cause, only to see me get dropped to a normal reported
> by, and that is after I told you off very harshly because of [2].
> Otherwise the reported-by might have been missing too.
> 
> In any case, there is no point in rehashing this over and over.
> Authorship in this series is mostly fine now, so it can go through.
> 
> And to avoid having this conversation again, there is another Legion
> Go S [3] patch you nacked and froze the testing for, so you could go
> on the manhunt for the real cause of this one. But it will probably be
> needed and you will find that as you get TDP controls going. So if you
> want me to prepare that in a timely manner, because that one actually
> needs rewriting to be posted, now is the time to say so.
> 
> Antheas
> 
> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> [2] https://gitlab.com/evlaV/linux-integration/-/
> commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760

Dear Antheas,

Thanks for all the amazing works and advice you have contributed to
Legion Go S not only in the kernel but also the HHD, I'm honored to have
you help on the product.

I'd say it's unfair to blame Mario for this patch submission because
it's actually my request to him and it's my failure on ODM management
that cannot fix these FW/EC/BIOS wake up issues, we have to submit the
patch to upstream as next kernel cycle begins.

Further more, I'm very grateful to the specific issue you have spend a
lot of personal time on it, you gave a very important guidance in issue
debugging and help us clarify the root cause.

As the project is the first time I work with the community, definitely I
have a long way to learn how to work together and manage the
development, so please forgive me on it.

B/R
Xino

Antheas Kapenekakis April 1, 2025, 2:46 p.m. UTC | #12

On Tue, 1 Apr 2025 at 16:17, Xino Ni <phnjs211@gmail.com> wrote:
>
>
>
> On 4/1/2025 5:01 PM, Antheas Kapenekakis wrote:
> > Anyways, you got what my problem here was. You nacked and bikeshed
> > this patch for 2 months, and that was after I did all the background
> > research, testing and deployed it [1], so you could find the real
> > cause, which I let you do as a _professional courtesy_. Then, out of a
> > sudden you are the primary author on a patch I authored and you nacked
> > and started testing after it was done [2].
> >
> > I guess a nicer way of saying this is that you make it hard to
> > collaborate on kernel development. When I bring up issues to you, do
> > the background research, bisect, and general grunt work for them, you
> > do a minor cleanup which is easy for you as a kernel developer, then
> > strip the credit for them and I have to hunt you down to get some of
> > it back. This is not a productive environment, I cannot work like
> > this.
> >
> > I think this is the 6th or 8th time this happened but this time it is
> > particularly egregious, because you had me spend 20 hours debugging
> > offshoots after my patch was already done in random directions trying
> > to find a real cause, only to see me get dropped to a normal reported
> > by, and that is after I told you off very harshly because of [2].
> > Otherwise the reported-by might have been missing too.
> >
> > In any case, there is no point in rehashing this over and over.
> > Authorship in this series is mostly fine now, so it can go through.
> >
> > And to avoid having this conversation again, there is another Legion
> > Go S [3] patch you nacked and froze the testing for, so you could go
> > on the manhunt for the real cause of this one. But it will probably be
> > needed and you will find that as you get TDP controls going. So if you
> > want me to prepare that in a timely manner, because that one actually
> > needs rewriting to be posted, now is the time to say so.
> >
> > Antheas
> >
> > [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> > [2] https://gitlab.com/evlaV/linux-integration/-/
> > commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> > [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
>
> Dear Antheas,
>
> Thanks for all the amazing works and advice you have contributed to
> Legion Go S not only in the kernel but also the HHD, I'm honored to have
> you help on the product.
>
> I'd say it's unfair to blame Mario for this patch submission because
> it's actually my request to him and it's my failure on ODM management
> that cannot fix these FW/EC/BIOS wake up issues, we have to submit the
> patch to upstream as next kernel cycle begins.
>
> Further more, I'm very grateful to the specific issue you have spend a
> lot of personal time on it, you gave a very important guidance in issue
> debugging and help us clarify the root cause.
>
> As the project is the first time I work with the community, definitely I
> have a long way to learn how to work together and manage the
> development, so please forgive me on it.

Hi Xino,
no problems from my side. We tried our best to have it work but
sometimes life has it some other way. There is also no problem with
submitting the patch now, a month before or a month after. The
attribution is how it should be now, and that was the only problem, so
it's ok.

Antheas

> B/R
> Xino

Xino Ni April 1, 2025, 2:58 p.m. UTC | #13

> Hi Xino,
> no problems from my side. We tried our best to have it work but
> sometimes life has it some other way. There is also no problem with
> submitting the patch now, a month before or a month after. The
> attribution is how it should be now, and that was the only problem, so
> it's ok.

Thanks again.

I'd like to share more progress in the debug:

After the BIOS implement standard _BTP method, the sleep-to-dead issue
is not happen again when battery is fully charged, and the fake sleep
heat problem is fixed also. But we observed OS crash to reboot directly
when wake up, same as you observed.

So you are right, the issue can be divided into three parts:
1. _BTP make the OS wake up time to time
2. Touch Panel signal gating device sleep again
3. something from EC/BIOS crashed the OS when wake up

We failed to resolve below two issues, and thanks to the ec_no_wakeup
quirk, we never encounter below two issues after introducing it.

Xino

On 4/1/2025 10:46 PM, Antheas Kapenekakis wrote:
> On Tue, 1 Apr 2025 at 16:17, Xino Ni <phnjs211@gmail.com> wrote:
>>
>>
>>
>> On 4/1/2025 5:01 PM, Antheas Kapenekakis wrote:
>>> Anyways, you got what my problem here was. You nacked and bikeshed
>>> this patch for 2 months, and that was after I did all the background
>>> research, testing and deployed it [1], so you could find the real
>>> cause, which I let you do as a _professional courtesy_. Then, out of a
>>> sudden you are the primary author on a patch I authored and you nacked
>>> and started testing after it was done [2].
>>>
>>> I guess a nicer way of saying this is that you make it hard to
>>> collaborate on kernel development. When I bring up issues to you, do
>>> the background research, bisect, and general grunt work for them, you
>>> do a minor cleanup which is easy for you as a kernel developer, then
>>> strip the credit for them and I have to hunt you down to get some of
>>> it back. This is not a productive environment, I cannot work like
>>> this.
>>>
>>> I think this is the 6th or 8th time this happened but this time it is
>>> particularly egregious, because you had me spend 20 hours debugging
>>> offshoots after my patch was already done in random directions trying
>>> to find a real cause, only to see me get dropped to a normal reported
>>> by, and that is after I told you off very harshly because of [2].
>>> Otherwise the reported-by might have been missing too.
>>>
>>> In any case, there is no point in rehashing this over and over.
>>> Authorship in this series is mostly fine now, so it can go through.
>>>
>>> And to avoid having this conversation again, there is another Legion
>>> Go S [3] patch you nacked and froze the testing for, so you could go
>>> on the manhunt for the real cause of this one. But it will probably be
>>> needed and you will find that as you get TDP controls going. So if you
>>> want me to prepare that in a timely manner, because that one actually
>>> needs rewriting to be posted, now is the time to say so.
>>>
>>> Antheas
>>>
>>> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
>>> [2] https://gitlab.com/evlaV/linux-integration/-/
>>> commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
>>> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
>>
>> Dear Antheas,
>>
>> Thanks for all the amazing works and advice you have contributed to
>> Legion Go S not only in the kernel but also the HHD, I'm honored to have
>> you help on the product.
>>
>> I'd say it's unfair to blame Mario for this patch submission because
>> it's actually my request to him and it's my failure on ODM management
>> that cannot fix these FW/EC/BIOS wake up issues, we have to submit the
>> patch to upstream as next kernel cycle begins.
>>
>> Further more, I'm very grateful to the specific issue you have spend a
>> lot of personal time on it, you gave a very important guidance in issue
>> debugging and help us clarify the root cause.
>>
>> As the project is the first time I work with the community, definitely I
>> have a long way to learn how to work together and manage the
>> development, so please forgive me on it.
> 
> Hi Xino,
> no problems from my side. We tried our best to have it work but
> sometimes life has it some other way. There is also no problem with
> submitting the patch now, a month before or a month after. The
> attribution is how it should be now, and that was the only problem, so
> it's ok.
> 
> Antheas
> 
>> B/R
>> Xino
>

Antheas Kapenekakis April 1, 2025, 3:03 p.m. UTC | #14

On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >>>> Here are tags for linking to your patch development to be picked up.
> >>>>
> >>>> Link:
> >>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>
> >>
> >> I don't believe that b4 will pick these up, so I will send out a v2 with
> >> them and mark this patch as superceded in patchwork so that Rafael
> >> doesn't have to pull everything out of this thread manually.
>
> FTR I don't have permission on patchwork for linux-acpi.
>
> I sent out v2 though.
>
> >>
> >>>
> >>> And to avoid having this conversation again, there is another Legion
> >>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>> on the manhunt for the real cause of this one. But it will probably be
> >>> needed and you will find that as you get TDP controls going. So if you
> >>> want me to prepare that in a timely manner, because that one actually
> >>> needs rewriting to be posted, now is the time to say so.
> >>
> >> Can you please propose what you have in mind on the mailing lists to
> >> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >> quirk infrastructure and so we need to make sure it is the right solution.
> >>
> >> Derek is working on CPU coefficient tuning in a completely separate
> >> driver.  If there are issues with that, I would generally prefer the
> >> fixes to be in that driver.
> >
> > CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> > try to make sure the quirk can be potentially added there, or in any
> > driver*.
>
> Yes things like fPPT, sPPT, STAPM, STT limits.
>
> >
> > The idea is to rewrite the patch series to just add a simple delay
> > field on the s2idle quirk struct. Then the biggest delay wins and gets
> > placed in ->begin. We have been using that series for ~6 months now,
> > and it turns out that having a delay system for every call is quite
> > pointless. But there are also situations where you might have a device
> > such as the Z13 Folio which looks like a USB device but listens to
> > s2idle notifications through ACPI, so the hid subsystem might need to
> > be able to inject a small delay there.
>
> So the "general" problem with injecting delays is they are typically not
> scalable as they're usually empirically measured and there is no
> handshake with the firmware.
>
> Say for example the EC has some hardcoded value of 200ms to wait for
> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> if you put 175ms it might work sometimes.  You get some reports of this,
> so you extend it to 200ms.  Great it works 100% of the time because the
> old hardcoded value in the EC was 200ms.
>
> Now say a new EC firmware comes out that for $REASONS changes it to
> 250ms.  Your old empirically measured value stops working, spend a bunch
> of cycles debugging it, measure the new one.  You change it to 250ms,
> but people with the old one have a problem now because the timing changed.
>
> So now you have to add infrastructure to say what version of the
> firmware gets what delay.
>
> Then you find out there is another SKU of that model which needs a
> different delay, so your complexity has ballooned.
>
> What if all these "delays" were FW timeouts from failing to service an
> interrupt?  Or what if they were a flow problem like the device expected
> you to issue a STOP command before a RESET command?
>
> So we need to be /incredibly careful/ with delays and 100% they are the
> right answer to a problem.

I do get your points. In this case though we sideskirt through a lot
of the points because of where the delay is placed.

If the instrumentation is in-place, this delay happens before sleep
after the screen of the device has turned off (due to early DPMS), the
keyboard backlight has turned off (DIsplay off call), and the suspend
light pulses (Sleep Entry). So it does not affect device behavior and
you can be quite liberal. The user has left the device alone.

If the device needs e.g., 250ms you will not put 250ms, you will put
500ms. Still unsure, you bump it to 750ms. Also, even if the
manufacturer comes up with a new firmware that fixes this issue, you
can keep the delay for the life of the product, because keeping it
does not affect device behavior, and writing kernel patches takes time.

This is how I think about it, at least. A universal delay might be
needed eventually. But for now, limiting the scope to some devices and
seeing how that goes should be enough.

Antheas


Antheas

> >
> > But rewriting the series will take 1-2 weeks, so I need a heads up now
> > if you need it for the Go S launch.
> >
> > Specifically for the Z13 folio, since I brought that up, it seems like
> > all Aura devices including the Ally need a 300ms delay to fade their
> > backlights after sleep entry but before D3, but my testing has been
> > mixed here because KDE plays with the backlight while i test the
> > hid-asus series.
> >
> > *for general device stability such as in the Go S, I'd have a slight
> > preference for a non-platform quirk though.
> >
> > Antheas
> >
> >>>
> >>> Antheas
> >>>
> >>> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> >>> [2] https://gitlab.com/evlaV/linux-integration/-/commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> >>> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
> >>>
> >>>>> Not that I am content with it, which you might have noticed with my
> >>>>> absence in the amd/drm issue tracker.
> >>>>>
> >>>>> So, was it the touchscreen after all? Did you verify this by tweaking
> >>>>> its firmware?
> >>>>
> >>>> Yes it's the touchscreen causing this issue.  It was confirmed by a
> >>>> hardware rework.
> >>>>
> >>>>>
> >>>>> Antheas
> >>>>>
> >>>>> [1] https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>>>
> >>>>>>>>> Cc: Xino JS1 Ni <nijs1@lenovo.com>
> >>>>>>>>> Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >>>>>>>>> ---
> >>>>>>>>>      drivers/acpi/ec.c | 28 ++++++++++++++++++++++++++++
> >>>>>>>>>      1 file changed, 28 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> >>>>>>>>> index 8db09d81918fb..3c5f34892734e 100644
> >>>>>>>>> --- a/drivers/acpi/ec.c
> >>>>>>>>> +++ b/drivers/acpi/ec.c
> >>>>>>>>> @@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
> >>>>>>>>>                             DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
> >>>>>>>>>                     },
> >>>>>>>>>             },
> >>>>>>>>> +       /*
> >>>>>>>>> +        * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
> >>>>>>>>> +        * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
> >>>>>>>>> +        */
> >>>>>>>>> +       {
> >>>>>>>>> +               .matches = {
> >>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
> >>>>>>>>> +               }
> >>>>>>>>> +       },
> >>>>>>>>> +       {
> >>>>>>>>> +               .matches = {
> >>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
> >>>>>>>>> +               }
> >>>>>>>>> +       },
> >>>>>>>>> +       {
> >>>>>>>>> +               .matches = {
> >>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
> >>>>>>>>> +               }
> >>>>>>>>> +       },
> >>>>>>>>> +       {
> >>>>>>>>> +               .matches = {
> >>>>>>>>> +                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
> >>>>>>>>> +                       DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
> >>>>>>>>> +               }
> >>>>>>>>> +       },
> >>>>>>>>>             { },
> >>>>>>>>>      };
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> 2.43.0
> >>>>>>>>>
> >>>>>>
> >>>>
> >>
>

Antheas Kapenekakis April 1, 2025, 3:05 p.m. UTC | #15

On Tue, 1 Apr 2025 at 16:58, Xino Ni <phnjs211@gmail.com> wrote:
>
> > Hi Xino,
> > no problems from my side. We tried our best to have it work but
> > sometimes life has it some other way. There is also no problem with
> > submitting the patch now, a month before or a month after. The
> > attribution is how it should be now, and that was the only problem, so
> > it's ok.
>
> Thanks again.
>
> I'd like to share more progress in the debug:
>
> After the BIOS implement standard _BTP method, the sleep-to-dead issue
> is not happen again when battery is fully charged, and the fake sleep
> heat problem is fixed also. But we observed OS crash to reboot directly
> when wake up, same as you observed.
>
> So you are right, the issue can be divided into three parts:
> 1. _BTP make the OS wake up time to time
> 2. Touch Panel signal gating device sleep again
> 3. something from EC/BIOS crashed the OS when wake up

I still got N3 on mine afterwards on performance mode. Only on
performance mode, or 30+W TDP, and only while in a game. So a lot
less.

But it has been 2 months. Does performance mode work correctly now?
Even when in-game?

If yes, then an additional patch is not needed.

Antheas

> We failed to resolve below two issues, and thanks to the ec_no_wakeup
> quirk, we never encounter below two issues after introducing it.
>
> Xino
>
> On 4/1/2025 10:46 PM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 16:17, Xino Ni <phnjs211@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/1/2025 5:01 PM, Antheas Kapenekakis wrote:
> >>> Anyways, you got what my problem here was. You nacked and bikeshed
> >>> this patch for 2 months, and that was after I did all the background
> >>> research, testing and deployed it [1], so you could find the real
> >>> cause, which I let you do as a _professional courtesy_. Then, out of a
> >>> sudden you are the primary author on a patch I authored and you nacked
> >>> and started testing after it was done [2].
> >>>
> >>> I guess a nicer way of saying this is that you make it hard to
> >>> collaborate on kernel development. When I bring up issues to you, do
> >>> the background research, bisect, and general grunt work for them, you
> >>> do a minor cleanup which is easy for you as a kernel developer, then
> >>> strip the credit for them and I have to hunt you down to get some of
> >>> it back. This is not a productive environment, I cannot work like
> >>> this.
> >>>
> >>> I think this is the 6th or 8th time this happened but this time it is
> >>> particularly egregious, because you had me spend 20 hours debugging
> >>> offshoots after my patch was already done in random directions trying
> >>> to find a real cause, only to see me get dropped to a normal reported
> >>> by, and that is after I told you off very harshly because of [2].
> >>> Otherwise the reported-by might have been missing too.
> >>>
> >>> In any case, there is no point in rehashing this over and over.
> >>> Authorship in this series is mostly fine now, so it can go through.
> >>>
> >>> And to avoid having this conversation again, there is another Legion
> >>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>> on the manhunt for the real cause of this one. But it will probably be
> >>> needed and you will find that as you get TDP controls going. So if you
> >>> want me to prepare that in a timely manner, because that one actually
> >>> needs rewriting to be posted, now is the time to say so.
> >>>
> >>> Antheas
> >>>
> >>> [1] https://github.com/bazzite-org/kernel-bazzite/releases/tag/6.12.12-201
> >>> [2] https://gitlab.com/evlaV/linux-integration/-/
> >>> commit/6c5a3a96be9b061f07bf9a1bcc33156c932ddf67
> >>> [3] https://gitlab.freedesktop.org/drm/amd/-/issues/3929#note_2764760
> >>
> >> Dear Antheas,
> >>
> >> Thanks for all the amazing works and advice you have contributed to
> >> Legion Go S not only in the kernel but also the HHD, I'm honored to have
> >> you help on the product.
> >>
> >> I'd say it's unfair to blame Mario for this patch submission because
> >> it's actually my request to him and it's my failure on ODM management
> >> that cannot fix these FW/EC/BIOS wake up issues, we have to submit the
> >> patch to upstream as next kernel cycle begins.
> >>
> >> Further more, I'm very grateful to the specific issue you have spend a
> >> lot of personal time on it, you gave a very important guidance in issue
> >> debugging and help us clarify the root cause.
> >>
> >> As the project is the first time I work with the community, definitely I
> >> have a long way to learn how to work together and manage the
> >> development, so please forgive me on it.
> >
> > Hi Xino,
> > no problems from my side. We tried our best to have it work but
> > sometimes life has it some other way. There is also no problem with
> > submitting the patch now, a month before or a month after. The
> > attribution is how it should be now, and that was the only problem, so
> > it's ok.
> >
> > Antheas
> >
> >> B/R
> >> Xino
> >
>

Mario Limonciello April 1, 2025, 3:24 p.m. UTC | #16

On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>>>> Here are tags for linking to your patch development to be picked up.
>>>>>>
>>>>>> Link:
>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>
>>>>
>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
>>>> them and mark this patch as superceded in patchwork so that Rafael
>>>> doesn't have to pull everything out of this thread manually.
>>
>> FTR I don't have permission on patchwork for linux-acpi.
>>
>> I sent out v2 though.
>>
>>>>
>>>>>
>>>>> And to avoid having this conversation again, there is another Legion
>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
>>>>> on the manhunt for the real cause of this one. But it will probably be
>>>>> needed and you will find that as you get TDP controls going. So if you
>>>>> want me to prepare that in a timely manner, because that one actually
>>>>> needs rewriting to be posted, now is the time to say so.
>>>>
>>>> Can you please propose what you have in mind on the mailing lists to
>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
>>>> quirk infrastructure and so we need to make sure it is the right solution.
>>>>
>>>> Derek is working on CPU coefficient tuning in a completely separate
>>>> driver.  If there are issues with that, I would generally prefer the
>>>> fixes to be in that driver.
>>>
>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
>>> try to make sure the quirk can be potentially added there, or in any
>>> driver*.
>>
>> Yes things like fPPT, sPPT, STAPM, STT limits.
>>
>>>
>>> The idea is to rewrite the patch series to just add a simple delay
>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
>>> placed in ->begin. We have been using that series for ~6 months now,
>>> and it turns out that having a delay system for every call is quite
>>> pointless. But there are also situations where you might have a device
>>> such as the Z13 Folio which looks like a USB device but listens to
>>> s2idle notifications through ACPI, so the hid subsystem might need to
>>> be able to inject a small delay there.
>>
>> So the "general" problem with injecting delays is they are typically not
>> scalable as they're usually empirically measured and there is no
>> handshake with the firmware.
>>
>> Say for example the EC has some hardcoded value of 200ms to wait for
>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
>> if you put 175ms it might work sometimes.  You get some reports of this,
>> so you extend it to 200ms.  Great it works 100% of the time because the
>> old hardcoded value in the EC was 200ms.
>>
>> Now say a new EC firmware comes out that for $REASONS changes it to
>> 250ms.  Your old empirically measured value stops working, spend a bunch
>> of cycles debugging it, measure the new one.  You change it to 250ms,
>> but people with the old one have a problem now because the timing changed.
>>
>> So now you have to add infrastructure to say what version of the
>> firmware gets what delay.
>>
>> Then you find out there is another SKU of that model which needs a
>> different delay, so your complexity has ballooned.
>>
>> What if all these "delays" were FW timeouts from failing to service an
>> interrupt?  Or what if they were a flow problem like the device expected
>> you to issue a STOP command before a RESET command?
>>
>> So we need to be /incredibly careful/ with delays and 100% they are the
>> right answer to a problem.
> 
> I do get your points. In this case though we sideskirt through a lot
> of the points because of where the delay is placed.
> 
> If the instrumentation is in-place, this delay happens before sleep
> after the screen of the device has turned off (due to early DPMS), the
> keyboard backlight has turned off (DIsplay off call), and the suspend
> light pulses (Sleep Entry). So it does not affect device behavior and
> you can be quite liberal. The user has left the device alone.
> 
> If the device needs e.g., 250ms you will not put 250ms, you will put
> 500ms. Still unsure, you bump it to 750ms. Also, even if the
> manufacturer comes up with a new firmware that fixes this issue, you
> can keep the delay for the life of the product, because keeping it
> does not affect device behavior, and writing kernel patches takes time.
> 
> This is how I think about it, at least. A universal delay might be
> needed eventually. But for now, limiting the scope to some devices and
> seeing how that goes should be enough.
> 
> Antheas

My take is that "universal" delays are never popular.  IE hardware that 
"previously" worked perfectly is now slower.  So if there /must/ be a 
delay it should be as narrow as possible and justified.

Let me give you an example of another case I'm *actively considering* a 
delay.

I have an OEM's system that if you enter and exit s0i3 too quickly you 
can trigger the over voltage protection (OVP) feature of the VR module.
When OVP is tripped the system is forced off immediately. This *only 
happens* on the VR module in that vendor's systems. "Normal" Linux 
userspace suspend/resume can't trip it.  But connecting a dock "does" 
trip it.

If you look on a scope you can see SLP_S3# pin is toggling faster than 
spec says it should.  Naïvely you would say well the easy solution is to 
add a delay somewhere so that SLP_S3# stays in spec.  I have a patch 
that does just that.

diff --git a/drivers/platform/x86/amd/pmc/pmc.c 
b/drivers/platform/x86/amd/pmc/pmc.c
index e6124498b195f..97387ddb281e1 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
         struct smu_metrics table;
         int rc;

-       /* CZN: Ensure that future s0i3 entry attempts at least 10ms 
passed */
-       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, 
&table) &&
-           table.s0i3_last_entry_status)
-               usleep_range(10000, 20000);
+       if (!get_metrics_table(pdev, &table) && 
table.s0i3_last_entry_status) {
+               switch (pdev->cpu_id) {
+               /* CZN: Ensure that future s0i3 entry attempts at least 
10ms passed */
+               case AMD_CPU_ID_CZN:
+                       usleep_range(10000, 20000);
+                       break;
+               /* PHX/HPT: Ensure enough time to avoid VR OVP */
+               case AMD_CPU_ID_PS:
+                       msleep(2500);
+                       break;
+               default:
+                       break;
+               }
+       }

This stops all the failures, but it also has an impact that any time the 
EC SCI is raised for any reason (like plug in power adapter) the system 
will take 2.5s to go back into s0i3.

Digging further - the intended behavior by the EC and BIOS was to wake 
the system when the dock is connected.

That is the reason this happens is because the EC SCI is raised when the 
dock is connected, but the Notify() the EC sent wasn't received by any 
driver.  I've got a patch I'll be sending out soon that adds support for 
the correct driver to wake up on this event.

This prevents the case of the OVP and now we don't *need* to penalize 
everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I 
find out there are other ways to trip the problem I still have that 
option though.

Antheas Kapenekakis April 1, 2025, 6:39 p.m. UTC | #17

On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> >>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>
> >>>>>> Here are tags for linking to your patch development to be picked up.
> >>>>>>
> >>>>>> Link:
> >>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>
> >>>>
> >>>> I don't believe that b4 will pick these up, so I will send out a v2 with
> >>>> them and mark this patch as superceded in patchwork so that Rafael
> >>>> doesn't have to pull everything out of this thread manually.
> >>
> >> FTR I don't have permission on patchwork for linux-acpi.
> >>
> >> I sent out v2 though.
> >>
> >>>>
> >>>>>
> >>>>> And to avoid having this conversation again, there is another Legion
> >>>>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>>>> on the manhunt for the real cause of this one. But it will probably be
> >>>>> needed and you will find that as you get TDP controls going. So if you
> >>>>> want me to prepare that in a timely manner, because that one actually
> >>>>> needs rewriting to be posted, now is the time to say so.
> >>>>
> >>>> Can you please propose what you have in mind on the mailing lists to
> >>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >>>> quirk infrastructure and so we need to make sure it is the right solution.
> >>>>
> >>>> Derek is working on CPU coefficient tuning in a completely separate
> >>>> driver.  If there are issues with that, I would generally prefer the
> >>>> fixes to be in that driver.
> >>>
> >>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> >>> try to make sure the quirk can be potentially added there, or in any
> >>> driver*.
> >>
> >> Yes things like fPPT, sPPT, STAPM, STT limits.
> >>
> >>>
> >>> The idea is to rewrite the patch series to just add a simple delay
> >>> field on the s2idle quirk struct. Then the biggest delay wins and gets
> >>> placed in ->begin. We have been using that series for ~6 months now,
> >>> and it turns out that having a delay system for every call is quite
> >>> pointless. But there are also situations where you might have a device
> >>> such as the Z13 Folio which looks like a USB device but listens to
> >>> s2idle notifications through ACPI, so the hid subsystem might need to
> >>> be able to inject a small delay there.
> >>
> >> So the "general" problem with injecting delays is they are typically not
> >> scalable as they're usually empirically measured and there is no
> >> handshake with the firmware.
> >>
> >> Say for example the EC has some hardcoded value of 200ms to wait for
> >> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> >> if you put 175ms it might work sometimes.  You get some reports of this,
> >> so you extend it to 200ms.  Great it works 100% of the time because the
> >> old hardcoded value in the EC was 200ms.
> >>
> >> Now say a new EC firmware comes out that for $REASONS changes it to
> >> 250ms.  Your old empirically measured value stops working, spend a bunch
> >> of cycles debugging it, measure the new one.  You change it to 250ms,
> >> but people with the old one have a problem now because the timing changed.
> >>
> >> So now you have to add infrastructure to say what version of the
> >> firmware gets what delay.
> >>
> >> Then you find out there is another SKU of that model which needs a
> >> different delay, so your complexity has ballooned.
> >>
> >> What if all these "delays" were FW timeouts from failing to service an
> >> interrupt?  Or what if they were a flow problem like the device expected
> >> you to issue a STOP command before a RESET command?
> >>
> >> So we need to be /incredibly careful/ with delays and 100% they are the
> >> right answer to a problem.
> >
> > I do get your points. In this case though we sideskirt through a lot
> > of the points because of where the delay is placed.
> >
> > If the instrumentation is in-place, this delay happens before sleep
> > after the screen of the device has turned off (due to early DPMS), the
> > keyboard backlight has turned off (DIsplay off call), and the suspend
> > light pulses (Sleep Entry). So it does not affect device behavior and
> > you can be quite liberal. The user has left the device alone.
> >
> > If the device needs e.g., 250ms you will not put 250ms, you will put
> > 500ms. Still unsure, you bump it to 750ms. Also, even if the
> > manufacturer comes up with a new firmware that fixes this issue, you
> > can keep the delay for the life of the product, because keeping it
> > does not affect device behavior, and writing kernel patches takes time.
> >
> > This is how I think about it, at least. A universal delay might be
> > needed eventually. But for now, limiting the scope to some devices and
> > seeing how that goes should be enough.
> >
> > Antheas
>
> My take is that "universal" delays are never popular.  IE hardware that
> "previously" worked perfectly is now slower.  So if there /must/ be a
> delay it should be as narrow as possible and justified.
>
> Let me give you an example of another case I'm *actively considering* a
> delay.
>
> I have an OEM's system that if you enter and exit s0i3 too quickly you
> can trigger the over voltage protection (OVP) feature of the VR module.
> When OVP is tripped the system is forced off immediately. This *only
> happens* on the VR module in that vendor's systems. "Normal" Linux
> userspace suspend/resume can't trip it.  But connecting a dock "does"
> trip it.
>
> If you look on a scope you can see SLP_S3# pin is toggling faster than
> spec says it should.  Naïvely you would say well the easy solution is to
> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
> that does just that.
>
> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
> b/drivers/platform/x86/amd/pmc/pmc.c
> index e6124498b195f..97387ddb281e1 100644
> --- a/drivers/platform/x86/amd/pmc/pmc.c
> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
>          struct smu_metrics table;
>          int rc;
>
> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
> passed */
> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
> &table) &&
> -           table.s0i3_last_entry_status)
> -               usleep_range(10000, 20000);
> +       if (!get_metrics_table(pdev, &table) &&
> table.s0i3_last_entry_status) {
> +               switch (pdev->cpu_id) {
> +               /* CZN: Ensure that future s0i3 entry attempts at least
> 10ms passed */
> +               case AMD_CPU_ID_CZN:
> +                       usleep_range(10000, 20000);
> +                       break;
> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
> +               case AMD_CPU_ID_PS:
> +                       msleep(2500);
> +                       break;
> +               default:
> +                       break;
> +               }
> +       }
>
> This stops all the failures, but it also has an impact that any time the
> EC SCI is raised for any reason (like plug in power adapter) the system
> will take 2.5s to go back into s0i3.
>
> Digging further - the intended behavior by the EC and BIOS was to wake
> the system when the dock is connected.
>
> That is the reason this happens is because the EC SCI is raised when the
> dock is connected, but the Notify() the EC sent wasn't received by any
> driver.  I've got a patch I'll be sending out soon that adds support for
> the correct driver to wake up on this event.
>
> This prevents the case of the OVP and now we don't *need* to penalize
> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
> find out there are other ways to trip the problem I still have that
> option though.

So you are talking about missing the AC/DC burst feature of Windows
here right? I do agree with you that yeah for most devices it is not
necessary.

But Microsoft guarantees 5 seconds. We already have the original Ally
unit which gets stuck in prochot due to this so it would be nice to
fix. For the Ally X I am unsure what Asus did but it stays awake for a
nice three seconds after you plug/unplug the charger so it has no
issues.

So if devices keep getting issues like we will have to eat it and do
AC/DC bursts with all of them.

And it is the same with entering s0i3 too fast. Some devices just need
a tiny amount of time to do whatever it is their manufacturer
programmed them to do after the Modern Standby notifications. For
handhelds, it is to turn off the controllers because XInput. Asus put
the fade animation so that takes 300ms and if you do it earlier the
controller gets cut before it saves its state and starts to do weird
RGB stuff. Other manufacturers typically do not malfunction but they
still use the notification.

Only MSI does not, but that controller is quirky before/after sleep
and they released a firmware update today saying something about
controller S3/S4 improvements so they probably do that too now, I need
to check.

For the Go S, it sets itself to 5W after sleep entry and turns off the
fan. A little delay went a long way in fixing the hang there, which I
suspect is due to aggressive tuning. But I do not know if you guys get
that. We did when we did the initial testing for it and carry the
delay now so I cannot tell you either way. So you should max out the
TDP, run stress -c 16, and make the device sleep 100-200 times to make
sure that is not an issue.

I do have a plan for trying to rework AC/DC bursts, but first the
s2idle ordering needs to be fixed and I need to rewrite the series for
that. The series we have for that works _fine_ so it is not a priority
to rework but it is not upstreamable in its current state so if you
need that (for the Go S) I need to know now.

For ACDC my idea would be after the reordering is done to have a quirk
that makes the kernel resume, fire the sleep exit notification, loop
for 5 (maybe 3?) seconds inside the device suspend section prior to
userspace resume, and then as long as a wakeup did not arrive restart
the suspend sequence to sleep again. I would also combine that with
the little s2idle wakeup device you made, so that userspace can enable
wakeups for that if it wants to do resume on dock connection. But that
has a lot of moving parts, including moving the DPMS action to happen
even earlier than your patch does and making sure display on/off does
not fire so that the keyboard backlight does not do weird stuff.

Antheas

Mario Limonciello April 1, 2025, 8:54 p.m. UTC | #18

On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
>>>>>>
>>>>>>>> Here are tags for linking to your patch development to be picked up.
>>>>>>>>
>>>>>>>> Link:
>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>>
>>>>>>
>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
>>>>>> them and mark this patch as superceded in patchwork so that Rafael
>>>>>> doesn't have to pull everything out of this thread manually.
>>>>
>>>> FTR I don't have permission on patchwork for linux-acpi.
>>>>
>>>> I sent out v2 though.
>>>>
>>>>>>
>>>>>>>
>>>>>>> And to avoid having this conversation again, there is another Legion
>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
>>>>>>> on the manhunt for the real cause of this one. But it will probably be
>>>>>>> needed and you will find that as you get TDP controls going. So if you
>>>>>>> want me to prepare that in a timely manner, because that one actually
>>>>>>> needs rewriting to be posted, now is the time to say so.
>>>>>>
>>>>>> Can you please propose what you have in mind on the mailing lists to
>>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
>>>>>> quirk infrastructure and so we need to make sure it is the right solution.
>>>>>>
>>>>>> Derek is working on CPU coefficient tuning in a completely separate
>>>>>> driver.  If there are issues with that, I would generally prefer the
>>>>>> fixes to be in that driver.
>>>>>
>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
>>>>> try to make sure the quirk can be potentially added there, or in any
>>>>> driver*.
>>>>
>>>> Yes things like fPPT, sPPT, STAPM, STT limits.
>>>>
>>>>>
>>>>> The idea is to rewrite the patch series to just add a simple delay
>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
>>>>> placed in ->begin. We have been using that series for ~6 months now,
>>>>> and it turns out that having a delay system for every call is quite
>>>>> pointless. But there are also situations where you might have a device
>>>>> such as the Z13 Folio which looks like a USB device but listens to
>>>>> s2idle notifications through ACPI, so the hid subsystem might need to
>>>>> be able to inject a small delay there.
>>>>
>>>> So the "general" problem with injecting delays is they are typically not
>>>> scalable as they're usually empirically measured and there is no
>>>> handshake with the firmware.
>>>>
>>>> Say for example the EC has some hardcoded value of 200ms to wait for
>>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
>>>> if you put 175ms it might work sometimes.  You get some reports of this,
>>>> so you extend it to 200ms.  Great it works 100% of the time because the
>>>> old hardcoded value in the EC was 200ms.
>>>>
>>>> Now say a new EC firmware comes out that for $REASONS changes it to
>>>> 250ms.  Your old empirically measured value stops working, spend a bunch
>>>> of cycles debugging it, measure the new one.  You change it to 250ms,
>>>> but people with the old one have a problem now because the timing changed.
>>>>
>>>> So now you have to add infrastructure to say what version of the
>>>> firmware gets what delay.
>>>>
>>>> Then you find out there is another SKU of that model which needs a
>>>> different delay, so your complexity has ballooned.
>>>>
>>>> What if all these "delays" were FW timeouts from failing to service an
>>>> interrupt?  Or what if they were a flow problem like the device expected
>>>> you to issue a STOP command before a RESET command?
>>>>
>>>> So we need to be /incredibly careful/ with delays and 100% they are the
>>>> right answer to a problem.
>>>
>>> I do get your points. In this case though we sideskirt through a lot
>>> of the points because of where the delay is placed.
>>>
>>> If the instrumentation is in-place, this delay happens before sleep
>>> after the screen of the device has turned off (due to early DPMS), the
>>> keyboard backlight has turned off (DIsplay off call), and the suspend
>>> light pulses (Sleep Entry). So it does not affect device behavior and
>>> you can be quite liberal. The user has left the device alone.
>>>
>>> If the device needs e.g., 250ms you will not put 250ms, you will put
>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
>>> manufacturer comes up with a new firmware that fixes this issue, you
>>> can keep the delay for the life of the product, because keeping it
>>> does not affect device behavior, and writing kernel patches takes time.
>>>
>>> This is how I think about it, at least. A universal delay might be
>>> needed eventually. But for now, limiting the scope to some devices and
>>> seeing how that goes should be enough.
>>>
>>> Antheas
>>
>> My take is that "universal" delays are never popular.  IE hardware that
>> "previously" worked perfectly is now slower.  So if there /must/ be a
>> delay it should be as narrow as possible and justified.
>>
>> Let me give you an example of another case I'm *actively considering* a
>> delay.
>>
>> I have an OEM's system that if you enter and exit s0i3 too quickly you
>> can trigger the over voltage protection (OVP) feature of the VR module.
>> When OVP is tripped the system is forced off immediately. This *only
>> happens* on the VR module in that vendor's systems. "Normal" Linux
>> userspace suspend/resume can't trip it.  But connecting a dock "does"
>> trip it.
>>
>> If you look on a scope you can see SLP_S3# pin is toggling faster than
>> spec says it should.  Naïvely you would say well the easy solution is to
>> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
>> that does just that.
>>
>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
>> b/drivers/platform/x86/amd/pmc/pmc.c
>> index e6124498b195f..97387ddb281e1 100644
>> --- a/drivers/platform/x86/amd/pmc/pmc.c
>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
>>           struct smu_metrics table;
>>           int rc;
>>
>> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
>> passed */
>> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
>> &table) &&
>> -           table.s0i3_last_entry_status)
>> -               usleep_range(10000, 20000);
>> +       if (!get_metrics_table(pdev, &table) &&
>> table.s0i3_last_entry_status) {
>> +               switch (pdev->cpu_id) {
>> +               /* CZN: Ensure that future s0i3 entry attempts at least
>> 10ms passed */
>> +               case AMD_CPU_ID_CZN:
>> +                       usleep_range(10000, 20000);
>> +                       break;
>> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
>> +               case AMD_CPU_ID_PS:
>> +                       msleep(2500);
>> +                       break;
>> +               default:
>> +                       break;
>> +               }
>> +       }
>>
>> This stops all the failures, but it also has an impact that any time the
>> EC SCI is raised for any reason (like plug in power adapter) the system
>> will take 2.5s to go back into s0i3.
>>
>> Digging further - the intended behavior by the EC and BIOS was to wake
>> the system when the dock is connected.
>>
>> That is the reason this happens is because the EC SCI is raised when the
>> dock is connected, but the Notify() the EC sent wasn't received by any
>> driver.  I've got a patch I'll be sending out soon that adds support for
>> the correct driver to wake up on this event.
>>
>> This prevents the case of the OVP and now we don't *need* to penalize
>> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
>> find out there are other ways to trip the problem I still have that
>> option though.
> 
> So you are talking about missing the AC/DC burst feature of Windows
> here right? I do agree with you that yeah for most devices it is not
> necessary.

No; I wasn't talking about that, my point was that timing delays are a 
tempting to solution to a problem, but they're very often papering over 
something else and a hint to dive deeper.

> 
> But Microsoft guarantees 5 seconds. We already have the original Ally
> unit which gets stuck in prochot due to this so it would be nice to
> fix. For the Ally X I am unsure what Asus did but it stays awake for a
> nice three seconds after you plug/unplug the charger so it has no
> issues.
> 
> So if devices keep getting issues like we will have to eat it and do
> AC/DC bursts with all of them.
> 
> And it is the same with entering s0i3 too fast. Some devices just need
> a tiny amount of time to do whatever it is their manufacturer
> programmed them to do after the Modern Standby notifications. For
> handhelds, it is to turn off the controllers because XInput. Asus put
> the fade animation so that takes 300ms and if you do it earlier the
> controller gets cut before it saves its state and starts to do weird
> RGB stuff. Other manufacturers typically do not malfunction but they
> still use the notification.
> 
> Only MSI does not, but that controller is quirky before/after sleep
> and they released a firmware update today saying something about
> controller S3/S4 improvements so they probably do that too now, I need
> to check.
> 
> For the Go S, it sets itself to 5W after sleep entry and turns off the
> fan. A little delay went a long way in fixing the hang there, which I
> suspect is due to aggressive tuning. But I do not know if you guys get
> that. We did when we did the initial testing for it and carry the
> delay now so I cannot tell you either way. So you should max out the
> TDP, run stress -c 16, and make the device sleep 100-200 times to make
> sure that is not an issue.
> 
> I do have a plan for trying to rework AC/DC bursts, but first the
> s2idle ordering needs to be fixed and I need to rewrite the series for
> that. The series we have for that works _fine_ so it is not a priority
> to rework but it is not upstreamable in its current state so if you
> need that (for the Go S) I need to know now.
> 
> For ACDC my idea would be after the reordering is done to have a quirk
> that makes the kernel resume, fire the sleep exit notification, loop
> for 5 (maybe 3?) seconds inside the device suspend section prior to
> userspace resume, and then as long as a wakeup did not arrive restart
> the suspend sequence to sleep again. I would also combine that with
> the little s2idle wakeup device you made, so that userspace can enable
> wakeups for that if it wants to do resume on dock connection. But that
> has a lot of moving parts, including moving the DPMS action to happen
> even earlier than your patch does and making sure display on/off does
> not fire so that the keyboard backlight does not do weird stuff.
> 
> Antheas

I think a good start for what you're talking about would be to rebase 
your series that reworked s2idle flow on 6.15 code (maybe it's a clean 
rebase, idk) and then if/when all of us on LKML are happy with it we can 
layer other concepts on top of that.

Antheas Kapenekakis April 1, 2025, 10:06 p.m. UTC | #19

On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> >>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>
> >>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> >>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>>>
> >>>>>>>> Here are tags for linking to your patch development to be picked up.
> >>>>>>>>
> >>>>>>>> Link:
> >>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>>
> >>>>>>
> >>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
> >>>>>> them and mark this patch as superceded in patchwork so that Rafael
> >>>>>> doesn't have to pull everything out of this thread manually.
> >>>>
> >>>> FTR I don't have permission on patchwork for linux-acpi.
> >>>>
> >>>> I sent out v2 though.
> >>>>
> >>>>>>
> >>>>>>>
> >>>>>>> And to avoid having this conversation again, there is another Legion
> >>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>>>>>> on the manhunt for the real cause of this one. But it will probably be
> >>>>>>> needed and you will find that as you get TDP controls going. So if you
> >>>>>>> want me to prepare that in a timely manner, because that one actually
> >>>>>>> needs rewriting to be posted, now is the time to say so.
> >>>>>>
> >>>>>> Can you please propose what you have in mind on the mailing lists to
> >>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >>>>>> quirk infrastructure and so we need to make sure it is the right solution.
> >>>>>>
> >>>>>> Derek is working on CPU coefficient tuning in a completely separate
> >>>>>> driver.  If there are issues with that, I would generally prefer the
> >>>>>> fixes to be in that driver.
> >>>>>
> >>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> >>>>> try to make sure the quirk can be potentially added there, or in any
> >>>>> driver*.
> >>>>
> >>>> Yes things like fPPT, sPPT, STAPM, STT limits.
> >>>>
> >>>>>
> >>>>> The idea is to rewrite the patch series to just add a simple delay
> >>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
> >>>>> placed in ->begin. We have been using that series for ~6 months now,
> >>>>> and it turns out that having a delay system for every call is quite
> >>>>> pointless. But there are also situations where you might have a device
> >>>>> such as the Z13 Folio which looks like a USB device but listens to
> >>>>> s2idle notifications through ACPI, so the hid subsystem might need to
> >>>>> be able to inject a small delay there.
> >>>>
> >>>> So the "general" problem with injecting delays is they are typically not
> >>>> scalable as they're usually empirically measured and there is no
> >>>> handshake with the firmware.
> >>>>
> >>>> Say for example the EC has some hardcoded value of 200ms to wait for
> >>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> >>>> if you put 175ms it might work sometimes.  You get some reports of this,
> >>>> so you extend it to 200ms.  Great it works 100% of the time because the
> >>>> old hardcoded value in the EC was 200ms.
> >>>>
> >>>> Now say a new EC firmware comes out that for $REASONS changes it to
> >>>> 250ms.  Your old empirically measured value stops working, spend a bunch
> >>>> of cycles debugging it, measure the new one.  You change it to 250ms,
> >>>> but people with the old one have a problem now because the timing changed.
> >>>>
> >>>> So now you have to add infrastructure to say what version of the
> >>>> firmware gets what delay.
> >>>>
> >>>> Then you find out there is another SKU of that model which needs a
> >>>> different delay, so your complexity has ballooned.
> >>>>
> >>>> What if all these "delays" were FW timeouts from failing to service an
> >>>> interrupt?  Or what if they were a flow problem like the device expected
> >>>> you to issue a STOP command before a RESET command?
> >>>>
> >>>> So we need to be /incredibly careful/ with delays and 100% they are the
> >>>> right answer to a problem.
> >>>
> >>> I do get your points. In this case though we sideskirt through a lot
> >>> of the points because of where the delay is placed.
> >>>
> >>> If the instrumentation is in-place, this delay happens before sleep
> >>> after the screen of the device has turned off (due to early DPMS), the
> >>> keyboard backlight has turned off (DIsplay off call), and the suspend
> >>> light pulses (Sleep Entry). So it does not affect device behavior and
> >>> you can be quite liberal. The user has left the device alone.
> >>>
> >>> If the device needs e.g., 250ms you will not put 250ms, you will put
> >>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
> >>> manufacturer comes up with a new firmware that fixes this issue, you
> >>> can keep the delay for the life of the product, because keeping it
> >>> does not affect device behavior, and writing kernel patches takes time.
> >>>
> >>> This is how I think about it, at least. A universal delay might be
> >>> needed eventually. But for now, limiting the scope to some devices and
> >>> seeing how that goes should be enough.
> >>>
> >>> Antheas
> >>
> >> My take is that "universal" delays are never popular.  IE hardware that
> >> "previously" worked perfectly is now slower.  So if there /must/ be a
> >> delay it should be as narrow as possible and justified.
> >>
> >> Let me give you an example of another case I'm *actively considering* a
> >> delay.
> >>
> >> I have an OEM's system that if you enter and exit s0i3 too quickly you
> >> can trigger the over voltage protection (OVP) feature of the VR module.
> >> When OVP is tripped the system is forced off immediately. This *only
> >> happens* on the VR module in that vendor's systems. "Normal" Linux
> >> userspace suspend/resume can't trip it.  But connecting a dock "does"
> >> trip it.
> >>
> >> If you look on a scope you can see SLP_S3# pin is toggling faster than
> >> spec says it should.  Naïvely you would say well the easy solution is to
> >> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
> >> that does just that.
> >>
> >> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
> >> b/drivers/platform/x86/amd/pmc/pmc.c
> >> index e6124498b195f..97387ddb281e1 100644
> >> --- a/drivers/platform/x86/amd/pmc/pmc.c
> >> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> >> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
> >>           struct smu_metrics table;
> >>           int rc;
> >>
> >> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
> >> passed */
> >> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
> >> &table) &&
> >> -           table.s0i3_last_entry_status)
> >> -               usleep_range(10000, 20000);
> >> +       if (!get_metrics_table(pdev, &table) &&
> >> table.s0i3_last_entry_status) {
> >> +               switch (pdev->cpu_id) {
> >> +               /* CZN: Ensure that future s0i3 entry attempts at least
> >> 10ms passed */
> >> +               case AMD_CPU_ID_CZN:
> >> +                       usleep_range(10000, 20000);
> >> +                       break;
> >> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
> >> +               case AMD_CPU_ID_PS:
> >> +                       msleep(2500);
> >> +                       break;
> >> +               default:
> >> +                       break;
> >> +               }
> >> +       }
> >>
> >> This stops all the failures, but it also has an impact that any time the
> >> EC SCI is raised for any reason (like plug in power adapter) the system
> >> will take 2.5s to go back into s0i3.
> >>
> >> Digging further - the intended behavior by the EC and BIOS was to wake
> >> the system when the dock is connected.
> >>
> >> That is the reason this happens is because the EC SCI is raised when the
> >> dock is connected, but the Notify() the EC sent wasn't received by any
> >> driver.  I've got a patch I'll be sending out soon that adds support for
> >> the correct driver to wake up on this event.
> >>
> >> This prevents the case of the OVP and now we don't *need* to penalize
> >> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
> >> find out there are other ways to trip the problem I still have that
> >> option though.
> >
> > So you are talking about missing the AC/DC burst feature of Windows
> > here right? I do agree with you that yeah for most devices it is not
> > necessary.
>
> No; I wasn't talking about that, my point was that timing delays are a
> tempting to solution to a problem, but they're very often papering over
> something else and a hint to dive deeper.

What I gleaned from what you said is that X manufacturer has a problem
due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
delay.

The intended behavior of AC/DC bursts is to fully wake up the kernel
for 5 seconds, and then sleep again. In windows, if a power supply is
connected, userspace wakes up too, and then the Windows power manager
sleeps the system again if there is no user activity for 5 seconds.
However, this should not affect device drivers, so we may consider it
optional on the Linux side until DEs get support for it and enable it
themselves I would say.

So in effect, AC/DC bursts are Windows' solution to problems like the
one you faced.

I am not saying penalize everyone. If I do make a patch for AC/DC it
will be device specific. But after a point, if random devices start
getting issues and the quirk list starts to grow, it might become
inevitable to force it for all of them.

I do get what you are saying with delays though. We had to merge one
of the initial SOF delay patch variants for the Steam Deck which
prevents audio crashing on resume, and that was definitely a bandage.

> >
> > But Microsoft guarantees 5 seconds. We already have the original Ally
> > unit which gets stuck in prochot due to this so it would be nice to
> > fix. For the Ally X I am unsure what Asus did but it stays awake for a
> > nice three seconds after you plug/unplug the charger so it has no
> > issues.
> >
> > So if devices keep getting issues like we will have to eat it and do
> > AC/DC bursts with all of them.
> >
> > And it is the same with entering s0i3 too fast. Some devices just need
> > a tiny amount of time to do whatever it is their manufacturer
> > programmed them to do after the Modern Standby notifications. For
> > handhelds, it is to turn off the controllers because XInput. Asus put
> > the fade animation so that takes 300ms and if you do it earlier the
> > controller gets cut before it saves its state and starts to do weird
> > RGB stuff. Other manufacturers typically do not malfunction but they
> > still use the notification.
> >
> > Only MSI does not, but that controller is quirky before/after sleep
> > and they released a firmware update today saying something about
> > controller S3/S4 improvements so they probably do that too now, I need
> > to check.
> >
> > For the Go S, it sets itself to 5W after sleep entry and turns off the
> > fan. A little delay went a long way in fixing the hang there, which I
> > suspect is due to aggressive tuning. But I do not know if you guys get
> > that. We did when we did the initial testing for it and carry the
> > delay now so I cannot tell you either way. So you should max out the
> > TDP, run stress -c 16, and make the device sleep 100-200 times to make
> > sure that is not an issue.
> >
> > I do have a plan for trying to rework AC/DC bursts, but first the
> > s2idle ordering needs to be fixed and I need to rewrite the series for
> > that. The series we have for that works _fine_ so it is not a priority
> > to rework but it is not upstreamable in its current state so if you
> > need that (for the Go S) I need to know now.
> >
> > For ACDC my idea would be after the reordering is done to have a quirk
> > that makes the kernel resume, fire the sleep exit notification, loop
> > for 5 (maybe 3?) seconds inside the device suspend section prior to
> > userspace resume, and then as long as a wakeup did not arrive restart
> > the suspend sequence to sleep again. I would also combine that with
> > the little s2idle wakeup device you made, so that userspace can enable
> > wakeups for that if it wants to do resume on dock connection. But that
> > has a lot of moving parts, including moving the DPMS action to happen
> > even earlier than your patch does and making sure display on/off does
> > not fire so that the keyboard backlight does not do weird stuff.
> >
> > Antheas
>
> I think a good start for what you're talking about would be to rebase
> your series that reworked s2idle flow on 6.15 code (maybe it's a clean
> rebase, idk) and then if/when all of us on LKML are happy with it we can
> layer other concepts on top of that.

Yeah, I will try to do that. However, I have around 30 submitted
patches in the air right now, and we are about to add another 5-6 to
the list for the Claw. So it will probably be after a bunch of those
merge. For the interest of sustainability, if nothing else. So let's
put a dot on this and pick up the discussion again mid 6.16 in a month
or so.

Unless you need this series for the Go S, in which case I can try to
re-order stuff around. So, one of you should use the red light TDP
mode with an artificial load (or actual, such as a game) and see if
sleep works properly. Do that on battery.

I would do at least 100 suspends with this, as most users do 50-70
suspends per reboot. I think I did around 300 to validate the Go S
quirk.

Antheas

Mario Limonciello April 2, 2025, 7:19 p.m. UTC | #20

On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote:
> On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
>>> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
>>>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
>>>>>>
>>>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
>>>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
>>>>>>>>
>>>>>>>>>> Here are tags for linking to your patch development to be picked up.
>>>>>>>>>>
>>>>>>>>>> Link:
>>>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
>>>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
>>>>>>>> them and mark this patch as superceded in patchwork so that Rafael
>>>>>>>> doesn't have to pull everything out of this thread manually.
>>>>>>
>>>>>> FTR I don't have permission on patchwork for linux-acpi.
>>>>>>
>>>>>> I sent out v2 though.
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> And to avoid having this conversation again, there is another Legion
>>>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
>>>>>>>>> on the manhunt for the real cause of this one. But it will probably be
>>>>>>>>> needed and you will find that as you get TDP controls going. So if you
>>>>>>>>> want me to prepare that in a timely manner, because that one actually
>>>>>>>>> needs rewriting to be posted, now is the time to say so.
>>>>>>>>
>>>>>>>> Can you please propose what you have in mind on the mailing lists to
>>>>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
>>>>>>>> quirk infrastructure and so we need to make sure it is the right solution.
>>>>>>>>
>>>>>>>> Derek is working on CPU coefficient tuning in a completely separate
>>>>>>>> driver.  If there are issues with that, I would generally prefer the
>>>>>>>> fixes to be in that driver.
>>>>>>>
>>>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
>>>>>>> try to make sure the quirk can be potentially added there, or in any
>>>>>>> driver*.
>>>>>>
>>>>>> Yes things like fPPT, sPPT, STAPM, STT limits.
>>>>>>
>>>>>>>
>>>>>>> The idea is to rewrite the patch series to just add a simple delay
>>>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
>>>>>>> placed in ->begin. We have been using that series for ~6 months now,
>>>>>>> and it turns out that having a delay system for every call is quite
>>>>>>> pointless. But there are also situations where you might have a device
>>>>>>> such as the Z13 Folio which looks like a USB device but listens to
>>>>>>> s2idle notifications through ACPI, so the hid subsystem might need to
>>>>>>> be able to inject a small delay there.
>>>>>>
>>>>>> So the "general" problem with injecting delays is they are typically not
>>>>>> scalable as they're usually empirically measured and there is no
>>>>>> handshake with the firmware.
>>>>>>
>>>>>> Say for example the EC has some hardcoded value of 200ms to wait for
>>>>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
>>>>>> if you put 175ms it might work sometimes.  You get some reports of this,
>>>>>> so you extend it to 200ms.  Great it works 100% of the time because the
>>>>>> old hardcoded value in the EC was 200ms.
>>>>>>
>>>>>> Now say a new EC firmware comes out that for $REASONS changes it to
>>>>>> 250ms.  Your old empirically measured value stops working, spend a bunch
>>>>>> of cycles debugging it, measure the new one.  You change it to 250ms,
>>>>>> but people with the old one have a problem now because the timing changed.
>>>>>>
>>>>>> So now you have to add infrastructure to say what version of the
>>>>>> firmware gets what delay.
>>>>>>
>>>>>> Then you find out there is another SKU of that model which needs a
>>>>>> different delay, so your complexity has ballooned.
>>>>>>
>>>>>> What if all these "delays" were FW timeouts from failing to service an
>>>>>> interrupt?  Or what if they were a flow problem like the device expected
>>>>>> you to issue a STOP command before a RESET command?
>>>>>>
>>>>>> So we need to be /incredibly careful/ with delays and 100% they are the
>>>>>> right answer to a problem.
>>>>>
>>>>> I do get your points. In this case though we sideskirt through a lot
>>>>> of the points because of where the delay is placed.
>>>>>
>>>>> If the instrumentation is in-place, this delay happens before sleep
>>>>> after the screen of the device has turned off (due to early DPMS), the
>>>>> keyboard backlight has turned off (DIsplay off call), and the suspend
>>>>> light pulses (Sleep Entry). So it does not affect device behavior and
>>>>> you can be quite liberal. The user has left the device alone.
>>>>>
>>>>> If the device needs e.g., 250ms you will not put 250ms, you will put
>>>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
>>>>> manufacturer comes up with a new firmware that fixes this issue, you
>>>>> can keep the delay for the life of the product, because keeping it
>>>>> does not affect device behavior, and writing kernel patches takes time.
>>>>>
>>>>> This is how I think about it, at least. A universal delay might be
>>>>> needed eventually. But for now, limiting the scope to some devices and
>>>>> seeing how that goes should be enough.
>>>>>
>>>>> Antheas
>>>>
>>>> My take is that "universal" delays are never popular.  IE hardware that
>>>> "previously" worked perfectly is now slower.  So if there /must/ be a
>>>> delay it should be as narrow as possible and justified.
>>>>
>>>> Let me give you an example of another case I'm *actively considering* a
>>>> delay.
>>>>
>>>> I have an OEM's system that if you enter and exit s0i3 too quickly you
>>>> can trigger the over voltage protection (OVP) feature of the VR module.
>>>> When OVP is tripped the system is forced off immediately. This *only
>>>> happens* on the VR module in that vendor's systems. "Normal" Linux
>>>> userspace suspend/resume can't trip it.  But connecting a dock "does"
>>>> trip it.
>>>>
>>>> If you look on a scope you can see SLP_S3# pin is toggling faster than
>>>> spec says it should.  Naïvely you would say well the easy solution is to
>>>> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
>>>> that does just that.
>>>>
>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
>>>> b/drivers/platform/x86/amd/pmc/pmc.c
>>>> index e6124498b195f..97387ddb281e1 100644
>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c
>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
>>>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
>>>>            struct smu_metrics table;
>>>>            int rc;
>>>>
>>>> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
>>>> passed */
>>>> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
>>>> &table) &&
>>>> -           table.s0i3_last_entry_status)
>>>> -               usleep_range(10000, 20000);
>>>> +       if (!get_metrics_table(pdev, &table) &&
>>>> table.s0i3_last_entry_status) {
>>>> +               switch (pdev->cpu_id) {
>>>> +               /* CZN: Ensure that future s0i3 entry attempts at least
>>>> 10ms passed */
>>>> +               case AMD_CPU_ID_CZN:
>>>> +                       usleep_range(10000, 20000);
>>>> +                       break;
>>>> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
>>>> +               case AMD_CPU_ID_PS:
>>>> +                       msleep(2500);
>>>> +                       break;
>>>> +               default:
>>>> +                       break;
>>>> +               }
>>>> +       }
>>>>
>>>> This stops all the failures, but it also has an impact that any time the
>>>> EC SCI is raised for any reason (like plug in power adapter) the system
>>>> will take 2.5s to go back into s0i3.
>>>>
>>>> Digging further - the intended behavior by the EC and BIOS was to wake
>>>> the system when the dock is connected.
>>>>
>>>> That is the reason this happens is because the EC SCI is raised when the
>>>> dock is connected, but the Notify() the EC sent wasn't received by any
>>>> driver.  I've got a patch I'll be sending out soon that adds support for
>>>> the correct driver to wake up on this event.
>>>>
>>>> This prevents the case of the OVP and now we don't *need* to penalize
>>>> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
>>>> find out there are other ways to trip the problem I still have that
>>>> option though.
>>>
>>> So you are talking about missing the AC/DC burst feature of Windows
>>> here right? I do agree with you that yeah for most devices it is not
>>> necessary.
>>
>> No; I wasn't talking about that, my point was that timing delays are a
>> tempting to solution to a problem, but they're very often papering over
>> something else and a hint to dive deeper.
> 
> What I gleaned from what you said is that X manufacturer has a problem
> due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
> delay.
> 
> The intended behavior of AC/DC bursts is to fully wake up the kernel
> for 5 seconds, and then sleep again. In windows, if a power supply is
> connected, userspace wakes up too, and then the Windows power manager
> sleeps the system again if there is no user activity for 5 seconds.
> However, this should not affect device drivers, so we may consider it
> optional on the Linux side until DEs get support for it and enable it
> themselves I would say.
> 
> So in effect, AC/DC bursts are Windows' solution to problems like the
> one you faced.
> 
> I am not saying penalize everyone. If I do make a patch for AC/DC it
> will be device specific. But after a point, if random devices start
> getting issues and the quirk list starts to grow, it might become
> inevitable to force it for all of them.
> 
> I do get what you are saying with delays though. We had to merge one
> of the initial SOF delay patch variants for the Steam Deck which
> prevents audio crashing on resume, and that was definitely a bandage.
> 

Maybe I'm failing at my search-engine-foo, could you point me at some 
docs about this AC/DC burst stuff?

[
And FTR unfortunately it's seeming that my proposal for the alternate 
wake source has negative side effects to other machines, so I'm going 
back to exploring a timing based quirk tied to SMBIOS data again too :( ]

>>>
>>> But Microsoft guarantees 5 seconds. We already have the original Ally
>>> unit which gets stuck in prochot due to this so it would be nice to
>>> fix. For the Ally X I am unsure what Asus did but it stays awake for a
>>> nice three seconds after you plug/unplug the charger so it has no
>>> issues.
>>>
>>> So if devices keep getting issues like we will have to eat it and do
>>> AC/DC bursts with all of them.
>>>
>>> And it is the same with entering s0i3 too fast. Some devices just need
>>> a tiny amount of time to do whatever it is their manufacturer
>>> programmed them to do after the Modern Standby notifications. For
>>> handhelds, it is to turn off the controllers because XInput. Asus put
>>> the fade animation so that takes 300ms and if you do it earlier the
>>> controller gets cut before it saves its state and starts to do weird
>>> RGB stuff. Other manufacturers typically do not malfunction but they
>>> still use the notification.
>>>
>>> Only MSI does not, but that controller is quirky before/after sleep
>>> and they released a firmware update today saying something about
>>> controller S3/S4 improvements so they probably do that too now, I need
>>> to check.
>>>
>>> For the Go S, it sets itself to 5W after sleep entry and turns off the
>>> fan. A little delay went a long way in fixing the hang there, which I
>>> suspect is due to aggressive tuning. But I do not know if you guys get
>>> that. We did when we did the initial testing for it and carry the
>>> delay now so I cannot tell you either way. So you should max out the
>>> TDP, run stress -c 16, and make the device sleep 100-200 times to make
>>> sure that is not an issue.
>>>
>>> I do have a plan for trying to rework AC/DC bursts, but first the
>>> s2idle ordering needs to be fixed and I need to rewrite the series for
>>> that. The series we have for that works _fine_ so it is not a priority
>>> to rework but it is not upstreamable in its current state so if you
>>> need that (for the Go S) I need to know now.
>>>
>>> For ACDC my idea would be after the reordering is done to have a quirk
>>> that makes the kernel resume, fire the sleep exit notification, loop
>>> for 5 (maybe 3?) seconds inside the device suspend section prior to
>>> userspace resume, and then as long as a wakeup did not arrive restart
>>> the suspend sequence to sleep again. I would also combine that with
>>> the little s2idle wakeup device you made, so that userspace can enable
>>> wakeups for that if it wants to do resume on dock connection. But that
>>> has a lot of moving parts, including moving the DPMS action to happen
>>> even earlier than your patch does and making sure display on/off does
>>> not fire so that the keyboard backlight does not do weird stuff.
>>>
>>> Antheas
>>
>> I think a good start for what you're talking about would be to rebase
>> your series that reworked s2idle flow on 6.15 code (maybe it's a clean
>> rebase, idk) and then if/when all of us on LKML are happy with it we can
>> layer other concepts on top of that.
> 
> Yeah, I will try to do that. However, I have around 30 submitted
> patches in the air right now, and we are about to add another 5-6 to
> the list for the Claw. So it will probably be after a bunch of those
> merge. For the interest of sustainability, if nothing else. So let's
> put a dot on this and pick up the discussion again mid 6.16 in a month
> or so.
> 
> Unless you need this series for the Go S, in which case I can try to
> re-order stuff around. So, one of you should use the red light TDP
> mode with an artificial load (or actual, such as a game) and see if
> sleep works properly. Do that on battery.

Take your time and get to it when you get to it.  I just want to make 
sure we build a clean foundation for big changes like you have in mind.

> 
> I would do at least 100 suspends with this, as most users do 50-70
> suspends per reboot. I think I did around 300 to validate the Go S
> quirk.
> 
> Antheas

Antheas Kapenekakis April 2, 2025, 8:37 p.m. UTC | #21

On Wed, 2 Apr 2025 at 21:19, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
> >>> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>
> >>>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> >>>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>>>
> >>>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> >>>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>>>>>
> >>>>>>>>>> Here are tags for linking to your patch development to be picked up.
> >>>>>>>>>>
> >>>>>>>>>> Link:
> >>>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
> >>>>>>>> them and mark this patch as superceded in patchwork so that Rafael
> >>>>>>>> doesn't have to pull everything out of this thread manually.
> >>>>>>
> >>>>>> FTR I don't have permission on patchwork for linux-acpi.
> >>>>>>
> >>>>>> I sent out v2 though.
> >>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And to avoid having this conversation again, there is another Legion
> >>>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>>>>>>>> on the manhunt for the real cause of this one. But it will probably be
> >>>>>>>>> needed and you will find that as you get TDP controls going. So if you
> >>>>>>>>> want me to prepare that in a timely manner, because that one actually
> >>>>>>>>> needs rewriting to be posted, now is the time to say so.
> >>>>>>>>
> >>>>>>>> Can you please propose what you have in mind on the mailing lists to
> >>>>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >>>>>>>> quirk infrastructure and so we need to make sure it is the right solution.
> >>>>>>>>
> >>>>>>>> Derek is working on CPU coefficient tuning in a completely separate
> >>>>>>>> driver.  If there are issues with that, I would generally prefer the
> >>>>>>>> fixes to be in that driver.
> >>>>>>>
> >>>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> >>>>>>> try to make sure the quirk can be potentially added there, or in any
> >>>>>>> driver*.
> >>>>>>
> >>>>>> Yes things like fPPT, sPPT, STAPM, STT limits.
> >>>>>>
> >>>>>>>
> >>>>>>> The idea is to rewrite the patch series to just add a simple delay
> >>>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
> >>>>>>> placed in ->begin. We have been using that series for ~6 months now,
> >>>>>>> and it turns out that having a delay system for every call is quite
> >>>>>>> pointless. But there are also situations where you might have a device
> >>>>>>> such as the Z13 Folio which looks like a USB device but listens to
> >>>>>>> s2idle notifications through ACPI, so the hid subsystem might need to
> >>>>>>> be able to inject a small delay there.
> >>>>>>
> >>>>>> So the "general" problem with injecting delays is they are typically not
> >>>>>> scalable as they're usually empirically measured and there is no
> >>>>>> handshake with the firmware.
> >>>>>>
> >>>>>> Say for example the EC has some hardcoded value of 200ms to wait for
> >>>>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> >>>>>> if you put 175ms it might work sometimes.  You get some reports of this,
> >>>>>> so you extend it to 200ms.  Great it works 100% of the time because the
> >>>>>> old hardcoded value in the EC was 200ms.
> >>>>>>
> >>>>>> Now say a new EC firmware comes out that for $REASONS changes it to
> >>>>>> 250ms.  Your old empirically measured value stops working, spend a bunch
> >>>>>> of cycles debugging it, measure the new one.  You change it to 250ms,
> >>>>>> but people with the old one have a problem now because the timing changed.
> >>>>>>
> >>>>>> So now you have to add infrastructure to say what version of the
> >>>>>> firmware gets what delay.
> >>>>>>
> >>>>>> Then you find out there is another SKU of that model which needs a
> >>>>>> different delay, so your complexity has ballooned.
> >>>>>>
> >>>>>> What if all these "delays" were FW timeouts from failing to service an
> >>>>>> interrupt?  Or what if they were a flow problem like the device expected
> >>>>>> you to issue a STOP command before a RESET command?
> >>>>>>
> >>>>>> So we need to be /incredibly careful/ with delays and 100% they are the
> >>>>>> right answer to a problem.
> >>>>>
> >>>>> I do get your points. In this case though we sideskirt through a lot
> >>>>> of the points because of where the delay is placed.
> >>>>>
> >>>>> If the instrumentation is in-place, this delay happens before sleep
> >>>>> after the screen of the device has turned off (due to early DPMS), the
> >>>>> keyboard backlight has turned off (DIsplay off call), and the suspend
> >>>>> light pulses (Sleep Entry). So it does not affect device behavior and
> >>>>> you can be quite liberal. The user has left the device alone.
> >>>>>
> >>>>> If the device needs e.g., 250ms you will not put 250ms, you will put
> >>>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
> >>>>> manufacturer comes up with a new firmware that fixes this issue, you
> >>>>> can keep the delay for the life of the product, because keeping it
> >>>>> does not affect device behavior, and writing kernel patches takes time.
> >>>>>
> >>>>> This is how I think about it, at least. A universal delay might be
> >>>>> needed eventually. But for now, limiting the scope to some devices and
> >>>>> seeing how that goes should be enough.
> >>>>>
> >>>>> Antheas
> >>>>
> >>>> My take is that "universal" delays are never popular.  IE hardware that
> >>>> "previously" worked perfectly is now slower.  So if there /must/ be a
> >>>> delay it should be as narrow as possible and justified.
> >>>>
> >>>> Let me give you an example of another case I'm *actively considering* a
> >>>> delay.
> >>>>
> >>>> I have an OEM's system that if you enter and exit s0i3 too quickly you
> >>>> can trigger the over voltage protection (OVP) feature of the VR module.
> >>>> When OVP is tripped the system is forced off immediately. This *only
> >>>> happens* on the VR module in that vendor's systems. "Normal" Linux
> >>>> userspace suspend/resume can't trip it.  But connecting a dock "does"
> >>>> trip it.
> >>>>
> >>>> If you look on a scope you can see SLP_S3# pin is toggling faster than
> >>>> spec says it should.  Naïvely you would say well the easy solution is to
> >>>> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
> >>>> that does just that.
> >>>>
> >>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
> >>>> b/drivers/platform/x86/amd/pmc/pmc.c
> >>>> index e6124498b195f..97387ddb281e1 100644
> >>>> --- a/drivers/platform/x86/amd/pmc/pmc.c
> >>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> >>>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
> >>>>            struct smu_metrics table;
> >>>>            int rc;
> >>>>
> >>>> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
> >>>> passed */
> >>>> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
> >>>> &table) &&
> >>>> -           table.s0i3_last_entry_status)
> >>>> -               usleep_range(10000, 20000);
> >>>> +       if (!get_metrics_table(pdev, &table) &&
> >>>> table.s0i3_last_entry_status) {
> >>>> +               switch (pdev->cpu_id) {
> >>>> +               /* CZN: Ensure that future s0i3 entry attempts at least
> >>>> 10ms passed */
> >>>> +               case AMD_CPU_ID_CZN:
> >>>> +                       usleep_range(10000, 20000);
> >>>> +                       break;
> >>>> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
> >>>> +               case AMD_CPU_ID_PS:
> >>>> +                       msleep(2500);
> >>>> +                       break;
> >>>> +               default:
> >>>> +                       break;
> >>>> +               }
> >>>> +       }
> >>>>
> >>>> This stops all the failures, but it also has an impact that any time the
> >>>> EC SCI is raised for any reason (like plug in power adapter) the system
> >>>> will take 2.5s to go back into s0i3.
> >>>>
> >>>> Digging further - the intended behavior by the EC and BIOS was to wake
> >>>> the system when the dock is connected.
> >>>>
> >>>> That is the reason this happens is because the EC SCI is raised when the
> >>>> dock is connected, but the Notify() the EC sent wasn't received by any
> >>>> driver.  I've got a patch I'll be sending out soon that adds support for
> >>>> the correct driver to wake up on this event.
> >>>>
> >>>> This prevents the case of the OVP and now we don't *need* to penalize
> >>>> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
> >>>> find out there are other ways to trip the problem I still have that
> >>>> option though.
> >>>
> >>> So you are talking about missing the AC/DC burst feature of Windows
> >>> here right? I do agree with you that yeah for most devices it is not
> >>> necessary.
> >>
> >> No; I wasn't talking about that, my point was that timing delays are a
> >> tempting to solution to a problem, but they're very often papering over
> >> something else and a hint to dive deeper.
> >
> > What I gleaned from what you said is that X manufacturer has a problem
> > due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
> > delay.
> >
> > The intended behavior of AC/DC bursts is to fully wake up the kernel
> > for 5 seconds, and then sleep again. In windows, if a power supply is
> > connected, userspace wakes up too, and then the Windows power manager
> > sleeps the system again if there is no user activity for 5 seconds.
> > However, this should not affect device drivers, so we may consider it
> > optional on the Linux side until DEs get support for it and enable it
> > themselves I would say.
> >
> > So in effect, AC/DC bursts are Windows' solution to problems like the
> > one you faced.
> >
> > I am not saying penalize everyone. If I do make a patch for AC/DC it
> > will be device specific. But after a point, if random devices start
> > getting issues and the quirk list starts to grow, it might become
> > inevitable to force it for all of them.
> >
> > I do get what you are saying with delays though. We had to merge one
> > of the initial SOF delay patch variants for the Steam Deck which
> > prevents audio crashing on resume, and that was definitely a bandage.
> >
>
> Maybe I'm failing at my search-engine-foo, could you point me at some
> docs about this AC/DC burst stuff?

AC/DC Burst/AC/DC Burst Suppresed are the events in Sleep Study
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy

You can see those when running a sleep study and unplugging a
connector. I think suppressed is unplugging

Then here is the description for plugging in a charger
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-1

> The Windows power manager will turn on the display when the battery subsystem has indicated
> AC power has been connected. The GPIO interrupt for power source changes must cause the
> ACPI _PSR method under the power supply device to be executed. The power subsystem must
> wake the SoC any time the power source changes, including when the system is attached or
> removed from a dock that has a battery or AC power source. After AC power is connected,
> the display will remain on for five seconds, unless there is input to the system during this five-second window.

And here for unplugging:
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-2

> The GPIO interrupt for power source changes must cause the ACPI _PSR method under
> the power supply device to be executed. The power subsystem must wake the SoC any time
> the power source changes, including when the system is attached or removed from a dock
> that has a battery or AC power source.

I guess from the description it is not clear that the device stays on
for 5 seconds when unplugging, but from empirical testing I want to
say it does. It has been a while. I left 3 devices like an hour ago on
Windows and none of them managed to sleep, so I cannot verify this at
the moment though.


Antheas

> [
> And FTR unfortunately it's seeming that my proposal for the alternate
> wake source has negative side effects to other machines, so I'm going
> back to exploring a timing based quirk tied to SMBIOS data again too :( ]
>
> >>>
> >>> But Microsoft guarantees 5 seconds. We already have the original Ally
> >>> unit which gets stuck in prochot due to this so it would be nice to
> >>> fix. For the Ally X I am unsure what Asus did but it stays awake for a
> >>> nice three seconds after you plug/unplug the charger so it has no
> >>> issues.
> >>>
> >>> So if devices keep getting issues like we will have to eat it and do
> >>> AC/DC bursts with all of them.
> >>>
> >>> And it is the same with entering s0i3 too fast. Some devices just need
> >>> a tiny amount of time to do whatever it is their manufacturer
> >>> programmed them to do after the Modern Standby notifications. For
> >>> handhelds, it is to turn off the controllers because XInput. Asus put
> >>> the fade animation so that takes 300ms and if you do it earlier the
> >>> controller gets cut before it saves its state and starts to do weird
> >>> RGB stuff. Other manufacturers typically do not malfunction but they
> >>> still use the notification.
> >>>
> >>> Only MSI does not, but that controller is quirky before/after sleep
> >>> and they released a firmware update today saying something about
> >>> controller S3/S4 improvements so they probably do that too now, I need
> >>> to check.
> >>>
> >>> For the Go S, it sets itself to 5W after sleep entry and turns off the
> >>> fan. A little delay went a long way in fixing the hang there, which I
> >>> suspect is due to aggressive tuning. But I do not know if you guys get
> >>> that. We did when we did the initial testing for it and carry the
> >>> delay now so I cannot tell you either way. So you should max out the
> >>> TDP, run stress -c 16, and make the device sleep 100-200 times to make
> >>> sure that is not an issue.
> >>>
> >>> I do have a plan for trying to rework AC/DC bursts, but first the
> >>> s2idle ordering needs to be fixed and I need to rewrite the series for
> >>> that. The series we have for that works _fine_ so it is not a priority
> >>> to rework but it is not upstreamable in its current state so if you
> >>> need that (for the Go S) I need to know now.
> >>>
> >>> For ACDC my idea would be after the reordering is done to have a quirk
> >>> that makes the kernel resume, fire the sleep exit notification, loop
> >>> for 5 (maybe 3?) seconds inside the device suspend section prior to
> >>> userspace resume, and then as long as a wakeup did not arrive restart
> >>> the suspend sequence to sleep again. I would also combine that with
> >>> the little s2idle wakeup device you made, so that userspace can enable
> >>> wakeups for that if it wants to do resume on dock connection. But that
> >>> has a lot of moving parts, including moving the DPMS action to happen
> >>> even earlier than your patch does and making sure display on/off does
> >>> not fire so that the keyboard backlight does not do weird stuff.
> >>>
> >>> Antheas
> >>
> >> I think a good start for what you're talking about would be to rebase
> >> your series that reworked s2idle flow on 6.15 code (maybe it's a clean
> >> rebase, idk) and then if/when all of us on LKML are happy with it we can
> >> layer other concepts on top of that.
> >
> > Yeah, I will try to do that. However, I have around 30 submitted
> > patches in the air right now, and we are about to add another 5-6 to
> > the list for the Claw. So it will probably be after a bunch of those
> > merge. For the interest of sustainability, if nothing else. So let's
> > put a dot on this and pick up the discussion again mid 6.16 in a month
> > or so.
> >
> > Unless you need this series for the Go S, in which case I can try to
> > re-order stuff around. So, one of you should use the red light TDP
> > mode with an artificial load (or actual, such as a game) and see if
> > sleep works properly. Do that on battery.
>
> Take your time and get to it when you get to it.  I just want to make
> sure we build a clean foundation for big changes like you have in mind.
>
> >
> > I would do at least 100 suspends with this, as most users do 50-70
> > suspends per reboot. I think I did around 300 to validate the Go S
> > quirk.
> >
> > Antheas
>

ACPI: EC: Set ec_no_wakeup for Lenovo Go S

Commit Message

Comments

Patch