Message ID | f634a05a-e3a9-93ab-4b87-d41f5ee083a5@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Some SSDT tables are not loading with kernel >= 5.12 | expand |
Hi, On 6/3/21 7:26 PM, Hans de Goede wrote: > Hi Rafael, > > I've been helping some users with trying to get to the bottom of some > new ACPI errors with kernel 5.12, see: > > https://bugzilla.kernel.org/show_bug.cgi?id=213023 > > After looking at dmesg output; and after editing the dmesg output > a bit so that I could do diff -u on it, the following stands out: > > --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 > +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 +0200 > @@ -92,7 +92,7 @@ > ACPI: IRQ9 used by override. > Using ACPI (MADT) for SMP configuration information > ACPI: HPET id: 0x8086a201 base: 0xfed00000 > -ACPI: Core revision 20200925 > +ACPI: Core revision 20210105 > PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 bytes) > PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] (7274496 bytes) > ACPI FADT declares the system doesn't support PCIe ASPM, so disable it > @@ -113,10 +113,6 @@ > ACPI: Dynamic OEM Table Load: > ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL 20160527) > ACPI: Dynamic OEM Table Load: > -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL 20160527) > -ACPI: Dynamic OEM Table Load: > -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL 20160527) > -ACPI: Dynamic OEM Table Load: > ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL 20160527) > ACPI: Dynamic OEM Table Load: > ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL 20160527) > > Note how for some reason the kernel is no longer loading the Cpu0Hwp and > HwpLvt SSDT-s ? > > Do you have any ideas what might be causing this ? Good news, a very similar bug is being tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1963717 And one of the reporters there has done a git bisect and has found the commit which is causing the problem for them: """ git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : "ACPI: Execute platform _OSC also with query bit clear". Tested 5.12.9 kernel with the commit reverted, and confirmed that the error messages are gone. (I had to revert 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc which is absent in the stable 5.12.x Hope this helps """ I've asked the reporters of: https://bugzilla.kernel.org/show_bug.cgi?id=213023 To also try reverting 719e1f561afbe020ed175825a9bd25ed62ed1697 and see if that helps (I expect it will, I believe the 2 bugs are the same issue). Either way we need to do something about this to fix this for the reporter of https://bugzilla.redhat.com/show_bug.cgi?id=1963717 Any ideas on how to fix this? Regards, Hans
HI, On 6/7/21 11:43 AM, Hans de Goede wrote: > Hi, > > On 6/3/21 7:26 PM, Hans de Goede wrote: >> Hi Rafael, >> >> I've been helping some users with trying to get to the bottom of some >> new ACPI errors with kernel 5.12, see: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=213023 >> >> After looking at dmesg output; and after editing the dmesg output >> a bit so that I could do diff -u on it, the following stands out: >> >> --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 >> +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 +0200 >> @@ -92,7 +92,7 @@ >> ACPI: IRQ9 used by override. >> Using ACPI (MADT) for SMP configuration information >> ACPI: HPET id: 0x8086a201 base: 0xfed00000 >> -ACPI: Core revision 20200925 >> +ACPI: Core revision 20210105 >> PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 bytes) >> PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] (7274496 bytes) >> ACPI FADT declares the system doesn't support PCIe ASPM, so disable it >> @@ -113,10 +113,6 @@ >> ACPI: Dynamic OEM Table Load: >> ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL 20160527) >> ACPI: Dynamic OEM Table Load: >> -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL 20160527) >> -ACPI: Dynamic OEM Table Load: >> -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL 20160527) >> -ACPI: Dynamic OEM Table Load: >> ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL 20160527) >> ACPI: Dynamic OEM Table Load: >> ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL 20160527) >> >> Note how for some reason the kernel is no longer loading the Cpu0Hwp and >> HwpLvt SSDT-s ? >> >> Do you have any ideas what might be causing this ? > > Good news, a very similar bug is being tracked here: > > https://bugzilla.redhat.com/show_bug.cgi?id=1963717 > > And one of the reporters there has done a git bisect and has found the commit which is causing the problem for them: > > """ > git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : > "ACPI: Execute platform _OSC also with query bit clear". > > Tested 5.12.9 kernel with the commit reverted, and confirmed that the error > messages are gone. (I had to revert > 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) > > It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc which is absent > in the stable 5.12.x > > Hope this helps > """ I've taken a quick look at commit 719e1f561afb ("ACPI: Execute platform _OSC also with query bit clear") and I think I may have found the problem. I've attached a patch which I think may fix this (and I've asked the reporters of the bugs to test this). Regards, Hans
On Mon, Jun 7, 2021 at 12:05 PM Hans de Goede <hdegoede@redhat.com> wrote: > > HI, > > On 6/7/21 11:43 AM, Hans de Goede wrote: > > Hi, > > > > On 6/3/21 7:26 PM, Hans de Goede wrote: > >> Hi Rafael, > >> > >> I've been helping some users with trying to get to the bottom of some > >> new ACPI errors with kernel 5.12, see: > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=213023 > >> > >> After looking at dmesg output; and after editing the dmesg output > >> a bit so that I could do diff -u on it, the following stands out: > >> > >> --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 > >> +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 +0200 > >> @@ -92,7 +92,7 @@ > >> ACPI: IRQ9 used by override. > >> Using ACPI (MADT) for SMP configuration information > >> ACPI: HPET id: 0x8086a201 base: 0xfed00000 > >> -ACPI: Core revision 20200925 > >> +ACPI: Core revision 20210105 > >> PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 bytes) > >> PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] (7274496 bytes) > >> ACPI FADT declares the system doesn't support PCIe ASPM, so disable it > >> @@ -113,10 +113,6 @@ > >> ACPI: Dynamic OEM Table Load: > >> ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL 20160527) > >> ACPI: Dynamic OEM Table Load: > >> -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL 20160527) > >> -ACPI: Dynamic OEM Table Load: > >> -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL 20160527) > >> -ACPI: Dynamic OEM Table Load: > >> ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL 20160527) > >> ACPI: Dynamic OEM Table Load: > >> ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL 20160527) > >> > >> Note how for some reason the kernel is no longer loading the Cpu0Hwp and > >> HwpLvt SSDT-s ? > >> > >> Do you have any ideas what might be causing this ? > > > > Good news, a very similar bug is being tracked here: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1963717 > > > > And one of the reporters there has done a git bisect and has found the commit which is causing the problem for them: > > > > """ > > git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : > > "ACPI: Execute platform _OSC also with query bit clear". > > > > Tested 5.12.9 kernel with the commit reverted, and confirmed that the error > > messages are gone. (I had to revert > > 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) > > > > It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc which is absent > > in the stable 5.12.x > > > > Hope this helps > > """ > > I've taken a quick look at commit 719e1f561afb ("ACPI: Execute platform _OSC also with query bit clear") and I think I may have found the problem. > > I've attached a patch which I think may fix this (and I've asked the reporters of the bugs to test this). Thank you, the patch looks reasonable to me. It looks like commit 719e1f561afb went a bit too far.
Hi, Tried now on ADL-P and TGL systems and the _OSC still works properly. Thanks Hans for fixing! Feel free to add my Tested-by. On Mon, Jun 07, 2021 at 01:01:59PM +0000, Mario Limonciello wrote: > Mika, > > Can you have a try and make sure this modification still works properly > on the series in the hardware we originally did it for? > __________________________________________________________________ > > From: Rafael J. Wysocki <rafael@kernel.org> > Sent: Monday, June 7, 2021 6:13:21 AM > To: Hans de Goede <hdegoede@redhat.com> > Cc: Rafael J . Wysocki <rjw@rjwysocki.net>; Mario Limonciello > <mario.limonciello@outlook.com>; linux-acpi > <linux-acpi@vger.kernel.org> > Subject: Re: Some SSDT tables are not loading with kernel >= 5.12 > > On Mon, Jun 7, 2021 at 12:05 PM Hans de Goede <hdegoede@redhat.com> > wrote: > > > > HI, > > > > On 6/7/21 11:43 AM, Hans de Goede wrote: > > > Hi, > > > > > > On 6/3/21 7:26 PM, Hans de Goede wrote: > > >> Hi Rafael, > > >> > > >> I've been helping some users with trying to get to the bottom of > some > > >> new ACPI errors with kernel 5.12, see: > > >> > > >> [1]https://bugzilla.kernel.org/show_bug.cgi?id=213023 > > >> > > >> After looking at dmesg output; and after editing the dmesg output > > >> a bit so that I could do diff -u on it, the following stands out: > > >> > > >> --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 > > >> +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 > +0200 > > >> @@ -92,7 +92,7 @@ > > >> ACPI: IRQ9 used by override. > > >> Using ACPI (MADT) for SMP configuration information > > >> ACPI: HPET id: 0x8086a201 base: 0xfed00000 > > >> -ACPI: Core revision 20200925 > > >> +ACPI: Core revision 20210105 > > >> PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 > bytes) > > >> PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] > (7274496 bytes) > > >> ACPI FADT declares the system doesn't support PCIe ASPM, so > disable it > > >> @@ -113,10 +113,6 @@ > > >> ACPI: Dynamic OEM Table Load: > > >> ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL > 20160527) > > >> ACPI: Dynamic OEM Table Load: > > >> -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL > 20160527) > > >> -ACPI: Dynamic OEM Table Load: > > >> -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL > 20160527) > > >> -ACPI: Dynamic OEM Table Load: > > >> ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL > 20160527) > > >> ACPI: Dynamic OEM Table Load: > > >> ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL > 20160527) > > >> > > >> Note how for some reason the kernel is no longer loading the > Cpu0Hwp and > > >> HwpLvt SSDT-s ? > > >> > > >> Do you have any ideas what might be causing this ? > > > > > > Good news, a very similar bug is being tracked here: > > > > > > [2]https://bugzilla.redhat.com/show_bug.cgi?id=1963717 > > > > > > And one of the reporters there has done a git bisect and has found > the commit which is causing the problem for them: > > > > > > """ > > > git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : > > > "ACPI: Execute platform _OSC also with query bit clear". > > > > > > Tested 5.12.9 kernel with the commit reverted, and confirmed that > the error > > > messages are gone. (I had to revert > > > 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) > > > > > > It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc > which is absent > > > in the stable 5.12.x > > > > > > Hope this helps > > > """ > > > > I've taken a quick look at commit 719e1f561afb ("ACPI: Execute > platform _OSC also with query bit clear") and I think I may have found > the problem. > > > > I've attached a patch which I think may fix this (and I've asked the > reporters of the bugs to test this). > Thank you, the patch looks reasonable to me. > It looks like commit 719e1f561afb went a bit too far. > > References > > 1. https://bugzilla.kernel.org/show_bug.cgi?id=213023 > 2. https://bugzilla.redhat.com/show_bug.cgi?id=1963717
Hi, On 6/7/21 6:08 PM, Mika Westerberg wrote: > Hi, > > Tried now on ADL-P and TGL systems and the _OSC still works properly. > > Thanks Hans for fixing! > > Feel free to add my Tested-by. Thank you for testing, unfortunately so far from the comments here: https://bugzilla.kernel.org/show_bug.cgi?id=213023 it seems that my patch does not help resolve the issues caused by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query bit clear"), where as reverting that commit does resolve them :| Does anyone have any other ideas how to fix this ? Regards, Hans > > On Mon, Jun 07, 2021 at 01:01:59PM +0000, Mario Limonciello wrote: >> Mika, >> >> Can you have a try and make sure this modification still works properly >> on the series in the hardware we originally did it for? >> __________________________________________________________________ >> >> From: Rafael J. Wysocki <rafael@kernel.org> >> Sent: Monday, June 7, 2021 6:13:21 AM >> To: Hans de Goede <hdegoede@redhat.com> >> Cc: Rafael J . Wysocki <rjw@rjwysocki.net>; Mario Limonciello >> <mario.limonciello@outlook.com>; linux-acpi >> <linux-acpi@vger.kernel.org> >> Subject: Re: Some SSDT tables are not loading with kernel >= 5.12 >> >> On Mon, Jun 7, 2021 at 12:05 PM Hans de Goede <hdegoede@redhat.com> >> wrote: >> > >> > HI, >> > >> > On 6/7/21 11:43 AM, Hans de Goede wrote: >> > > Hi, >> > > >> > > On 6/3/21 7:26 PM, Hans de Goede wrote: >> > >> Hi Rafael, >> > >> >> > >> I've been helping some users with trying to get to the bottom of >> some >> > >> new ACPI errors with kernel 5.12, see: >> > >> >> > >> [1]https://bugzilla.kernel.org/show_bug.cgi?id=213023 >> > >> >> > >> After looking at dmesg output; and after editing the dmesg output >> > >> a bit so that I could do diff -u on it, the following stands out: >> > >> >> > >> --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 >> > >> +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 >> +0200 >> > >> @@ -92,7 +92,7 @@ >> > >> ACPI: IRQ9 used by override. >> > >> Using ACPI (MADT) for SMP configuration information >> > >> ACPI: HPET id: 0x8086a201 base: 0xfed00000 >> > >> -ACPI: Core revision 20200925 >> > >> +ACPI: Core revision 20210105 >> > >> PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 >> bytes) >> > >> PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] >> (7274496 bytes) >> > >> ACPI FADT declares the system doesn't support PCIe ASPM, so >> disable it >> > >> @@ -113,10 +113,6 @@ >> > >> ACPI: Dynamic OEM Table Load: >> > >> ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL >> 20160527) >> > >> ACPI: Dynamic OEM Table Load: >> > >> -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL >> 20160527) >> > >> -ACPI: Dynamic OEM Table Load: >> > >> -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL >> 20160527) >> > >> -ACPI: Dynamic OEM Table Load: >> > >> ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL >> 20160527) >> > >> ACPI: Dynamic OEM Table Load: >> > >> ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL >> 20160527) >> > >> >> > >> Note how for some reason the kernel is no longer loading the >> Cpu0Hwp and >> > >> HwpLvt SSDT-s ? >> > >> >> > >> Do you have any ideas what might be causing this ? >> > > >> > > Good news, a very similar bug is being tracked here: >> > > >> > > [2]https://bugzilla.redhat.com/show_bug.cgi?id=1963717 >> > > >> > > And one of the reporters there has done a git bisect and has found >> the commit which is causing the problem for them: >> > > >> > > """ >> > > git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : >> > > "ACPI: Execute platform _OSC also with query bit clear". >> > > >> > > Tested 5.12.9 kernel with the commit reverted, and confirmed that >> the error >> > > messages are gone. (I had to revert >> > > 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) >> > > >> > > It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc >> which is absent >> > > in the stable 5.12.x >> > > >> > > Hope this helps >> > > """ >> > >> > I've taken a quick look at commit 719e1f561afb ("ACPI: Execute >> platform _OSC also with query bit clear") and I think I may have found >> the problem. >> > >> > I've attached a patch which I think may fix this (and I've asked the >> reporters of the bugs to test this). >> Thank you, the patch looks reasonable to me. >> It looks like commit 719e1f561afb went a bit too far. >> >> References >> >> 1. https://bugzilla.kernel.org/show_bug.cgi?id=213023 >> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1963717 >
Hi, On 6/7/21 9:18 PM, Hans de Goede wrote: > Hi, > > On 6/7/21 6:08 PM, Mika Westerberg wrote: >> Hi, >> >> Tried now on ADL-P and TGL systems and the _OSC still works properly. >> >> Thanks Hans for fixing! >> >> Feel free to add my Tested-by. > > Thank you for testing, unfortunately so far from the comments here: > > https://bugzilla.kernel.org/show_bug.cgi?id=213023 > > it seems that my patch does not help resolve the issues caused > by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query > bit clear"), where as reverting that commit does resolve them :| > > Does anyone have any other ideas how to fix this ? The reporter who has done the bisect has commented out the new/second _OSC call and that fixes things for them. So I've written a new fix (attached), note just as before this is untested ATM. Mika, if you can test this one (it replaces the previous one) on machines with native USB4 support to check those don't regress then that would be great. I've asked the various reporters from the 2 bugzilla's for this to also test this new patch. I'll let you know how that goes. Regards, Hans >> On Mon, Jun 07, 2021 at 01:01:59PM +0000, Mario Limonciello wrote: >>> Mika, >>> >>> Can you have a try and make sure this modification still works properly >>> on the series in the hardware we originally did it for? >>> __________________________________________________________________ >>> >>> From: Rafael J. Wysocki <rafael@kernel.org> >>> Sent: Monday, June 7, 2021 6:13:21 AM >>> To: Hans de Goede <hdegoede@redhat.com> >>> Cc: Rafael J . Wysocki <rjw@rjwysocki.net>; Mario Limonciello >>> <mario.limonciello@outlook.com>; linux-acpi >>> <linux-acpi@vger.kernel.org> >>> Subject: Re: Some SSDT tables are not loading with kernel >= 5.12 >>> >>> On Mon, Jun 7, 2021 at 12:05 PM Hans de Goede <hdegoede@redhat.com> >>> wrote: >>> > >>> > HI, >>> > >>> > On 6/7/21 11:43 AM, Hans de Goede wrote: >>> > > Hi, >>> > > >>> > > On 6/3/21 7:26 PM, Hans de Goede wrote: >>> > >> Hi Rafael, >>> > >> >>> > >> I've been helping some users with trying to get to the bottom of >>> some >>> > >> new ACPI errors with kernel 5.12, see: >>> > >> >>> > >> [1]https://bugzilla.kernel.org/show_bug.cgi?id=213023 >>> > >> >>> > >> After looking at dmesg output; and after editing the dmesg output >>> > >> a bit so that I could do diff -u on it, the following stands out: >>> > >> >>> > >> --- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 >>> > >> +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 >>> +0200 >>> > >> @@ -92,7 +92,7 @@ >>> > >> ACPI: IRQ9 used by override. >>> > >> Using ACPI (MADT) for SMP configuration information >>> > >> ACPI: HPET id: 0x8086a201 base: 0xfed00000 >>> > >> -ACPI: Core revision 20200925 >>> > >> +ACPI: Core revision 20210105 >>> > >> PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 >>> bytes) >>> > >> PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] >>> (7274496 bytes) >>> > >> ACPI FADT declares the system doesn't support PCIe ASPM, so >>> disable it >>> > >> @@ -113,10 +113,6 @@ >>> > >> ACPI: Dynamic OEM Table Load: >>> > >> ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL >>> 20160527) >>> > >> ACPI: Dynamic OEM Table Load: >>> > >> -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL >>> 20160527) >>> > >> -ACPI: Dynamic OEM Table Load: >>> > >> -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL >>> 20160527) >>> > >> -ACPI: Dynamic OEM Table Load: >>> > >> ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL >>> 20160527) >>> > >> ACPI: Dynamic OEM Table Load: >>> > >> ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL >>> 20160527) >>> > >> >>> > >> Note how for some reason the kernel is no longer loading the >>> Cpu0Hwp and >>> > >> HwpLvt SSDT-s ? >>> > >> >>> > >> Do you have any ideas what might be causing this ? >>> > > >>> > > Good news, a very similar bug is being tracked here: >>> > > >>> > > [2]https://bugzilla.redhat.com/show_bug.cgi?id=1963717 >>> > > >>> > > And one of the reporters there has done a git bisect and has found >>> the commit which is causing the problem for them: >>> > > >>> > > """ >>> > > git-bisect points to 719e1f561afbe020ed175825a9bd25ed62ed1697 : >>> > > "ACPI: Execute platform _OSC also with query bit clear". >>> > > >>> > > Tested 5.12.9 kernel with the commit reverted, and confirmed that >>> the error >>> > > messages are gone. (I had to revert >>> > > 5a6a2c0f0f43676df27632d657a3f18b151a7ef8 for dependency too.) >>> > > >>> > > It also brings back the /sys/devices/system/cpu/cpu0/acpi_cppc >>> which is absent >>> > > in the stable 5.12.x >>> > > >>> > > Hope this helps >>> > > """ >>> > >>> > I've taken a quick look at commit 719e1f561afb ("ACPI: Execute >>> platform _OSC also with query bit clear") and I think I may have found >>> the problem. >>> > >>> > I've attached a patch which I think may fix this (and I've asked the >>> reporters of the bugs to test this). >>> Thank you, the patch looks reasonable to me. >>> It looks like commit 719e1f561afb went a bit too far. >>> >>> References >>> >>> 1. https://bugzilla.kernel.org/show_bug.cgi?id=213023 >>> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1963717 >>
Hi, On Tue, Jun 08, 2021 at 11:50:15AM +0200, Hans de Goede wrote: > Hi, > > On 6/7/21 9:18 PM, Hans de Goede wrote: > > Hi, > > > > On 6/7/21 6:08 PM, Mika Westerberg wrote: > >> Hi, > >> > >> Tried now on ADL-P and TGL systems and the _OSC still works properly. > >> > >> Thanks Hans for fixing! > >> > >> Feel free to add my Tested-by. > > > > Thank you for testing, unfortunately so far from the comments here: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=213023 > > > > it seems that my patch does not help resolve the issues caused > > by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query > > bit clear"), where as reverting that commit does resolve them :| > > > > Does anyone have any other ideas how to fix this ? > > The reporter who has done the bisect has commented out the new/second > _OSC call and that fixes things for them. So I've written a new fix > (attached), note just as before this is untested ATM. > > Mika, if you can test this one (it replaces the previous one) > on machines with native USB4 support to check those don't regress then > that would be great. I can test it sure, but first let's try to understand what the problem is :) > I've asked the various reporters from the 2 bugzilla's for this to also > test this new patch. I'll let you know how that goes. The _OSC on at least one of the affected platforms look like this: If ((Arg0 == ToUUID ("0811b06e-4a27-44f9-8d60-3cbbc22e7b48") /* Platform-wide Capabilities */)) { If ((Arg1 == One)) { OSCP = CAP0 /* \_SB_._OSC.CAP0 */ If ((CAP0 & 0x04)) { OSCO = 0x04 If (((SGMD & 0x0F) != 0x02)) { If ((RTD3 == Zero)) { CAP0 &= 0x3B STS0 |= 0x10 } } } } Else { STS0 &= 0xFFFFFF00 STS0 |= 0x0A } } Else { STS0 &= 0xFFFFFF00 STS0 |= 0x06 } Probably it is fine to call it several times but the issue is with the mask that it does: CAP0 &= 0x3B This clears out the upper bits. I think this is actually a BIOS bug as it ends up clearing OSC_SB_PCLPI_SUPPORT which is probably not intented, and that seems to cause skipping of the LPI tables or something like that. The alternative is to pass the original caps to the second _OSC call. I think this is safe too. While looking at the code, I found a couple of other issues that should be fixed with the below hack patch. What do you think about this approach? diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index be7da23fad76..80ff81bb668b 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -290,7 +290,7 @@ static void acpi_bus_osc_negotiate_platform_control(void) struct acpi_osc_context context = { .uuid_str = sb_uuid_str, .rev = 1, - .cap.length = 8, + .cap.length = sizeof(capbuf), .cap.pointer = capbuf, }; acpi_handle handle; @@ -330,32 +330,21 @@ static void acpi_bus_osc_negotiate_platform_control(void) if (ACPI_FAILURE(acpi_run_osc(handle, &context))) return; - capbuf_ret = context.ret.pointer; - if (context.ret.length <= OSC_SUPPORT_DWORD) { - kfree(context.ret.pointer); - return; - } + kfree(context.ret.pointer); - /* - * Now run _OSC again with query flag clear and with the caps - * supported by both the OS and the platform. - */ + /* Now run _OSC again with query flag clear */ capbuf[OSC_QUERY_DWORD] = 0; - capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD]; - kfree(context.ret.pointer); if (ACPI_FAILURE(acpi_run_osc(handle, &context))) return; capbuf_ret = context.ret.pointer; - if (context.ret.length > OSC_SUPPORT_DWORD) { - osc_sb_apei_support_acked = - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT; - osc_pc_lpi_support_confirmed = - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT; - osc_sb_native_usb4_support_confirmed = - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT; - } + osc_sb_apei_support_acked = + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT; + osc_pc_lpi_support_confirmed = + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT; + osc_sb_native_usb4_support_confirmed = + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT; kfree(context.ret.pointer); }
On Tue, Jun 08, 2021 at 02:45:03PM +0300, Mika Westerberg wrote: > Hi, > > On Tue, Jun 08, 2021 at 11:50:15AM +0200, Hans de Goede wrote: > > Hi, > > > > On 6/7/21 9:18 PM, Hans de Goede wrote: > > > Hi, > > > > > > On 6/7/21 6:08 PM, Mika Westerberg wrote: > > >> Hi, > > >> > > >> Tried now on ADL-P and TGL systems and the _OSC still works properly. > > >> > > >> Thanks Hans for fixing! > > >> > > >> Feel free to add my Tested-by. > > > > > > Thank you for testing, unfortunately so far from the comments here: > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=213023 > > > > > > it seems that my patch does not help resolve the issues caused > > > by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query > > > bit clear"), where as reverting that commit does resolve them :| > > > > > > Does anyone have any other ideas how to fix this ? > > > > The reporter who has done the bisect has commented out the new/second > > _OSC call and that fixes things for them. So I've written a new fix > > (attached), note just as before this is untested ATM. > > > > Mika, if you can test this one (it replaces the previous one) > > on machines with native USB4 support to check those don't regress then > > that would be great. > > I can test it sure, but first let's try to understand what the problem is :) FYI, I also tested your patch and it still works on my test system so if we decided to go with that then feel free to add my Tested-by to the patch too.
Hi, On 6/8/21 1:44 PM, Mika Westerberg wrote: > Hi, > > On Tue, Jun 08, 2021 at 11:50:15AM +0200, Hans de Goede wrote: >> Hi, >> >> On 6/7/21 9:18 PM, Hans de Goede wrote: >>> Hi, >>> >>> On 6/7/21 6:08 PM, Mika Westerberg wrote: >>>> Hi, >>>> >>>> Tried now on ADL-P and TGL systems and the _OSC still works properly. >>>> >>>> Thanks Hans for fixing! >>>> >>>> Feel free to add my Tested-by. >>> >>> Thank you for testing, unfortunately so far from the comments here: >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=213023 >>> >>> it seems that my patch does not help resolve the issues caused >>> by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query >>> bit clear"), where as reverting that commit does resolve them :| >>> >>> Does anyone have any other ideas how to fix this ? >> >> The reporter who has done the bisect has commented out the new/second >> _OSC call and that fixes things for them. So I've written a new fix >> (attached), note just as before this is untested ATM. >> >> Mika, if you can test this one (it replaces the previous one) >> on machines with native USB4 support to check those don't regress then >> that would be great. > > I can test it sure, but first let's try to understand what the problem is :) > >> I've asked the various reporters from the 2 bugzilla's for this to also >> test this new patch. I'll let you know how that goes. > > The _OSC on at least one of the affected platforms look like this: > > If ((Arg0 == ToUUID ("0811b06e-4a27-44f9-8d60-3cbbc22e7b48") /* Platform-wide Capabilities */)) > { > If ((Arg1 == One)) > { > OSCP = CAP0 /* \_SB_._OSC.CAP0 */ > If ((CAP0 & 0x04)) > { > OSCO = 0x04 > If (((SGMD & 0x0F) != 0x02)) > { > If ((RTD3 == Zero)) > { > CAP0 &= 0x3B > STS0 |= 0x10 > } > } > } > } > Else > { > STS0 &= 0xFFFFFF00 > STS0 |= 0x0A > } > } > Else > { > STS0 &= 0xFFFFFF00 > STS0 |= 0x06 > } > > Probably it is fine to call it several times but the issue is with the mask > that it does: > > CAP0 &= 0x3B > > This clears out the upper bits. I think this is actually a BIOS bug as it ends > up clearing OSC_SB_PCLPI_SUPPORT which is probably not intented, and that seems > to cause skipping of the LPI tables or something like that. > > The alternative is to pass the original caps to the second _OSC call. I think > this is safe too. While looking at the code, I found a couple of other issues > that should be fixed with the below hack patch. > > What do you think about this approach? I think you might be on to something, quoting from the spec: """ 6.2.11.1.3 Sequence of _OSC calls The following rules govern sequences of calls to _OSC that are issued to the same host bridge and occur within the same boot. • The OS is permitted to evaluate _OSC an arbitrary number of times. • If the OS declares support of a feature in the Status Field in one call to _OSC, then it must preserve the set state of that bit (declaring support for that feature) in all subsequent calls. • If the OS is granted control of a feature in the Control Field in one call to _OSC, then it must preserve the set state of that bit (requesting that feature) in all subsequent calls. """ So the spec is saying that we should indeed keep all the flags which set during the first call also set during subsequent calls. If you can turn this into a proper patch then I can ask the reporters of the 2 bugs to test that patch. Regards, Hans > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > index be7da23fad76..80ff81bb668b 100644 > --- a/drivers/acpi/bus.c > +++ b/drivers/acpi/bus.c > @@ -290,7 +290,7 @@ static void acpi_bus_osc_negotiate_platform_control(void) > struct acpi_osc_context context = { > .uuid_str = sb_uuid_str, > .rev = 1, > - .cap.length = 8, > + .cap.length = sizeof(capbuf), > .cap.pointer = capbuf, > }; > acpi_handle handle; > @@ -330,32 +330,21 @@ static void acpi_bus_osc_negotiate_platform_control(void) > if (ACPI_FAILURE(acpi_run_osc(handle, &context))) > return; > > - capbuf_ret = context.ret.pointer; > - if (context.ret.length <= OSC_SUPPORT_DWORD) { > - kfree(context.ret.pointer); > - return; > - } > + kfree(context.ret.pointer); > > - /* > - * Now run _OSC again with query flag clear and with the caps > - * supported by both the OS and the platform. > - */ > + /* Now run _OSC again with query flag clear */ > capbuf[OSC_QUERY_DWORD] = 0; > - capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD]; > - kfree(context.ret.pointer); > > if (ACPI_FAILURE(acpi_run_osc(handle, &context))) > return; > > capbuf_ret = context.ret.pointer; > - if (context.ret.length > OSC_SUPPORT_DWORD) { > - osc_sb_apei_support_acked = > - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT; > - osc_pc_lpi_support_confirmed = > - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT; > - osc_sb_native_usb4_support_confirmed = > - capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT; > - } > + osc_sb_apei_support_acked = > + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT; > + osc_pc_lpi_support_confirmed = > + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT; > + osc_sb_native_usb4_support_confirmed = > + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT; > > kfree(context.ret.pointer); > } >
Hi Hans, On Tue, Jun 08, 2021 at 03:24:28PM +0200, Hans de Goede wrote: > Hi, > > On 6/8/21 1:44 PM, Mika Westerberg wrote: > > Hi, > > > > On Tue, Jun 08, 2021 at 11:50:15AM +0200, Hans de Goede wrote: > >> Hi, > >> > >> On 6/7/21 9:18 PM, Hans de Goede wrote: > >>> Hi, > >>> > >>> On 6/7/21 6:08 PM, Mika Westerberg wrote: > >>>> Hi, > >>>> > >>>> Tried now on ADL-P and TGL systems and the _OSC still works properly. > >>>> > >>>> Thanks Hans for fixing! > >>>> > >>>> Feel free to add my Tested-by. > >>> > >>> Thank you for testing, unfortunately so far from the comments here: > >>> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=213023 > >>> > >>> it seems that my patch does not help resolve the issues caused > >>> by commit 719e1f561afb ("ACPI: Execute platform _OSC also with query > >>> bit clear"), where as reverting that commit does resolve them :| > >>> > >>> Does anyone have any other ideas how to fix this ? > >> > >> The reporter who has done the bisect has commented out the new/second > >> _OSC call and that fixes things for them. So I've written a new fix > >> (attached), note just as before this is untested ATM. > >> > >> Mika, if you can test this one (it replaces the previous one) > >> on machines with native USB4 support to check those don't regress then > >> that would be great. > > > > I can test it sure, but first let's try to understand what the problem is :) > > > >> I've asked the various reporters from the 2 bugzilla's for this to also > >> test this new patch. I'll let you know how that goes. > > > > The _OSC on at least one of the affected platforms look like this: > > > > If ((Arg0 == ToUUID ("0811b06e-4a27-44f9-8d60-3cbbc22e7b48") /* Platform-wide Capabilities */)) > > { > > If ((Arg1 == One)) > > { > > OSCP = CAP0 /* \_SB_._OSC.CAP0 */ > > If ((CAP0 & 0x04)) > > { > > OSCO = 0x04 > > If (((SGMD & 0x0F) != 0x02)) > > { > > If ((RTD3 == Zero)) > > { > > CAP0 &= 0x3B > > STS0 |= 0x10 > > } > > } > > } > > } > > Else > > { > > STS0 &= 0xFFFFFF00 > > STS0 |= 0x0A > > } > > } > > Else > > { > > STS0 &= 0xFFFFFF00 > > STS0 |= 0x06 > > } > > > > Probably it is fine to call it several times but the issue is with the mask > > that it does: > > > > CAP0 &= 0x3B > > > > This clears out the upper bits. I think this is actually a BIOS bug as it ends > > up clearing OSC_SB_PCLPI_SUPPORT which is probably not intented, and that seems > > to cause skipping of the LPI tables or something like that. > > > > The alternative is to pass the original caps to the second _OSC call. I think > > this is safe too. While looking at the code, I found a couple of other issues > > that should be fixed with the below hack patch. > > > > What do you think about this approach? > > I think you might be on to something, quoting from the spec: > > """ > 6.2.11.1.3 Sequence of _OSC calls > The following rules govern sequences of calls to _OSC that are issued to the same host bridge and > occur within the same boot. > • The OS is permitted to evaluate _OSC an arbitrary number of times. > • If the OS declares support of a feature in the Status Field in one call to _OSC, then it must > preserve the set state of that bit (declaring support for that feature) in all subsequent calls. > • If the OS is granted control of a feature in the Control Field in one call to _OSC, then it must > preserve the set state of that bit (requesting that feature) in all subsequent calls. > """ > > So the spec is saying that we should indeed keep all the flags which set during > the first call also set during subsequent calls. > > If you can turn this into a proper patch then I can ask the reporters of > the 2 bugs to test that patch. Sure I will. First I think I figured why this happens. The BIOS loads the HWP tables dynamically (in ssdt9.dsl) like this: \_PR.PRxx.GCAP(): If ((OSYS >= 0x07DF)) { If (((CFGD & 0x00400000) && !(SDTL & 0x40))) { If ((\_SB.OSCP & 0x40)) { SDTL |= 0x40 OperationRegion (HWP0, SystemMemory, DerefOf (SSDT [0x0D]), DerefOf (SSDT [0x0E])) Load (HWP0, HW0) /* \_PR_.PR00.HW0_ */ If ((CFGD & 0x00800000)) { OperationRegion (HWPL, SystemMemory, DerefOf (SSDT [0x13]), DerefOf (SSDT [0x14])) Load (HWPL, HW2) /* \_PR_.PR00.HW2_ */ } } Note it checks the \_SB.OSCP which is set in _OSC to the value of the "support" buffer that Linux populates. However, in _OSC it also clears that particular bit (when RTD3 is set to 0): CAP0 &= 0x3B STS0 |= 0x10 Since Linux calls the _OSC again with the cleared bit the \_SB.OSCP also does not have that bit set anymore and that makes GCAP() to skip the Load() operation resulting the errors users have reported. This looks like that the BIOS expects the same set of "support" bits to be set on each call, or alternatively it only expects the _OSC to be run once. In any case, I will make a proper patch soon with the above added to the commit log too.
--- dmesg_5.10.38-1-lts 2021-06-03 16:29:41.372922210 +0200 +++ dmesg_linux-5.12.5-arch1-1 2021-06-03 16:30:01.013031634 +0200 @@ -92,7 +92,7 @@ ACPI: IRQ9 used by override. Using ACPI (MADT) for SMP configuration information ACPI: HPET id: 0x8086a201 base: 0xfed00000 -ACPI: Core revision 20200925 +ACPI: Core revision 20210105 PM: Registering ACPI NVS region [mem 0x7156c000-0x7156cfff] (4096 bytes) PM: Registering ACPI NVS region [mem 0x8a88f000-0x8af7efff] (7274496 bytes) ACPI FADT declares the system doesn't support PCIe ASPM, so disable it @@ -113,10 +113,6 @@ ACPI: Dynamic OEM Table Load: ACPI: SSDT 0xFFFF... 0003FF (v02 PmRef Cpu0Cst 00003001 INTL 20160527) ACPI: Dynamic OEM Table Load: -ACPI: SSDT 0xFFFF... 0000BA (v02 PmRef Cpu0Hwp 00003000 INTL 20160527) -ACPI: Dynamic OEM Table Load: -ACPI: SSDT 0xFFFF... 000628 (v02 PmRef HwpLvt 00003000 INTL 20160527) -ACPI: Dynamic OEM Table Load: ACPI: SSDT 0xFFFF... 000D14 (v02 PmRef ApIst 00003000 INTL 20160527) ACPI: Dynamic OEM Table Load: ACPI: SSDT 0xFFFF... 000317 (v02 PmRef ApHwp 00003000 INTL 20160527)