Message ID | 20230725113938.2277420-1-imammedo@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | acpipcihp: fix kernel crash on 2nd resume | expand |
Igor Mammedov wrote: > Changelog: > * split out debug patch into a separate one with extra printk added > * fixed inverte bus->self check (probably a reason why it didn't work before) > > > 1/3 debug patch > 2/3 offending patch > 3/3 potential fix > > I added more files to trace, add following to kernel CLI > dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel > > should be applied on top of > e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present > > apply a patch one by one and run testcase + capture dmesg after each patch > one shpould endup with 3 dmesg to ananlyse > 1st - old behaviour - no crash > 2nd - crash > 3rd - no crash hopefully > > Igor Mammedov (3): > acpiphp: extra debug hack > PCI: acpiphp: Reassign resources on bridge if necessary > acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge > > drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > Actually applying patch1 is already creating the crash (why???), hence I have added also dmesg-6.5-0.txt which shows a working condition based on git e8afd0d9fccc level (acpiphp_glue in kernel 6.4) Patch3 did not fix the issue, it seems that the culprit is somewhere else triggered by "benign" patch1 :-( Also note about the trigger description in patch3: the dmesg trace on Inspiron laptop is collected after the first wake from suspend to ram. The consecutive attempt to sleep results in a frozen system. Thanks, Woody
On Tue, 25 Jul 2023 09:51:53 -0400 Woody Suwalski <terraluna977@gmail.com> wrote: > Igor Mammedov wrote: > > Changelog: > > * split out debug patch into a separate one with extra printk added > > * fixed inverte bus->self check (probably a reason why it didn't work before) > > > > > > 1/3 debug patch > > 2/3 offending patch > > 3/3 potential fix > > > > I added more files to trace, add following to kernel CLI > > dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel > > > > should be applied on top of > > e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present > > > > apply a patch one by one and run testcase + capture dmesg after each patch > > one shpould endup with 3 dmesg to ananlyse > > 1st - old behaviour - no crash > > 2nd - crash > > 3rd - no crash hopefully > > > > Igor Mammedov (3): > > acpiphp: extra debug hack > > PCI: acpiphp: Reassign resources on bridge if necessary > > acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge > > > > drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++----- > > 1 file changed, 18 insertions(+), 5 deletions(-) > > > Actually applying patch1 is already creating the crash (why???), probably it's due to an extra debug line, I've added. I dropped suspicions one, can you try again and see if it works. > hence I > have added also dmesg-6.5-0.txt which shows a working condition based on > git e8afd0d9fccc level (acpiphp_glue in kernel 6.4) > > Patch3 did not fix the issue, it seems that the culprit is somewhere > else triggered by "benign" patch1 :-( > > Also note about the trigger description in patch3: the dmesg trace on > Inspiron laptop is collected after the first wake from suspend to ram. > The consecutive attempt to sleep results in a frozen system. Thanks for clarification, I'll correct commit message once culprit is found. > > Thanks, Woody >
Woody Suwalski wrote: > Igor Mammedov wrote: >> Changelog: >> * split out debug patch into a separate one with extra printk added >> * fixed inverte bus->self check (probably a reason why it didn't >> work before) >> >> >> 1/3 debug patch >> 2/3 offending patch >> 3/3 potential fix >> I added more files to trace, add following to kernel CLI >> dyndbg="file drivers/pci/access.c +p; file >> drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; >> file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file >> drivers/acpi/bus.c +p" ignore_loglevel >> >> should be applied on top of >> e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not >> present >> >> apply a patch one by one and run testcase + capture dmesg after each >> patch >> one shpould endup with 3 dmesg to ananlyse >> 1st - old behaviour - no crash >> 2nd - crash >> 3rd - no crash hopefully >> >> Igor Mammedov (3): >> acpiphp: extra debug hack >> PCI: acpiphp: Reassign resources on bridge if necessary >> acpipcihp: use __pci_bus_assign_resources() if bus doesn't have >> bridge >> >> drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++----- >> 1 file changed, 18 insertions(+), 5 deletions(-) >> > Actually applying patch1 is already creating the crash (why???), hence > I have added also dmesg-6.5-0.txt which shows a working condition > based on git e8afd0d9fccc level (acpiphp_glue in kernel 6.4) > > Patch3 did not fix the issue, it seems that the culprit is somewhere > else triggered by "benign" patch1 :-( > > Also note about the trigger description in patch3: the dmesg trace on > Inspiron laptop is collected after the first wake from suspend to ram. > The consecutive attempt to sleep results in a frozen system. > > Thanks, Woody > I think that in patch1 there is a problem in your debug statement acpi_handle_debug(...slot_name...) - it is masking the "old" issue. when I commented out that line in hotplug_event(), it has worked ok (as was expected). I will redo the testing in ~2 hours... Woody
Igor Mammedov wrote: > On Tue, 25 Jul 2023 09:51:53 -0400 > Woody Suwalski <terraluna977@gmail.com> wrote: > >> Igor Mammedov wrote: >>> Changelog: >>> * split out debug patch into a separate one with extra printk added >>> * fixed inverte bus->self check (probably a reason why it didn't work before) >>> >>> >>> 1/3 debug patch >>> 2/3 offending patch >>> 3/3 potential fix >>> >>> I added more files to trace, add following to kernel CLI >>> dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel >>> >>> should be applied on top of >>> e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present >>> >>> apply a patch one by one and run testcase + capture dmesg after each patch >>> one shpould endup with 3 dmesg to ananlyse >>> 1st - old behaviour - no crash >>> 2nd - crash >>> 3rd - no crash hopefully >>> >>> Igor Mammedov (3): >>> acpiphp: extra debug hack >>> PCI: acpiphp: Reassign resources on bridge if necessary >>> acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge >>> >>> drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++----- >>> 1 file changed, 18 insertions(+), 5 deletions(-) >>> >> Actually applying patch1 is already creating the crash (why???), > probably it's due to an extra debug line, I've added. > I dropped suspicions one, can you try again and see if it works. > >> hence I >> have added also dmesg-6.5-0.txt which shows a working condition based on >> git e8afd0d9fccc level (acpiphp_glue in kernel 6.4) >> >> Patch3 did not fix the issue, it seems that the culprit is somewhere >> else triggered by "benign" patch1 :-( >> >> Also note about the trigger description in patch3: the dmesg trace on >> Inspiron laptop is collected after the first wake from suspend to ram. >> The consecutive attempt to sleep results in a frozen system. > Thanks for clarification, I'll correct commit message once culprit > is found. > Good news. After removing the botched debug statement which was masking the original issue, the testing went as you have predicted, and on patch 3 system suspends to RAM OK. Here are the requested 3 dmesg outputs, #2 is for the bad run. I can retest with a final version of the patch once you have it ready... Thanks, Woody
On Tue, 25 Jul 2023 11:59:56 -0400 Woody Suwalski <terraluna977@gmail.com> wrote: > Igor Mammedov wrote: > > On Tue, 25 Jul 2023 09:51:53 -0400 > > Woody Suwalski <terraluna977@gmail.com> wrote: > > > >> Igor Mammedov wrote: > >>> Changelog: > >>> * split out debug patch into a separate one with extra printk added > >>> * fixed inverte bus->self check (probably a reason why it didn't work before) > >>> > >>> > >>> 1/3 debug patch > >>> 2/3 offending patch > >>> 3/3 potential fix > >>> > >>> I added more files to trace, add following to kernel CLI > >>> dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel > >>> > >>> should be applied on top of > >>> e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present > >>> > >>> apply a patch one by one and run testcase + capture dmesg after each patch > >>> one shpould endup with 3 dmesg to ananlyse > >>> 1st - old behaviour - no crash > >>> 2nd - crash > >>> 3rd - no crash hopefully > >>> > >>> Igor Mammedov (3): > >>> acpiphp: extra debug hack > >>> PCI: acpiphp: Reassign resources on bridge if necessary > >>> acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge > >>> > >>> drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++----- > >>> 1 file changed, 18 insertions(+), 5 deletions(-) > >>> > >> Actually applying patch1 is already creating the crash (why???), > > probably it's due to an extra debug line, I've added. > > I dropped suspicions one, can you try again and see if it works. > > > >> hence I > >> have added also dmesg-6.5-0.txt which shows a working condition based on > >> git e8afd0d9fccc level (acpiphp_glue in kernel 6.4) > >> > >> Patch3 did not fix the issue, it seems that the culprit is somewhere > >> else triggered by "benign" patch1 :-( > >> > >> Also note about the trigger description in patch3: the dmesg trace on > >> Inspiron laptop is collected after the first wake from suspend to ram. > >> The consecutive attempt to sleep results in a frozen system. > > Thanks for clarification, I'll correct commit message once culprit > > is found. > > > Good news. After removing the botched debug statement which was masking > the original issue, the testing went as you have predicted, and on patch > 3 system suspends to RAM OK. Thanks for confirmation, I'll post cleaned up 3/3 patch today. > > Here are the requested 3 dmesg outputs, #2 is for the bad run. > > I can retest with a final version of the patch once you have it ready... > > Thanks, Woody >