Message ID | 1779906.DOHeuFCiDy@vostro.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 10 Aug 2016, Rafael J. Wysocki wrote: > The last patch I sent had a problem, because if restore_jump_address really > overlapped with the identity mapping of the restore kernel, it might share > PGD or PUD entries with that mapping and that should have been taken into > account. > > Here goes an update. Again, this works on my test machine, but then the > previous version worked on it too ... Unfortunately still exactly the same symptoms during resume even with this one. Thanks,
On Wed, Aug 10, 2016 at 6:18 AM, Jiri Kosina <jikos@kernel.org> wrote: > On Wed, 10 Aug 2016, Rafael J. Wysocki wrote: > >> The last patch I sent had a problem, because if restore_jump_address really >> overlapped with the identity mapping of the restore kernel, it might share >> PGD or PUD entries with that mapping and that should have been taken into >> account. >> >> Here goes an update. Again, this works on my test machine, but then the >> previous version worked on it too ... > > Unfortunately still exactly the same symptoms during resume even with this > one. What type of machines are you testing it on? What is the memory size? Processor generation? > > Thanks, > > -- > Jiri Kosina > SUSE Labs >
On Wed, 10 Aug 2016, Thomas Garnier wrote: > What type of machines are you testing it on? What is the memory size? > Processor generation? Mine is Lenovo thinkpad x200s; I think Boris has been testing it on x230s, but not sure whether any of the latest patches didn't actually fix it for him. The machine I am seeing the issue on, has 2G RAM, with this e820 map: BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x000000007c4a0fff] usable BIOS-e820: [mem 0x000000007c4a1000-0x000000007c4a6fff] reserved BIOS-e820: [mem 0x000000007c4a7000-0x000000007c5b6fff] usable BIOS-e820: [mem 0x000000007c5b7000-0x000000007c60efff] reserved BIOS-e820: [mem 0x000000007c60f000-0x000000007c6c5fff] usable BIOS-e820: [mem 0x000000007c6c6000-0x000000007c6d0fff] ACPI NVS BIOS-e820: [mem 0x000000007c6d1000-0x000000007c6d3fff] ACPI data BIOS-e820: [mem 0x000000007c6d4000-0x000000007c6d7fff] reserved BIOS-e820: [mem 0x000000007c6d8000-0x000000007c6dbfff] ACPI NVS BIOS-e820: [mem 0x000000007c6dc000-0x000000007c6defff] reserved BIOS-e820: [mem 0x000000007c6df000-0x000000007c705fff] ACPI NVS BIOS-e820: [mem 0x000000007c706000-0x000000007c707fff] ACPI data BIOS-e820: [mem 0x000000007c708000-0x000000007c90efff] reserved BIOS-e820: [mem 0x000000007c90f000-0x000000007c99efff] ACPI NVS BIOS-e820: [mem 0x000000007c99f000-0x000000007c9fefff] ACPI data BIOS-e820: [mem 0x000000007c9ff000-0x000000007c9fffff] usable BIOS-e820: [mem 0x000000007cc00000-0x000000007effffff] reserved BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fed00000-0x00000000fed003ff] reserved BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff] reserved BIOS-e820: [mem 0x00000000fed18000-0x00000000fed19fff] reserved BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed8ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved NX (Execute Disable) protection: active SMBIOS 2.4 present. DMI: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008 e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable e820: last_pfn = 0x7ca00 max_arch_pfn = 0x400000000 CPU: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping : 6 microcode : 0x60c cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority dtherm ida bugs : bogomips : 3723.69 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
On Wed, Aug 10, 2016 at 04:59:40PM +0200, Jiri Kosina wrote: > Mine is Lenovo thinkpad x200s; I think Boris has been testing it on x230s, It says "X230" here under the screen. > but not sure whether any of the latest patches didn't actually fix it for > him. Haven't tested them yet. I'm waiting for you to test them first since this is the only machine I have right now and I need it for work. > The machine I am seeing the issue on, has 2G RAM, with this e820 map: 8G here: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff] reserved BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff] usable BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff] reserved BIOS-e820: [mem 0x0000000040005000-0x00000000cec2ffff] usable BIOS-e820: [mem 0x00000000cec30000-0x00000000dae9efff] reserved BIOS-e820: [mem 0x00000000dae9f000-0x00000000daf9efff] ACPI NVS BIOS-e820: [mem 0x00000000daf9f000-0x00000000daffefff] ACPI data BIOS-e820: [mem 0x00000000dafff000-0x00000000df9fffff] reserved BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved BIOS-e820: [mem 0x00000000fed08000-0x00000000fed08fff] reserved BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000021e5fffff] usable BIOS-e820: [mem 0x000000021e600000-0x000000021e7fffff] reserved debug: ignoring loglevel setting. NX (Execute Disable) protection: active SMBIOS 2.7 present. DMI: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable e820: last_pfn = 0x21e600 max_arch_pfn = 0x400000000 > CPU: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz stepping : 9 microcode : 0x1c cpu MHz : 1257.421 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts bugs : bogomips : 5786.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
On Wed, Aug 10, 2016 at 6:35 PM, Borislav Petkov <bp@suse.de> wrote: > On Wed, Aug 10, 2016 at 04:59:40PM +0200, Jiri Kosina wrote: >> Mine is Lenovo thinkpad x200s; I think Boris has been testing it on x230s, > > It says "X230" here under the screen. > >> but not sure whether any of the latest patches didn't actually fix it for >> him. > > Haven't tested them yet. I'm waiting for you to test them first since > this is the only machine I have right now and I need it for work. > >> The machine I am seeing the issue on, has 2G RAM, with this e820 map: > > 8G here: > > e820: BIOS-provided physical RAM map: > BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable > BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved > BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable > BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff] reserved > BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff] usable > BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff] reserved > BIOS-e820: [mem 0x0000000040005000-0x00000000cec2ffff] usable > BIOS-e820: [mem 0x00000000cec30000-0x00000000dae9efff] reserved > BIOS-e820: [mem 0x00000000dae9f000-0x00000000daf9efff] ACPI NVS > BIOS-e820: [mem 0x00000000daf9f000-0x00000000daffefff] ACPI data > BIOS-e820: [mem 0x00000000dafff000-0x00000000df9fffff] reserved > BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved > BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved > BIOS-e820: [mem 0x00000000fed08000-0x00000000fed08fff] reserved > BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved > BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved > BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved > BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved > BIOS-e820: [mem 0x0000000100000000-0x000000021e5fffff] usable > BIOS-e820: [mem 0x000000021e600000-0x000000021e7fffff] reserved > debug: ignoring loglevel setting. > NX (Execute Disable) protection: active > SMBIOS 2.7 present. > DMI: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 > e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > e820: remove [mem 0x000a0000-0x000fffff] usable > e820: last_pfn = 0x21e600 max_arch_pfn = 0x400000000 So far, I'm unable to reproduce the problem (with the $subject patch applied) on two different Intel-base machines with 4 Gig and 8 Gig of RAM. One thing that's clearly different on my machines is that they both have usable memory at the end of the e820 map (and the one where Jiri can reproduce the problem has reserved memory at the end of it, just like yours). Thomas, what about the e820 map on your machine? I'm not sure why that would matter, though. Thanks, Rafael
On Wed, Aug 10, 2016 at 10:56 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Wed, Aug 10, 2016 at 6:35 PM, Borislav Petkov <bp@suse.de> wrote: >> On Wed, Aug 10, 2016 at 04:59:40PM +0200, Jiri Kosina wrote: >>> Mine is Lenovo thinkpad x200s; I think Boris has been testing it on x230s, >> >> It says "X230" here under the screen. >> >>> but not sure whether any of the latest patches didn't actually fix it for >>> him. >> >> Haven't tested them yet. I'm waiting for you to test them first since >> this is the only machine I have right now and I need it for work. >> >>> The machine I am seeing the issue on, has 2G RAM, with this e820 map: >> >> 8G here: >> >> e820: BIOS-provided physical RAM map: >> BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable >> BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved >> BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved >> BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable >> BIOS-e820: [mem 0x0000000020000000-0x00000000201fffff] reserved >> BIOS-e820: [mem 0x0000000020200000-0x0000000040003fff] usable >> BIOS-e820: [mem 0x0000000040004000-0x0000000040004fff] reserved >> BIOS-e820: [mem 0x0000000040005000-0x00000000cec2ffff] usable >> BIOS-e820: [mem 0x00000000cec30000-0x00000000dae9efff] reserved >> BIOS-e820: [mem 0x00000000dae9f000-0x00000000daf9efff] ACPI NVS >> BIOS-e820: [mem 0x00000000daf9f000-0x00000000daffefff] ACPI data >> BIOS-e820: [mem 0x00000000dafff000-0x00000000df9fffff] reserved >> BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved >> BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved >> BIOS-e820: [mem 0x00000000fed08000-0x00000000fed08fff] reserved >> BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved >> BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved >> BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved >> BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved >> BIOS-e820: [mem 0x0000000100000000-0x000000021e5fffff] usable >> BIOS-e820: [mem 0x000000021e600000-0x000000021e7fffff] reserved >> debug: ignoring loglevel setting. >> NX (Execute Disable) protection: active >> SMBIOS 2.7 present. >> DMI: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 >> e820: update [mem 0x00000000-0x00000fff] usable ==> reserved >> e820: remove [mem 0x000a0000-0x000fffff] usable >> e820: last_pfn = 0x21e600 max_arch_pfn = 0x400000000 > > So far, I'm unable to reproduce the problem (with the $subject patch > applied) on two different Intel-base machines with 4 Gig and 8 Gig of > RAM. So I used your .config to generate one for my test machine and with that I can reproduce. The hardware configuration doesn't matter, then, the issue is config-related. Thanks, Rafael
On Wed, 10 Aug 2016, Rafael J. Wysocki wrote: > So I used your .config to generate one for my test machine and with > that I can reproduce. Was that the config I've sent, or did Boris provide one as well? Which one are you able to reproduce with please? > The hardware configuration doesn't matter, then, the issue is config-related. How big is the diff between the two configs? Could you please share it? Thanks,
On Wed, Aug 10, 2016 at 11:59 PM, Jiri Kosina <jikos@kernel.org> wrote: > On Wed, 10 Aug 2016, Rafael J. Wysocki wrote: > >> So I used your .config to generate one for my test machine and with >> that I can reproduce. > > Was that the config I've sent, or did Boris provide one as well? Which one > are you able to reproduce with please? It's the Boris' one. Moreover, I have found the options that make the difference: unsetting CONFIG_PROVE_LOCKING and CONFIG_DEBUG_LOCK_ALLOC (which also will unset CONFIG_LOCKDEP AFAICS) in it makes hibernation work again with CONFIG_RANDOMIZE_MEMORY set and with the $subject patch applied. Unbelievable, but that's what I'm seeing. Now, that leads to a few questions: - How does lockdep change the picture so it matters for hibernation? - Why is hibernation the only piece that's affected? - Why is RANDOMIZE_MEMORY necessary to make this breakage show up? Thomas, any ideas? Thanks, Rafael
On Wed, Aug 10, 2016 at 5:35 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Wed, Aug 10, 2016 at 11:59 PM, Jiri Kosina <jikos@kernel.org> wrote: >> On Wed, 10 Aug 2016, Rafael J. Wysocki wrote: >> >>> So I used your .config to generate one for my test machine and with >>> that I can reproduce. >> >> Was that the config I've sent, or did Boris provide one as well? Which one >> are you able to reproduce with please? > > It's the Boris' one. > > Moreover, I have found the options that make the difference: unsetting > CONFIG_PROVE_LOCKING and CONFIG_DEBUG_LOCK_ALLOC (which also will > unset CONFIG_LOCKDEP AFAICS) in it makes hibernation work again with > CONFIG_RANDOMIZE_MEMORY set and with the $subject patch applied. > > Unbelievable, but that's what I'm seeing. Nice find! > > Now, that leads to a few questions: > > - How does lockdep change the picture so it matters for hibernation? > - Why is hibernation the only piece that's affected? > - Why is RANDOMIZE_MEMORY necessary to make this breakage show up? > > Thomas, any ideas? No idea so far. I will investigate though. We had an unrelated issue with CONFIG_DEBUG_PAGEALLOC on early boot. I don't think it was related because it was on early boot and with certain e820 memory layout (and PUD randomization that I disabled on the previous patch test). The fix is on tip: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=fb754f958f8e46202c1efd7f66d5b3db1208117d > > Thanks, > Rafael
Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -38,14 +38,22 @@ unsigned long jump_address_phys; unsigned long restore_cr3 __visible; unsigned long temp_level4_pgt __visible; +unsigned long jump_level4_pgt __visible; unsigned long relocated_restore_code __visible; -static int set_up_temporary_text_mapping(pgd_t *pgd) +static int set_up_temporary_text_mapping(void) { + unsigned long pgd_idx = pgd_index(restore_jump_address); + unsigned long pud_idx = pud_index(restore_jump_address); + pgd_t *pgd; pmd_t *pmd; pud_t *pud; + pgd = (pgd_t *)get_safe_page(GFP_ATOMIC); + if (!pgd) + return -ENOMEM; + /* * The new mapping only has to cover the page containing the image * kernel's entry point (jump_address_phys), because the switch over to @@ -69,10 +77,32 @@ static int set_up_temporary_text_mapping set_pmd(pmd + pmd_index(restore_jump_address), __pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); - set_pud(pud + pud_index(restore_jump_address), + set_pud(pud + pud_idx, __pud(__pa(pmd) | _KERNPG_TABLE)); + set_pgd(pgd + pgd_idx, __pgd(__pa(pud) | _KERNPG_TABLE)); + + if (pgd_idx != pgd_index(relocated_restore_code)) { + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + set_pgd(pgd + pgd_index(relocated_restore_code), + __pgd(__pa(pud) | _KERNPG_TABLE)); + } else if (pud_idx == pud_index(relocated_restore_code)) { + goto set_pmd; + } + + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pud(pud + pud_index(relocated_restore_code), __pud(__pa(pmd) | _KERNPG_TABLE)); - set_pgd(pgd + pgd_index(restore_jump_address), - __pgd(__pa(pud) | _KERNPG_TABLE)); + + set_pmd: + set_pmd(pmd + pmd_index(relocated_restore_code), + __pmd((__pa(relocated_restore_code) & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); + + jump_level4_pgt = __pa(pgd); return 0; } @@ -98,11 +128,6 @@ static int set_up_temporary_mappings(voi if (!pgd) return -ENOMEM; - /* Prepare a temporary mapping for the kernel text */ - result = set_up_temporary_text_mapping(pgd); - if (result) - return result; - /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { mstart = pfn_mapped[i].start << PAGE_SHIFT; @@ -122,7 +147,10 @@ static int relocate_restore_code(void) pgd_t *pgd; pud_t *pud; - relocated_restore_code = get_safe_page(GFP_ATOMIC); + do + relocated_restore_code = get_safe_page(GFP_ATOMIC); + while ((relocated_restore_code & PMD_MASK) == (restore_jump_address & PMD_MASK)); + if (!relocated_restore_code) return -ENOMEM; @@ -162,6 +190,11 @@ int swsusp_arch_resume(void) if (error) return error; + /* Prepare a temporary mapping for the jump to the image kernel */ + error = set_up_temporary_text_mapping(); + if (error) + return error; + restore_image(); return 0; } Index: linux-pm/arch/x86/power/hibernate_asm_64.S =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S +++ linux-pm/arch/x86/power/hibernate_asm_64.S @@ -57,6 +57,7 @@ ENTRY(restore_image) /* prepare to jump to the image kernel */ movq restore_jump_address(%rip), %r8 movq restore_cr3(%rip), %r9 + movq jump_level4_pgt(%rip), %r10 /* prepare to switch to temporary page tables */ movq temp_level4_pgt(%rip), %rax @@ -96,6 +97,15 @@ ENTRY(core_restore_code) jmp .Lloop .Ldone: + /* switch to jump page tables */ + movq %r10, %cr3 + /* flush TLB */ + movq %rbx, %rcx + andq $~(X86_CR4_PGE), %rcx + movq %rcx, %cr4; # turn off PGE + movq %cr3, %rcx; # flush TLB + movq %rcx, %cr3; + movq %rbx, %cr4; # turn PGE back on /* jump to the restore_registers address from the image header */ jmpq *%r8