Message ID | 201301081747.17853.trenn@suse.de (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Tue, Jan 8, 2013 at 8:47 AM, Thomas Renninger <trenn@suse.de> wrote: > On Tuesday, January 08, 2013 04:04:56 AM Yinghai Lu wrote: >> On Mon, Jan 7, 2013 at 4:42 PM, Thomas Renninger <trenn@suse.de> wrote: >> > memmap=256M$3584M >> >> may need to change to: >> >> memmap=256M\$\$3584M > The problem is (beside the special char $) that > memmap=exactmap boot param resets all e820 maps every time the > parameter is processed. > And: > /sbin/kexec -p xy --append="..." --initrd yx > seem to magically add (append): > memmap=exactmap memmap=640K@0K memmap=392556K@115328K elfcorehdr=507884K memmap=252K#3099760K > > therefore all memmap= I try to pass are voided out by: > memmap=exactmap > which is always added by kexec after my params. > > I could come around with attached patch and passing: > /sbin/kexec -p xy --append='... memmap=exactmap memmap=256M$3584M' --initrd yx > > Now mmconfig is working in kdump kernel. > This would mean mmconfig is broken by design in kexec? > > Only way to fix this I can think of is to export > mmconfig area through /sys (../kernel/debug/mmconfig?, possibly > already in X$Y format?) in the productive kernel and make kexec add it > like the other memmap= params automatically. > > I'll attach the output in a separate mail. > > Thomas > > --- > > x86 e820: Do not reset e820 map twice, even if memmap=exactmap is passed as boot param several times > > This is needed to be able to explicitly pass (debug) e820 > modifications through kexec via memmap=. > Otherwise those get voided by kexec appending memmap=exactmap always > after the user defined boot parameters. > > Signed-off-by: Thomas Renninger <trenn@suse.de> > > linux-2.6_t/arch/x86/kernel/e820.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > Index: git/linux-2.6_t/arch/x86/kernel/e820.c > =================================================================== > --- git.orig/linux-2.6_t/arch/x86/kernel/e820.c > +++ git/linux-2.6_t/arch/x86/kernel/e820.c > @@ -845,7 +845,9 @@ static int __init parse_memmap_opt(char > > if (!strncmp(p, "exactmap", 8)) { > #ifdef CONFIG_CRASH_DUMP > - /* > + /* memmap=exactmap passed twice, do not reset tables again */ > + if (saved_max_pfn) > + return 0; /* > * If we are doing a crash dump, we still need to know > * the real mem size before original memory map is > * reset. that exactmap logic still have problem: We need to check exactmap at first, aka need to scan the whole comand line to see if exactmap is there at first and reset e820 tables then handle other memmap opt. Also please update your patch after tip/x86/mm2 I have one patch that process memmap= with "," there. http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f581bb4c35589ac046b0cfc0deb7f369fc85 We could put exactmap scanning in new parse_memmap_opt. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote: ... > > that exactmap logic still have problem: > We need to check exactmap at first, aka need to scan the whole comand line > to see if exactmap is there at first and reset e820 tables then handle > other memmap opt. > > Also please update your patch after > > tip/x86/mm2 > > I have one patch that process memmap= with "," there. > > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58 > 1bb4c35589ac046b0cfc0deb7f369fc85 > > We could put exactmap scanning in new parse_memmap_opt. I still do not understand why: Kexec (kexec/firmware_memmap.c) is setting up the e820 map from: /sys/firmware/memmap/* and pass it via bootloader structures. And this e820 table gets immediately voided by memmap=exactmap and a new one passed via boot parameters is set up. If I read this correctly, this is what happens? Can kexec simply pass the memory to use via memmap=X@Y Then take the original e820 table, but not the usable entries (those are coming from above memmap=X@Y). That would mean that the kexec kernel takes all the original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries from the original e820 table and identifies the usable memory from memmap boot param? This would be much smarter than trying to pass the mmconf reserved area and I could imagine other issues will show up if the reserved areas do not match the original ones in the kexec kernel. If this really can be done and memmap=exactmap was only used by kexec, it's logic could be redefined from "drop all e820 entries" to "drop all usable e820 entries" and no further adjustings in kexec/kernel are needed to get mmconf working (and other issues may be avoided before they happen). Beside that ACPI reserved aread is not needed anymore to get passed via memmap=X#Y by kexec. Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 10, 2013 at 04:21:49AM +0100, Thomas Renninger wrote: > On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote: > ... > > > > that exactmap logic still have problem: > > We need to check exactmap at first, aka need to scan the whole comand line > > to see if exactmap is there at first and reset e820 tables then handle > > other memmap opt. > > > > Also please update your patch after > > > > tip/x86/mm2 > > > > I have one patch that process memmap= with "," there. > > > > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58 > > 1bb4c35589ac046b0cfc0deb7f369fc85 > > > > We could put exactmap scanning in new parse_memmap_opt. > > I still do not understand why: > > Kexec (kexec/firmware_memmap.c) is setting up the e820 map from: > /sys/firmware/memmap/* Kexe reads it from /sys/firmware/memmap/* because that's supposed to be e820 map as passed by BIOS. And if user has specified some command line options like mem=X, /sys/firmware/memmap/* is not truncated but one exported using /proc/iomem is truncated accordingly. So if we pass mem=1G in first kernel but not in second kernel, we want second kernel to see whole of the system memory and not just 1G. > and pass it via bootloader structures. That's how bootloader is supposed to pass e820 memory map to kernel and kexec is a bootloader. > And this e820 table gets immediately voided by memmap=exactmap > and a new one passed via boot parameters is set up. > If I read this correctly, this is what happens? This happens only in case of kdump and not kexec. In case of kdump we want second kernel to use only selected memory areas. In fact this is one improvement area. Instead of using memmap= entries in kdump case, we should probably modify the e820 map passed in zero page and get rid of memmap= entries. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > This happens only in case of kdump and not kexec. In case of kdump > we want second kernel to use only selected memory areas. > > In fact this is one improvement area. Instead of using memmap= entries > in kdump case, we should probably modify the e820 map passed in > zero page and get rid of memmap= entries. then how the kdump kernel get saved_max_pfn? may have problem for read_oldmem with crashed kernel. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 10, 2013 at 08:53:18AM -0800, Yinghai Lu wrote: > On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > > > This happens only in case of kdump and not kexec. In case of kdump > > we want second kernel to use only selected memory areas. > > > > In fact this is one improvement area. Instead of using memmap= entries > > in kdump case, we should probably modify the e820 map passed in > > zero page and get rid of memmap= entries. > > then how the kdump kernel get saved_max_pfn? > Oh, I forgot about that. May be we can pass saved_max_pfn on command line instead of passing all memmap= entries. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 10, 2013 at 9:01 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Jan 10, 2013 at 08:53:18AM -0800, Yinghai Lu wrote: >> On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote: >> > >> > This happens only in case of kdump and not kexec. In case of kdump >> > we want second kernel to use only selected memory areas. >> > >> > In fact this is one improvement area. Instead of using memmap= entries >> > in kdump case, we should probably modify the e820 map passed in >> > zero page and get rid of memmap= entries. >> >> then how the kdump kernel get saved_max_pfn? >> > > Oh, I forgot about that. May be we can pass saved_max_pfn on command line > instead of passing all memmap= entries. then we create another kernel version/kexec tools dependency. kexec tools need to check kernel version, to see if saved_max_pfn= is supported. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 9, 2013 at 7:21 PM, Thomas Renninger <trenn@suse.de> wrote: > I still do not understand why: > > Kexec (kexec/firmware_memmap.c) is setting up the e820 map from: > /sys/firmware/memmap/* > and pass it via bootloader structures. > And this e820 table gets immediately voided by memmap=exactmap > and a new one passed via boot parameters is set up. > If I read this correctly, this is what happens? yes, kdump scripts append those memmap. > > Can kexec simply pass the memory to use via memmap=X@Y > Then take the original e820 table, but not the usable entries (those > are coming from above memmap=X@Y). > That would mean that the kexec kernel takes all the > original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries > from the original e820 table and identifies the usable memory from > memmap boot param? kdump scripts already do that for acpi regions, need to update it to append that for mmconf. > > This would be much smarter than trying to pass the mmconf reserved > area and I could imagine other issues will show up if the reserved areas > do not match the original ones in the kexec kernel. > > If this really can be done and memmap=exactmap was only used by kexec, > it's logic could be redefined from "drop all e820 entries" to > "drop all usable e820 entries" and no further adjustings in kexec/kernel are > needed to get mmconf working (and other issues may be avoided before > they happen). Beside that ACPI reserved aread is not needed anymore to get > passed via memmap=X#Y by kexec. yes, we have other user for debug like simulating user memmap for some bugs. current problem for exactmap is that we don't scan that at first. attached patch could help that. Thanks Yinghai
Index: git/linux-2.6_t/arch/x86/kernel/e820.c =================================================================== --- git.orig/linux-2.6_t/arch/x86/kernel/e820.c +++ git/linux-2.6_t/arch/x86/kernel/e820.c @@ -845,7 +845,9 @@ static int __init parse_memmap_opt(char if (!strncmp(p, "exactmap", 8)) { #ifdef CONFIG_CRASH_DUMP - /* + /* memmap=exactmap passed twice, do not reset tables again */ + if (saved_max_pfn) + return 0; /* * If we are doing a crash dump, we still need to know * the real mem size before original memory map is * reset.