Message ID | 20240409210254.660888920@goodmis.org (mailing list archive) |
---|---|
Headers | show |
Series | pstore/mm/x86: Add wildcard memmap to map pstore consistently | expand |
On Tue, 09 Apr 2024 17:02:54 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> memmap=12M*4096:oops ramoops.mem_name=oops
I forgot to mention that this makes it trivial for any machine that doesn't
clear memory on soft-reboot, to enable console ramoops (to have access to
the last boot dmesg without needing serial).
I tested this on a couple of my test boxes and on QEMU, and it works rather
well.
-- Steve
On Tue, Apr 09, 2024 at 05:23:58PM -0400, Steven Rostedt wrote: > On Tue, 09 Apr 2024 17:02:54 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > memmap=12M*4096:oops ramoops.mem_name=oops > > I forgot to mention that this makes it trivial for any machine that doesn't > clear memory on soft-reboot, to enable console ramoops (to have access to > the last boot dmesg without needing serial). > > I tested this on a couple of my test boxes and on QEMU, and it works rather > well. I've long wanted a "stable for this machine and kernel" memory region like this for pstore. It would make testing much easier.
>> I forgot to mention that this makes it trivial for any machine that doesn't >> clear memory on soft-reboot, to enable console ramoops (to have access to >> the last boot dmesg without needing serial). >> >> I tested this on a couple of my test boxes and on QEMU, and it works rather >> well. > > I've long wanted a "stable for this machine and kernel" memory region > like this for pstore. It would make testing much easier. Which systems does this work on? I'd assume that servers (and anything else with ECC memory) would nuke contents while resetting ECC to clean state. -Tony
> On Apr 10, 2024, at 3:55 AM, Luck, Tony <tony.luck@intel.com> wrote: > > >> >>> I forgot to mention that this makes it trivial for any machine that doesn't >>> clear memory on soft-reboot, to enable console ramoops (to have access to >>> the last boot dmesg without needing serial). >>> >>> I tested this on a couple of my test boxes and on QEMU, and it works rather >>> well. >> >> I've long wanted a "stable for this machine and kernel" memory region >> like this for pstore. It would make testing much easier. > > Which systems does this work on? I'd assume that servers (and anything > else with ECC memory) would nuke contents while resetting ECC to clean > state. If that were the case universally, then ramoops pstore backend would not work either? And yet we get the last kernel logs via the pstore for many years now, on embedded-ish devices. From my reading, ECC-enabled DRAM is not present on lots of systems and IIRC, pstore ramoops has its own ECC. Or did I miss a recent trend with ECC-enabled DRAM? - Joel > > -Tony
On Tue, 9 Apr 2024 22:25:33 +0000 "Luck, Tony" <tony.luck@intel.com> wrote: > >> I forgot to mention that this makes it trivial for any machine that doesn't > >> clear memory on soft-reboot, to enable console ramoops (to have access to > >> the last boot dmesg without needing serial). > >> > >> I tested this on a couple of my test boxes and on QEMU, and it works rather > >> well. > > > > I've long wanted a "stable for this machine and kernel" memory region > > like this for pstore. It would make testing much easier. > > Which systems does this work on? I'd assume that servers (and anything > else with ECC memory) would nuke contents while resetting ECC to clean > state. > Well I tested it on a couple of chromebooks, a test box and a laptop (as well as QEMU). I know that ramoops has an ecc option. I'm guessing that would help here (but I'd have to defer to others to answer that). -- Steve
On Tue, Apr 09, 2024 at 10:25:33PM +0000, Luck, Tony wrote: > >> I forgot to mention that this makes it trivial for any machine that doesn't > >> clear memory on soft-reboot, to enable console ramoops (to have access to > >> the last boot dmesg without needing serial). > >> > >> I tested this on a couple of my test boxes and on QEMU, and it works rather > >> well. > > > > I've long wanted a "stable for this machine and kernel" memory region > > like this for pstore. It would make testing much easier. > > Which systems does this work on? I'd assume that servers (and anything > else with ECC memory) would nuke contents while resetting ECC to clean > state. Do ECC servers wipe their RAM by default? I know that if you build with CONFIG_RESET_ATTACK_MITIGATION=y on an EFI system that supports the MemoryOverwriteRequestControl EFI variable you'll get a RAM wipe...
> Do ECC servers wipe their RAM by default? I know that if you build with > CONFIG_RESET_ATTACK_MITIGATION=y on an EFI system that supports the > MemoryOverwriteRequestControl EFI variable you'll get a RAM wipe... I know that after I've been running RAS tests that inject ECC errors into thousands of pages, those errors all disappear after a reboot. I think some BIOS have options to speed up boot by skipping memory initialization. I don't know if anyone makes that the default mode. -Tony
On 09/04/2024 19:25, Luck, Tony wrote: >>> I forgot to mention that this makes it trivial for any machine that doesn't >>> clear memory on soft-reboot, to enable console ramoops (to have access to >>> the last boot dmesg without needing serial). >>> >>> I tested this on a couple of my test boxes and on QEMU, and it works rather >>> well. >> >> I've long wanted a "stable for this machine and kernel" memory region >> like this for pstore. It would make testing much easier. > > Which systems does this work on? I'd assume that servers (and anything > else with ECC memory) would nuke contents while resetting ECC to clean > state. > > -Tony Thanks Steve! Like Kees, I've been wanting a consistent way of mapping some RAM for pstore for a while, without resorting to platform drivers like Chromebooks do... The idea seems very interesting and helpful, I'll test it here. My only concern / "complain" is that it's currently only implemented for builtin ramoops, which is not the default in many distros (like Arch, Ubuntu, Debian). I read patch 2 (and discussion), so I think would be good to have that builtin helper implemented upfront to allow modular usage of ramoops. Now, responding to Tony: Steam Deck also uses pstore/ram to store logs, and I've tested in my AMD desktop, it does work. Seems disabling memory retraining in BIOS (to speedup boot?) is somewhat common, not sure for servers though. As Joel mentioned as well, quite common to use pstore/ram in ARM embedded world. Cheers, Guilherme
On Thu, 11 Apr 2024 16:11:55 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > Thanks Steve! Like Kees, I've been wanting a consistent way of mapping > some RAM for pstore for a while, without resorting to platform drivers > like Chromebooks do... Great! > > The idea seems very interesting and helpful, I'll test it here. My only > concern / "complain" is that it's currently only implemented for builtin > ramoops, which is not the default in many distros (like Arch, Ubuntu, > Debian). I read patch 2 (and discussion), so I think would be good to > have that builtin helper implemented upfront to allow modular usage of > ramoops. What I think I could do is to have a check after memory is allocated to copy the table mapping (in the heap) if it is filled. The reason I did it this way was because it was the easiest way to save the label to address memory before memory is initialized. I use a __initdata array (why waste memory if it's hardly ever used). But, after memory is initialized, we can check if the table has content, and if so allocate a copy and store it there and use that table instead. That would give modules a way to find the address as well. -- Steve
On 11/04/2024 16:40, Steven Rostedt wrote: > [...] > What I think I could do is to have a check after memory is allocated to > copy the table mapping (in the heap) if it is filled. The reason I did it > this way was because it was the easiest way to save the label to address > memory before memory is initialized. I use a __initdata array (why waste > memory if it's hardly ever used). > > But, after memory is initialized, we can check if the table has content, > and if so allocate a copy and store it there and use that table instead. > That would give modules a way to find the address as well. > > -- Steve > Thanks Steve, seems a good idea. With that, I could test on kdumpst (the tool used on Steam Deck), since it relies on modular pstore/ram. Cheers!
On Fri, 12 Apr 2024 09:17:18 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > Thanks Steve, seems a good idea. With that, I could test on kdumpst (the > tool used on Steam Deck), since it relies on modular pstore/ram. Something like this could work. -- Steve diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index a8831ef30c73..878aee8b2399 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -16,6 +16,7 @@ #include <linux/firmware-map.h> #include <linux/sort.h> #include <linux/memory_hotplug.h> +#include <linux/mm.h> #include <asm/e820/api.h> #include <asm/setup.h> @@ -64,61 +65,6 @@ struct e820_table *e820_table __refdata = &e820_table_init; struct e820_table *e820_table_kexec __refdata = &e820_table_kexec_init; struct e820_table *e820_table_firmware __refdata = &e820_table_firmware_init; -/* For wildcard memory requests, have a table to find them later */ -#define E820_MAX_MAPS 8 -#define E820_MAP_NAME_SIZE 16 -struct e820_mmap_map { - char name[E820_MAP_NAME_SIZE]; - u64 start; - u64 size; -}; -static struct e820_mmap_map e820_mmap_list[E820_MAX_MAPS] __initdata; -static int e820_mmap_size __initdata; - -/* Add wildcard region with a lookup name */ -static int __init e820_add_mmap(u64 start, u64 size, const char *name) -{ - struct e820_mmap_map *map; - - if (!name || !name[0] || strlen(name) >= E820_MAP_NAME_SIZE) - return -EINVAL; - - if (e820_mmap_size >= E820_MAX_MAPS) - return -1; - - map = &e820_mmap_list[e820_mmap_size++]; - map->start = start; - map->size = size; - strcpy(map->name, name); - return 0; -} - -/** - * memmap_named - Find a wildcard region with a given name - * @name: The name that is attached to a wildcard region - * @start: If found, holds the start address - * @size: If found, holds the size of the address. - * - * Returns: 1 if found or 0 if not found. - */ -int __init memmap_named(const char *name, u64 *start, u64 *size) -{ - struct e820_mmap_map *map; - int i; - - for (i = 0; i < e820_mmap_size; i++) { - map = &e820_mmap_list[i]; - if (!map->size) - continue; - if (strcmp(name, map->name) == 0) { - *start = map->start; - *size = map->size; - return 1; - } - } - return 0; -} - /* For PCI or other memory-mapped resources */ unsigned long pci_mem_start = 0xaeedbabe; #ifdef CONFIG_PCI @@ -1024,6 +970,8 @@ static int __init parse_memmap_one(char *p) e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); } else if (*p == '*') { u64 align; + int ret; + /* Followed by alignment and ':' then the name */ align = memparse(p+1, &p); start_at = e820__region(mem_size, align); @@ -1032,9 +980,10 @@ static int __init parse_memmap_one(char *p) if (*p != ':') return -EINVAL; p++; - e820_add_mmap(start_at, mem_size, p); + ret = memmap_add(start_at, mem_size, p); p += strlen(p); - e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); + if (!ret) + e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); } else if (*p == '!') { start_at = memparse(p+1, &p); e820__range_add(start_at, mem_size, E820_TYPE_PRAM); diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index c200388399fb..22d2e2731dc2 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -919,7 +919,6 @@ static void __init ramoops_register_dummy(void) { struct ramoops_platform_data pdata; -#ifndef MODULE /* Only allowed when builtin */ if (mem_name) { u64 start; @@ -930,7 +929,6 @@ static void __init ramoops_register_dummy(void) mem_size = size; } } -#endif /* * Prepare a dummy platform data structure to carry the module diff --git a/include/linux/mm.h b/include/linux/mm.h index cf9b34454c6f..6ce1c6929d1f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4203,5 +4203,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn) } int memmap_named(const char *name, u64 *start, u64 *size); +int memmap_add(long start, long size, const char *name); #endif /* _LINUX_MM_H */ diff --git a/mm/memory.c b/mm/memory.c index 7a29f17df7c1..fe054e1bb678 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -120,12 +120,6 @@ static bool vmf_orig_pte_uffd_wp(struct vm_fault *vmf) return pte_marker_uffd_wp(vmf->orig_pte); } -int __init __weak memmap_named(const char *name, u64 *start, u64 *size) -{ - pr_info("Kernel command line: memmap=nn*align:name not supported on this kernel"); - /* zero means not found */ - return 0; -} /* * A number of key systems in x86 including ioremap() rely on the assumption diff --git a/mm/mm_init.c b/mm/mm_init.c index 549e76af8f82..e5b729b83fdc 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -154,6 +154,77 @@ static __init int set_mminit_loglevel(char *str) early_param("mminit_loglevel", set_mminit_loglevel); #endif /* CONFIG_DEBUG_MEMORY_INIT */ +/* For wildcard memory requests, have a table to find them later */ +#define MAX_MAPS 8 +#define MAP_NAME_SIZE 16 +struct mmap_map { + char name[MAP_NAME_SIZE]; + long start; + long size; +}; +static struct mmap_map early_mmap_list[MAX_MAPS] __initdata; +static int early_mmap_size __initdata; +static struct mmap_map *mmap_list; + +/* Add wildcard region with a lookup name */ +int memmap_add(long start, long size, const char *name) +{ + struct mmap_map *map; + + if (!name || !name[0] || strlen(name) >= MAP_NAME_SIZE) + return -EINVAL; + + if (early_mmap_size >= MAX_MAPS) + return -1; + + map = &early_mmap_list[early_mmap_size++]; + map->start = start; + map->size = size; + strcpy(map->name, name); + return 0; +} + +static void __init memmap_copy(void) +{ + if (!early_mmap_size) + return; + + mmap_list = kcalloc(early_mmap_size + 1, sizeof(mmap_list), GFP_KERNEL); + if (!mmap_list) + return; + + for (int i = 0; i < early_mmap_size; i++) + mmap_list[i] = early_mmap_list[i]; +} + +/** + * memmap_named - Find a wildcard region with a given name + * @name: The name that is attached to a wildcard region + * @start: If found, holds the start address + * @size: If found, holds the size of the address. + * + * Returns: 1 if found or 0 if not found. + */ +int memmap_named(const char *name, u64 *start, u64 *size) +{ + struct mmap_map *map; + + if (!mmap_list) + return 0; + + for (int i = 0; mmap_list[i].name[0]; i++) { + map = &mmap_list[i]; + if (!map->size) + continue; + if (strcmp(name, map->name) == 0) { + *start = map->start; + *size = map->size; + return 1; + } + } + return 0; +} + struct kobject *mm_kobj; #ifdef CONFIG_SMP @@ -2793,4 +2864,5 @@ void __init mm_core_init(void) pti_init(); kmsan_init_runtime(); mm_cache_init(); + memmap_copy(); }
On Fri, Apr 12, 2024 at 01:22:43PM -0400, Steven Rostedt wrote: > On Fri, 12 Apr 2024 09:17:18 -0300 > "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > > > Thanks Steve, seems a good idea. With that, I could test on kdumpst (the > > tool used on Steam Deck), since it relies on modular pstore/ram. > > Something like this could work. > > -- Steve > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index a8831ef30c73..878aee8b2399 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -16,6 +16,7 @@ > #include <linux/firmware-map.h> > #include <linux/sort.h> > #include <linux/memory_hotplug.h> > +#include <linux/mm.h> > > #include <asm/e820/api.h> > #include <asm/setup.h> > @@ -64,61 +65,6 @@ struct e820_table *e820_table __refdata = &e820_table_init; > struct e820_table *e820_table_kexec __refdata = &e820_table_kexec_init; > struct e820_table *e820_table_firmware __refdata = &e820_table_firmware_init; > > -/* For wildcard memory requests, have a table to find them later */ > -#define E820_MAX_MAPS 8 > -#define E820_MAP_NAME_SIZE 16 > -struct e820_mmap_map { > - char name[E820_MAP_NAME_SIZE]; > - u64 start; > - u64 size; > -}; > -static struct e820_mmap_map e820_mmap_list[E820_MAX_MAPS] __initdata; > -static int e820_mmap_size __initdata; > - > -/* Add wildcard region with a lookup name */ > -static int __init e820_add_mmap(u64 start, u64 size, const char *name) > -{ > - struct e820_mmap_map *map; > - > - if (!name || !name[0] || strlen(name) >= E820_MAP_NAME_SIZE) > - return -EINVAL; > - > - if (e820_mmap_size >= E820_MAX_MAPS) > - return -1; > - > - map = &e820_mmap_list[e820_mmap_size++]; > - map->start = start; > - map->size = size; > - strcpy(map->name, name); > - return 0; > -} > - > -/** > - * memmap_named - Find a wildcard region with a given name > - * @name: The name that is attached to a wildcard region > - * @start: If found, holds the start address > - * @size: If found, holds the size of the address. > - * > - * Returns: 1 if found or 0 if not found. > - */ > -int __init memmap_named(const char *name, u64 *start, u64 *size) > -{ > - struct e820_mmap_map *map; > - int i; > - > - for (i = 0; i < e820_mmap_size; i++) { > - map = &e820_mmap_list[i]; > - if (!map->size) > - continue; > - if (strcmp(name, map->name) == 0) { > - *start = map->start; > - *size = map->size; > - return 1; > - } > - } > - return 0; > -} > - > /* For PCI or other memory-mapped resources */ > unsigned long pci_mem_start = 0xaeedbabe; > #ifdef CONFIG_PCI > @@ -1024,6 +970,8 @@ static int __init parse_memmap_one(char *p) > e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); > } else if (*p == '*') { > u64 align; > + int ret; > + > /* Followed by alignment and ':' then the name */ > align = memparse(p+1, &p); > start_at = e820__region(mem_size, align); > @@ -1032,9 +980,10 @@ static int __init parse_memmap_one(char *p) > if (*p != ':') > return -EINVAL; > p++; > - e820_add_mmap(start_at, mem_size, p); > + ret = memmap_add(start_at, mem_size, p); > p += strlen(p); > - e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); > + if (!ret) > + e820__range_add(start_at, mem_size, E820_TYPE_RESERVED); > } else if (*p == '!') { > start_at = memparse(p+1, &p); > e820__range_add(start_at, mem_size, E820_TYPE_PRAM); > diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c > index c200388399fb..22d2e2731dc2 100644 > --- a/fs/pstore/ram.c > +++ b/fs/pstore/ram.c > @@ -919,7 +919,6 @@ static void __init ramoops_register_dummy(void) > { > struct ramoops_platform_data pdata; > > -#ifndef MODULE > /* Only allowed when builtin */ > if (mem_name) { > u64 start; > @@ -930,7 +929,6 @@ static void __init ramoops_register_dummy(void) > mem_size = size; > } > } > -#endif > > /* > * Prepare a dummy platform data structure to carry the module > diff --git a/include/linux/mm.h b/include/linux/mm.h > index cf9b34454c6f..6ce1c6929d1f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -4203,5 +4203,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn) > } > > int memmap_named(const char *name, u64 *start, u64 *size); > +int memmap_add(long start, long size, const char *name); > > #endif /* _LINUX_MM_H */ > diff --git a/mm/memory.c b/mm/memory.c > index 7a29f17df7c1..fe054e1bb678 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -120,12 +120,6 @@ static bool vmf_orig_pte_uffd_wp(struct vm_fault *vmf) > return pte_marker_uffd_wp(vmf->orig_pte); > } > > -int __init __weak memmap_named(const char *name, u64 *start, u64 *size) > -{ > - pr_info("Kernel command line: memmap=nn*align:name not supported on this kernel"); > - /* zero means not found */ > - return 0; > -} > > /* > * A number of key systems in x86 including ioremap() rely on the assumption > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 549e76af8f82..e5b729b83fdc 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -154,6 +154,77 @@ static __init int set_mminit_loglevel(char *str) > early_param("mminit_loglevel", set_mminit_loglevel); > #endif /* CONFIG_DEBUG_MEMORY_INIT */ > > +/* For wildcard memory requests, have a table to find them later */ > +#define MAX_MAPS 8 > +#define MAP_NAME_SIZE 16 > +struct mmap_map { > + char name[MAP_NAME_SIZE]; > + long start; > + long size; > +}; > +static struct mmap_map early_mmap_list[MAX_MAPS] __initdata; > +static int early_mmap_size __initdata; > +static struct mmap_map *mmap_list; > + > +/* Add wildcard region with a lookup name */ > +int memmap_add(long start, long size, const char *name) > +{ > + struct mmap_map *map; > + > + if (!name || !name[0] || strlen(name) >= MAP_NAME_SIZE) > + return -EINVAL; > + > + if (early_mmap_size >= MAX_MAPS) > + return -1; > + > + map = &early_mmap_list[early_mmap_size++]; > + map->start = start; > + map->size = size; > + strcpy(map->name, name); > + return 0; > +} > + > +static void __init memmap_copy(void) > +{ > + if (!early_mmap_size) > + return; > + > + mmap_list = kcalloc(early_mmap_size + 1, sizeof(mmap_list), GFP_KERNEL); We can keep early_mmap_size after boot and then we don't need to allocate an extra element in the mmap_list. No strong feeling here, though. > + if (!mmap_list) > + return; > + > + for (int i = 0; i < early_mmap_size; i++) > + mmap_list[i] = early_mmap_list[i]; > +} With something like this /* * Parse early_reserve_mem=nn:align:name */ static int __init early_reserve_mem(char *p) { phys_addr_t start, size, align; char *oldp; int err; if (!p) return -EINVAL; oldp = p; size = memparse(p, &p); if (p == oldp) return -EINVAL; if (*p != ':') return -EINVAL; align = memparse(p+1, &p); if (*p != ':') return -EINVAL; start = memblock_phys_alloc(size, align); if (!start) return -ENOMEM; p++; err = memmap_add(start, size, p); if (err) { memblock_phys_free(start, size); return err; } p += strlen(p); return *p == '\0' ? 0: -EINVAL; } __setup("early_reserve_mem=", early_reserve_mem); you don't need to touch e820 and it will work the same for all architectures. We'd need a better naming, but I couldn't think of something better yet. > + > +/** > + * memmap_named - Find a wildcard region with a given name > + * @name: The name that is attached to a wildcard region > + * @start: If found, holds the start address > + * @size: If found, holds the size of the address. > + * > + * Returns: 1 if found or 0 if not found. > + */ > +int memmap_named(const char *name, u64 *start, u64 *size) > +{ > + struct mmap_map *map; > + > + if (!mmap_list) > + return 0; > + > + for (int i = 0; mmap_list[i].name[0]; i++) { > + map = &mmap_list[i]; > + if (!map->size) > + continue; > + if (strcmp(name, map->name) == 0) { > + *start = map->start; > + *size = map->size; > + return 1; > + } > + } > + return 0; > +} > + > struct kobject *mm_kobj; > > #ifdef CONFIG_SMP > @@ -2793,4 +2864,5 @@ void __init mm_core_init(void) > pti_init(); > kmsan_init_runtime(); > mm_cache_init(); > + memmap_copy(); > }
On Wed, 1 May 2024 17:45:49 +0300 Mike Rapoport <rppt@kernel.org> wrote: > > +static void __init memmap_copy(void) > > +{ > > + if (!early_mmap_size) > > + return; > > + > > + mmap_list = kcalloc(early_mmap_size + 1, sizeof(mmap_list), GFP_KERNEL); > > We can keep early_mmap_size after boot and then we don't need to allocate > an extra element in the mmap_list. No strong feeling here, though. > > > + if (!mmap_list) > > + return; > > + > > + for (int i = 0; i < early_mmap_size; i++) > > + mmap_list[i] = early_mmap_list[i]; > > +} > > With something like this > > /* > * Parse early_reserve_mem=nn:align:name > */ > static int __init early_reserve_mem(char *p) > { > phys_addr_t start, size, align; > char *oldp; > int err; > > if (!p) > return -EINVAL; > > oldp = p; > size = memparse(p, &p); > if (p == oldp) > return -EINVAL; > > if (*p != ':') > return -EINVAL; > > align = memparse(p+1, &p); > if (*p != ':') > return -EINVAL; > > start = memblock_phys_alloc(size, align); So this will allocate the same physical location for every boot, if booting the same kernel and having the same physical memory layout? -- Steve > if (!start) > return -ENOMEM; > > p++; > err = memmap_add(start, size, p); > if (err) { > memblock_phys_free(start, size); > return err; > } > > p += strlen(p); > > return *p == '\0' ? 0: -EINVAL; > } > __setup("early_reserve_mem=", early_reserve_mem); > > you don't need to touch e820 and it will work the same for all > architectures. > > We'd need a better naming, but I couldn't think of something better yet. > > > + > > +/** > > + * memmap_named - Find a wildcard region with a given name > > + * @name: The name that is attached to a wildcard region > > + * @start: If found, holds the start address > > + * @size: If found, holds the size of the address. > > + * > > + * Returns: 1 if found or 0 if not found. > > + */ > > +int memmap_named(const char *name, u64 *start, u64 *size) > > +{ > > + struct mmap_map *map; > > + > > + if (!mmap_list) > > + return 0; > > + > > + for (int i = 0; mmap_list[i].name[0]; i++) { > > + map = &mmap_list[i]; > > + if (!map->size) > > + continue; > > + if (strcmp(name, map->name) == 0) { > > + *start = map->start; > > + *size = map->size; > > + return 1; > > + } > > + } > > + return 0; > > +} > > + > > struct kobject *mm_kobj; > > > > #ifdef CONFIG_SMP > > @@ -2793,4 +2864,5 @@ void __init mm_core_init(void) > > pti_init(); > > kmsan_init_runtime(); > > mm_cache_init(); > > + memmap_copy(); > > } >
On Wed, May 01, 2024 at 10:54:55AM -0400, Steven Rostedt wrote: > On Wed, 1 May 2024 17:45:49 +0300 > Mike Rapoport <rppt@kernel.org> wrote: > > > > +static void __init memmap_copy(void) > > > +{ > > > + if (!early_mmap_size) > > > + return; > > > + > > > + mmap_list = kcalloc(early_mmap_size + 1, sizeof(mmap_list), GFP_KERNEL); > > > > We can keep early_mmap_size after boot and then we don't need to allocate > > an extra element in the mmap_list. No strong feeling here, though. > > > > > + if (!mmap_list) > > > + return; > > > + > > > + for (int i = 0; i < early_mmap_size; i++) > > > + mmap_list[i] = early_mmap_list[i]; > > > +} > > > > With something like this > > > > /* > > * Parse early_reserve_mem=nn:align:name > > */ > > static int __init early_reserve_mem(char *p) > > { > > phys_addr_t start, size, align; > > char *oldp; > > int err; > > > > if (!p) > > return -EINVAL; > > > > oldp = p; > > size = memparse(p, &p); > > if (p == oldp) > > return -EINVAL; > > > > if (*p != ':') > > return -EINVAL; > > > > align = memparse(p+1, &p); > > if (*p != ':') > > return -EINVAL; > > > > start = memblock_phys_alloc(size, align); > > So this will allocate the same physical location for every boot, if booting > the same kernel and having the same physical memory layout? Up to kaslr that might use that location for the kernel image. But it's the same as allocating from e820 after kaslr. And, TBH, I don't have good ideas how to ensure the same physical location with randomization of the physical address of the kernel image. > -- Steve > > > > if (!start) > > return -ENOMEM; > > > > p++; > > err = memmap_add(start, size, p); > > if (err) { > > memblock_phys_free(start, size); > > return err; > > } > > > > p += strlen(p); > > > > return *p == '\0' ? 0: -EINVAL; > > } > > __setup("early_reserve_mem=", early_reserve_mem); > > > > you don't need to touch e820 and it will work the same for all > > architectures. > > > > We'd need a better naming, but I couldn't think of something better yet. > > > > > + > > > +/** > > > + * memmap_named - Find a wildcard region with a given name > > > + * @name: The name that is attached to a wildcard region > > > + * @start: If found, holds the start address > > > + * @size: If found, holds the size of the address. > > > + * > > > + * Returns: 1 if found or 0 if not found. > > > + */ > > > +int memmap_named(const char *name, u64 *start, u64 *size) > > > +{ > > > + struct mmap_map *map; > > > + > > > + if (!mmap_list) > > > + return 0; > > > + > > > + for (int i = 0; mmap_list[i].name[0]; i++) { > > > + map = &mmap_list[i]; > > > + if (!map->size) > > > + continue; > > > + if (strcmp(name, map->name) == 0) { > > > + *start = map->start; > > > + *size = map->size; > > > + return 1; > > > + } > > > + } > > > + return 0; > > > +} > > > + > > > struct kobject *mm_kobj; > > > > > > #ifdef CONFIG_SMP > > > @@ -2793,4 +2864,5 @@ void __init mm_core_init(void) > > > pti_init(); > > > kmsan_init_runtime(); > > > mm_cache_init(); > > > + memmap_copy(); > > > } > > >
On Wed, 1 May 2024 18:30:40 +0300 Mike Rapoport <rppt@kernel.org> wrote: > > > /* > > > * Parse early_reserve_mem=nn:align:name > > > */ > > > static int __init early_reserve_mem(char *p) > > > { > > > phys_addr_t start, size, align; > > > char *oldp; > > > int err; > > > > > > if (!p) > > > return -EINVAL; > > > > > > oldp = p; > > > size = memparse(p, &p); > > > if (p == oldp) > > > return -EINVAL; > > > > > > if (*p != ':') > > > return -EINVAL; > > > > > > align = memparse(p+1, &p); > > > if (*p != ':') > > > return -EINVAL; > > > > > > start = memblock_phys_alloc(size, align); > > > > So this will allocate the same physical location for every boot, if booting > > the same kernel and having the same physical memory layout? > > Up to kaslr that might use that location for the kernel image. > But it's the same as allocating from e820 after kaslr. > > And, TBH, I don't have good ideas how to ensure the same physical location > with randomization of the physical address of the kernel image. I'll try it out. Looking at arch/x86/boot/compressed/kaslr.c, if I read the code correctly, it creates up to a 100 slots to store the kernel. The method I used was to make sure that the allocation was always done at the top address of memory, which I think would in most cases never be assigned by KASLR. This looks to just grab the next available physical address, which KASLR can most definitely mess with. I would still like to get the highest address possible. -- Steve
On Wed, May 01, 2024 at 12:09:04PM -0400, Steven Rostedt wrote: > On Wed, 1 May 2024 18:30:40 +0300 > Mike Rapoport <rppt@kernel.org> wrote: > > > > > /* > > > > * Parse early_reserve_mem=nn:align:name > > > > */ > > > > static int __init early_reserve_mem(char *p) > > > > { > > > > phys_addr_t start, size, align; > > > > char *oldp; > > > > int err; > > > > > > > > if (!p) > > > > return -EINVAL; > > > > > > > > oldp = p; > > > > size = memparse(p, &p); > > > > if (p == oldp) > > > > return -EINVAL; > > > > > > > > if (*p != ':') > > > > return -EINVAL; > > > > > > > > align = memparse(p+1, &p); > > > > if (*p != ':') > > > > return -EINVAL; > > > > > > > > start = memblock_phys_alloc(size, align); > > > > > > So this will allocate the same physical location for every boot, if booting > > > the same kernel and having the same physical memory layout? > > > > Up to kaslr that might use that location for the kernel image. > > But it's the same as allocating from e820 after kaslr. > > > > And, TBH, I don't have good ideas how to ensure the same physical location > > with randomization of the physical address of the kernel image. > > I'll try it out. Looking at arch/x86/boot/compressed/kaslr.c, if I read the > code correctly, it creates up to a 100 slots to store the kernel. > > The method I used was to make sure that the allocation was always done at > the top address of memory, which I think would in most cases never be > assigned by KASLR. > > This looks to just grab the next available physical address, which KASLR > can most definitely mess with. On x86 memblock allocates from top of the memory. As this will run later than e820, the allocation will be lower than from e820, but still close to the top of the memory. > I would still like to get the highest address possible. > > -- Steve
On Wed, 1 May 2024 18:30:40 +0300 Mike Rapoport <rppt@kernel.org> wrote: > > So this will allocate the same physical location for every boot, if booting > > the same kernel and having the same physical memory layout? > > Up to kaslr that might use that location for the kernel image. > But it's the same as allocating from e820 after kaslr. > > And, TBH, I don't have good ideas how to ensure the same physical location > with randomization of the physical address of the kernel image. > I tried this approach and it unfortunately picks a different physical location every time :-( So it is either adding to e820 tables or we create a new way to allocate memory at early boot up. Below is the patch I used. -- Steve diff --git a/include/linux/mm.h b/include/linux/mm.h index b6bdaa18b9e9..74aaf0bcb363 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4204,4 +4204,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn) return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE); } +int memmap_named(const char *name, unsigned long *start, unsigned long *size); + #endif /* _LINUX_MM_H */ diff --git a/mm/memblock.c b/mm/memblock.c index d09136e040d3..3c015395d262 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2243,6 +2243,101 @@ void __init memblock_free_all(void) pages = free_low_memory_core_early(); totalram_pages_add(pages); } +/* For wildcard memory requests, have a table to find them later */ +#define MEMMAP_MAX_MAPS 8 +#define MEMMAP_NAME_SIZE 16 +struct memmap_map { + char name[MEMMAP_NAME_SIZE]; + unsigned long start; + unsigned long size; +}; +static struct memmap_map memmap_list[MEMMAP_MAX_MAPS] __initdata; +static int memmap_size __initdata; + +/* Add wildcard region with a lookup name */ +static int __init memmap_add(u64 start, u64 size, const char *name) +{ + struct memmap_map *map; + + if (!name || !name[0] || strlen(name) >= MEMMAP_NAME_SIZE) + return -EINVAL; + + if (memmap_size >= MEMMAP_MAX_MAPS) + return -1; + + map = &memmap_list[memmap_size++]; + map->start = start; + map->size = size; + strcpy(map->name, name); + return 0; +} + +/** + * memmap_named - Find a wildcard region with a given name + * @name: The name that is attached to a wildcard region + * @start: If found, holds the start address + * @size: If found, holds the size of the address. + * + * Returns: 1 if found or 0 if not found. + */ +int __init memmap_named(const char *name, unsigned long *start, unsigned long *size) +{ + struct memmap_map *map; + int i; + + for (i = 0; i < memmap_size; i++) { + map = &memmap_list[i]; + if (!map->size) + continue; + if (strcmp(name, map->name) == 0) { + *start = map->start; + *size = map->size; + return 1; + } + } + return 0; +} + +/* + * Parse early_reserve_mem=nn:align:name + */ +static int __init early_reserve_mem(char *p) +{ + phys_addr_t start, size, align; + char *oldp; + int err; + + if (!p) + return -EINVAL; + + oldp = p; + size = memparse(p, &p); + if (p == oldp) + return -EINVAL; + + if (*p != ':') + return -EINVAL; + + align = memparse(p+1, &p); + if (*p != ':') + return -EINVAL; + + start = memblock_phys_alloc(size, align); + if (!start) + return -ENOMEM; + + p++; + err = memmap_add(start, size, p); + if (err) { + memblock_phys_free(start, size); + return err; + } + + p += strlen(p); + + return *p == '\0' ? 0: -EINVAL; +} +__setup("early_reserve_mem=", early_reserve_mem); #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK) static const char * const flagname[] = {
On Thu, 9 May 2024 00:00:23 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > I tried this approach and it unfortunately picks a different physical > location every time :-( > > So it is either adding to e820 tables or we create a new way to > allocate memory at early boot up. > Hmm, now I'm testing it more and it always seems to end up in the same location. I'm not sure why it failed the first three times I tried it :-/ -- Steve
On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote: > On Thu, 9 May 2024 00:00:23 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > I tried this approach and it unfortunately picks a different physical > > location every time :-( > > > > So it is either adding to e820 tables or we create a new way to > > allocate memory at early boot up. > > > > Hmm, now I'm testing it more and it always seems to end up in the same > location. I'm not sure why it failed the first three times I tried it :-/ If the kernel and the command line were the same, they shouldn't have. e820 translates to memblock the same way every boot and the early allocations also do not change from boot to boot. Could it be that three runs that failed had some differences in the kernel parameters? > -- Steve
On Thu, 9 May 2024 23:24:20 +0300 Mike Rapoport <rppt@kernel.org> wrote: > On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote: > > On Thu, 9 May 2024 00:00:23 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > I tried this approach and it unfortunately picks a different physical > > > location every time :-( > > > > > > So it is either adding to e820 tables or we create a new way to > > > allocate memory at early boot up. > > > > > > > Hmm, now I'm testing it more and it always seems to end up in the same > > location. I'm not sure why it failed the first three times I tried it :-/ > > If the kernel and the command line were the same, they shouldn't have. > e820 translates to memblock the same way every boot and the early > allocations also do not change from boot to boot. > > Could it be that three runs that failed had some differences in the kernel > parameters? > I wonder if KASLR caused it or not. But when I first added it to another machine, it failed to get the same address on the second boot, but was fine after that. Could be just something with my setup. I'm going to backport this to 5.15 and test this on a Chromebook and see what happens there (as that's the motivation behind this work). -- Steve