diff mbox series

[v2,1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

Message ID 20240606150316.751642266@goodmis.org (mailing list archive)
State New
Headers show
Series mm/memblock: Add "reserve_mem" to reserved named memory at boot up | expand

Commit Message

Steven Rostedt June 6, 2024, 3:01 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In order to allow for requesting a memory region that can be used for
things like pstore on multiple machines where the memory layout is not the
same, add a new option to the kernel command line called "reserve_mem".

The format is:  reserve_mem=nn:align:name

Where it will find nn amount of memory at the given alignment of align.
The name field is to allow another subsystem to retrieve where the memory
was found. For example:

  reserve_mem=12M:4096:oops ramoops.mem_name=oops

Where ramoops.mem_name will tell ramoops that memory was reserved for it
via the reserve_mem option and it can find it by calling:

  if (reserve_mem_find_by_name("oops", &start, &size)) {
	// start holds the start address and size holds the size given

This is typically used for systems that do not wipe the RAM, and this
command line will try to reserve the same physical memory on soft reboots.
Note, it is not guaranteed to be the same location. For example, if KASLR
places the kernel at the location of where the RAM reservation was from a
previous boot, the new reservation will be at a different location.  Any
subsystem using this feature must add a way to verify that the contents of
the physical memory is from a previous boot, as there may be cases where
the memory will not be located at the same location.

Not all systems may work either. There could be bit flips if the reboot
goes through the BIOS. Using kexec to reboot the machine is likely to
have better results in such cases.

Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/

Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 .../admin-guide/kernel-parameters.txt         | 20 ++++
 include/linux/mm.h                            |  2 +
 mm/memblock.c                                 | 97 +++++++++++++++++++
 3 files changed, 119 insertions(+)

Comments

Guilherme G. Piccoli June 7, 2024, 7:35 p.m. UTC | #1
Hi Steve, thanks for the patch! Some suggestions/fixes below, inline.


On 06/06/2024 12:01, Steven Rostedt wrote:
> [...]
> +
> +			The format is size:align:label for example, to request
> +			12 megabytes of 4096 alignment for ramoops:
> +
> +			reserver_mem=12M:4096:oops ramoops.mem_name=oops

s/reserver/reserve


> [...]
> + * reserve_mem_find_by_name - Find reserved memory region with a given name
> + * @name: The name that is attached to a reserved memory region
> + * @start: If found, holds the start address
> + * @size: If found, holds the size of the address.
> + *
> + * Returns: 1 if found or 0 if not found.
> + */
> +int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
> +{
> +	struct reserve_mem_table *map;
> +	int i;
> +
> +	for (i = 0; i < reserved_mem_count; i++) {
> +		map = &reserved_mem_table[i];
> +		if (!map->size)
> +			continue;
> +		if (strcmp(name, map->name) == 0) {
> +			*start = map->start;
> +			*size = map->size;
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
An EXPORT_SYMBOL_GPL(reserve_mem_find_by_name) is needed here, or else
ramoops fails to build as module - at least it worked with this
exporting in my build of 6.10.0-rc2 =)

Cheers,


Guilherme
Wei Yang June 11, 2024, 2:40 p.m. UTC | #2
On Thu, Jun 06, 2024 at 11:01:44AM -0400, Steven Rostedt wrote:
>From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
>In order to allow for requesting a memory region that can be used for
>things like pstore on multiple machines where the memory layout is not the
>same, add a new option to the kernel command line called "reserve_mem".
>
>The format is:  reserve_mem=nn:align:name
>
>Where it will find nn amount of memory at the given alignment of align.
>The name field is to allow another subsystem to retrieve where the memory
>was found. For example:
>
>  reserve_mem=12M:4096:oops ramoops.mem_name=oops
>
>Where ramoops.mem_name will tell ramoops that memory was reserved for it
>via the reserve_mem option and it can find it by calling:
>
>  if (reserve_mem_find_by_name("oops", &start, &size)) {
>	// start holds the start address and size holds the size given
>
>This is typically used for systems that do not wipe the RAM, and this
>command line will try to reserve the same physical memory on soft reboots.
>Note, it is not guaranteed to be the same location. For example, if KASLR
>places the kernel at the location of where the RAM reservation was from a
>previous boot, the new reservation will be at a different location.  Any
>subsystem using this feature must add a way to verify that the contents of
>the physical memory is from a previous boot, as there may be cases where
>the memory will not be located at the same location.
>
>Not all systems may work either. There could be bit flips if the reboot
>goes through the BIOS. Using kexec to reboot the machine is likely to
>have better results in such cases.
>
>Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
>
>Suggested-by: Mike Rapoport <rppt@kernel.org>
>Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>---
> .../admin-guide/kernel-parameters.txt         | 20 ++++
> include/linux/mm.h                            |  2 +
> mm/memblock.c                                 | 97 +++++++++++++++++++
> 3 files changed, 119 insertions(+)
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index b600df82669d..4b2f7fb8de66 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -5710,6 +5710,26 @@
> 			them.  If <base> is less than 0x10000, the region
> 			is assumed to be I/O ports; otherwise it is memory.
> 
>+	reserve_mem=	[RAM]
>+			Format: nn[KNG]:<align>:<label>
>+			Reserve physical memory and label it with a name that
>+			other subsystems can use to access it. This is typically
>+			used for systems that do not wipe the RAM, and this command
>+			line will try to reserve the same physical memory on
>+			soft reboots. Note, it is not guaranteed to be the same
>+			location. For example, if KASLR places the kernel at the
>+			location of where the RAM reservation was from a previous
>+			boot, the new reservation will be at a different location.
>+			Any subsystem using this feature must add a way to verify
>+			that the contents of the physical memory is from a previous
>+			boot, as there may be cases where the memory will not be
>+			located at the same location.
>+
>+			The format is size:align:label for example, to request
>+			12 megabytes of 4096 alignment for ramoops:
>+
>+			reserver_mem=12M:4096:oops ramoops.mem_name=oops
>+
> 	reservetop=	[X86-32,EARLY]
> 			Format: nn[KMG]
> 			Reserves a hole at the top of the kernel virtual
>diff --git a/include/linux/mm.h b/include/linux/mm.h
>index 9849dfda44d4..b4455cc02f2c 100644
>--- a/include/linux/mm.h
>+++ b/include/linux/mm.h
>@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
> void vma_pgtable_walk_begin(struct vm_area_struct *vma);
> void vma_pgtable_walk_end(struct vm_area_struct *vma);
> 
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
>+
> #endif /* _LINUX_MM_H */
>diff --git a/mm/memblock.c b/mm/memblock.c
>index d09136e040d3..a8bf0ee9e2b4 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
> 	totalram_pages_add(pages);
> }
> 
>+/* Keep a table to reserve named memory */
>+#define RESERVE_MEM_MAX_ENTRIES		8
>+#define RESERVE_MEM_NAME_SIZE		16
                                        ^
Suggest to align with previous line.

>+struct reserve_mem_table {
>+	char			name[RESERVE_MEM_NAME_SIZE];
>+	unsigned long		start;
>+	unsigned long		size;

phys_addr_t looks more precise?

>+};
>+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
>+static int reserved_mem_count;

Seems no matter we use this feature or not, these memory would be occupied?

>+
>+/* Add wildcard region with a lookup name */
>+static int __init reserved_mem_add(unsigned long start, unsigned long size,
>+				   const char *name)
>+{
>+	struct reserve_mem_table *map;
>+
>+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
>+		return -EINVAL;
>+
>+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
>+		return -1;

return ENOSPC? Not good at it, but a raw value maybe not a good practice.

Also, we'd better do this check before allocation.

>+
>+	map = &reserved_mem_table[reserved_mem_count++];
>+	map->start = start;
>+	map->size = size;
>+	strscpy(map->name, name);
>+	return 0;
>+}
>+
>+/**
>+ * reserve_mem_find_by_name - Find reserved memory region with a given name
>+ * @name: The name that is attached to a reserved memory region
>+ * @start: If found, holds the start address
>+ * @size: If found, holds the size of the address.
>+ *
>+ * Returns: 1 if found or 0 if not found.
>+ */
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
>+{
>+	struct reserve_mem_table *map;
>+	int i;
>+
>+	for (i = 0; i < reserved_mem_count; i++) {
>+		map = &reserved_mem_table[i];
>+		if (!map->size)
>+			continue;
>+		if (strcmp(name, map->name) == 0) {
>+			*start = map->start;
>+			*size = map->size;
>+			return 1;
>+		}
>+	}
>+	return 0;
>+}
>+
>+/*
>+ * Parse early_reserve_mem=nn:align:name

early_reserve_mem or reserve_mem ?

>+ */
>+static int __init reserve_mem(char *p)
>+{
>+	phys_addr_t start, size, align;
>+	char *oldp;
>+	int err;
>+
>+	if (!p)
>+		return -EINVAL;
>+
>+	oldp = p;
>+	size = memparse(p, &p);
>+	if (p == oldp)
>+		return -EINVAL;
>+
>+	if (*p != ':')
>+		return -EINVAL;
>+
>+	align = memparse(p+1, &p);
>+	if (*p != ':')
>+		return -EINVAL;
>+

Better to check if the name is valid here. 

Make sure command line parameters are valid before doing the allocation.

>+	start = memblock_phys_alloc(size, align);
>+	if (!start)
>+		return -ENOMEM;
>+
>+	p++;
>+	err = reserved_mem_add(start, size, p);
>+	if (err) {
>+		memblock_phys_free(start, size);
>+		return err;
>+	}
>+
>+	p += strlen(p);
>+
>+	return *p == '\0' ? 0: -EINVAL;

We won't free the memory if return -EINVAL?

>+}
>+__setup("reserve_mem=", reserve_mem);
>+
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> static const char * const flagname[] = {
> 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
>-- 
>2.43.0
>
>
Guenter Roeck June 11, 2024, 2:58 p.m. UTC | #3
On 6/11/24 07:40, Wei Yang wrote:
[ ... ]

>> +/* Keep a table to reserve named memory */
>> +#define RESERVE_MEM_MAX_ENTRIES		8
>> +#define RESERVE_MEM_NAME_SIZE		16
>                                          ^
> Suggest to align with previous line.
> 
It _is_ aligned. It just looks unaligned because of the "+"
at the beginning of the patch.

Guenter
Steven Rostedt June 11, 2024, 3:12 p.m. UTC | #4
On Tue, 11 Jun 2024 14:40:29 +0000
Wei Yang <richard.weiyang@gmail.com> wrote:

Missed this just before sending out v3 :-p

> >diff --git a/mm/memblock.c b/mm/memblock.c
> >index d09136e040d3..a8bf0ee9e2b4 100644
> >--- a/mm/memblock.c
> >+++ b/mm/memblock.c
> >@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
> > 	totalram_pages_add(pages);
> > }
> > 
> >+/* Keep a table to reserve named memory */
> >+#define RESERVE_MEM_MAX_ENTRIES		8
> >+#define RESERVE_MEM_NAME_SIZE		16  
>                                         ^
> Suggest to align with previous line.

It is. But because the patch adds a "+", it pushed the "8" out another tab.

> 
> >+struct reserve_mem_table {
> >+	char			name[RESERVE_MEM_NAME_SIZE];
> >+	unsigned long		start;
> >+	unsigned long		size;  
> 
> phys_addr_t looks more precise?

For just the start variable, correct? I'm OK with updating that.

> 
> >+};
> >+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
> >+static int reserved_mem_count;  
> 
> Seems no matter we use this feature or not, these memory would be occupied?

Yes, because allocation may screw it up as well. I could add a CONFIG
around it, so that those that do not want this could configure it out. But
since it's just a total of (16 + 8 + 8) * 8 = 256 bytes, I'm not sure it's
much of a worry to add the complexities to save that much space. As the
code to save it may likely be bigger.

> 
> >+
> >+/* Add wildcard region with a lookup name */
> >+static int __init reserved_mem_add(unsigned long start, unsigned long size,
> >+				   const char *name)
> >+{
> >+	struct reserve_mem_table *map;
> >+
> >+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
> >+		return -EINVAL;
> >+
> >+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
> >+		return -1;  
> 
> return ENOSPC? Not good at it, but a raw value maybe not a good practice.

This is what gets returned by the command line parser. It only cares if it
is zero or not.

> 
> Also, we'd better do this check before allocation.

What allocation?

> 
> >+
> >+	map = &reserved_mem_table[reserved_mem_count++];
> >+	map->start = start;
> >+	map->size = size;
> >+	strscpy(map->name, name);
> >+	return 0;
> >+}
> >+
> >+/**
> >+ * reserve_mem_find_by_name - Find reserved memory region with a given name
> >+ * @name: The name that is attached to a reserved memory region
> >+ * @start: If found, holds the start address
> >+ * @size: If found, holds the size of the address.
> >+ *
> >+ * Returns: 1 if found or 0 if not found.
> >+ */
> >+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
> >+{
> >+	struct reserve_mem_table *map;
> >+	int i;
> >+
> >+	for (i = 0; i < reserved_mem_count; i++) {
> >+		map = &reserved_mem_table[i];
> >+		if (!map->size)
> >+			continue;
> >+		if (strcmp(name, map->name) == 0) {
> >+			*start = map->start;
> >+			*size = map->size;
> >+			return 1;
> >+		}
> >+	}
> >+	return 0;
> >+}
> >+
> >+/*
> >+ * Parse early_reserve_mem=nn:align:name  
> 
> early_reserve_mem or reserve_mem ?

Oops, that was the original name. I'll change that.

> 
> >+ */
> >+static int __init reserve_mem(char *p)
> >+{
> >+	phys_addr_t start, size, align;

Hmm, I wonder if I should change size and align to unsigned long?

> >+	char *oldp;
> >+	int err;
> >+
> >+	if (!p)
> >+		return -EINVAL;
> >+
> >+	oldp = p;
> >+	size = memparse(p, &p);
> >+	if (p == oldp)
> >+		return -EINVAL;
> >+
> >+	if (*p != ':')
> >+		return -EINVAL;
> >+
> >+	align = memparse(p+1, &p);
> >+	if (*p != ':')
> >+		return -EINVAL;
> >+  
> 
> Better to check if the name is valid here. 

You mean that it has text and is not blank?

> 
> Make sure command line parameters are valid before doing the allocation.

You mean that size is non zero?

I don't know if we care what the align is. Zero is valid.

> 
> >+	start = memblock_phys_alloc(size, align);
> >+	if (!start)
> >+		return -ENOMEM;
> >+
> >+	p++;
> >+	err = reserved_mem_add(start, size, p);
> >+	if (err) {
> >+		memblock_phys_free(start, size);
> >+		return err;
> >+	}
> >+
> >+	p += strlen(p);
> >+
> >+	return *p == '\0' ? 0: -EINVAL;  
> 
> We won't free the memory if return -EINVAL?

I guess we can do this check before the allocation, like you suggested.

Thanks for the review.

-- Steve


> 
> >+}
> >+__setup("reserve_mem=", reserve_mem);
> >+
> > #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> > static const char * const flagname[] = {
> > 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
> >-- 
> >2.43.0
> >
> >  
>
Mike Rapoport June 11, 2024, 4:30 p.m. UTC | #5
On Tue, Jun 11, 2024 at 11:12:18AM -0400, Steven Rostedt wrote:
> On Tue, 11 Jun 2024 14:40:29 +0000
> Wei Yang <richard.weiyang@gmail.com> wrote:
> 
> > >+
> > >+	align = memparse(p+1, &p);
> > >+	if (*p != ':')
> > >+		return -EINVAL;
> > >+  
> >
> > Make sure command line parameters are valid before doing the allocation.
> 
> You mean that size is non zero?
> 
> I don't know if we care what the align is. Zero is valid.

memblock won't like zero align, it should be SMP_CACHE_BYTES at least.
No point requiring it from user, just update the alignment if the user passed
zero.
 
> > >+	start = memblock_phys_alloc(size, align);
> > >+	if (!start)
> > >+		return -ENOMEM;
> > >+
Steven Rostedt June 11, 2024, 5:34 p.m. UTC | #6
On Tue, 11 Jun 2024 19:30:47 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> > I don't know if we care what the align is. Zero is valid.  
> 
> memblock won't like zero align, it should be SMP_CACHE_BYTES at least.
> No point requiring it from user, just update the alignment if the user passed
> zero.

Thanks, will do in v4.

-- Steve
Steven Rostedt June 11, 2024, 7:39 p.m. UTC | #7
On Tue, 11 Jun 2024 11:12:18 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > >+	p++;
> > >+	err = reserved_mem_add(start, size, p);
> > >+	if (err) {
> > >+		memblock_phys_free(start, size);
> > >+		return err;
> > >+	}
> > >+
> > >+	p += strlen(p);
> > >+
> > >+	return *p == '\0' ? 0: -EINVAL;    
> > 
> > We won't free the memory if return -EINVAL?  

I actually copied this from parse_memmap_one() in arch/x86/kernel/e820.c
and now looking at it, it's a pretty stupid check.

It does: p += strlen(p); which requires p ending with '\0'. So this will
likely bug if there is no '\0'.

I'm going to remove this, but still check to make sure that the name has
some length before the allocation.

-- Steve
Wei Yang June 12, 2024, 7:23 a.m. UTC | #8
On Tue, Jun 11, 2024 at 11:12:18AM -0400, Steven Rostedt wrote:
>On Tue, 11 Jun 2024 14:40:29 +0000
>Wei Yang <richard.weiyang@gmail.com> wrote:
>
>Missed this just before sending out v3 :-p
>
>> >diff --git a/mm/memblock.c b/mm/memblock.c
>> >index d09136e040d3..a8bf0ee9e2b4 100644
>> >--- a/mm/memblock.c
>> >+++ b/mm/memblock.c
>> >@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
>> > 	totalram_pages_add(pages);
>> > }
>> > 
>> >+/* Keep a table to reserve named memory */
>> >+#define RESERVE_MEM_MAX_ENTRIES		8
>> >+#define RESERVE_MEM_NAME_SIZE		16  
>>                                         ^
>> Suggest to align with previous line.
>
>It is. But because the patch adds a "+", it pushed the "8" out another tab.
>
>> 
>> >+struct reserve_mem_table {
>> >+	char			name[RESERVE_MEM_NAME_SIZE];
>> >+	unsigned long		start;
>> >+	unsigned long		size;  
>> 
>> phys_addr_t looks more precise?
>
>For just the start variable, correct? I'm OK with updating that.
>

Both start and size. When you look at the definition of memblock_region, both
are defined as phys_addr_t.

>> 
>> >+};
>> >+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
>> >+static int reserved_mem_count;  
>> 
>> Seems no matter we use this feature or not, these memory would be occupied?
>
>Yes, because allocation may screw it up as well. I could add a CONFIG
>around it, so that those that do not want this could configure it out. But
>since it's just a total of (16 + 8 + 8) * 8 = 256 bytes, I'm not sure it's
>much of a worry to add the complexities to save that much space. As the
>code to save it may likely be bigger.
>

If Mike feel good to it, I am ok.

>> 
>> >+
>> >+/* Add wildcard region with a lookup name */
>> >+static int __init reserved_mem_add(unsigned long start, unsigned long size,
>> >+				   const char *name)
>> >+{
>> >+	struct reserve_mem_table *map;
>> >+
>> >+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
>> >+		return -EINVAL;
>> >+
>> >+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
>> >+		return -1;  
>> 
>> return ENOSPC? Not good at it, but a raw value maybe not a good practice.
>
>This is what gets returned by the command line parser. It only cares if it
>is zero or not.
>
>> 
>> Also, we'd better do this check before allocation.
>
>What allocation?
>

You call reserved_mem_add() after memblock_phys_alloc(). 
My suggestion is do those sanity check before calling memblock_phys_alloc().

>> 
>> >+
>> >+	map = &reserved_mem_table[reserved_mem_count++];
>> >+	map->start = start;
>> >+	map->size = size;
>> >+	strscpy(map->name, name);
>> >+	return 0;
>> >+}
>> >+
>> >+/**
>> >+ * reserve_mem_find_by_name - Find reserved memory region with a given name
>> >+ * @name: The name that is attached to a reserved memory region
>> >+ * @start: If found, holds the start address
>> >+ * @size: If found, holds the size of the address.
>> >+ *
>> >+ * Returns: 1 if found or 0 if not found.
>> >+ */
>> >+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
>> >+{
>> >+	struct reserve_mem_table *map;
>> >+	int i;
>> >+
>> >+	for (i = 0; i < reserved_mem_count; i++) {
>> >+		map = &reserved_mem_table[i];
>> >+		if (!map->size)
>> >+			continue;
>> >+		if (strcmp(name, map->name) == 0) {
>> >+			*start = map->start;
>> >+			*size = map->size;
>> >+			return 1;
>> >+		}
>> >+	}
>> >+	return 0;
>> >+}
>> >+
>> >+/*
>> >+ * Parse early_reserve_mem=nn:align:name  
>> 
>> early_reserve_mem or reserve_mem ?
>
>Oops, that was the original name. I'll change that.
>
>> 
>> >+ */
>> >+static int __init reserve_mem(char *p)
>> >+{
>> >+	phys_addr_t start, size, align;
>
>Hmm, I wonder if I should change size and align to unsigned long?
>

I grep the kernel, some use u64, some use unsigned long. 
I think it is ok to use unsigned long here.

>> >+	char *oldp;
>> >+	int err;
>> >+
>> >+	if (!p)
>> >+		return -EINVAL;
>> >+
>> >+	oldp = p;
>> >+	size = memparse(p, &p);
>> >+	if (p == oldp)
>> >+		return -EINVAL;
>> >+
>> >+	if (*p != ':')
>> >+		return -EINVAL;
>> >+
>> >+	align = memparse(p+1, &p);
>> >+	if (*p != ':')
>> >+		return -EINVAL;
>> >+  
>> 
>> Better to check if the name is valid here. 
>
>You mean that it has text and is not blank?
>
>> 
>> Make sure command line parameters are valid before doing the allocation.
>
>You mean that size is non zero?
>

I mean do those sanity check before real allocation.

>I don't know if we care what the align is. Zero is valid.
>

memblock internal would check the alignment. If it is zero, it will change to
SMP_CACHE_BYTES with dump_stack().

>> 
>> >+	start = memblock_phys_alloc(size, align);
>> >+	if (!start)
>> >+		return -ENOMEM;
>> >+
>> >+	p++;
>> >+	err = reserved_mem_add(start, size, p);
>> >+	if (err) {
>> >+		memblock_phys_free(start, size);
>> >+		return err;
>> >+	}
>> >+
>> >+	p += strlen(p);
>> >+
>> >+	return *p == '\0' ? 0: -EINVAL;  
>> 
>> We won't free the memory if return -EINVAL?
>
>I guess we can do this check before the allocation, like you suggested.
>
>Thanks for the review.
>
>-- Steve
>
>
>> 
>> >+}
>> >+__setup("reserve_mem=", reserve_mem);
>> >+
>> > #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>> > static const char * const flagname[] = {
>> > 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
>> >-- 
>> >2.43.0
>> >
>> >  
>>
Wei Yang June 12, 2024, 7:30 a.m. UTC | #9
On Thu, Jun 06, 2024 at 11:01:44AM -0400, Steven Rostedt wrote:
>From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
>In order to allow for requesting a memory region that can be used for
>things like pstore on multiple machines where the memory layout is not the
>same, add a new option to the kernel command line called "reserve_mem".
>
>The format is:  reserve_mem=nn:align:name
>
>Where it will find nn amount of memory at the given alignment of align.
>The name field is to allow another subsystem to retrieve where the memory
>was found. For example:
>
>  reserve_mem=12M:4096:oops ramoops.mem_name=oops
>
>Where ramoops.mem_name will tell ramoops that memory was reserved for it
>via the reserve_mem option and it can find it by calling:
>
>  if (reserve_mem_find_by_name("oops", &start, &size)) {
>	// start holds the start address and size holds the size given
>
>This is typically used for systems that do not wipe the RAM, and this
>command line will try to reserve the same physical memory on soft reboots.
>Note, it is not guaranteed to be the same location. For example, if KASLR
>places the kernel at the location of where the RAM reservation was from a
>previous boot, the new reservation will be at a different location.  Any
>subsystem using this feature must add a way to verify that the contents of
>the physical memory is from a previous boot, as there may be cases where
>the memory will not be located at the same location.
>
>Not all systems may work either. There could be bit flips if the reboot
>goes through the BIOS. Using kexec to reboot the machine is likely to
>have better results in such cases.
>
>Link: https://lore.kernel.org/all/ZjJVnZUX3NZiGW6q@kernel.org/
>
>Suggested-by: Mike Rapoport <rppt@kernel.org>
>Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>---
> .../admin-guide/kernel-parameters.txt         | 20 ++++
> include/linux/mm.h                            |  2 +
> mm/memblock.c                                 | 97 +++++++++++++++++++
> 3 files changed, 119 insertions(+)
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index b600df82669d..4b2f7fb8de66 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -5710,6 +5710,26 @@
> 			them.  If <base> is less than 0x10000, the region
> 			is assumed to be I/O ports; otherwise it is memory.
> 
>+	reserve_mem=	[RAM]
>+			Format: nn[KNG]:<align>:<label>
>+			Reserve physical memory and label it with a name that
>+			other subsystems can use to access it. This is typically
>+			used for systems that do not wipe the RAM, and this command
>+			line will try to reserve the same physical memory on
>+			soft reboots. Note, it is not guaranteed to be the same
>+			location. For example, if KASLR places the kernel at the
>+			location of where the RAM reservation was from a previous
>+			boot, the new reservation will be at a different location.
>+			Any subsystem using this feature must add a way to verify
>+			that the contents of the physical memory is from a previous
>+			boot, as there may be cases where the memory will not be
>+			located at the same location.
>+
>+			The format is size:align:label for example, to request
>+			12 megabytes of 4096 alignment for ramoops:
>+
>+			reserver_mem=12M:4096:oops ramoops.mem_name=oops
>+
> 	reservetop=	[X86-32,EARLY]
> 			Format: nn[KMG]
> 			Reserves a hole at the top of the kernel virtual
>diff --git a/include/linux/mm.h b/include/linux/mm.h
>index 9849dfda44d4..b4455cc02f2c 100644
>--- a/include/linux/mm.h
>+++ b/include/linux/mm.h
>@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
> void vma_pgtable_walk_begin(struct vm_area_struct *vma);
> void vma_pgtable_walk_end(struct vm_area_struct *vma);
> 
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
>+
> #endif /* _LINUX_MM_H */
>diff --git a/mm/memblock.c b/mm/memblock.c
>index d09136e040d3..a8bf0ee9e2b4 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -2244,6 +2244,103 @@ void __init memblock_free_all(void)
> 	totalram_pages_add(pages);
> }
> 
>+/* Keep a table to reserve named memory */
>+#define RESERVE_MEM_MAX_ENTRIES		8
>+#define RESERVE_MEM_NAME_SIZE		16
>+struct reserve_mem_table {
>+	char			name[RESERVE_MEM_NAME_SIZE];
>+	unsigned long		start;
>+	unsigned long		size;
>+};
>+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
>+static int reserved_mem_count;
>+
>+/* Add wildcard region with a lookup name */
>+static int __init reserved_mem_add(unsigned long start, unsigned long size,
>+				   const char *name)
>+{
>+	struct reserve_mem_table *map;
>+
>+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
>+		return -EINVAL;
>+
>+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
>+		return -1;

Another thing come to my mind: could we specify several reserve_mem on the
command line?

If so, we may need to check whether names conflict.

>+
>+	map = &reserved_mem_table[reserved_mem_count++];
>+	map->start = start;
>+	map->size = size;
>+	strscpy(map->name, name);
>+	return 0;
>+}
>+
>+/**
>+ * reserve_mem_find_by_name - Find reserved memory region with a given name
>+ * @name: The name that is attached to a reserved memory region
>+ * @start: If found, holds the start address
>+ * @size: If found, holds the size of the address.
>+ *
>+ * Returns: 1 if found or 0 if not found.
>+ */
>+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
>+{
>+	struct reserve_mem_table *map;
>+	int i;
>+
>+	for (i = 0; i < reserved_mem_count; i++) {
>+		map = &reserved_mem_table[i];
>+		if (!map->size)
>+			continue;
>+		if (strcmp(name, map->name) == 0) {
>+			*start = map->start;
>+			*size = map->size;
>+			return 1;
>+		}
>+	}
>+	return 0;
>+}
>+
>+/*
>+ * Parse early_reserve_mem=nn:align:name
>+ */
>+static int __init reserve_mem(char *p)
>+{
>+	phys_addr_t start, size, align;
>+	char *oldp;
>+	int err;
>+
>+	if (!p)
>+		return -EINVAL;
>+
>+	oldp = p;
>+	size = memparse(p, &p);
>+	if (p == oldp)
>+		return -EINVAL;
>+
>+	if (*p != ':')
>+		return -EINVAL;
>+
>+	align = memparse(p+1, &p);
>+	if (*p != ':')
>+		return -EINVAL;
>+
>+	start = memblock_phys_alloc(size, align);
>+	if (!start)
>+		return -ENOMEM;
>+
>+	p++;
>+	err = reserved_mem_add(start, size, p);
>+	if (err) {
>+		memblock_phys_free(start, size);
>+		return err;
>+	}
>+
>+	p += strlen(p);
>+
>+	return *p == '\0' ? 0: -EINVAL;
>+}
>+__setup("reserve_mem=", reserve_mem);
>+
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> static const char * const flagname[] = {
> 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",
>-- 
>2.43.0
>
>
Steven Rostedt June 12, 2024, 3:28 p.m. UTC | #10
On Wed, 12 Jun 2024 07:30:49 +0000
Wei Yang <richard.weiyang@gmail.com> wrote:

> >+/* Add wildcard region with a lookup name */
> >+static int __init reserved_mem_add(unsigned long start, unsigned long size,
> >+				   const char *name)
> >+{
> >+	struct reserve_mem_table *map;
> >+
> >+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
> >+		return -EINVAL;
> >+
> >+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
> >+		return -1;  
> 
> Another thing come to my mind: could we specify several reserve_mem on the
> command line?

Yes, in fact I have this on my command line that I test with:

  reserve_mem=2M:0:oops ramoops.mem_name=oops ramoops.console_size=0x100000 reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace


> 
> If so, we may need to check whether names conflict.

Yeah, I can add that.

Thanks,

-- Steve
Steven Rostedt June 12, 2024, 3:32 p.m. UTC | #11
On Wed, 12 Jun 2024 07:23:40 +0000
Wei Yang <richard.weiyang@gmail.com> wrote:
> >> >+struct reserve_mem_table {
> >> >+	char			name[RESERVE_MEM_NAME_SIZE];
> >> >+	unsigned long		start;
> >> >+	unsigned long		size;    
> >> 
> >> phys_addr_t looks more precise?  
> >
> >For just the start variable, correct? I'm OK with updating that.
> >  
> 
> Both start and size. When you look at the definition of memblock_region, both
> are defined as phys_addr_t.

I ended up keeping everything phys_addr_t.

> 
> >>   
> >> >+};
> >> >+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
> >> >+static int reserved_mem_count;    
> >> 
> >> Seems no matter we use this feature or not, these memory would be occupied?  
> >
> >Yes, because allocation may screw it up as well. I could add a CONFIG
> >around it, so that those that do not want this could configure it out. But
> >since it's just a total of (16 + 8 + 8) * 8 = 256 bytes, I'm not sure it's
> >much of a worry to add the complexities to save that much space. As the
> >code to save it may likely be bigger.
> >  
> 
> If Mike feel good to it, I am ok.
> 
> >>   
> >> >+
> >> >+/* Add wildcard region with a lookup name */
> >> >+static int __init reserved_mem_add(unsigned long start, unsigned long size,
> >> >+				   const char *name)
> >> >+{
> >> >+	struct reserve_mem_table *map;
> >> >+
> >> >+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
> >> >+		return -EINVAL;
> >> >+
> >> >+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
> >> >+		return -1;    
> >> 
> >> return ENOSPC? Not good at it, but a raw value maybe not a good practice.  
> >
> >This is what gets returned by the command line parser. It only cares if it
> >is zero or not.
> >  
> >> 
> >> Also, we'd better do this check before allocation.  
> >
> >What allocation?
> >  
> 
> You call reserved_mem_add() after memblock_phys_alloc(). 
> My suggestion is do those sanity check before calling memblock_phys_alloc().

Yeah, I did add more checks before the allocation happens.


> 
> >>   
> >> >+
> >> >+	map = &reserved_mem_table[reserved_mem_count++];
> >> >+	map->start = start;
> >> >+	map->size = size;
> >> >+	strscpy(map->name, name);
> >> >+	return 0;
> >> >+}
> >> >+
> >> >+/**
> >> >+ * reserve_mem_find_by_name - Find reserved memory region with a given name
> >> >+ * @name: The name that is attached to a reserved memory region
> >> >+ * @start: If found, holds the start address
> >> >+ * @size: If found, holds the size of the address.
> >> >+ *
> >> >+ * Returns: 1 if found or 0 if not found.
> >> >+ */
> >> >+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
> >> >+{
> >> >+	struct reserve_mem_table *map;
> >> >+	int i;
> >> >+
> >> >+	for (i = 0; i < reserved_mem_count; i++) {
> >> >+		map = &reserved_mem_table[i];
> >> >+		if (!map->size)
> >> >+			continue;
> >> >+		if (strcmp(name, map->name) == 0) {
> >> >+			*start = map->start;
> >> >+			*size = map->size;
> >> >+			return 1;
> >> >+		}
> >> >+	}
> >> >+	return 0;
> >> >+}
> >> >+
> >> >+/*
> >> >+ * Parse early_reserve_mem=nn:align:name    
> >> 
> >> early_reserve_mem or reserve_mem ?  
> >
> >Oops, that was the original name. I'll change that.
> >  
> >>   
> >> >+ */
> >> >+static int __init reserve_mem(char *p)
> >> >+{
> >> >+	phys_addr_t start, size, align;  
> >
> >Hmm, I wonder if I should change size and align to unsigned long?
> >  
> 
> I grep the kernel, some use u64, some use unsigned long. 
> I think it is ok to use unsigned long here.

For consistency, I switched them all to phys_addr_t.

> 
> >> >+	char *oldp;
> >> >+	int err;
> >> >+
> >> >+	if (!p)
> >> >+		return -EINVAL;
> >> >+
> >> >+	oldp = p;
> >> >+	size = memparse(p, &p);
> >> >+	if (p == oldp)
> >> >+		return -EINVAL;
> >> >+
> >> >+	if (*p != ':')
> >> >+		return -EINVAL;
> >> >+
> >> >+	align = memparse(p+1, &p);
> >> >+	if (*p != ':')
> >> >+		return -EINVAL;
> >> >+    
> >> 
> >> Better to check if the name is valid here.   
> >
> >You mean that it has text and is not blank?
> >  
> >> 
> >> Make sure command line parameters are valid before doing the allocation.  
> >
> >You mean that size is non zero?
> >  
> 
> I mean do those sanity check before real allocation.

Yep, I hope I caught everything (of course I need to check if the name
exists first).

> 
> >I don't know if we care what the align is. Zero is valid.
> >  
> 
> memblock internal would check the alignment. If it is zero, it will change to
> SMP_CACHE_BYTES with dump_stack().

I saw that and added:

	if (align < SMP_CACHE_BYTES)
		align = SMP_CACHE_BYTES;

so that SMP_CACHE_BYTES will be the minimum alignment.

Thanks for looking at this.

-- Steve
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b600df82669d..4b2f7fb8de66 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5710,6 +5710,26 @@ 
 			them.  If <base> is less than 0x10000, the region
 			is assumed to be I/O ports; otherwise it is memory.
 
+	reserve_mem=	[RAM]
+			Format: nn[KNG]:<align>:<label>
+			Reserve physical memory and label it with a name that
+			other subsystems can use to access it. This is typically
+			used for systems that do not wipe the RAM, and this command
+			line will try to reserve the same physical memory on
+			soft reboots. Note, it is not guaranteed to be the same
+			location. For example, if KASLR places the kernel at the
+			location of where the RAM reservation was from a previous
+			boot, the new reservation will be at a different location.
+			Any subsystem using this feature must add a way to verify
+			that the contents of the physical memory is from a previous
+			boot, as there may be cases where the memory will not be
+			located at the same location.
+
+			The format is size:align:label for example, to request
+			12 megabytes of 4096 alignment for ramoops:
+
+			reserver_mem=12M:4096:oops ramoops.mem_name=oops
+
 	reservetop=	[X86-32,EARLY]
 			Format: nn[KMG]
 			Reserves a hole at the top of the kernel virtual
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..b4455cc02f2c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4263,4 +4263,6 @@  static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
 void vma_pgtable_walk_begin(struct vm_area_struct *vma);
 void vma_pgtable_walk_end(struct vm_area_struct *vma);
 
+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index d09136e040d3..a8bf0ee9e2b4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2244,6 +2244,103 @@  void __init memblock_free_all(void)
 	totalram_pages_add(pages);
 }
 
+/* Keep a table to reserve named memory */
+#define RESERVE_MEM_MAX_ENTRIES		8
+#define RESERVE_MEM_NAME_SIZE		16
+struct reserve_mem_table {
+	char			name[RESERVE_MEM_NAME_SIZE];
+	unsigned long		start;
+	unsigned long		size;
+};
+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];
+static int reserved_mem_count;
+
+/* Add wildcard region with a lookup name */
+static int __init reserved_mem_add(unsigned long start, unsigned long size,
+				   const char *name)
+{
+	struct reserve_mem_table *map;
+
+	if (!name || !name[0] || strlen(name) >= RESERVE_MEM_NAME_SIZE)
+		return -EINVAL;
+
+	if (reserved_mem_count >= RESERVE_MEM_MAX_ENTRIES)
+		return -1;
+
+	map = &reserved_mem_table[reserved_mem_count++];
+	map->start = start;
+	map->size = size;
+	strscpy(map->name, name);
+	return 0;
+}
+
+/**
+ * reserve_mem_find_by_name - Find reserved memory region with a given name
+ * @name: The name that is attached to a reserved memory region
+ * @start: If found, holds the start address
+ * @size: If found, holds the size of the address.
+ *
+ * Returns: 1 if found or 0 if not found.
+ */
+int reserve_mem_find_by_name(const char *name, unsigned long *start, unsigned long *size)
+{
+	struct reserve_mem_table *map;
+	int i;
+
+	for (i = 0; i < reserved_mem_count; i++) {
+		map = &reserved_mem_table[i];
+		if (!map->size)
+			continue;
+		if (strcmp(name, map->name) == 0) {
+			*start = map->start;
+			*size = map->size;
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Parse early_reserve_mem=nn:align:name
+ */
+static int __init reserve_mem(char *p)
+{
+	phys_addr_t start, size, align;
+	char *oldp;
+	int err;
+
+	if (!p)
+		return -EINVAL;
+
+	oldp = p;
+	size = memparse(p, &p);
+	if (p == oldp)
+		return -EINVAL;
+
+	if (*p != ':')
+		return -EINVAL;
+
+	align = memparse(p+1, &p);
+	if (*p != ':')
+		return -EINVAL;
+
+	start = memblock_phys_alloc(size, align);
+	if (!start)
+		return -ENOMEM;
+
+	p++;
+	err = reserved_mem_add(start, size, p);
+	if (err) {
+		memblock_phys_free(start, size);
+		return err;
+	}
+
+	p += strlen(p);
+
+	return *p == '\0' ? 0: -EINVAL;
+}
+__setup("reserve_mem=", reserve_mem);
+
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
 static const char * const flagname[] = {
 	[ilog2(MEMBLOCK_HOTPLUG)] = "HOTPLUG",