diff mbox series

[v2] fs/coredump: Enable dynamic configuration of max file note size

Message ID 20240502145920.5011-1-apais@linux.microsoft.com (mailing list archive)
State New
Headers show
Series [v2] fs/coredump: Enable dynamic configuration of max file note size | expand

Commit Message

Allen Pais May 2, 2024, 2:59 p.m. UTC
Introduce the capability to dynamically configure the maximum file
note size for ELF core dumps via sysctl. This enhancement removes
the previous static limit of 4MB, allowing system administrators to
adjust the size based on system-specific requirements or constraints.

- Remove hardcoded `MAX_FILE_NOTE_SIZE` from `fs/binfmt_elf.c`.
- Define `max_file_note_size` in `fs/coredump.c` with an initial value
  set to 4MB.
- Declare `max_file_note_size` as an external variable in
  `include/linux/coredump.h`.
- Add a new sysctl entry in `kernel/sysctl.c` to manage this setting
  at runtime.

$ sysctl -a | grep max_file_note_size
kernel.max_file_note_size = 4194304

$ sysctl -n kernel.max_file_note_size
4194304

$echo 519304 > /proc/sys/kernel/max_file_note_size

$sysctl -n kernel.max_file_note_size
519304

Why is this being done?
We have observed that during a crash when there are more than 65k mmaps
in memory, the existing fixed limit on the size of the ELF notes section
becomes a bottleneck. The notes section quickly reaches its capacity,
leading to incomplete memory segment information in the resulting coredump.
This truncation compromises the utility of the coredumps, as crucial
information about the memory state at the time of the crash might be
omitted.

Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>

---
Changes in v2:
   - Move new sysctl to fs/coredump.c [Luis & Kees]
   - rename max_file_note_size to core_file_note_size_max [kees]
   - Capture "why this is being done?" int he commit message [Luis & Kees]
---
 fs/binfmt_elf.c          |  3 +--
 fs/coredump.c            | 10 ++++++++++
 include/linux/coredump.h |  1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

Comments

Kees Cook May 2, 2024, 5:50 p.m. UTC | #1
On Thu, May 02, 2024 at 02:59:20PM +0000, Allen Pais wrote:
> Introduce the capability to dynamically configure the maximum file
> note size for ELF core dumps via sysctl. This enhancement removes
> the previous static limit of 4MB, allowing system administrators to
> adjust the size based on system-specific requirements or constraints.
> 
> - Remove hardcoded `MAX_FILE_NOTE_SIZE` from `fs/binfmt_elf.c`.
> - Define `max_file_note_size` in `fs/coredump.c` with an initial value
>   set to 4MB.
> - Declare `max_file_note_size` as an external variable in
>   `include/linux/coredump.h`.
> - Add a new sysctl entry in `kernel/sysctl.c` to manage this setting
>   at runtime.
> 
> $ sysctl -a | grep max_file_note_size
> kernel.max_file_note_size = 4194304
> 
> $ sysctl -n kernel.max_file_note_size
> 4194304
> 
> $echo 519304 > /proc/sys/kernel/max_file_note_size
> 
> $sysctl -n kernel.max_file_note_size
> 519304

The names and paths in the commit log need a refresh here, since they've
changed.

> 
> Why is this being done?
> We have observed that during a crash when there are more than 65k mmaps
> in memory, the existing fixed limit on the size of the ELF notes section
> becomes a bottleneck. The notes section quickly reaches its capacity,
> leading to incomplete memory segment information in the resulting coredump.
> This truncation compromises the utility of the coredumps, as crucial
> information about the memory state at the time of the crash might be
> omitted.

Thanks for adding this!

> 
> Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> 
> ---
> Changes in v2:
>    - Move new sysctl to fs/coredump.c [Luis & Kees]
>    - rename max_file_note_size to core_file_note_size_max [kees]
>    - Capture "why this is being done?" int he commit message [Luis & Kees]
> ---
>  fs/binfmt_elf.c          |  3 +--
>  fs/coredump.c            | 10 ++++++++++
>  include/linux/coredump.h |  1 +
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 5397b552fbeb..6aebd062b92b 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1564,7 +1564,6 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
>  	fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
>  }
>  
> -#define MAX_FILE_NOTE_SIZE (4*1024*1024)
>  /*
>   * Format of NT_FILE note:
>   *
> @@ -1592,7 +1591,7 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
>  
>  	names_ofs = (2 + 3 * count) * sizeof(data[0]);
>   alloc:
> -	if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
> +	if (size >= core_file_note_size_max) /* paranoia check */
>  		return -EINVAL;

I wonder, given the purpose of this sysctl, if it would be a
discoverability improvement to include a pr_warn_once() before the
EINVAL? Like:

	/* paranoia check */
	if (size >= core_file_note_size_max) {
		pr_warn_once("coredump Note size too large: %zu (does kernel.core_file_note_size_max sysctl need adjustment?\n", size);
  		return -EINVAL;
	}

What do folks think? (I can't imagine tracking down this problem
originally was much fun, for example.)

>  	size = round_up(size, PAGE_SIZE);
>  	/*
> diff --git a/fs/coredump.c b/fs/coredump.c
> index be6403b4b14b..a312be48030f 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -56,10 +56,13 @@
>  static bool dump_vma_snapshot(struct coredump_params *cprm);
>  static void free_vma_snapshot(struct coredump_params *cprm);
>  
> +#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> +
>  static int core_uses_pid;
>  static unsigned int core_pipe_limit;
>  static char core_pattern[CORENAME_MAX_SIZE] = "core";
>  static int core_name_size = CORENAME_MAX_SIZE;
> +unsigned int core_file_note_size_max = MAX_FILE_NOTE_SIZE;
>  
>  struct core_name {
>  	char *corename;
> @@ -1020,6 +1023,13 @@ static struct ctl_table coredump_sysctls[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname       = "core_file_note_size_max",
> +		.data           = &core_file_note_size_max,
> +		.maxlen         = sizeof(unsigned int),
> +		.mode           = 0644,
> +		.proc_handler   = proc_douintvec,
> +	},
>  };
>  
>  static int __init init_fs_coredump_sysctls(void)
> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> index d3eba4360150..14c057643e7f 100644
> --- a/include/linux/coredump.h
> +++ b/include/linux/coredump.h
> @@ -46,6 +46,7 @@ static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
>  #endif
>  
>  #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
> +extern unsigned int core_file_note_size_max;
>  extern void validate_coredump_safety(void);
>  #else
>  static inline void validate_coredump_safety(void) {}
> -- 
> 2.17.1

Otherwise, yes, this looks good to me.
Allen May 2, 2024, 8:03 p.m. UTC | #2
On Thu, May 2, 2024 at 10:50 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, May 02, 2024 at 02:59:20PM +0000, Allen Pais wrote:
> > Introduce the capability to dynamically configure the maximum file
> > note size for ELF core dumps via sysctl. This enhancement removes
> > the previous static limit of 4MB, allowing system administrators to
> > adjust the size based on system-specific requirements or constraints.
> >
> > - Remove hardcoded `MAX_FILE_NOTE_SIZE` from `fs/binfmt_elf.c`.
> > - Define `max_file_note_size` in `fs/coredump.c` with an initial value
> >   set to 4MB.
> > - Declare `max_file_note_size` as an external variable in
> >   `include/linux/coredump.h`.
> > - Add a new sysctl entry in `kernel/sysctl.c` to manage this setting
> >   at runtime.
> >
> > $ sysctl -a | grep max_file_note_size
> > kernel.max_file_note_size = 4194304
> >
> > $ sysctl -n kernel.max_file_note_size
> > 4194304
> >
> > $echo 519304 > /proc/sys/kernel/max_file_note_size
> >
> > $sysctl -n kernel.max_file_note_size
> > 519304
>
> The names and paths in the commit log need a refresh here, since they've
> changed.

Will fix it in v3.
>
> >
> > Why is this being done?
> > We have observed that during a crash when there are more than 65k mmaps
> > in memory, the existing fixed limit on the size of the ELF notes section
> > becomes a bottleneck. The notes section quickly reaches its capacity,
> > leading to incomplete memory segment information in the resulting coredump.
> > This truncation compromises the utility of the coredumps, as crucial
> > information about the memory state at the time of the crash might be
> > omitted.
>
> Thanks for adding this!
>
> >
> > Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
> > Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >
> > ---
> > Changes in v2:
> >    - Move new sysctl to fs/coredump.c [Luis & Kees]
> >    - rename max_file_note_size to core_file_note_size_max [kees]
> >    - Capture "why this is being done?" int he commit message [Luis & Kees]
> > ---
> >  fs/binfmt_elf.c          |  3 +--
> >  fs/coredump.c            | 10 ++++++++++
> >  include/linux/coredump.h |  1 +
> >  3 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 5397b552fbeb..6aebd062b92b 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -1564,7 +1564,6 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
> >       fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
> >  }
> >
> > -#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> >  /*
> >   * Format of NT_FILE note:
> >   *
> > @@ -1592,7 +1591,7 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
> >
> >       names_ofs = (2 + 3 * count) * sizeof(data[0]);
> >   alloc:
> > -     if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
> > +     if (size >= core_file_note_size_max) /* paranoia check */
> >               return -EINVAL;
>
> I wonder, given the purpose of this sysctl, if it would be a
> discoverability improvement to include a pr_warn_once() before the
> EINVAL? Like:
>
>         /* paranoia check */
>         if (size >= core_file_note_size_max) {
>                 pr_warn_once("coredump Note size too large: %zu (does kernel.core_file_note_size_max sysctl need adjustment?\n", size);
>                 return -EINVAL;
>         }
>
> What do folks think? (I can't imagine tracking down this problem
> originally was much fun, for example.)

 I think this would really be helpful. I will go ahead and add this if
there's no objection from anyone.

Also, I haven't received a reply from Luis, do you think we need to
add a ceiling?

+#define MAX_FILE_NOTE_SIZE (4*1024*1024)
+#define MAX_ALLOWED_NOTE_SIZE (16*1024*1024) // Define a reasonable max cap
.....

+       {
+               .procname       = "core_file_note_size_max",
+               .data           = &core_file_note_size_max,
+               .maxlen         = sizeof(unsigned int),
+               .mode           = 0644,
+               .proc_handler   = proc_core_file_note_size_max,
+       },
 };

+int proc_core_file_note_size_max(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos) {
+    int error = proc_douintvec(table, write, buffer, lenp, ppos);
+    if (write && (core_file_note_size_max < MAX_FILE_NOTE_SIZE ||
core_file_note_size_max > MAX_ALLOWED_NOTE_SIZE))
+        core_file_note_size_max = MAX_FILE_NOTE_SIZE;  // Revert to
default if out of bounds
+    return error;
+}


Or, should we go ahead with the current patch(with the warning added)?

Thanks,
Allen
>
> >       size = round_up(size, PAGE_SIZE);
> >       /*
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index be6403b4b14b..a312be48030f 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -56,10 +56,13 @@
> >  static bool dump_vma_snapshot(struct coredump_params *cprm);
> >  static void free_vma_snapshot(struct coredump_params *cprm);
> >
> > +#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> > +
> >  static int core_uses_pid;
> >  static unsigned int core_pipe_limit;
> >  static char core_pattern[CORENAME_MAX_SIZE] = "core";
> >  static int core_name_size = CORENAME_MAX_SIZE;
> > +unsigned int core_file_note_size_max = MAX_FILE_NOTE_SIZE;
> >
> >  struct core_name {
> >       char *corename;
> > @@ -1020,6 +1023,13 @@ static struct ctl_table coredump_sysctls[] = {
> >               .mode           = 0644,
> >               .proc_handler   = proc_dointvec,
> >       },
> > +     {
> > +             .procname       = "core_file_note_size_max",
> > +             .data           = &core_file_note_size_max,
> > +             .maxlen         = sizeof(unsigned int),
> > +             .mode           = 0644,
> > +             .proc_handler   = proc_douintvec,
> > +     },
> >  };
> >
> >  static int __init init_fs_coredump_sysctls(void)
> > diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> > index d3eba4360150..14c057643e7f 100644
> > --- a/include/linux/coredump.h
> > +++ b/include/linux/coredump.h
> > @@ -46,6 +46,7 @@ static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
> >  #endif
> >
> >  #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
> > +extern unsigned int core_file_note_size_max;
> >  extern void validate_coredump_safety(void);
> >  #else
> >  static inline void validate_coredump_safety(void) {}
> > --
> > 2.17.1
>
> Otherwise, yes, this looks good to me.
>
> --
> Kees Cook
Luis Chamberlain May 2, 2024, 9:35 p.m. UTC | #3
On Thu, May 02, 2024 at 01:03:52PM -0700, Allen wrote:
> +int proc_core_file_note_size_max(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos) {
> +    int error = proc_douintvec(table, write, buffer, lenp, ppos);
> +    if (write && (core_file_note_size_max < MAX_FILE_NOTE_SIZE ||
> core_file_note_size_max > MAX_ALLOWED_NOTE_SIZE))
> +        core_file_note_size_max = MAX_FILE_NOTE_SIZE;  // Revert to
> default if out of bounds
> +    return error;
> +}

There's already a proc helper which let's you set min / max.

  Luis
Kees Cook May 2, 2024, 10:47 p.m. UTC | #4
On Thu, May 02, 2024 at 01:03:52PM -0700, Allen wrote:
> On Thu, May 2, 2024 at 10:50 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Thu, May 02, 2024 at 02:59:20PM +0000, Allen Pais wrote:
> > > Introduce the capability to dynamically configure the maximum file
> > > note size for ELF core dumps via sysctl. This enhancement removes
> > > the previous static limit of 4MB, allowing system administrators to
> > > adjust the size based on system-specific requirements or constraints.
> > >
> > > - Remove hardcoded `MAX_FILE_NOTE_SIZE` from `fs/binfmt_elf.c`.
> > > - Define `max_file_note_size` in `fs/coredump.c` with an initial value
> > >   set to 4MB.
> > > - Declare `max_file_note_size` as an external variable in
> > >   `include/linux/coredump.h`.
> > > - Add a new sysctl entry in `kernel/sysctl.c` to manage this setting
> > >   at runtime.
> > >
> > > $ sysctl -a | grep max_file_note_size
> > > kernel.max_file_note_size = 4194304
> > >
> > > $ sysctl -n kernel.max_file_note_size
> > > 4194304
> > >
> > > $echo 519304 > /proc/sys/kernel/max_file_note_size
> > >
> > > $sysctl -n kernel.max_file_note_size
> > > 519304
> >
> > The names and paths in the commit log need a refresh here, since they've
> > changed.
> 
> Will fix it in v3.
> >
> > >
> > > Why is this being done?
> > > We have observed that during a crash when there are more than 65k mmaps
> > > in memory, the existing fixed limit on the size of the ELF notes section
> > > becomes a bottleneck. The notes section quickly reaches its capacity,
> > > leading to incomplete memory segment information in the resulting coredump.
> > > This truncation compromises the utility of the coredumps, as crucial
> > > information about the memory state at the time of the crash might be
> > > omitted.
> >
> > Thanks for adding this!
> >
> > >
> > > Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
> > > Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> > >
> > > ---
> > > Changes in v2:
> > >    - Move new sysctl to fs/coredump.c [Luis & Kees]
> > >    - rename max_file_note_size to core_file_note_size_max [kees]
> > >    - Capture "why this is being done?" int he commit message [Luis & Kees]
> > > ---
> > >  fs/binfmt_elf.c          |  3 +--
> > >  fs/coredump.c            | 10 ++++++++++
> > >  include/linux/coredump.h |  1 +
> > >  3 files changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > > index 5397b552fbeb..6aebd062b92b 100644
> > > --- a/fs/binfmt_elf.c
> > > +++ b/fs/binfmt_elf.c
> > > @@ -1564,7 +1564,6 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
> > >       fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
> > >  }
> > >
> > > -#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> > >  /*
> > >   * Format of NT_FILE note:
> > >   *
> > > @@ -1592,7 +1591,7 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
> > >
> > >       names_ofs = (2 + 3 * count) * sizeof(data[0]);
> > >   alloc:
> > > -     if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
> > > +     if (size >= core_file_note_size_max) /* paranoia check */
> > >               return -EINVAL;
> >
> > I wonder, given the purpose of this sysctl, if it would be a
> > discoverability improvement to include a pr_warn_once() before the
> > EINVAL? Like:
> >
> >         /* paranoia check */
> >         if (size >= core_file_note_size_max) {
> >                 pr_warn_once("coredump Note size too large: %zu (does kernel.core_file_note_size_max sysctl need adjustment?\n", size);
> >                 return -EINVAL;
> >         }
> >
> > What do folks think? (I can't imagine tracking down this problem
> > originally was much fun, for example.)
> 
>  I think this would really be helpful. I will go ahead and add this if
> there's no objection from anyone.
> 
> Also, I haven't received a reply from Luis, do you think we need to
> add a ceiling?
> 
> +#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> +#define MAX_ALLOWED_NOTE_SIZE (16*1024*1024) // Define a reasonable max cap
> .....
> 
> +       {
> +               .procname       = "core_file_note_size_max",
> +               .data           = &core_file_note_size_max,
> +               .maxlen         = sizeof(unsigned int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_core_file_note_size_max,
> +       },
>  };
> 
> +int proc_core_file_note_size_max(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos) {
> +    int error = proc_douintvec(table, write, buffer, lenp, ppos);
> +    if (write && (core_file_note_size_max < MAX_FILE_NOTE_SIZE ||
> core_file_note_size_max > MAX_ALLOWED_NOTE_SIZE))
> +        core_file_note_size_max = MAX_FILE_NOTE_SIZE;  // Revert to
> default if out of bounds
> +    return error;
> +}
> 
> 
> Or, should we go ahead with the current patch(with the warning added)?

Let's add a ceiling just to avoid really pathological behavior. We got
this far with 4M, so having a new ceiling seems reasonable. And for
implementing it, see proc_douintvec_minmax.

-Kees
diff mbox series

Patch

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5397b552fbeb..6aebd062b92b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1564,7 +1564,6 @@  static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
 	fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
 }
 
-#define MAX_FILE_NOTE_SIZE (4*1024*1024)
 /*
  * Format of NT_FILE note:
  *
@@ -1592,7 +1591,7 @@  static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
 
 	names_ofs = (2 + 3 * count) * sizeof(data[0]);
  alloc:
-	if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
+	if (size >= core_file_note_size_max) /* paranoia check */
 		return -EINVAL;
 	size = round_up(size, PAGE_SIZE);
 	/*
diff --git a/fs/coredump.c b/fs/coredump.c
index be6403b4b14b..a312be48030f 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -56,10 +56,13 @@ 
 static bool dump_vma_snapshot(struct coredump_params *cprm);
 static void free_vma_snapshot(struct coredump_params *cprm);
 
+#define MAX_FILE_NOTE_SIZE (4*1024*1024)
+
 static int core_uses_pid;
 static unsigned int core_pipe_limit;
 static char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
+unsigned int core_file_note_size_max = MAX_FILE_NOTE_SIZE;
 
 struct core_name {
 	char *corename;
@@ -1020,6 +1023,13 @@  static struct ctl_table coredump_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname       = "core_file_note_size_max",
+		.data           = &core_file_note_size_max,
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler   = proc_douintvec,
+	},
 };
 
 static int __init init_fs_coredump_sysctls(void)
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index d3eba4360150..14c057643e7f 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -46,6 +46,7 @@  static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
 #endif
 
 #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
+extern unsigned int core_file_note_size_max;
 extern void validate_coredump_safety(void);
 #else
 static inline void validate_coredump_safety(void) {}