mbox series

[v3,0/2] binfmt_elf, coredump: Log the reason of the failed core dumps

Message ID 20240718182743.1959160-1-romank@linux.microsoft.com (mailing list archive)
Headers show
Series binfmt_elf, coredump: Log the reason of the failed core dumps | expand

Message

Roman Kisel July 18, 2024, 6:27 p.m. UTC
A powerful way to diagnose crashes is to analyze the core dump produced upon
the failure. Missing or malformed core dump files hinder these investigations.
I'd like to propose changes that add logging as to why the kernel would not
finish writing out the core dump file.

To help in diagnosing the user mode helper not writing out the entire coredump
contents, the changes also log short statistics on the dump collection. I'd
advocate for keeping this at the info level on these grounds.

For validation, I built the kernel and a simple user space to exercize the new
code.

[V3]
  - Standartized the existing logging to report TGID and comm consistently
  - Fixed compiler warnings for the 32-bit systems (used %zd in the format strings)

[V2]
  https://lore.kernel.org/all/20240712215223.605363-1-romank@linux.microsoft.com/
  - Used _ratelimited to avoid spamming the system log
  - Added comm and PID to the log messages
  - Added logging to the failure paths in dump_interrupted, dump_skip, and dump_emit
  - Fixed compiler warnings produced when CONFIG_COREDUMP is disabled

[V1]
  https://lore.kernel.org/all/20240617234133.1167523-1-romank@linux.microsoft.com/

Roman Kisel (2):
  coredump: Standartize and fix logging
  binfmt_elf, coredump: Log the reason of the failed core dumps

 fs/binfmt_elf.c          |  48 +++++++++----
 fs/coredump.c            | 150 +++++++++++++++++++++++++++------------
 include/linux/coredump.h |  30 +++++++-
 kernel/signal.c          |  21 +++++-
 4 files changed, 188 insertions(+), 61 deletions(-)


base-commit: 831bcbcead6668ebf20b64fdb27518f1362ace3a

Comments

Kees Cook July 19, 2024, 5:26 p.m. UTC | #1
On Thu, Jul 18, 2024 at 11:27:23AM -0700, Roman Kisel wrote:
> A powerful way to diagnose crashes is to analyze the core dump produced upon
> the failure. Missing or malformed core dump files hinder these investigations.
> I'd like to propose changes that add logging as to why the kernel would not
> finish writing out the core dump file.
> 
> To help in diagnosing the user mode helper not writing out the entire coredump
> contents, the changes also log short statistics on the dump collection. I'd
> advocate for keeping this at the info level on these grounds.
> 
> For validation, I built the kernel and a simple user space to exercize the new
> code.
> 
> [V3]
>   - Standartized the existing logging to report TGID and comm consistently
>   - Fixed compiler warnings for the 32-bit systems (used %zd in the format strings)
> 
> [V2]
>   https://lore.kernel.org/all/20240712215223.605363-1-romank@linux.microsoft.com/
>   - Used _ratelimited to avoid spamming the system log
>   - Added comm and PID to the log messages
>   - Added logging to the failure paths in dump_interrupted, dump_skip, and dump_emit
>   - Fixed compiler warnings produced when CONFIG_COREDUMP is disabled
> 
> [V1]
>   https://lore.kernel.org/all/20240617234133.1167523-1-romank@linux.microsoft.com/
> 
> Roman Kisel (2):
>   coredump: Standartize and fix logging
>   binfmt_elf, coredump: Log the reason of the failed core dumps
> 
>  fs/binfmt_elf.c          |  48 +++++++++----
>  fs/coredump.c            | 150 +++++++++++++++++++++++++++------------
>  include/linux/coredump.h |  30 +++++++-
>  kernel/signal.c          |  21 +++++-
>  4 files changed, 188 insertions(+), 61 deletions(-)

This looks good to me! I'll put this in -next once the merge window
closes. Thanks!

-Kees
Roman Kisel July 22, 2024, 7:33 p.m. UTC | #2
On 7/19/2024 10:26 AM, Kees Cook wrote:
> On Thu, Jul 18, 2024 at 11:27:23AM -0700, Roman Kisel wrote:
>> A powerful way to diagnose crashes is to analyze the core dump produced upon
>> the failure. Missing or malformed core dump files hinder these investigations.
>> I'd like to propose changes that add logging as to why the kernel would not
>> finish writing out the core dump file.
>>
>> To help in diagnosing the user mode helper not writing out the entire coredump
>> contents, the changes also log short statistics on the dump collection. I'd
>> advocate for keeping this at the info level on these grounds.
>>
>> For validation, I built the kernel and a simple user space to exercize the new
>> code.
>>
>> [V3]
>>    - Standartized the existing logging to report TGID and comm consistently
>>    - Fixed compiler warnings for the 32-bit systems (used %zd in the format strings)
>>
>> [V2]
>>    https://lore.kernel.org/all/20240712215223.605363-1-romank@linux.microsoft.com/
>>    - Used _ratelimited to avoid spamming the system log
>>    - Added comm and PID to the log messages
>>    - Added logging to the failure paths in dump_interrupted, dump_skip, and dump_emit
>>    - Fixed compiler warnings produced when CONFIG_COREDUMP is disabled
>>
>> [V1]
>>    https://lore.kernel.org/all/20240617234133.1167523-1-romank@linux.microsoft.com/
>>
>> Roman Kisel (2):
>>    coredump: Standartize and fix logging
>>    binfmt_elf, coredump: Log the reason of the failed core dumps
>>
>>   fs/binfmt_elf.c          |  48 +++++++++----
>>   fs/coredump.c            | 150 +++++++++++++++++++++++++++------------
>>   include/linux/coredump.h |  30 +++++++-
>>   kernel/signal.c          |  21 +++++-
>>   4 files changed, 188 insertions(+), 61 deletions(-)
> 
> This looks good to me! I'll put this in -next once the merge window
> closes. Thanks!
> 
Kees, thank you for your guidance!

> -Kees
>
Kees Cook Aug. 6, 2024, 4:33 a.m. UTC | #3
On Thu, 18 Jul 2024 11:27:23 -0700, Roman Kisel wrote:
> A powerful way to diagnose crashes is to analyze the core dump produced upon
> the failure. Missing or malformed core dump files hinder these investigations.
> I'd like to propose changes that add logging as to why the kernel would not
> finish writing out the core dump file.
> 
> To help in diagnosing the user mode helper not writing out the entire coredump
> contents, the changes also log short statistics on the dump collection. I'd
> advocate for keeping this at the info level on these grounds.
> 
> [...]

Applied to for-next/execve, thanks!

[1/2] coredump: Standartize and fix logging
      https://git.kernel.org/kees/c/c114e9948c2b
[2/2] binfmt_elf, coredump: Log the reason of the failed core dumps
      https://git.kernel.org/kees/c/fb97d2eb542f

Take care,