diff mbox series

[1/2] check: detect and preserve all coredumps made by a test

Message ID 166500907546.887104.248083399669088204.stgit@magnolia (mailing list archive)
State New, archived
Headers show
Series fstests: improve coredump capture and storage | expand

Commit Message

Darrick J. Wong Oct. 5, 2022, 10:31 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

If someone sets kernel.core_uses_pid (or kernel.core_pattern), any
coredumps generated by fstests might have names that are longer than
just "core".  Since the pid isn't all that useful by itself, let's
record the coredumps by hash when we save them, so that we don't waste
space storing identical crash dumps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check     |   26 ++++++++++++++++++++++----
 common/rc |   16 ++++++++++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)

Comments

Zorro Lang Oct. 7, 2022, 5:18 a.m. UTC | #1
On Wed, Oct 05, 2022 at 03:31:15PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If someone sets kernel.core_uses_pid (or kernel.core_pattern), any
> coredumps generated by fstests might have names that are longer than
> just "core".  Since the pid isn't all that useful by itself, let's
> record the coredumps by hash when we save them, so that we don't waste
> space storing identical crash dumps.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  check     |   26 ++++++++++++++++++++++----
>  common/rc |   16 ++++++++++++++++
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/check b/check
> index af23572ccc..654d986b27 100755
> --- a/check
> +++ b/check
> @@ -913,11 +913,19 @@ function run_section()
>  			sts=$?
>  		fi
>  
> -		if [ -f core ]; then
> -			_dump_err_cont "[dumped core]"
> -			mv core $RESULT_BASE/$seqnum.core
> +		# If someone sets kernel.core_pattern or kernel.core_uses_pid,
> +		# coredumps generated by fstests might have a longer name than
> +		# just "core".  Use globbing to find the most common patterns,
> +		# assuming there are no other coredump capture packages set up.
> +		local cores=0
> +		for i in core core.*; do

I'm wondering if it should be "for i in core*" ? The coredump file only can be
"core" with dot ".", can it with "-" or "_" or others?


> +			test -f "$i" || continue
> +			if ((cores++ == 0)); then
> +				_dump_err_cont "[dumped core]"
> +			fi
> +			_save_coredump "$i"
>  			tc_status="fail"
> -		fi
> +		done
>  
>  		if [ -f $seqres.notrun ]; then
>  			$timestamp && _timestamp
> @@ -950,6 +958,16 @@ function run_section()
>  			# of the check script itself.
>  			(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
>  			_check_dmesg || tc_status="fail"
> +
> +			# Save any coredumps from the post-test fs checks
> +			for i in core core.*; do
> +				test -f "$i" || continue
> +				if ((cores++ == 0)); then
> +					_dump_err_cont "[dumped core]"
> +				fi
> +				_save_coredump "$i"
> +				tc_status="fail"
> +			done
>  		fi
>  
>  		# Reload the module after each test to check for leaks or
> diff --git a/common/rc b/common/rc
> index d1f3d56bf8..9750d06a9a 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4948,6 +4948,22 @@ _create_file_sized()
>  	return $ret
>  }
>  
> +_save_coredump()
> +{
> +	local path="$1"
> +
> +	local core_hash="$(_md5_checksum "$path")"
> +	local out_file="$RESULT_BASE/$seqnum.core.$core_hash"
> +
> +	if [ -s "$out_file" ]; then
> +		rm -f "$path"
> +		return
> +	fi
> +	rm -f "$out_file"
> +
> +	mv "$path" "$out_file"
> +}
> +
>  init_rc
>  
>  ################################################################################
>
Darrick J. Wong Oct. 7, 2022, 9:29 p.m. UTC | #2
On Fri, Oct 07, 2022 at 01:18:55PM +0800, Zorro Lang wrote:
> On Wed, Oct 05, 2022 at 03:31:15PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If someone sets kernel.core_uses_pid (or kernel.core_pattern), any
> > coredumps generated by fstests might have names that are longer than
> > just "core".  Since the pid isn't all that useful by itself, let's
> > record the coredumps by hash when we save them, so that we don't waste
> > space storing identical crash dumps.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  check     |   26 ++++++++++++++++++++++----
> >  common/rc |   16 ++++++++++++++++
> >  2 files changed, 38 insertions(+), 4 deletions(-)
> > 
> > 
> > diff --git a/check b/check
> > index af23572ccc..654d986b27 100755
> > --- a/check
> > +++ b/check
> > @@ -913,11 +913,19 @@ function run_section()
> >  			sts=$?
> >  		fi
> >  
> > -		if [ -f core ]; then
> > -			_dump_err_cont "[dumped core]"
> > -			mv core $RESULT_BASE/$seqnum.core
> > +		# If someone sets kernel.core_pattern or kernel.core_uses_pid,
> > +		# coredumps generated by fstests might have a longer name than
> > +		# just "core".  Use globbing to find the most common patterns,
> > +		# assuming there are no other coredump capture packages set up.
> > +		local cores=0
> > +		for i in core core.*; do
> 
> I'm wondering if it should be "for i in core*" ? The coredump file only can be
> "core" with dot ".", can it with "-" or "_" or others?

The ".$pid" pattern is encoded in the kernel function format_corename:

	/* Backward compatibility with core_uses_pid:
	 *
	 * If core_pattern does not include a %p (as is the default)
	 * and core_uses_pid is set, then .%pid will be appended to
	 * the filename. Do not do this for piped commands. */
	if (!ispipe && !pid_in_pattern && core_uses_pid) {
		err = cn_printf(cn, ".%d", task_tgid_vnr(current));
		if (err)
			return err;
	}

Note that even this is an incomplete solution because the sysadmin could
configure a totally different corename pattern, pipe it to another
program, or whatever, and fstests fails to capture any dumps at all.

Longer term it'd be theoretically nice to turn core_pattern into a
cgroup-manageable tunable and teach ./check to capture all the
coredumps, but I am not volunteering to write that much new
infrastructure. :/

--D

> 
> > +			test -f "$i" || continue
> > +			if ((cores++ == 0)); then
> > +				_dump_err_cont "[dumped core]"
> > +			fi
> > +			_save_coredump "$i"
> >  			tc_status="fail"
> > -		fi
> > +		done
> >  
> >  		if [ -f $seqres.notrun ]; then
> >  			$timestamp && _timestamp
> > @@ -950,6 +958,16 @@ function run_section()
> >  			# of the check script itself.
> >  			(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
> >  			_check_dmesg || tc_status="fail"
> > +
> > +			# Save any coredumps from the post-test fs checks
> > +			for i in core core.*; do
> > +				test -f "$i" || continue
> > +				if ((cores++ == 0)); then
> > +					_dump_err_cont "[dumped core]"
> > +				fi
> > +				_save_coredump "$i"
> > +				tc_status="fail"
> > +			done
> >  		fi
> >  
> >  		# Reload the module after each test to check for leaks or
> > diff --git a/common/rc b/common/rc
> > index d1f3d56bf8..9750d06a9a 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -4948,6 +4948,22 @@ _create_file_sized()
> >  	return $ret
> >  }
> >  
> > +_save_coredump()
> > +{
> > +	local path="$1"
> > +
> > +	local core_hash="$(_md5_checksum "$path")"
> > +	local out_file="$RESULT_BASE/$seqnum.core.$core_hash"
> > +
> > +	if [ -s "$out_file" ]; then
> > +		rm -f "$path"
> > +		return
> > +	fi
> > +	rm -f "$out_file"
> > +
> > +	mv "$path" "$out_file"
> > +}
> > +
> >  init_rc
> >  
> >  ################################################################################
> > 
>
diff mbox series

Patch

diff --git a/check b/check
index af23572ccc..654d986b27 100755
--- a/check
+++ b/check
@@ -913,11 +913,19 @@  function run_section()
 			sts=$?
 		fi
 
-		if [ -f core ]; then
-			_dump_err_cont "[dumped core]"
-			mv core $RESULT_BASE/$seqnum.core
+		# If someone sets kernel.core_pattern or kernel.core_uses_pid,
+		# coredumps generated by fstests might have a longer name than
+		# just "core".  Use globbing to find the most common patterns,
+		# assuming there are no other coredump capture packages set up.
+		local cores=0
+		for i in core core.*; do
+			test -f "$i" || continue
+			if ((cores++ == 0)); then
+				_dump_err_cont "[dumped core]"
+			fi
+			_save_coredump "$i"
 			tc_status="fail"
-		fi
+		done
 
 		if [ -f $seqres.notrun ]; then
 			$timestamp && _timestamp
@@ -950,6 +958,16 @@  function run_section()
 			# of the check script itself.
 			(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
 			_check_dmesg || tc_status="fail"
+
+			# Save any coredumps from the post-test fs checks
+			for i in core core.*; do
+				test -f "$i" || continue
+				if ((cores++ == 0)); then
+					_dump_err_cont "[dumped core]"
+				fi
+				_save_coredump "$i"
+				tc_status="fail"
+			done
 		fi
 
 		# Reload the module after each test to check for leaks or
diff --git a/common/rc b/common/rc
index d1f3d56bf8..9750d06a9a 100644
--- a/common/rc
+++ b/common/rc
@@ -4948,6 +4948,22 @@  _create_file_sized()
 	return $ret
 }
 
+_save_coredump()
+{
+	local path="$1"
+
+	local core_hash="$(_md5_checksum "$path")"
+	local out_file="$RESULT_BASE/$seqnum.core.$core_hash"
+
+	if [ -s "$out_file" ]; then
+		rm -f "$path"
+		return
+	fi
+	rm -f "$out_file"
+
+	mv "$path" "$out_file"
+}
+
 init_rc
 
 ################################################################################