diff mbox series

[RFC,v2,2/2] trace2: don't overload target directories

Message ID a779e272df958702c0df06ab58f1f6d6f8086a30.1564771000.git.steadmon@google.com (mailing list archive)
State New, archived
Headers show
Series trace2: don't overload target directories | expand

Commit Message

Josh Steadmon Aug. 2, 2019, 10:02 p.m. UTC
trace2 can write files into a target directory. With heavy usage, this
directory can fill up with files, causing difficulty for
trace-processing systems.

This patch adds a config option (trace2.maxFiles) to set a maximum
number of files that trace2 will write to a target directory. The
following behavior is enabled when the maxFiles is set to a positive
integer:
  When trace2 would write a file to a target directory, first check
  whether or not the directory is overloaded. A directory is overloaded
  if there is a sentinel file declaring an overload, or if the number of
  files exceeds trace2.maxFiles. If the latter, create a sentinel file
  to speed up later overload checks.

The assumption is that a separate trace-processing system is dealing
with the generated traces; once it processes and removes the sentinel
file, it should be safe to generate new trace files again.

The default value for trace2.maxFiles is zero, which disables the
overload check.

The config can also be overridden with a new environment variable:
GIT_TRACE2_MAX_FILES.

Potential future work:
* Write a message into the sentinel file (should match the requested
  trace2 output format).
* Add a performance test to make sure that contention between multiple
  processes all writing to the same target directory does not become an
  issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/config/trace2.txt |  6 +++
 t/t0210-trace2-normal.sh        | 19 ++++++++
 trace2/tr2_dst.c                | 86 +++++++++++++++++++++++++++++++++
 trace2/tr2_sysenv.c             |  3 ++
 trace2/tr2_sysenv.h             |  2 +
 5 files changed, 116 insertions(+)

Comments

Jeff Hostetler Aug. 5, 2019, 3:34 p.m. UTC | #1
On 8/2/2019 6:02 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.
> 
> This patch adds a config option (trace2.maxFiles) to set a maximum
> number of files that trace2 will write to a target directory. The
> following behavior is enabled when the maxFiles is set to a positive
> integer:
>    When trace2 would write a file to a target directory, first check
>    whether or not the directory is overloaded. A directory is overloaded
>    if there is a sentinel file declaring an overload, or if the number of
>    files exceeds trace2.maxFiles. If the latter, create a sentinel file
>    to speed up later overload checks.
> 
> The assumption is that a separate trace-processing system is dealing
> with the generated traces; once it processes and removes the sentinel
> file, it should be safe to generate new trace files again.
> 
> The default value for trace2.maxFiles is zero, which disables the
> overload check.
> 
> The config can also be overridden with a new environment variable:
> GIT_TRACE2_MAX_FILES.
> 
> Potential future work:
> * Write a message into the sentinel file (should match the requested
>    trace2 output format).
> * Add a performance test to make sure that contention between multiple
>    processes all writing to the same target directory does not become an
>    issue.


This looks much nicer than the V1 version.  Having it be a
real feature rather than a test feature helps.

I don't see anything wrong with this.  I do worry about the
overhead a bit.  If you really have that many files in the
target directory, having every command count them at startup
might be an issue.

As an alternative, you might consider doing something like
this:

[] have an option to make the target directory path expand to
    something like "<path>/yyyymmdd/" and create the per-process
    files as "<path>/yyyymmdd/<sid>".

If there are 0, 1 or 2 directories, logging is enabled.
We assume that the post-processor is keeping up and all is well.
We need to allow 2 so that we continue to log around midnight.

If there are 3 or more directories, logging is disabled.
The post-processor is more than 24 hours behind for whatever
reason.  We assume here that the post-processor will process
and delete the oldest-named directory, so it is a valid measure
of the backlog.

I suggest "yyyymmdd" here for simplicity in this discussion
as daily log rotation is common.  If that's still overloading,
you could make it a longer prefix of the <sid>.  And include
the hour, for example.

I suggest 3 as the cutoff lower bound, because we need to allow
2 for midnight rotation.  But you may want to increase it to
allow for someone to be offline for a long weekend, for example.

Anyway, this is just a suggestion.  It would give you the
throttling, but without the need for every command to count
the contents of the target directory.

And it would still allow your post-processor to operate in
near real-time on the contents of the current day's target
directory or to hang back if that causes too much contention.

Feel free to ignore this :-)

Jeff
SZEDER Gábor Aug. 5, 2019, 6:01 p.m. UTC | #2
On Fri, Aug 02, 2019 at 03:02:35PM -0700, Josh Steadmon wrote:
> +test_expect_success "don't overload target directory" '
> +	mkdir trace_target_dir &&
> +	test_when_finished "rm -r trace_target_dir" &&
> +	(
> +		GIT_TRACE2_MAX_FILES=5 &&
> +		export GIT_TRACE2_MAX_FILES &&
> +		cd trace_target_dir &&
> +		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
> +		xargs touch <../expected_filenames.txt &&
> +		cd .. &&
> +		ls trace_target_dir >first_ls_output.txt &&
> +		test_cmp expected_filenames.txt first_ls_output.txt &&

Nit: what's the purpose of this 'ls' and 'test_cmp'?

It looks like they check that xargs created all the files it was told
to create.  I think that this falls into the category "We are not in
the business of verifying that the world given to us sanely works."
and is unnecessary.

> +		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
> +	) &&
> +	echo git-trace2-overload >>expected_filenames.txt &&
> +	ls trace_target_dir >second_ls_output.txt &&
> +	test_cmp expected_filenames.txt second_ls_output.txt
> +'
> +
>  test_done
Josh Steadmon Aug. 5, 2019, 6:09 p.m. UTC | #3
On 2019.08.05 20:01, SZEDER Gábor wrote:
> On Fri, Aug 02, 2019 at 03:02:35PM -0700, Josh Steadmon wrote:
> > +test_expect_success "don't overload target directory" '
> > +	mkdir trace_target_dir &&
> > +	test_when_finished "rm -r trace_target_dir" &&
> > +	(
> > +		GIT_TRACE2_MAX_FILES=5 &&
> > +		export GIT_TRACE2_MAX_FILES &&
> > +		cd trace_target_dir &&
> > +		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
> > +		xargs touch <../expected_filenames.txt &&
> > +		cd .. &&
> > +		ls trace_target_dir >first_ls_output.txt &&
> > +		test_cmp expected_filenames.txt first_ls_output.txt &&
> 
> Nit: what's the purpose of this 'ls' and 'test_cmp'?
> 
> It looks like they check that xargs created all the files it was told
> to create.  I think that this falls into the category "We are not in
> the business of verifying that the world given to us sanely works."
> and is unnecessary.

Understood. Will remove in V3.
Josh Steadmon Aug. 5, 2019, 6:17 p.m. UTC | #4
On 2019.08.05 11:34, Jeff Hostetler wrote:
> 
> 
> On 8/2/2019 6:02 PM, Josh Steadmon wrote:
> > trace2 can write files into a target directory. With heavy usage, this
> > directory can fill up with files, causing difficulty for
> > trace-processing systems.
> > 
> > This patch adds a config option (trace2.maxFiles) to set a maximum
> > number of files that trace2 will write to a target directory. The
> > following behavior is enabled when the maxFiles is set to a positive
> > integer:
> >    When trace2 would write a file to a target directory, first check
> >    whether or not the directory is overloaded. A directory is overloaded
> >    if there is a sentinel file declaring an overload, or if the number of
> >    files exceeds trace2.maxFiles. If the latter, create a sentinel file
> >    to speed up later overload checks.
> > 
> > The assumption is that a separate trace-processing system is dealing
> > with the generated traces; once it processes and removes the sentinel
> > file, it should be safe to generate new trace files again.
> > 
> > The default value for trace2.maxFiles is zero, which disables the
> > overload check.
> > 
> > The config can also be overridden with a new environment variable:
> > GIT_TRACE2_MAX_FILES.
> > 
> > Potential future work:
> > * Write a message into the sentinel file (should match the requested
> >    trace2 output format).
> > * Add a performance test to make sure that contention between multiple
> >    processes all writing to the same target directory does not become an
> >    issue.
> 
> 
> This looks much nicer than the V1 version.  Having it be a
> real feature rather than a test feature helps.
> 
> I don't see anything wrong with this.  I do worry about the
> overhead a bit.  If you really have that many files in the
> target directory, having every command count them at startup
> might be an issue.
> 
> As an alternative, you might consider doing something like
> this:
> 
> [] have an option to make the target directory path expand to
>    something like "<path>/yyyymmdd/" and create the per-process
>    files as "<path>/yyyymmdd/<sid>".
> 
> If there are 0, 1 or 2 directories, logging is enabled.
> We assume that the post-processor is keeping up and all is well.
> We need to allow 2 so that we continue to log around midnight.
> 
> If there are 3 or more directories, logging is disabled.
> The post-processor is more than 24 hours behind for whatever
> reason.  We assume here that the post-processor will process
> and delete the oldest-named directory, so it is a valid measure
> of the backlog.
> 
> I suggest "yyyymmdd" here for simplicity in this discussion
> as daily log rotation is common.  If that's still overloading,
> you could make it a longer prefix of the <sid>.  And include
> the hour, for example.
> 
> I suggest 3 as the cutoff lower bound, because we need to allow
> 2 for midnight rotation.  But you may want to increase it to
> allow for someone to be offline for a long weekend, for example.
> 
> Anyway, this is just a suggestion.  It would give you the
> throttling, but without the need for every command to count
> the contents of the target directory.
> 
> And it would still allow your post-processor to operate in
> near real-time on the contents of the current day's target
> directory or to hang back if that causes too much contention.
> 
> Feel free to ignore this :-)
> 
> Jeff

This does sound reasonable. I'll talk with our collection team and see
if this would work without requiring any changes on their side. If not,
I'll probably take a stab at this when I'm back from vacation.

Thanks!
diff mbox series

Patch

diff --git a/Documentation/config/trace2.txt b/Documentation/config/trace2.txt
index 2edbfb02fe..4ce0b9a6d1 100644
--- a/Documentation/config/trace2.txt
+++ b/Documentation/config/trace2.txt
@@ -54,3 +54,9 @@  trace2.destinationDebug::
 	By default, these errors are suppressed and tracing is
 	silently disabled.  May be overridden by the
 	`GIT_TRACE2_DST_DEBUG` environment variable.
+
+trace2.maxFiles::
+	Integer.  When writing trace files to a target directory, do not
+	write additional traces if we would exceed this many files. Instead,
+	write a sentinel file that will block further tracing to this
+	directory. Defaults to 0, which disables this check.
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index ce7574edb1..59b9560109 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -186,4 +186,23 @@  test_expect_success 'using global config with include' '
 	test_cmp expect actual
 '
 
+test_expect_success "don't overload target directory" '
+	mkdir trace_target_dir &&
+	test_when_finished "rm -r trace_target_dir" &&
+	(
+		GIT_TRACE2_MAX_FILES=5 &&
+		export GIT_TRACE2_MAX_FILES &&
+		cd trace_target_dir &&
+		test_seq $GIT_TRACE2_MAX_FILES >../expected_filenames.txt &&
+		xargs touch <../expected_filenames.txt &&
+		cd .. &&
+		ls trace_target_dir >first_ls_output.txt &&
+		test_cmp expected_filenames.txt first_ls_output.txt &&
+		GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0
+	) &&
+	echo git-trace2-overload >>expected_filenames.txt &&
+	ls trace_target_dir >second_ls_output.txt &&
+	test_cmp expected_filenames.txt second_ls_output.txt
+'
+
 test_done
diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c
index 5dda0ca1cd..40ec03c2d7 100644
--- a/trace2/tr2_dst.c
+++ b/trace2/tr2_dst.c
@@ -1,3 +1,5 @@ 
+#include <dirent.h>
+
 #include "cache.h"
 #include "trace2/tr2_dst.h"
 #include "trace2/tr2_sid.h"
@@ -8,6 +10,19 @@ 
  */
 #define MAX_AUTO_ATTEMPTS 10
 
+/*
+ * Sentinel file used to detect when we're overloading a directory with too many
+ * trace files.
+ */
+#define OVERLOAD_SENTINEL_NAME "git-trace2-overload"
+
+/*
+ * When set to zero, disables directory overload checking. Otherwise, controls
+ * how many files we can write to a directory before entering overload mode.
+ * This can be overridden via the TR2_SYSENV_MAX_FILES setting.
+ */
+static int tr2env_max_files = 0;
+
 static int tr2_dst_want_warning(void)
 {
 	static int tr2env_dst_debug = -1;
@@ -32,6 +47,67 @@  void tr2_dst_trace_disable(struct tr2_dst *dst)
 	dst->need_close = 0;
 }
 
+/*
+ * Check to make sure we're not overloading the target directory with too many
+ * files. First get the threshold (if present) from the config or envvar. If
+ * it's zero or unset, disable this check.  Next check for the presence of a
+ * sentinel file, then check file count. If we are overloaded, create the
+ * sentinel file if it doesn't already exist.
+ *
+ * We expect that some trace processing system is gradually collecting files
+ * from the target directory; after it removes the sentinel file we'll start
+ * writing traces again.
+ */
+static int tr2_dst_overloaded(const char *tgt_prefix)
+{
+	int file_count = 0, max_files = 0, ret = 0;
+	const char *max_files_var;
+	DIR *dirp;
+	struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT;
+	struct stat statbuf;
+
+	strbuf_addstr(&path, tgt_prefix);
+	if (!is_dir_sep(path.buf[path.len - 1])) {
+		strbuf_addch(&path, '/');
+	}
+
+	/* Get the config or envvar and decide if we should continue this check */
+	max_files_var = tr2_sysenv_get(TR2_SYSENV_MAX_FILES);
+	if (max_files_var && *max_files_var && ((max_files = atoi(max_files_var)) >= 0))
+		tr2env_max_files = max_files;
+
+	if (!tr2env_max_files) {
+		ret = 0;
+		goto cleanup;
+	}
+
+	/* check sentinel */
+	strbuf_addstr(&sentinel_path, path.buf);
+	strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME);
+	if (!stat(sentinel_path.buf, &statbuf)) {
+		ret = 1;
+		goto cleanup;
+	}
+
+	/* check file count */
+	dirp = opendir(path.buf);
+	while (file_count < tr2env_max_files && dirp && readdir(dirp))
+		file_count++;
+	if (dirp)
+		closedir(dirp);
+
+	if (file_count >= tr2env_max_files) {
+		creat(sentinel_path.buf, S_IRUSR | S_IWUSR);
+		ret = 1;
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&path);
+	strbuf_release(&sentinel_path);
+	return ret;
+}
+
 static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 {
 	int fd;
@@ -50,6 +126,16 @@  static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix)
 	strbuf_addstr(&path, sid);
 	base_path_len = path.len;
 
+	if (tr2_dst_overloaded(tgt_prefix)) {
+		strbuf_release(&path);
+		if (tr2_dst_want_warning())
+			warning("trace2: not opening %s trace file due to too "
+				"many files in target directory %s",
+				tr2_sysenv_display_name(dst->sysenv_var),
+				tgt_prefix);
+		return 0;
+	}
+
 	for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) {
 		if (attempt_count > 0) {
 			strbuf_setlen(&path, base_path_len);
diff --git a/trace2/tr2_sysenv.c b/trace2/tr2_sysenv.c
index 5958cfc424..3c3792eca2 100644
--- a/trace2/tr2_sysenv.c
+++ b/trace2/tr2_sysenv.c
@@ -49,6 +49,9 @@  static struct tr2_sysenv_entry tr2_sysenv_settings[] = {
 				       "trace2.perftarget" },
 	[TR2_SYSENV_PERF_BRIEF]    = { "GIT_TRACE2_PERF_BRIEF",
 				       "trace2.perfbrief" },
+
+	[TR2_SYSENV_MAX_FILES]     = { "GIT_TRACE2_MAX_FILES",
+				       "trace2.maxfiles" },
 };
 /* clang-format on */
 
diff --git a/trace2/tr2_sysenv.h b/trace2/tr2_sysenv.h
index 8dd82a7a56..d4364a7b85 100644
--- a/trace2/tr2_sysenv.h
+++ b/trace2/tr2_sysenv.h
@@ -24,6 +24,8 @@  enum tr2_sysenv_variable {
 	TR2_SYSENV_PERF,
 	TR2_SYSENV_PERF_BRIEF,
 
+	TR2_SYSENV_MAX_FILES,
+
 	TR2_SYSENV_MUST_BE_LAST
 };