Message ID | 99e4a0fe409a236d210d95e54cd03fce61daa291.1564438745.git.steadmon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] trace2: don't overload target directories | expand |
On 7/29/2019 6:20 PM, Josh Steadmon wrote: > trace2 can write files into a target directory. With heavy usage, this > directory can fill up with files, causing difficulty for > trace-processing systems. > > When trace2 would write a file to a target directory, first check > whether or not the directory is overloaded. A directory is overloaded if > there is a sentinel file declaring an overload, or if the number of > files exceeds a threshold. If the latter, create a sentinel file to > speed up later overload checks. > > The file count threshold is currently set to 1M files, but this can be > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT. 1 million seems like a LOT, and the environment variable seems to be only for testing. * If the variable is only for testing, then it should start with GIT_TEST_ * Are we sure 1 million is the right number? I would imagine even 10,000 starting to be a problem. How would a user adjust this value if they are having problems before 1,000,000? > The assumption is that a separate trace-processing system is dealing > with the generated traces; once it processes and removes the sentinel > file, it should be safe to generate new trace files again. This matches the model that you (Google) are using for collecting logs. I'll trust your expertise here in how backed up these logs become. I imagine that someone working without a network connection for a long time would be likely to run into this problem. [snip] > +test_expect_success "don't overload target directory" ' > + GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 && For testing, does this need to be 100? Could it be 5? > + export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT && To avoid leakage to other (future) tests, should these be in a subshell? > + test_when_finished "rm -r trace_target_dir" && > + mkdir trace_target_dir && > + test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt && > + xargs touch < expected_filenames.txt && nit: no space between redirection and filename. > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt && > + test_cmp expected_filenames.txt first_ls_output.txt && > + GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 && > + echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt && > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt && > + test_cmp expected_filenames.txt second_ls_output.txt > +' [snip] > +/* > + * Check to make sure we're not overloading the target directory with too many > + * files. First check for the presence of a sentinel file, then check file > + * count. If we are overloaded, create the sentinel file if it doesn't already > + * exist. > + * > + * We expect that some trace processing system is gradually collecting files > + * from the target directory; after it removes the sentinel file we'll start > + * writing traces again. > + */ > +static int tr2_dst_overloaded(const char *tgt_prefix) > +{ > + int file_count = 0, overload_file_count = 0; > + char *test_threshold_val; > + DIR *dirp; > + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT; > + struct stat statbuf; > + > + strbuf_addstr(&path, tgt_prefix); > + if (!is_dir_sep(path.buf[path.len - 1])) { > + strbuf_addch(&path, '/'); > + } > + > + /* check sentinel */ > + strbuf_addstr(&sentinel_path, path.buf); > + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME); > + if (!stat(sentinel_path.buf, &statbuf)) { > + strbuf_release(&path); > + return 1; > + } > + > + /* check if we're overriding the threshold (e.g., for testing) */ > + test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT"); > + if (test_threshold_val) > + overload_file_count = atoi(test_threshold_val); > + if (overload_file_count <= 0) > + overload_file_count = OVERLOAD_FILE_COUNT; > + > + > + /* check file count */ > + dirp = opendir(path.buf); > + while (file_count < overload_file_count && dirp && readdir(dirp)) > + file_count++; > + if (dirp) > + closedir(dirp); > + > + if (file_count >= overload_file_count) { > + creat(sentinel_path.buf, S_IRUSR | S_IWUSR); > + /* TODO: Write a target-specific message? */ Perhaps leave the TODO out of the code? I did see it in your commit message. > + strbuf_release(&path); > + return 1; > + } > + > + strbuf_release(&path); > + return 0; > +} > + > static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > { > int fd; > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > strbuf_addstr(&path, sid); > base_path_len = path.len; > > + if (tr2_dst_overloaded(tgt_prefix)) { > + strbuf_release(&path); > + if (tr2_dst_want_warning()) > + warning("trace2: not opening %s trace file due to too " > + "many files in target directory %s", > + tr2_sysenv_display_name(dst->sysenv_var), > + tgt_prefix); > + return 0; > + } > + > for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) { > if (attempt_count > 0) { > strbuf_setlen(&path, base_path_len); > Overall, this looks correct and the test is very clear. Seems to be a helpful feature! I only have the nits mentioned above. Thanks! -Stolee
On 7/29/2019 6:20 PM, Josh Steadmon wrote: > trace2 can write files into a target directory. With heavy usage, this > directory can fill up with files, causing difficulty for > trace-processing systems. > > When trace2 would write a file to a target directory, first check > whether or not the directory is overloaded. A directory is overloaded if > there is a sentinel file declaring an overload, or if the number of > files exceeds a threshold. If the latter, create a sentinel file to > speed up later overload checks. Something about this idea bothers me, but I can't quite put my finger on it. You're filling a directory with thousands of files while (hopefully simultaneously) having a post-processor/aggregator app read and delete them. I understand that if the aggregator falls behind or isn't running, the files will just accumulate and that the total number of files is the problem. But I have to wonder if contention on that directory is going to be a bottleneck and/or a source of problems. That is, you'll have one process reading and deleting and one or more Git processes scanning/counting/creating. It seems like there might be opportunity for some kinds of races here. It have to wonder if it would be better to do some kind of directory rotation rather than create a marker file. Alternatively, I think it would be better to not have the marker file be inside the directory, but rather have a lock file somewhere to temporarily disable tracing. Then your stat() call would not need to effectively search the large directory. Maybe make this "<dirname>.lock" as a peer to "<dirname>/", for example. > > The file count threshold is currently set to 1M files, but this can be > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT. > > The assumption is that a separate trace-processing system is dealing > with the generated traces; once it processes and removes the sentinel > file, it should be safe to generate new trace files again. > > Potential future work: > * Write a message into the sentinel file (should match the requested > trace2 output format). > * Make the overload threshold (and the whole overload feature) > configurable. I'm wondering if we should just make this setting another value in `tr2_sysenv_settings[]` rather than a *_TEST_* env var. That would give you both env and system/global config support, since I'm assuming you'd eventually want to have this be in the user's global config with the other trace2 settings. All of your tests could be expressed in terms of this new setting and we wouldn't need this new test env var. > > Signed-off-by: Josh Steadmon <steadmon@google.com> > --- > t/t0210-trace2-normal.sh | 15 ++++++++ > trace2/tr2_dst.c | 81 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 96 insertions(+) > > diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh > index ce7574edb1..e8a03e9212 100755 > --- a/t/t0210-trace2-normal.sh > +++ b/t/t0210-trace2-normal.sh > @@ -186,4 +186,19 @@ test_expect_success 'using global config with include' ' > test_cmp expect actual > ' > > +test_expect_success "don't overload target directory" ' > + GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 && > + export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT && > + test_when_finished "rm -r trace_target_dir" && > + mkdir trace_target_dir && > + test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt && > + xargs touch < expected_filenames.txt && > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt && > + test_cmp expected_filenames.txt first_ls_output.txt && > + GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 && > + echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt && > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt && > + test_cmp expected_filenames.txt second_ls_output.txt > +' > + > test_done > diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c > index 5dda0ca1cd..3286297918 100644 > --- a/trace2/tr2_dst.c > +++ b/trace2/tr2_dst.c > @@ -1,3 +1,5 @@ > +#include <dirent.h> > + > #include "cache.h" > #include "trace2/tr2_dst.h" > #include "trace2/tr2_sid.h" > @@ -8,6 +10,18 @@ > */ > #define MAX_AUTO_ATTEMPTS 10 > > +/* > + * Sentinel file used to detect when we're overloading a directory with too many > + * trace files. > + */ > +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload" > + > +/* > + * How many files we can write to a directory before entering overload mode. > + * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT > + */ > +#define OVERLOAD_FILE_COUNT 1000000 > + > static int tr2_dst_want_warning(void) > { > static int tr2env_dst_debug = -1; > @@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst) > dst->need_close = 0; > } > > +/* > + * Check to make sure we're not overloading the target directory with too many > + * files. First check for the presence of a sentinel file, then check file > + * count. If we are overloaded, create the sentinel file if it doesn't already > + * exist. > + * > + * We expect that some trace processing system is gradually collecting files > + * from the target directory; after it removes the sentinel file we'll start > + * writing traces again. > + */ > +static int tr2_dst_overloaded(const char *tgt_prefix) > +{ > + int file_count = 0, overload_file_count = 0; > + char *test_threshold_val; > + DIR *dirp; > + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT; > + struct stat statbuf; > + > + strbuf_addstr(&path, tgt_prefix); > + if (!is_dir_sep(path.buf[path.len - 1])) { > + strbuf_addch(&path, '/'); > + } > + > + /* check sentinel */ > + strbuf_addstr(&sentinel_path, path.buf); > + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME); > + if (!stat(sentinel_path.buf, &statbuf)) { > + strbuf_release(&path); Also release sentinel_path ? (And in both of the return statements below.) > + return 1; > + } > + > + /* check if we're overriding the threshold (e.g., for testing) */ > + test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT"); > + if (test_threshold_val) > + overload_file_count = atoi(test_threshold_val); > + if (overload_file_count <= 0) > + overload_file_count = OVERLOAD_FILE_COUNT; > + > + > + /* check file count */ > + dirp = opendir(path.buf); > + while (file_count < overload_file_count && dirp && readdir(dirp)) > + file_count++; > + if (dirp) > + closedir(dirp); > + > + if (file_count >= overload_file_count) { > + creat(sentinel_path.buf, S_IRUSR | S_IWUSR); > + /* TODO: Write a target-specific message? */ > + strbuf_release(&path); > + return 1; > + } > + > + strbuf_release(&path); > + return 0; > +} > + > static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > { > int fd; > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > strbuf_addstr(&path, sid); > base_path_len = path.len; > > + if (tr2_dst_overloaded(tgt_prefix)) { > + strbuf_release(&path); > + if (tr2_dst_want_warning()) > + warning("trace2: not opening %s trace file due to too " > + "many files in target directory %s", > + tr2_sysenv_display_name(dst->sysenv_var), > + tgt_prefix); > + return 0; > + } > + > for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) { > if (attempt_count > 0) { > strbuf_setlen(&path, base_path_len); > hope this helps, Jeff
On 7/29/2019 6:20 PM, Josh Steadmon wrote: > trace2 can write files into a target directory. With heavy usage, this > directory can fill up with files, causing difficulty for > trace-processing systems. I'm routing data in my org to a daemon via a Named Pipe or UD Socket, so I'm not seeing the thousands of files problems that you're seeing. However, we were being overwhelmed with lots of "uninteresting" commands and so I added some whitelisting to my post-processing daemon. For example, I want to know about checkout and push times -- I really don't care about rev-parse or config times or other such minor commands. I went one step further and allow either "(cmd_name)" or the pair "(cmd_name, cmd_mode)". This lets me select all checkouts and limit checkouts to branch-changing ones, for example. I drop any events in my post-processor that does not match any of my whitelist patterns. Perhaps you could run a quick histogram and see if something would be useful to pre-filter the data. That is, if we had whitelisting within git.exe itself, would you still have too much data and/or would you still need the overload feature that you've proposed in this RFC? Thanks, Jeff
On 2019.07.30 09:29, Derrick Stolee wrote: > On 7/29/2019 6:20 PM, Josh Steadmon wrote: > > trace2 can write files into a target directory. With heavy usage, this > > directory can fill up with files, causing difficulty for > > trace-processing systems. > > > > When trace2 would write a file to a target directory, first check > > whether or not the directory is overloaded. A directory is overloaded if > > there is a sentinel file declaring an overload, or if the number of > > files exceeds a threshold. If the latter, create a sentinel file to > > speed up later overload checks. > > > > The file count threshold is currently set to 1M files, but this can be > > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT. > > 1 million seems like a LOT, and the environment variable seems to be only > for testing. > > * If the variable is only for testing, then it should start with GIT_TEST_ > > * Are we sure 1 million is the right number? I would imagine even 10,000 > starting to be a problem. How would a user adjust this value if they > are having problems before 1,000,000? Yeah. I think we've only had reports of trouble starting around 5 million files. This definitely feels more like a config variable, but on the other hand I thought there was some resistance towards adding the somewhat special early-initialization trace config variables. So for the first revision I figured I'd just throw out a constant and see if there were any objections. If people feel like it's OK to add this to the early trace2 config options, then I'd be happy to do that in V2. > > The assumption is that a separate trace-processing system is dealing > > with the generated traces; once it processes and removes the sentinel > > file, it should be safe to generate new trace files again. > > This matches the model that you (Google) are using for collecting logs. > I'll trust your expertise here in how backed up these logs become. I > imagine that someone working without a network connection for a long > time would be likely to run into this problem. > > [snip] > > > +test_expect_success "don't overload target directory" ' > > + GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 && > > For testing, does this need to be 100? Could it be 5? Sure, changed to 5 in V2. > > + export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT && > > To avoid leakage to other (future) tests, should these be in a subshell? Yes, thanks for the catch. Fixed in V2. > > + test_when_finished "rm -r trace_target_dir" && > > + mkdir trace_target_dir && > > + test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt && > > + xargs touch < expected_filenames.txt && > > nit: no space between redirection and filename. Fixed in V2. > > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt && > > + test_cmp expected_filenames.txt first_ls_output.txt && > > + GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 && > > + echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt && > > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt && > > + test_cmp expected_filenames.txt second_ls_output.txt > > +' > > [snip] > > > +/* > > + * Check to make sure we're not overloading the target directory with too many > > + * files. First check for the presence of a sentinel file, then check file > > + * count. If we are overloaded, create the sentinel file if it doesn't already > > + * exist. > > + * > > + * We expect that some trace processing system is gradually collecting files > > + * from the target directory; after it removes the sentinel file we'll start > > + * writing traces again. > > + */ > > +static int tr2_dst_overloaded(const char *tgt_prefix) > > +{ > > + int file_count = 0, overload_file_count = 0; > > + char *test_threshold_val; > > + DIR *dirp; > > + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT; > > + struct stat statbuf; > > + > > + strbuf_addstr(&path, tgt_prefix); > > + if (!is_dir_sep(path.buf[path.len - 1])) { > > + strbuf_addch(&path, '/'); > > + } > > + > > + /* check sentinel */ > > + strbuf_addstr(&sentinel_path, path.buf); > > + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME); > > + if (!stat(sentinel_path.buf, &statbuf)) { > > + strbuf_release(&path); > > + return 1; > > + } > > + > > + /* check if we're overriding the threshold (e.g., for testing) */ > > + test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT"); > > + if (test_threshold_val) > > + overload_file_count = atoi(test_threshold_val); > > + if (overload_file_count <= 0) > > + overload_file_count = OVERLOAD_FILE_COUNT; > > + > > + > > + /* check file count */ > > + dirp = opendir(path.buf); > > + while (file_count < overload_file_count && dirp && readdir(dirp)) > > + file_count++; > > + if (dirp) > > + closedir(dirp); > > + > > + if (file_count >= overload_file_count) { > > + creat(sentinel_path.buf, S_IRUSR | S_IWUSR); > > + /* TODO: Write a target-specific message? */ > > Perhaps leave the TODO out of the code? I did see it in your commit message. Fixed in V2. > > + strbuf_release(&path); > > + return 1; > > + } > > + > > + strbuf_release(&path); > > + return 0; > > +} > > + > > static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > > { > > int fd; > > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > > strbuf_addstr(&path, sid); > > base_path_len = path.len; > > > > + if (tr2_dst_overloaded(tgt_prefix)) { > > + strbuf_release(&path); > > + if (tr2_dst_want_warning()) > > + warning("trace2: not opening %s trace file due to too " > > + "many files in target directory %s", > > + tr2_sysenv_display_name(dst->sysenv_var), > > + tgt_prefix); > > + return 0; > > + } > > + > > for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) { > > if (attempt_count > 0) { > > strbuf_setlen(&path, base_path_len); > > > > Overall, this looks correct and the test is very clear. Seems to be a helpful feature! > > I only have the nits mentioned above. Thanks for the review!
On 2019.07.30 12:46, Jeff Hostetler wrote: > > > On 7/29/2019 6:20 PM, Josh Steadmon wrote: > > trace2 can write files into a target directory. With heavy usage, this > > directory can fill up with files, causing difficulty for > > trace-processing systems. > > > > When trace2 would write a file to a target directory, first check > > whether or not the directory is overloaded. A directory is overloaded if > > there is a sentinel file declaring an overload, or if the number of > > files exceeds a threshold. If the latter, create a sentinel file to > > speed up later overload checks. > > Something about this idea bothers me, but I can't quite put my finger > on it. You're filling a directory with thousands of files while > (hopefully simultaneously) having a post-processor/aggregator app > read and delete them. I understand that if the aggregator falls > behind or isn't running, the files will just accumulate and that the > total number of files is the problem. But I have to wonder if > contention on that directory is going to be a bottleneck and/or > a source of problems. That is, you'll have one process reading and > deleting and one or more Git processes scanning/counting/creating. > It seems like there might be opportunity for some kinds of races > here. Yeah, this probably deserves some performance testing. I'll see what I can arrange prior to sending out V2. I don't think that racing issue is that big a deal, as there's not a ton of difference if we stop tracing after (say) 10**6 - 100 files vs 10**6 + 100 or whatever. But contention from multiple processes could definitely slow things down. > It have to wonder if it would be better to do some kind of directory > rotation rather than create a marker file. > > Alternatively, I think it would be better to not have the marker > file be inside the directory, but rather have a lock file somewhere > to temporarily disable tracing. Then your stat() call would not need > to effectively search the large directory. Maybe make this > "<dirname>.lock" as a peer to "<dirname>/", for example. Yeah, this would make the stat() faster. But we'd have to add logic either to git or to the collection system to clean up the .lock files after we empty out the target directory. I'll have to think about this and see if the collection team is open to this. > > The file count threshold is currently set to 1M files, but this can be > > overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT. > > > > The assumption is that a separate trace-processing system is dealing > > with the generated traces; once it processes and removes the sentinel > > file, it should be safe to generate new trace files again. > > > > Potential future work: > > * Write a message into the sentinel file (should match the requested > > trace2 output format). > > * Make the overload threshold (and the whole overload feature) > > configurable. > > I'm wondering if we should just make this setting another > value in `tr2_sysenv_settings[]` rather than a *_TEST_* env var. > > That would give you both env and system/global config support, > since I'm assuming you'd eventually want to have this be in > the user's global config with the other trace2 settings. > > All of your tests could be expressed in terms of this new setting > and we wouldn't need this new test env var. I do think this makes more sense as a config variable, but as I mentioned in my reply to Stolee, my impression was that we were somewhat hesitant to add new trace config variables if they have to go through the special early config initialization. If that impression is wrong I'll add the config var. > > > > Signed-off-by: Josh Steadmon <steadmon@google.com> > > --- > > t/t0210-trace2-normal.sh | 15 ++++++++ > > trace2/tr2_dst.c | 81 ++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 96 insertions(+) > > > > diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh > > index ce7574edb1..e8a03e9212 100755 > > --- a/t/t0210-trace2-normal.sh > > +++ b/t/t0210-trace2-normal.sh > > @@ -186,4 +186,19 @@ test_expect_success 'using global config with include' ' > > test_cmp expect actual > > ' > > +test_expect_success "don't overload target directory" ' > > + GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 && > > + export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT && > > + test_when_finished "rm -r trace_target_dir" && > > + mkdir trace_target_dir && > > + test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt && > > + xargs touch < expected_filenames.txt && > > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt && > > + test_cmp expected_filenames.txt first_ls_output.txt && > > + GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 && > > + echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt && > > + ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt && > > + test_cmp expected_filenames.txt second_ls_output.txt > > +' > > + > > test_done > > diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c > > index 5dda0ca1cd..3286297918 100644 > > --- a/trace2/tr2_dst.c > > +++ b/trace2/tr2_dst.c > > @@ -1,3 +1,5 @@ > > +#include <dirent.h> > > + > > #include "cache.h" > > #include "trace2/tr2_dst.h" > > #include "trace2/tr2_sid.h" > > @@ -8,6 +10,18 @@ > > */ > > #define MAX_AUTO_ATTEMPTS 10 > > +/* > > + * Sentinel file used to detect when we're overloading a directory with too many > > + * trace files. > > + */ > > +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload" > > + > > +/* > > + * How many files we can write to a directory before entering overload mode. > > + * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT > > + */ > > +#define OVERLOAD_FILE_COUNT 1000000 > > + > > static int tr2_dst_want_warning(void) > > { > > static int tr2env_dst_debug = -1; > > @@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst) > > dst->need_close = 0; > > } > > +/* > > + * Check to make sure we're not overloading the target directory with too many > > + * files. First check for the presence of a sentinel file, then check file > > + * count. If we are overloaded, create the sentinel file if it doesn't already > > + * exist. > > + * > > + * We expect that some trace processing system is gradually collecting files > > + * from the target directory; after it removes the sentinel file we'll start > > + * writing traces again. > > + */ > > +static int tr2_dst_overloaded(const char *tgt_prefix) > > +{ > > + int file_count = 0, overload_file_count = 0; > > + char *test_threshold_val; > > + DIR *dirp; > > + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT; > > + struct stat statbuf; > > + > > + strbuf_addstr(&path, tgt_prefix); > > + if (!is_dir_sep(path.buf[path.len - 1])) { > > + strbuf_addch(&path, '/'); > > + } > > + > > + /* check sentinel */ > > + strbuf_addstr(&sentinel_path, path.buf); > > + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME); > > + if (!stat(sentinel_path.buf, &statbuf)) { > > + strbuf_release(&path); > > Also release sentinel_path ? > (And in both of the return statements below.) > > > + return 1; > > + } > > + > > + /* check if we're overriding the threshold (e.g., for testing) */ > > + test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT"); > > + if (test_threshold_val) > > + overload_file_count = atoi(test_threshold_val); > > + if (overload_file_count <= 0) > > + overload_file_count = OVERLOAD_FILE_COUNT; > > + > > + > > + /* check file count */ > > + dirp = opendir(path.buf); > > + while (file_count < overload_file_count && dirp && readdir(dirp)) > > + file_count++; > > + if (dirp) > > + closedir(dirp); > > + > > + if (file_count >= overload_file_count) { > > + creat(sentinel_path.buf, S_IRUSR | S_IWUSR); > > + /* TODO: Write a target-specific message? */ > > + strbuf_release(&path); > > + return 1; > > + } > > + > > + strbuf_release(&path); > > + return 0; > > +} > > + > > static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > > { > > int fd; > > @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) > > strbuf_addstr(&path, sid); > > base_path_len = path.len; > > + if (tr2_dst_overloaded(tgt_prefix)) { > > + strbuf_release(&path); > > + if (tr2_dst_want_warning()) > > + warning("trace2: not opening %s trace file due to too " > > + "many files in target directory %s", > > + tr2_sysenv_display_name(dst->sysenv_var), > > + tgt_prefix); > > + return 0; > > + } > > + > > for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) { > > if (attempt_count > 0) { > > strbuf_setlen(&path, base_path_len); > > > > hope this helps, > Jeff >
On 2019.07.30 12:46, Jeff Hostetler wrote: [snip] > Also release sentinel_path ? > (And in both of the return statements below.) Thanks for the catch. Fixed in V2.
On 2019.07.30 14:00, Jeff Hostetler wrote: > > > On 7/29/2019 6:20 PM, Josh Steadmon wrote: > > trace2 can write files into a target directory. With heavy usage, this > > directory can fill up with files, causing difficulty for > > trace-processing systems. > > I'm routing data in my org to a daemon via a Named Pipe or UD Socket, > so I'm not seeing the thousands of files problems that you're seeing. > However, we were being overwhelmed with lots of "uninteresting" commands > and so I added some whitelisting to my post-processing daemon. For > example, I want to know about checkout and push times -- I really don't > care about rev-parse or config times or other such minor commands. > > I went one step further and allow either "(cmd_name)" or > the pair "(cmd_name, cmd_mode)". This lets me select all checkouts > and limit checkouts to branch-changing ones, for example. I drop > any events in my post-processor that does not match any of my whitelist > patterns. > > Perhaps you could run a quick histogram and see if something would > be useful to pre-filter the data. That is, if we had whitelisting > within git.exe itself, would you still have too much data and/or > would you still need the overload feature that you've proposed in > this RFC? Our problem is not so much with the volume of data as the fact that a few special users have a ton of git invocations in rapid succession, each of which creates a new trace file. If we had a more synchronous collection system like yours it would probably not be so much of a challenge. But the massive number of files created in a short timeframe revealed some inefficiencies in our collection system (which are thankfully being addressed by the owners of that code). But we probably still want some sort of overload prevention feature as long as we're using the target directory approach.
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh index ce7574edb1..e8a03e9212 100755 --- a/t/t0210-trace2-normal.sh +++ b/t/t0210-trace2-normal.sh @@ -186,4 +186,19 @@ test_expect_success 'using global config with include' ' test_cmp expect actual ' +test_expect_success "don't overload target directory" ' + GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT=100 && + export GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT && + test_when_finished "rm -r trace_target_dir" && + mkdir trace_target_dir && + test_seq $GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT | sed "s#^#trace_target_dir/#" | sort > expected_filenames.txt && + xargs touch < expected_filenames.txt && + ls trace_target_dir | sed "s#^#trace_target_dir/#" > first_ls_output.txt && + test_cmp expected_filenames.txt first_ls_output.txt && + GIT_TRACE2="$(pwd)/trace_target_dir" test-tool trace2 001return 0 && + echo "trace_target_dir/git-trace2-overload" >> expected_filenames.txt && + ls trace_target_dir | sed "s#^#trace_target_dir/#" > second_ls_output.txt && + test_cmp expected_filenames.txt second_ls_output.txt +' + test_done diff --git a/trace2/tr2_dst.c b/trace2/tr2_dst.c index 5dda0ca1cd..3286297918 100644 --- a/trace2/tr2_dst.c +++ b/trace2/tr2_dst.c @@ -1,3 +1,5 @@ +#include <dirent.h> + #include "cache.h" #include "trace2/tr2_dst.h" #include "trace2/tr2_sid.h" @@ -8,6 +10,18 @@ */ #define MAX_AUTO_ATTEMPTS 10 +/* + * Sentinel file used to detect when we're overloading a directory with too many + * trace files. + */ +#define OVERLOAD_SENTINEL_NAME "git-trace2-overload" + +/* + * How many files we can write to a directory before entering overload mode. + * This can be overridden with the envvar GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT + */ +#define OVERLOAD_FILE_COUNT 1000000 + static int tr2_dst_want_warning(void) { static int tr2env_dst_debug = -1; @@ -32,6 +46,63 @@ void tr2_dst_trace_disable(struct tr2_dst *dst) dst->need_close = 0; } +/* + * Check to make sure we're not overloading the target directory with too many + * files. First check for the presence of a sentinel file, then check file + * count. If we are overloaded, create the sentinel file if it doesn't already + * exist. + * + * We expect that some trace processing system is gradually collecting files + * from the target directory; after it removes the sentinel file we'll start + * writing traces again. + */ +static int tr2_dst_overloaded(const char *tgt_prefix) +{ + int file_count = 0, overload_file_count = 0; + char *test_threshold_val; + DIR *dirp; + struct strbuf path = STRBUF_INIT, sentinel_path = STRBUF_INIT; + struct stat statbuf; + + strbuf_addstr(&path, tgt_prefix); + if (!is_dir_sep(path.buf[path.len - 1])) { + strbuf_addch(&path, '/'); + } + + /* check sentinel */ + strbuf_addstr(&sentinel_path, path.buf); + strbuf_addstr(&sentinel_path, OVERLOAD_SENTINEL_NAME); + if (!stat(sentinel_path.buf, &statbuf)) { + strbuf_release(&path); + return 1; + } + + /* check if we're overriding the threshold (e.g., for testing) */ + test_threshold_val = getenv("GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT"); + if (test_threshold_val) + overload_file_count = atoi(test_threshold_val); + if (overload_file_count <= 0) + overload_file_count = OVERLOAD_FILE_COUNT; + + + /* check file count */ + dirp = opendir(path.buf); + while (file_count < overload_file_count && dirp && readdir(dirp)) + file_count++; + if (dirp) + closedir(dirp); + + if (file_count >= overload_file_count) { + creat(sentinel_path.buf, S_IRUSR | S_IWUSR); + /* TODO: Write a target-specific message? */ + strbuf_release(&path); + return 1; + } + + strbuf_release(&path); + return 0; +} + static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) { int fd; @@ -50,6 +121,16 @@ static int tr2_dst_try_auto_path(struct tr2_dst *dst, const char *tgt_prefix) strbuf_addstr(&path, sid); base_path_len = path.len; + if (tr2_dst_overloaded(tgt_prefix)) { + strbuf_release(&path); + if (tr2_dst_want_warning()) + warning("trace2: not opening %s trace file due to too " + "many files in target directory %s", + tr2_sysenv_display_name(dst->sysenv_var), + tgt_prefix); + return 0; + } + for (attempt_count = 0; attempt_count < MAX_AUTO_ATTEMPTS; attempt_count++) { if (attempt_count > 0) { strbuf_setlen(&path, base_path_len);
trace2 can write files into a target directory. With heavy usage, this directory can fill up with files, causing difficulty for trace-processing systems. When trace2 would write a file to a target directory, first check whether or not the directory is overloaded. A directory is overloaded if there is a sentinel file declaring an overload, or if the number of files exceeds a threshold. If the latter, create a sentinel file to speed up later overload checks. The file count threshold is currently set to 1M files, but this can be overridden for testing with GIT_TRACE2_TEST_OVERLOAD_FILE_COUNT. The assumption is that a separate trace-processing system is dealing with the generated traces; once it processes and removes the sentinel file, it should be safe to generate new trace files again. Potential future work: * Write a message into the sentinel file (should match the requested trace2 output format). * Make the overload threshold (and the whole overload feature) configurable. Signed-off-by: Josh Steadmon <steadmon@google.com> --- t/t0210-trace2-normal.sh | 15 ++++++++ trace2/tr2_dst.c | 81 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+)