Message ID | 7e8f1aed1138ab2a52a8957ac95895ac9effd933.1586789126.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Integrate changed-path Bloom filters with 'git blame' | expand |
On Mon, Apr 13, 2020 at 02:45:24PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- > graph file whenever "git commit" is run, ensuring that we always > have an updated commit-graph throughout the test suite. The > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was > introduced to write the changed-path Bloom filters whenever "git > commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH > trick doesn't launch a separate process and instead writes it > directly. > > Update the "git commit" builtin to write changed-path Bloom filters > when both GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS > are enabled. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > commit-graph.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c > index 77668629e27..3e8f36ce5c8 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1968,6 +1968,8 @@ int write_commit_graph(struct object_directory *odb, > ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; > ctx->split_opts = split_opts; > ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; > + if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0)) > + ctx->changed_paths = 1; Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' environment variables and potentially ignoring its arguments, but given the discussion we had in v1, I don't feel strongly enough to recommend that you change this. For what it's worth, I think that 'write_commit_graph' could behave more purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set 'flags' appropriately from the outside, but I can understand also why it's preferred to do it in this style, too. > ctx->total_bloom_filter_data_size = 0; > > if (ctx->split) { > -- > gitgitgadget > Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' > environment variables and potentially ignoring its arguments, but given > the discussion we had in v1, I don't feel strongly enough to recommend > that you change this. > > For what it's worth, I think that 'write_commit_graph' could behave more > purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set > 'flags' appropriately from the outside,... Yeah, I agree that it would be a lot cleaner if that is easy to arrange. I have a suspicion that Derrick already tried and the resulting code looked messier and was discarded?
On 4/13/2020 6:21 PM, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > >> Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' >> environment variables and potentially ignoring its arguments, but given >> the discussion we had in v1, I don't feel strongly enough to recommend >> that you change this. >> >> For what it's worth, I think that 'write_commit_graph' could behave more >> purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set >> 'flags' appropriately from the outside,... > > Yeah, I agree that it would be a lot cleaner if that is easy to > arrange. I have a suspicion that Derrick already tried and the > resulting code looked messier and was discarded? Perhaps I could fix both concerns by 1. Use a macro instead of a library call. 2. Check the _CHANGED_PATHS variable in the macro. The macro would look like this: #define git_test_write_commit_graph_or_die() \ if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0)) { \ int flags = 0; \ if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0)) \ flags = COMMIT_GRAPH_WRITE_BLOOM_FILTERS; \ if (write_commit_graph_reachable( \ the_repository->objects->odb, flags, NULL)) \ die("failed to write commit-graph under GIT_TEST_COMMIT_GRAPH"); \ } Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: > On 4/13/2020 6:21 PM, Junio C Hamano wrote: >> Taylor Blau <me@ttaylorr.com> writes: >> >>> Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' >>> environment variables and potentially ignoring its arguments, but given >>> the discussion we had in v1, I don't feel strongly enough to recommend >>> that you change this. >>> >>> For what it's worth, I think that 'write_commit_graph' could behave more >>> purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set >>> 'flags' appropriately from the outside,... >> >> Yeah, I agree that it would be a lot cleaner if that is easy to >> arrange. I have a suspicion that Derrick already tried and the >> resulting code looked messier and was discarded? > > Perhaps I could fix both concerns by > > 1. Use a macro instead of a library call. > > 2. Check the _CHANGED_PATHS variable in the macro. I am not sure how use of a macro "fixes" purity, though. And what is the other concern? How widely would this "if we are testing, write out the graph file" call be sprinkled over the codebase? I am hoping that it won't be "everywhere", but only at strategic places (like "just once before we leave a subcommand that creates one or bunch of commits"). And how often would they be called? I am also hoping that it won't be "inside a tight loop". In short, I am wondering if we can promise our codebase that - git_test_write_commit_graph_or_die() calls won't be an eyesore (and/or distraction) for developers too much. - git_env_bool() call won't have performance impact. > The macro would look like this: > > > #define git_test_write_commit_graph_or_die() \ > if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0)) { \ > int flags = 0; \ > if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0)) \ > flags = COMMIT_GRAPH_WRITE_BLOOM_FILTERS; \ > if (write_commit_graph_reachable( \ > the_repository->objects->odb, flags, NULL)) \ > die("failed to write commit-graph under GIT_TEST_COMMIT_GRAPH"); \ > } > > Thanks, > -Stolee
On 4/14/2020 1:26 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> On 4/13/2020 6:21 PM, Junio C Hamano wrote: >>> Taylor Blau <me@ttaylorr.com> writes: >>> >>>> Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' >>>> environment variables and potentially ignoring its arguments, but given >>>> the discussion we had in v1, I don't feel strongly enough to recommend >>>> that you change this. >>>> >>>> For what it's worth, I think that 'write_commit_graph' could behave more >>>> purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set >>>> 'flags' appropriately from the outside,... >>> >>> Yeah, I agree that it would be a lot cleaner if that is easy to >>> arrange. I have a suspicion that Derrick already tried and the >>> resulting code looked messier and was discarded? >> >> Perhaps I could fix both concerns by >> >> 1. Use a macro instead of a library call. >> >> 2. Check the _CHANGED_PATHS variable in the macro. > > I am not sure how use of a macro "fixes" purity, though. And what > is the other concern? The concern was (1) checking the environment and (2) die()ing in the library. > How widely would this "if we are testing, write out the graph file" > call be sprinkled over the codebase? I am hoping that it won't be > "everywhere", but only at strategic places (like "just once before > we leave a subcommand that creates one or bunch of commits"). And > how often would they be called? I am also hoping that it won't be > "inside a tight loop". In short, I am wondering if we can promise > our codebase that > > - git_test_write_commit_graph_or_die() calls won't be an eyesore > (and/or distraction) for developers too much. > > - git_env_bool() call won't have performance impact. I could add a comment to the header file to say "this is only for improving coverage of optional features in the test suite. Do not call this method unless you know what you are doing." My intention right now is to only include this in 'git commit' and 'git merge'. Earlier discussion included thoughts about 'git rebase' and similar, but those are more rarely used when constructing an "example repo" in the test scripts. -Stolee
On Tue, Apr 14, 2020 at 01:40:53PM -0400, Derrick Stolee wrote: > On 4/14/2020 1:26 PM, Junio C Hamano wrote: > > Derrick Stolee <stolee@gmail.com> writes: > > > >> On 4/13/2020 6:21 PM, Junio C Hamano wrote: > >>> Taylor Blau <me@ttaylorr.com> writes: > >>> > >>>> Hmm. I'm not crazy about a library function looking at 'GIT_TEST_*' > >>>> environment variables and potentially ignoring its arguments, but given > >>>> the discussion we had in v1, I don't feel strongly enough to recommend > >>>> that you change this. > >>>> > >>>> For what it's worth, I think that 'write_commit_graph' could behave more > >>>> purely if callers checked 'GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS' and set > >>>> 'flags' appropriately from the outside,... > >>> > >>> Yeah, I agree that it would be a lot cleaner if that is easy to > >>> arrange. I have a suspicion that Derrick already tried and the > >>> resulting code looked messier and was discarded? > >> > >> Perhaps I could fix both concerns by > >> > >> 1. Use a macro instead of a library call. > >> > >> 2. Check the _CHANGED_PATHS variable in the macro. > > > > I am not sure how use of a macro "fixes" purity, though. And what > > is the other concern? > > The concern was (1) checking the environment and (2) die()ing in the > library. I think that we might as well use a plain old function here instead of a macro, especially since the macro is just expanding out to what that function would be if we had written it that way. For what it's worth, my concern was more that I don't mind dying, but I'd rather do so in a calling function, and that by the time we're calling 'write_commit_graph', that we should be using return values instead of 'die()' or 'error()'. > > How widely would this "if we are testing, write out the graph file" > > call be sprinkled over the codebase? I am hoping that it won't be > > "everywhere", but only at strategic places (like "just once before > > we leave a subcommand that creates one or bunch of commits"). And > > how often would they be called? I am also hoping that it won't be > > "inside a tight loop". In short, I am wondering if we can promise > > our codebase that > > > > - git_test_write_commit_graph_or_die() calls won't be an eyesore > > (and/or distraction) for developers too much. > > > > - git_env_bool() call won't have performance impact. > > I could add a comment to the header file to say "this is only for > improving coverage of optional features in the test suite. Do not > call this method unless you know what you are doing." Works for me. > My intention right now is to only include this in 'git commit' > and 'git merge'. Earlier discussion included thoughts about > 'git rebase' and similar, but those are more rarely used when > constructing an "example repo" in the test scripts. Ditto. I think that there's benefit in having it in some places but not all (as Stolee pointed out earlier in the thread), and that trying to have it everywhere is a fool's errand. > -Stolee Thanks, Taylor
diff --git a/commit-graph.c b/commit-graph.c index 77668629e27..3e8f36ce5c8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1968,6 +1968,8 @@ int write_commit_graph(struct object_directory *odb, ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; ctx->split_opts = split_opts; ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; + if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0)) + ctx->changed_paths = 1; ctx->total_bloom_filter_data_size = 0; if (ctx->split) {