Message ID | 20211005001931.13932-1-chooglen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Use default values from settings instead of config | expand |
On Mon, Oct 04 2021, Glen Choo wrote: > Hi everyone! This patch was created in response to something we observed in > Google, where fsck failed to detect that the commit graph was invalid. We > initially assumed that fsck never checked the commit graph, but it turns out > that it does so only when core.commitgraph is set, even though we set defaults > for "whether to use the commit graph" in the repository settings. > > Instead of using the config, let's use repository settings where > available. Replace core.commitGraph and core.multiPackIndex with their > equivalent repository settings in fsck and gc. > > I don't anticipate any more business logic changes and the previous comments > were focused on testing style, so hopefully this should be good to merge. > > Changes since v2: > * Various small test fixes (thanks Eric!). Most notably, I've used -c instead of > test_config because test_config can affect the values in subsequent tests. > * Rewording fix in patch 3 commit message > * Refactor tests in patch 3 so that we use a single helper function instead of > copy-pasted code > > Changes since v1: > * clean up typo in patch 1 commit message > * document the commits that patches 1 and 2 address > * use test helpers in patch 1 > * rewrite patch 2's tests so that it's easier to tell that each test > does something different > * reword patch 3 commit message to explain the bug > * add tests to patch 3 > > Glen Choo (3): > fsck: verify commit graph when implicitly enabled > fsck: verify multi-pack-index when implictly enabled > gc: perform incremental repack when implictly enabled I've looked this over carefully, it LGTM. I noticed some bugs in this area, but they pre-date your commits. I've pushed a "gc/use-repo-settings" to https://github.com/avar/git.git that you can check out, consider that branch partially as commentary, but feel free to pick/squash anything you find there. The range-diff for that is quoted below. Notes: * This is somewhat of a case of throwing rocks from a glass house, but I think the commit messages could really be shorter/get to the point. I.e. say right away what's being changed and why, instead of giving the reader an overview of the whole control flow. I just adjusted adusted 1/3 in that direction below, I think 2/3 could do with being squashed into it & adjusted, it's really the exact same bug/improvement, just with a different config key that behaves the same. * The "test_unconfig" in your 2/3 is redundant, so is the "rm -rf" of directories that can't exist by that point in 3/3. * I wondered if we could just call prepare_repo_settings() in one place earlier in gc/maintenance instead of adding yet another call deeper in the control flow. I've got a 4/3 below that tries to do that, and (as noted in its commit message) intentionally breaks "gc.c" in a way that our tests don't catch, which is perhaps fallout from 31b1de6a09b? So that's very much on the topic of extra improvements, i.e. your patches don't break that anymore than it is now, but I for one would very much like to see an end-state where we're confident that gc/maintenance do the right thing on each of the tri-states of these boolean variables. It's also a point about how to write reliable tests (again, pre-dating your series). I.e. instead of asserting "we do X when Y is true" have the test assert all of Y=[unset|false|true] in a way that breaks if any are different than expected. One of the tests you're changing has a "graph_git_two_modes" helper does 2/3 of that, maybe it would be easy to extend it? * You use corrupting the graph as a shorthand for "did we run this command?". See test_region for an approach that might be better, i.e. just log with trace2 and grep that to see if we started the relevant command. The test_region helper can't be used as-is for that (you could manually grep the emitted PERF/EVENT output), but perhaps that's easier/more reliable? * This is again, something that pre-dates your patches, but I think following the "is enabled?" behavior here for these variables in "git fsck" is highly questionable. So just some thoughts after having looked at that: Just because your config doen't say "use the graph" doesn't mean another process won't do so, and if the graph is broken they might probably encounter an error. Although I think we *should* have cleaned up those being fatal a while ago). See 43d35618055 (commit-graph write: don't die if the existing graph is corrupt, 2019-03-25). But in general for these auxiliary files (commit-graph, midix, *.rev etc.) I think there's a good argument to be made that fsck should *always* check them, and at most use the config to say something like: Hey, you've got a 100% broken commit-graph/midx/rev in your .git, looks like the core.XYZ enabling it is off *right now* though, so maybe we won't use it. I hope you're feeling lucky... :) 1: fd9a58e2c7b ! 1: c6041f3b633 fsck: verify commit graph when implicitly enabled @@ Metadata ## Commit message ## fsck: verify commit graph when implicitly enabled - the_repository->settings is the preferred way to get certain - settings (such as "core.commitGraph") because it gets default values - from prepare_repo_settings(). However, cmd_fsck() reads the config - directly via git_config_get_bool(), which bypasses these default values. - This causes fsck to ignore the commit graph if "core.commitgraph" is not - explicitly set in the config. This worked fine until commit-graph was - enabled by default in 31b1de6a09 (commit-graph: turn on commit-graph by - default, 2019-08-13). + Change fsck to check the "core_commit_graph" variable set in + "repo-settings.c" instead of reading the "core.commitGraph" variable, + this fixes a bug where we wouldn't start "commit-graph verify" if the + config key was missing, which since 31b1de6a09 (commit-graph: turn on + commit-graph by default, 2019-08-13) has meant that it's on by + default. - Replace git_config_get_bool("core.commitgraph") in fsck with the - equivalent call to the_repository->settings.core_commit_graph. - - The expected behavior is that fsck respects the config value when it is - set, but uses the default value when it is unset. For example, for - core.commitGraph, there are three cases: - - - Config value is set to true -> enabled - - Config value is set to false -> disabled - - Config value is unset -> enabled - - As such, tests cover all three cases. + Add tests to "t5318-commit-graph.sh" to check that this works, using + the detection of a corrupt commit-graph as a method of seeing if + "commit-graph verify" was invoked. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> + Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## builtin/fsck.c ## @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix) 2: 9d6813bc280 ! 2: d11209ebfe1 fsck: verify multi-pack-index when implictly enabled @@ Commit message Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> + Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## builtin/fsck.c ## @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix) @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify incorrect offset' ' "incorrect object offset" \ - "git -c core.multipackindex=true fsck" + "git -c core.multiPackIndex=true fsck" && -+ test_unconfig core.multiPackIndex && + test_must_fail git fsck && + git -c core.multiPackIndex=false fsck ' 3: 260f46568c6 ! 3: 017e2003e85 gc: perform incremental repack when implictly enabled @@ Commit message Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> + Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## builtin/gc.c ## @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts) @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto +} + +test_expect_success 'maintenance.incremental-repack.auto' ' -+ rm -rf incremental-repack-true && + git init incremental-repack-true && + ( + cd incremental-repack-true && @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto +' + +test_expect_success 'maintenance.incremental-repack.auto (when config is unset)' ' -+ rm -rf incremental-repack-unset && + git init incremental-repack-unset && + ( + cd incremental-repack-unset && -: ----------- > 4: 04d1527f180 WIP gc/maintenance: just call prepare_repo_settings() earlier -- >8 -- Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier Consolidate the various calls to prepare_repo_settings() to happen at the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit intentionally breaks things, we seem to be lacking test coverage for cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13)? Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/gc.c | 5 +---- t/t5318-commit-graph.sh | 2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index 26709311601..f59dbedc1fe 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix) clean_pack_garbage(); } - prepare_repo_settings(the_repository); if (the_repository->settings.gc_write_commit_graph == 1) write_commit_graph_reachable(the_repository->objects->odb, !quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0, @@ -860,7 +859,6 @@ static int run_write_commit_graph(struct maintenance_run_opts *opts) static int maintenance_task_commit_graph(struct maintenance_run_opts *opts) { - prepare_repo_settings(the_repository); if (!the_repository->settings.core_commit_graph) return 0; @@ -1052,7 +1050,6 @@ static int incremental_repack_auto_condition(void) int incremental_repack_auto_limit = 10; int count = 0; - prepare_repo_settings(the_repository); if (!the_repository->settings.core_multi_pack_index) return 0; @@ -1167,7 +1164,6 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts) static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts) { - prepare_repo_settings(the_repository); if (!the_repository->settings.core_multi_pack_index) { warning(_("skipping incremental-repack task because core.multiPackIndex is disabled")); return 0; @@ -2492,6 +2488,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix) (argc == 2 && !strcmp(argv[1], "-h"))) usage(builtin_maintenance_usage); + prepare_repo_settings(the_repository); if (!strcmp(argv[1], "run")) return maintenance_run(argc - 1, argv + 1, prefix); if (!strcmp(argv[1], "start")) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4e4450af1ef..98123191e1e 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -368,7 +368,7 @@ test_expect_success 'check that gc computes commit-graph' ' git commit-graph write --reachable && cp $objdir/info/commit-graph commit-graph-before-gc && git reset --hard HEAD~1 && - git config gc.writeCommitGraph true && + # BUG: Due to 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13)? git gc && cp $objdir/info/commit-graph commit-graph-after-gc && ! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
On 10/5/2021 7:57 AM, Ævar Arnfjörð Bjarmason wrote: > Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier > > Consolidate the various calls to prepare_repo_settings() to happen at > the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit > intentionally breaks things, we seem to be lacking test coverage for > cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on > commit-graph by default, 2019-08-13)? > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > builtin/gc.c | 5 +---- > t/t5318-commit-graph.sh | 2 +- > 2 files changed, 2 insertions(+), 5 deletions(-) > > diff --git a/builtin/gc.c b/builtin/gc.c > index 26709311601..f59dbedc1fe 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > clean_pack_garbage(); > } > > - prepare_repo_settings(the_repository); > if (the_repository->settings.gc_write_commit_graph == 1) I think that removing these calls is dangerous. prepare_repo_settings() already returns immediately if the repository already has its settings populated. The pattern of "call prepare before using a setting" is a safe way to future-proof the check from a movement of the call. Putting that potential-future-problem aside, I don't see how this change is a _benefit_ other than fewer lines of code, which is not a quality measure in itself. Thanks, -Stolee
On Tue, Oct 05 2021, Derrick Stolee wrote: > On 10/5/2021 7:57 AM, Ævar Arnfjörð Bjarmason wrote: > >> Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier >> >> Consolidate the various calls to prepare_repo_settings() to happen at >> the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit >> intentionally breaks things, we seem to be lacking test coverage for >> cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on >> commit-graph by default, 2019-08-13)? >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> builtin/gc.c | 5 +---- >> t/t5318-commit-graph.sh | 2 +- >> 2 files changed, 2 insertions(+), 5 deletions(-) >> >> diff --git a/builtin/gc.c b/builtin/gc.c >> index 26709311601..f59dbedc1fe 100644 >> --- a/builtin/gc.c >> +++ b/builtin/gc.c >> @@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix) >> clean_pack_garbage(); >> } >> >> - prepare_repo_settings(the_repository); >> if (the_repository->settings.gc_write_commit_graph == 1) > > I think that removing these calls is dangerous. prepare_repo_settings() > already returns immediately if the repository already has its settings > populated. The pattern of "call prepare before using a setting" is a > safe way to future-proof the check from a movement of the call. > > Putting that potential-future-problem aside, I don't see how this > change is a _benefit_ other than fewer lines of code, which is not a > quality measure in itself. Very dangerous, yes :) As I noted this hunk intentionaly breaks the "git gc" command. Search for "I wondered if we could just call prepare_repo_settings[...]" upthread. I.e. I was commenting on how Glen's patch makes an attempt to test this tri-state config for some cases, but not others, and that in this case we can break "git gc" without any existing test catching it. Which as noted might have started happening when you flipped the core.commitGraph default, but I didn't test or confirm that, it just seemed the most likely place to start digging from a quick "git log -G...". Anyway, as also noted I think this series is fine as-is, just some feedback in case Glen or anyone would be interested in either doing a v4 or a follow-up, i.e. when I tested this I found that some code/tests in this area were (probably) either fragile or already broken.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I've looked this over carefully, it LGTM. I noticed some bugs in this > area, but they pre-date your commits. I've pushed a > "gc/use-repo-settings" to https://github.com/avar/git.git that you can > check out, consider that branch partially as commentary, but feel free > to pick/squash anything you find there. Your review is very thoughtful, thank you :) > * This is somewhat of a case of throwing rocks from a glass house, but > I think the commit messages could really be shorter/get to the > point. I.e. say right away what's being changed and why, instead of > giving the reader an overview of the whole control flow. > > I just adjusted adusted 1/3 in that direction below, I think 2/3 > could do with being squashed into it & adjusted, it's really the > exact same bug/improvement, just with a different config key that > behaves the same. In hindsight, I think this is how I should have initially structured my commits because this improves the signal-noise ratio. I'll keep it in mind in the future for sure. I'm hesitant to send a v4, but I can be persuaded to do so :) > * The "test_unconfig" in your 2/3 is redundant, so is the "rm -rf" of > directories that can't exist by that point in 3/3. These are redundant, but they serve a defensive purpose. By being extra defensive, I was hoping to make it clear to the reader that the test is *guaranteed* not to conflict with/rely on previous state and reduce the number of greps needed. > * You use corrupting the graph as a shorthand for "did we run this > command?". See test_region for an approach that might be better, > i.e. just log with trace2 and grep that to see if we started the > relevant command. The test_region helper can't be used as-is for that > (you could manually grep the emitted PERF/EVENT output), but perhaps > that's easier/more reliable? My intention here was not to assert "did git fsck call the function I expected?", but something more along the lines of "did git fsck catch the corrupted commit-graph?". Calling the function seems like an implementation detail, but the actual desired behavior is that git fsck catches the breakage. Perhaps the test title should be reflect what I meant, like - git fsck (checks commit-graph when config set to true) + git fsck detects corrupted commit-graph > * This is again, something that pre-dates your patches, but I think > following the "is enabled?" behavior here for these variables in "git > fsck" is highly questionable. So just some thoughts after having > looked at that: I think it depends on how we think about core.commitGraph (and others). Do we want Git to be able to pretend that commit-graph doesn't exist at all? If so, then we should skip the check in git fsck. Alternatively, is commit-graph actually part of Git's core offering? If so, core.commitGraph is more like a way to opt *out* of some commit-graph behavior, but fsck should still verify our core data structures. Given that we default to true, it sounds like more of the latter, but I'm not sure what use case this serves. > Just because your config doen't say "use the graph" doesn't mean > another process won't do so, and if the graph is broken they might > probably encounter an error. While that's true, I think that the user assumes a certain degree of risk if they are using different commands with different config values. I don't think it's unreasonable to say that "git -c core.commitGraph=false fsck" won't catch issues that will arise in "git -c core.commitGraph=true foo". > But in general for these auxiliary files (commit-graph, midix, *.rev > etc.) I think there's a good argument to be made that fsck should > *always* check them, and at most use the config to say something > like: > > Hey, you've got a 100% broken commit-graph/midx/rev in your .git, > looks like the core.XYZ enabling it is off *right now* though, so > maybe we won't use it. I hope you're feeling lucky... I think "always check" sounds like the best way to maximize visibility for users, though I don't think core.commitGraph should control the severity of the warning/error. If we pursue "always check", I think we could adopt fsck.<msg-id> instead.
Glen Choo <chooglen@google.com> writes: > Hi everyone! This patch was created in response to something we observed in > Google, where fsck failed to detect that the commit graph was invalid. We > initially assumed that fsck never checked the commit graph, but it turns out > that it does so only when core.commitgraph is set, even though we set defaults > for "whether to use the commit graph" in the repository settings. With this merged to 'seen', the CI job with the extra set of GIT_TEST_X settings fail. When this topic is excluded, with all the other topics in flight in 'seen', everything seems to be OK. For which GIT_TEST_X environment variables to set and export while testing to trigger the problem, see [*1*] For a successful test run of 'seen' without this topic, see [*2*] For the test log of the failing run with this topic, see [*3*]; you'd need to be logged into GitHub to see the details of the errors (e.g. click on "regular (linux-gcc...)" with red X sign on the left hand side, then open "Run ci/print-test-failures.sh" and look for "not ok"). [References] *1* https://github.com/git/git/runs/3843549095?check_suite_focus=true#step:4:1677 export GIT_TEST_SPLIT_INDEX=yes export GIT_TEST_MERGE_ALGORITHM=recursive export GIT_TEST_FULL_IN_PACK_ARRAY=true export GIT_TEST_OE_SIZE=10 export GIT_TEST_OE_DELTA_SIZE=5 export GIT_TEST_COMMIT_GRAPH=1 export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1 export GIT_TEST_MULTI_PACK_INDEX=1 export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 export GIT_TEST_ADD_I_USE_BUILTIN=1 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master export GIT_TEST_WRITE_REV_INDEX=1 export GIT_TEST_CHECKOUT_WORKERS=2 *2* https://github.com/git/git/actions/runs/1322907901 (feff65d) A successful CI run of 'seen' without gc/use-repo-settings. *3* https://github.com/git/git/actions/runs/1322842689 (54a31af) CI run of 'seen' with gc/use-repo-settings that fails. The commits that is in the failing 'seen' but not in the succeeding tree are those from this topic, as can be seen here: $ git shortlog --no-merges 54a31af ^feff65d Glen Choo (3): fsck: verify commit graph when implicitly enabled fsck: verify multi-pack-index when implictly enabled gc: perform incremental repack when implictly enabled
Junio C Hamano <gitster@pobox.com> writes: > With this merged to 'seen', the CI job with the extra set of > GIT_TEST_X settings fail. When this topic is excluded, with > all the other topics in flight in 'seen', everything seems to > be OK. Thanks for the feedback, I did two CI runs on my fork in response: * [1] uses the same tree as seen. Looks like osx-clang passes but linux-gcc fails. * [2] is rebased onto master. Again, osx-clang passes but linux-gcc fails. This should be enough for me to hunt down the issue, though I would appreciate any hints if anyone has any :) [1] https://github.com/chooglen/git/runs/3862097340?check_suite_focus=true [2] https://github.com/chooglen/git/runs/3862670035?check_suite_focus=true
Glen Choo <chooglen@google.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> With this merged to 'seen', the CI job with the extra set of >> GIT_TEST_X settings fail. When this topic is excluded, with >> all the other topics in flight in 'seen', everything seems to >> be OK. > > Thanks for the feedback, I did two CI runs on my fork in response: > > * [1] uses the same tree as seen. Looks like osx-clang passes but > linux-gcc fails. > * [2] is rebased onto master. Again, osx-clang passes but linux-gcc > fails. > > This should be enough for me to hunt down the issue, though I would > appreciate any hints if anyone has any :) Using the set of GIT_TEST_FOO... settings I cited in the Reference#1, in the message you are responding to, will let you reproduce the issue locally on glinux boxes.
Junio C Hamano <gitster@pobox.com> writes: > Using the set of GIT_TEST_FOO... settings I cited in the > Reference#1, in the message you are responding to, will let you > reproduce the issue locally on glinux boxes. Ah, I didn't notice that ci/run-build-and-tests.sh uses different settings for different environments. With those settings, I can reproduce the failure on my OS X machine. Thanks so much!