mbox series

[v3,0/3] Use default values from settings instead of config

Message ID 20211005001931.13932-1-chooglen@google.com (mailing list archive)
Headers show
Series Use default values from settings instead of config | expand

Message

Glen Choo Oct. 5, 2021, 12:19 a.m. UTC
Hi everyone! This patch was created in response to something we observed in
Google, where fsck failed to detect that the commit graph was invalid. We
initially assumed that fsck never checked the commit graph, but it turns out
that it does so only when core.commitgraph is set, even though we set defaults
for "whether to use the commit graph" in the repository settings.

Instead of using the config, let's use repository settings where
available. Replace core.commitGraph and core.multiPackIndex with their
equivalent repository settings in fsck and gc.

I don't anticipate any more business logic changes and the previous comments
were focused on testing style, so hopefully this should be good to merge.

Changes since v2:
* Various small test fixes (thanks Eric!). Most notably, I've used -c instead of
  test_config because test_config can affect the values in subsequent tests.
* Rewording fix in patch 3 commit message
* Refactor tests in patch 3 so that we use a single helper function instead of
  copy-pasted code

Changes since v1:
* clean up typo in patch 1 commit message 
* document the commits that patches 1 and 2 address
* use test helpers in patch 1
* rewrite patch 2's tests so that it's easier to tell that each test
  does something different
* reword patch 3 commit message to explain the bug
* add tests to patch 3

Glen Choo (3):
  fsck: verify commit graph when implicitly enabled
  fsck: verify multi-pack-index when implictly enabled
  gc: perform incremental repack when implictly enabled

 builtin/fsck.c              |  5 +++--
 builtin/gc.c                |  5 ++---
 t/t5318-commit-graph.sh     | 23 ++++++++++++++++++++++-
 t/t5319-multi-pack-index.sh |  5 ++++-
 t/t7900-maintenance.sh      | 28 ++++++++++++++++++++++++----
 5 files changed, 55 insertions(+), 11 deletions(-)

Range-diff against v2:
1:  97ab2bb529 ! 1:  2f9ff949e6 fsck: verify commit graph when implicitly enabled
    @@ t/t5318-commit-graph.sh: test_expect_success 'detect incorrect chunk count' '
      	cd "$TRASH_DIRECTORY/full" &&
      	git fsck &&
      	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
    -@@ t/t5318-commit-graph.sh: test_expect_success 'git fsck (checks commit-graph)' '
    - 	test_must_fail git fsck
    - '
    - 
    + 		"incorrect checksum" &&
    + 	cp commit-graph-pre-write-test $objdir/info/commit-graph &&
    ++	test_must_fail git -c core.commitGraph=true fsck
    ++'
    ++
     +test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
     +	cd "$TRASH_DIRECTORY/full" &&
     +	git fsck &&
     +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
     +		"incorrect checksum" &&
     +	cp commit-graph-pre-write-test $objdir/info/commit-graph &&
    -+	test_config core.commitGraph false &&
    -+	git fsck
    ++	git -c core.commitGraph=false fsck
     +'
     +
     +test_expect_success 'git fsck (checks commit-graph when config unset)' '
    ++	cd "$TRASH_DIRECTORY/full" &&
     +	test_when_finished "git config core.commitGraph true" &&
     +
    -+	cd "$TRASH_DIRECTORY/full" &&
     +	git fsck &&
     +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
     +		"incorrect checksum" &&
     +	test_unconfig core.commitGraph &&
     +	cp commit-graph-pre-write-test $objdir/info/commit-graph &&
    -+	test_must_fail git fsck
    -+'
    -+
    - test_expect_success 'setup non-the_repository tests' '
    - 	rm -rf repo &&
    - 	git init repo &&
    + 	test_must_fail git fsck
    + '
    + 
2:  ace92453ca ! 2:  b13ca2a695 fsck: verify multi-pack-index when implictly enabled
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify incorrect offset' '
      	corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\377" $objdir \
      		"incorrect object offset" \
     -		"git -c core.multipackindex=true fsck"
    -+		"git -c core.multipackindex=true fsck" &&
    -+		test_must_fail git fsck &&
    -+		git -c core.multipackindex=false fsck
    ++		"git -c core.multiPackIndex=true fsck" &&
    ++	test_unconfig core.multiPackIndex &&
    ++	test_must_fail git fsck &&
    ++	git -c core.multiPackIndex=false fsck
      '
      
      test_expect_success 'corrupt MIDX is not reused' '
3:  d6959d61c1 < -:  ---------- gc: perform incremental repack when implictly enabled
-:  ---------- > 3:  76943aff80 gc: perform incremental repack when implictly enabled

Comments

Ævar Arnfjörð Bjarmason Oct. 5, 2021, 11:57 a.m. UTC | #1
On Mon, Oct 04 2021, Glen Choo wrote:

> Hi everyone! This patch was created in response to something we observed in
> Google, where fsck failed to detect that the commit graph was invalid. We
> initially assumed that fsck never checked the commit graph, but it turns out
> that it does so only when core.commitgraph is set, even though we set defaults
> for "whether to use the commit graph" in the repository settings.
>
> Instead of using the config, let's use repository settings where
> available. Replace core.commitGraph and core.multiPackIndex with their
> equivalent repository settings in fsck and gc.
>
> I don't anticipate any more business logic changes and the previous comments
> were focused on testing style, so hopefully this should be good to merge.
>
> Changes since v2:
> * Various small test fixes (thanks Eric!). Most notably, I've used -c instead of
>   test_config because test_config can affect the values in subsequent tests.
> * Rewording fix in patch 3 commit message
> * Refactor tests in patch 3 so that we use a single helper function instead of
>   copy-pasted code
>
> Changes since v1:
> * clean up typo in patch 1 commit message 
> * document the commits that patches 1 and 2 address
> * use test helpers in patch 1
> * rewrite patch 2's tests so that it's easier to tell that each test
>   does something different
> * reword patch 3 commit message to explain the bug
> * add tests to patch 3
>
> Glen Choo (3):
>   fsck: verify commit graph when implicitly enabled
>   fsck: verify multi-pack-index when implictly enabled
>   gc: perform incremental repack when implictly enabled

I've looked this over carefully, it LGTM. I noticed some bugs in this
area, but they pre-date your commits. I've pushed a
"gc/use-repo-settings" to https://github.com/avar/git.git that you can
check out, consider that branch partially as commentary, but feel free
to pick/squash anything you find there.

The range-diff for that is quoted below. Notes:

 * This is somewhat of a case of throwing rocks from a glass house, but
   I think the commit messages could really be shorter/get to the
   point. I.e. say right away what's being changed and why, instead of
   giving the reader an overview of the whole control flow.

   I just adjusted adusted 1/3 in that direction below, I think 2/3
   could do with being squashed into it & adjusted, it's really the
   exact same bug/improvement, just with a different config key that
   behaves the same.

 * The "test_unconfig" in your 2/3 is redundant, so is the "rm -rf" of
   directories that can't exist by that point in 3/3.

 * I wondered if we could just call prepare_repo_settings() in one place
   earlier in gc/maintenance instead of adding yet another call deeper
   in the control flow. I've got a 4/3 below that tries to do that, and
   (as noted in its commit message) intentionally breaks "gc.c" in a way
   that our tests don't catch, which is perhaps fallout from
   31b1de6a09b?

   So that's very much on the topic of extra improvements, i.e. your
   patches don't break that anymore than it is now, but I for one would
   very much like to see an end-state where we're confident that
   gc/maintenance do the right thing on each of the tri-states of these
   boolean variables.

   It's also a point about how to write reliable tests (again,
   pre-dating your series). I.e. instead of asserting "we do X when Y is
   true" have the test assert all of Y=[unset|false|true] in a way that
   breaks if any are different than expected.

   One of the tests you're changing has a "graph_git_two_modes" helper
   does 2/3 of that, maybe it would be easy to extend it?

 * You use corrupting the graph as a shorthand for "did we run this
   command?". See test_region for an approach that might be better,
   i.e. just log with trace2 and grep that to see if we started the
   relevant command. The test_region helper can't be used as-is for that
   (you could manually grep the emitted PERF/EVENT output), but perhaps
   that's easier/more reliable?

 * This is again, something that pre-dates your patches, but I think
   following the "is enabled?" behavior here for these variables in "git
   fsck" is highly questionable. So just some thoughts after having
   looked at that:

   Just because your config doen't say "use the graph" doesn't mean
   another process won't do so, and if the graph is broken they might
   probably encounter an error.

   Although I think we *should* have cleaned up those being fatal a
   while ago). See 43d35618055 (commit-graph write: don't die if the
   existing graph is corrupt, 2019-03-25).

   But in general for these auxiliary files (commit-graph, midix, *.rev
   etc.) I think there's a good argument to be made that fsck should
   *always* check them, and at most use the config to say something
   like:

       Hey, you've got a 100% broken commit-graph/midx/rev in your .git,
       looks like the core.XYZ enabling it is off *right now* though, so
       maybe we won't use it.  I hope you're feeling lucky...

   :)


1:  fd9a58e2c7b ! 1:  c6041f3b633 fsck: verify commit graph when implicitly enabled
    @@ Metadata
      ## Commit message ##
         fsck: verify commit graph when implicitly enabled
     
    -    the_repository->settings is the preferred way to get certain
    -    settings (such as "core.commitGraph") because it gets default values
    -    from prepare_repo_settings(). However, cmd_fsck() reads the config
    -    directly via git_config_get_bool(), which bypasses these default values.
    -    This causes fsck to ignore the commit graph if "core.commitgraph" is not
    -    explicitly set in the config. This worked fine until commit-graph was
    -    enabled by default in 31b1de6a09 (commit-graph: turn on commit-graph by
    -    default, 2019-08-13).
    +    Change fsck to check the "core_commit_graph" variable set in
    +    "repo-settings.c" instead of reading the "core.commitGraph" variable,
    +    this fixes a bug where we wouldn't start "commit-graph verify" if the
    +    config key was missing, which since 31b1de6a09 (commit-graph: turn on
    +    commit-graph by default, 2019-08-13) has meant that it's on by
    +    default.
     
    -    Replace git_config_get_bool("core.commitgraph") in fsck with the
    -    equivalent call to the_repository->settings.core_commit_graph.
    -
    -    The expected behavior is that fsck respects the config value when it is
    -    set, but uses the default value when it is unset. For example, for
    -    core.commitGraph, there are three cases:
    -
    -    - Config value is set to true -> enabled
    -    - Config value is set to false -> disabled
    -    - Config value is unset -> enabled
    -
    -    As such, tests cover all three cases.
    +    Add tests to "t5318-commit-graph.sh" to check that this works, using
    +    the detection of a corrupt commit-graph as a method of seeing if
    +    "commit-graph verify" was invoked.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
2:  9d6813bc280 ! 2:  d11209ebfe1 fsck: verify multi-pack-index when implictly enabled
    @@ Commit message
     
         Signed-off-by: Glen Choo <chooglen@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify incorrect offset' '
      		"incorrect object offset" \
     -		"git -c core.multipackindex=true fsck"
     +		"git -c core.multiPackIndex=true fsck" &&
    -+	test_unconfig core.multiPackIndex &&
     +	test_must_fail git fsck &&
     +	git -c core.multiPackIndex=false fsck
      '
3:  260f46568c6 ! 3:  017e2003e85 gc: perform incremental repack when implictly enabled
    @@ Commit message
     
         Signed-off-by: Glen Choo <chooglen@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/gc.c ##
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +}
     +
     +test_expect_success 'maintenance.incremental-repack.auto' '
    -+	rm -rf incremental-repack-true &&
     +	git init incremental-repack-true &&
     +	(
     +		cd incremental-repack-true &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +'
     +
     +test_expect_success 'maintenance.incremental-repack.auto (when config is unset)' '
    -+	rm -rf incremental-repack-unset &&
     +	git init incremental-repack-unset &&
     +	(
     +		cd incremental-repack-unset &&
-:  ----------- > 4:  04d1527f180 WIP gc/maintenance: just call prepare_repo_settings() earlier

-- >8 --
Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier

Consolidate the various calls to prepare_repo_settings() to happen at
the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit
intentionally breaks things, we seem to be lacking test coverage for
cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on
commit-graph by default, 2019-08-13)?

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/gc.c            | 5 +----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 26709311601..f59dbedc1fe 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	prepare_repo_settings(the_repository);
 	if (the_repository->settings.gc_write_commit_graph == 1)
 		write_commit_graph_reachable(the_repository->objects->odb,
 					     !quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
@@ -860,7 +859,6 @@ static int run_write_commit_graph(struct maintenance_run_opts *opts)
 
 static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
 {
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_commit_graph)
 		return 0;
 
@@ -1052,7 +1050,6 @@ static int incremental_repack_auto_condition(void)
 	int incremental_repack_auto_limit = 10;
 	int count = 0;
 
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_multi_pack_index)
 		return 0;
 
@@ -1167,7 +1164,6 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 
 static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts)
 {
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_multi_pack_index) {
 		warning(_("skipping incremental-repack task because core.multiPackIndex is disabled"));
 		return 0;
@@ -2492,6 +2488,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	    (argc == 2 && !strcmp(argv[1], "-h")))
 		usage(builtin_maintenance_usage);
 
+	prepare_repo_settings(the_repository);
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
 	if (!strcmp(argv[1], "start"))
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4e4450af1ef..98123191e1e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -368,7 +368,7 @@ test_expect_success 'check that gc computes commit-graph' '
 	git commit-graph write --reachable &&
 	cp $objdir/info/commit-graph commit-graph-before-gc &&
 	git reset --hard HEAD~1 &&
-	git config gc.writeCommitGraph true &&
+	# BUG: Due to 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13)?
 	git gc &&
 	cp $objdir/info/commit-graph commit-graph-after-gc &&
 	! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
Derrick Stolee Oct. 5, 2021, 5:43 p.m. UTC | #2
On 10/5/2021 7:57 AM, Ævar Arnfjörð Bjarmason wrote:

> Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier
> 
> Consolidate the various calls to prepare_repo_settings() to happen at
> the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit
> intentionally breaks things, we seem to be lacking test coverage for
> cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on
> commit-graph by default, 2019-08-13)?
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/gc.c            | 5 +----
>  t/t5318-commit-graph.sh | 2 +-
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 26709311601..f59dbedc1fe 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  		clean_pack_garbage();
>  	}
>  
> -	prepare_repo_settings(the_repository);
>  	if (the_repository->settings.gc_write_commit_graph == 1)

I think that removing these calls is dangerous. prepare_repo_settings()
already returns immediately if the repository already has its settings
populated. The pattern of "call prepare before using a setting" is a
safe way to future-proof the check from a movement of the call.

Putting that potential-future-problem aside, I don't see how this
change is a _benefit_ other than fewer lines of code, which is not a
quality measure in itself.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason Oct. 5, 2021, 7:10 p.m. UTC | #3
On Tue, Oct 05 2021, Derrick Stolee wrote:

> On 10/5/2021 7:57 AM, Ævar Arnfjörð Bjarmason wrote:
>
>> Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier
>> 
>> Consolidate the various calls to prepare_repo_settings() to happen at
>> the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit
>> intentionally breaks things, we seem to be lacking test coverage for
>> cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on
>> commit-graph by default, 2019-08-13)?
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  builtin/gc.c            | 5 +----
>>  t/t5318-commit-graph.sh | 2 +-
>>  2 files changed, 2 insertions(+), 5 deletions(-)
>> 
>> diff --git a/builtin/gc.c b/builtin/gc.c
>> index 26709311601..f59dbedc1fe 100644
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>>  		clean_pack_garbage();
>>  	}
>>  
>> -	prepare_repo_settings(the_repository);
>>  	if (the_repository->settings.gc_write_commit_graph == 1)
>
> I think that removing these calls is dangerous. prepare_repo_settings()
> already returns immediately if the repository already has its settings
> populated. The pattern of "call prepare before using a setting" is a
> safe way to future-proof the check from a movement of the call.
>
> Putting that potential-future-problem aside, I don't see how this
> change is a _benefit_ other than fewer lines of code, which is not a
> quality measure in itself.

Very dangerous, yes :) As I noted this hunk intentionaly breaks the "git
gc" command. Search for "I wondered if we could just call
prepare_repo_settings[...]" upthread.

I.e. I was commenting on how Glen's patch makes an attempt to test this
tri-state config for some cases, but not others, and that in this case
we can break "git gc" without any existing test catching it.

Which as noted might have started happening when you flipped the
core.commitGraph default, but I didn't test or confirm that, it just
seemed the most likely place to start digging from a quick "git log
-G...".

Anyway, as also noted I think this series is fine as-is, just some
feedback in case Glen or anyone would be interested in either doing a v4
or a follow-up, i.e. when I tested this I found that some code/tests in
this area were (probably) either fragile or already broken.
Glen Choo Oct. 5, 2021, 10:25 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I've looked this over carefully, it LGTM. I noticed some bugs in this
> area, but they pre-date your commits. I've pushed a
> "gc/use-repo-settings" to https://github.com/avar/git.git that you can
> check out, consider that branch partially as commentary, but feel free
> to pick/squash anything you find there.

Your review is very thoughtful, thank you :)

>  * This is somewhat of a case of throwing rocks from a glass house, but
>    I think the commit messages could really be shorter/get to the
>    point. I.e. say right away what's being changed and why, instead of
>    giving the reader an overview of the whole control flow.
>
>    I just adjusted adusted 1/3 in that direction below, I think 2/3
>    could do with being squashed into it & adjusted, it's really the
>    exact same bug/improvement, just with a different config key that
>    behaves the same.

In hindsight, I think this is how I should have initially structured my
commits because this improves the signal-noise ratio. I'll keep it in
mind in the future for sure. I'm hesitant to send a v4, but I can be
persuaded to do so :)

>  * The "test_unconfig" in your 2/3 is redundant, so is the "rm -rf" of
>    directories that can't exist by that point in 3/3.

These are redundant, but they serve a defensive purpose. By being extra
defensive, I was hoping to make it clear to the reader that the test is
*guaranteed* not to conflict with/rely on previous state and reduce the
number of greps needed.

>  * You use corrupting the graph as a shorthand for "did we run this
>    command?". See test_region for an approach that might be better,
>    i.e. just log with trace2 and grep that to see if we started the
>    relevant command. The test_region helper can't be used as-is for that
>    (you could manually grep the emitted PERF/EVENT output), but perhaps
>    that's easier/more reliable?

My intention here was not to assert "did git fsck call the function I
expected?", but something more along the lines of "did git fsck catch the
corrupted commit-graph?". Calling the function seems like an
implementation detail, but the actual desired behavior is that git fsck
catches the breakage.

Perhaps the test title should be reflect what I meant, like

  - git fsck (checks commit-graph when config set to true)
  + git fsck detects corrupted commit-graph

>  * This is again, something that pre-dates your patches, but I think
>    following the "is enabled?" behavior here for these variables in "git
>    fsck" is highly questionable. So just some thoughts after having
>    looked at that:

I think it depends on how we think about core.commitGraph (and others).
Do we want Git to be able to pretend that commit-graph doesn't exist at
all? If so, then we should skip the check in git fsck. Alternatively, is
commit-graph actually part of Git's core offering? If so,
core.commitGraph is more like a way to opt *out* of some commit-graph
behavior, but fsck should still verify our core data structures. Given
that we default to true, it sounds like more of the latter, but I'm not
sure what use case this serves.

>    Just because your config doen't say "use the graph" doesn't mean
>    another process won't do so, and if the graph is broken they might
>    probably encounter an error.

While that's true, I think that the user assumes a certain degree of
risk if they are using different commands with different config values.
I don't think it's unreasonable to say that "git -c
core.commitGraph=false fsck" won't catch issues that will arise in "git
-c core.commitGraph=true foo".

>    But in general for these auxiliary files (commit-graph, midix, *.rev
>    etc.) I think there's a good argument to be made that fsck should
>    *always* check them, and at most use the config to say something
>    like:
>
>        Hey, you've got a 100% broken commit-graph/midx/rev in your .git,
>        looks like the core.XYZ enabling it is off *right now* though, so
>        maybe we won't use it.  I hope you're feeling lucky...

I think "always check" sounds like the best way to maximize visibility
for users, though I don't think core.commitGraph should control the
severity of the warning/error. If we pursue "always check", I think we
could adopt fsck.<msg-id> instead.
Junio C Hamano Oct. 9, 2021, 7:24 a.m. UTC | #5
Glen Choo <chooglen@google.com> writes:

> Hi everyone! This patch was created in response to something we observed in
> Google, where fsck failed to detect that the commit graph was invalid. We
> initially assumed that fsck never checked the commit graph, but it turns out
> that it does so only when core.commitgraph is set, even though we set defaults
> for "whether to use the commit graph" in the repository settings.

With this merged to 'seen', the CI job with the extra set of
GIT_TEST_X settings fail.  When this topic is excluded, with
all the other topics in flight in 'seen', everything seems to
be OK.

For which GIT_TEST_X environment variables to set and export while
testing to trigger the problem, see [*1*]

For a successful test run of 'seen' without this topic, see [*2*]

For the test log of the failing run with this topic, see [*3*];
you'd need to be logged into GitHub to see the details of the errors
(e.g. click on "regular (linux-gcc...)" with red X sign on the left
hand side, then open "Run ci/print-test-failures.sh" and look for
"not ok").


[References]

*1* https://github.com/git/git/runs/3843549095?check_suite_focus=true#step:4:1677

    export GIT_TEST_SPLIT_INDEX=yes
    export GIT_TEST_MERGE_ALGORITHM=recursive
    export GIT_TEST_FULL_IN_PACK_ARRAY=true
    export GIT_TEST_OE_SIZE=10
    export GIT_TEST_OE_DELTA_SIZE=5
    export GIT_TEST_COMMIT_GRAPH=1
    export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
    export GIT_TEST_MULTI_PACK_INDEX=1
    export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
    export GIT_TEST_ADD_I_USE_BUILTIN=1
    export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
    export GIT_TEST_WRITE_REV_INDEX=1
    export GIT_TEST_CHECKOUT_WORKERS=2

*2* https://github.com/git/git/actions/runs/1322907901 (feff65d)

    A successful CI run of 'seen' without gc/use-repo-settings.

*3* https://github.com/git/git/actions/runs/1322842689 (54a31af)

    CI run of 'seen' with gc/use-repo-settings that fails.

    The commits that is in the failing 'seen' but not in the
    succeeding tree are those from this topic, as can be seen here:

    $ git shortlog --no-merges 54a31af ^feff65d
    Glen Choo (3):
          fsck: verify commit graph when implicitly enabled
          fsck: verify multi-pack-index when implictly enabled
          gc: perform incremental repack when implictly enabled
Glen Choo Oct. 11, 2021, 7:58 p.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> With this merged to 'seen', the CI job with the extra set of
> GIT_TEST_X settings fail.  When this topic is excluded, with
> all the other topics in flight in 'seen', everything seems to
> be OK.

Thanks for the feedback, I did two CI runs on my fork in response:

* [1] uses the same tree as seen. Looks like osx-clang passes but
  linux-gcc fails.
* [2] is rebased onto master. Again, osx-clang passes but linux-gcc
  fails.

This should be enough for me to hunt down the issue, though I would
appreciate any hints if anyone has any :)

[1] https://github.com/chooglen/git/runs/3862097340?check_suite_focus=true
[2] https://github.com/chooglen/git/runs/3862670035?check_suite_focus=true
Junio C Hamano Oct. 11, 2021, 8:08 p.m. UTC | #7
Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> With this merged to 'seen', the CI job with the extra set of
>> GIT_TEST_X settings fail.  When this topic is excluded, with
>> all the other topics in flight in 'seen', everything seems to
>> be OK.
>
> Thanks for the feedback, I did two CI runs on my fork in response:
>
> * [1] uses the same tree as seen. Looks like osx-clang passes but
>   linux-gcc fails.
> * [2] is rebased onto master. Again, osx-clang passes but linux-gcc
>   fails.
>
> This should be enough for me to hunt down the issue, though I would
> appreciate any hints if anyone has any :)

Using the set of GIT_TEST_FOO... settings I cited in the
Reference#1, in the message you are responding to, will let you
reproduce the issue locally on glinux boxes.
Glen Choo Oct. 11, 2021, 8:48 p.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:

> Using the set of GIT_TEST_FOO... settings I cited in the
> Reference#1, in the message you are responding to, will let you
> reproduce the issue locally on glinux boxes.

Ah, I didn't notice that ci/run-build-and-tests.sh uses different
settings for different environments. With those settings, I can
reproduce the failure on my OS X machine.

Thanks so much!