diff mbox series

[RFC] repo-settings: set defaults even when not in a repo

Message ID 1b27e0b115f858a422e0a2891688227be8f3db01.1648055915.git.steadmon@google.com (mailing list archive)
State New, archived
Headers show
Series [RFC] repo-settings: set defaults even when not in a repo | expand

Commit Message

Josh Steadmon March 23, 2022, 6:03 p.m. UTC
prepare_repo_settings() initializes a `struct repository` with various
default config options and settings read from a repository-local config
file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
in git repos), prepare_repo_settings was changed to issue a BUG() if it
is called by a process whose CWD is not a Git repository. This approach
was suggested in [1].

This breaks fuzz-commit-graph, which attempts to parse arbitrary
fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.

Fix this by refactoring prepare_repo_settings() such that it sets
default options unconditionally; if its process is in a Git repository,
it will also load settings from the local config. This eliminates the
need for a BUG() when not in a repository.

Concerns:

Are any callers strictly dependent on having a BUG() here? I suspect
that the worst that would happen is that rather than this BUG(), the
caller would later hit its own BUG() or die(), so I do not think this is
a blocker. Additionally, every builtin that directly calls
prepare_repo_settings is either marked as RUN_SETUP, which means we
would die() prior to calling it anyway, or checks on its own before
calling it (builtin/diff.c). There are several callers in library code,
though, and I have not tracked down how all of those are used.

Alternatives considered:

Setting up a valid repository for fuzz testing would avoid triggering
this bug, but would unacceptably slow down the test cases.

Refactoring parse_commit_graph() in such a way that the fuzz test has an
alternate entry point that avoids calling prepare_repo_settings() might
be possible, but would be a much larger change than this one. It would
also run the risk that the changes would be so extensive that the fuzzer
would be merely testing its own custom commit-graph implementation,
rather than the one that's actually used in the real world.

[1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 repo-settings.c | 111 ++++++++++++++++++++++++------------------------
 1 file changed, 55 insertions(+), 56 deletions(-)


base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7

Comments

Derrick Stolee March 23, 2022, 7:22 p.m. UTC | #1
On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
> 
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
> 
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.

I think you have the right idea and this can work.
 
> -	/* Booleans config or default, cascades to other settings */
> -	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> -	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> +	if (r->gitdir) {
> +		/* Booleans config or default, cascades to other settings */
> +		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> +		repo_cfg_bool(r, "feature.experimental", &experimental, 0);

However, this giant if block is going to be a bit unwieldy,
especially as we add settings in the future.

Ignoring whitespace, this is your diff:

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..d154b37007 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
-
 	if (r->settings.initialized++)
 		return;
 
@@ -28,6 +25,7 @@ void prepare_repo_settings(struct repository *r)
 	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
 	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
 
+	if (r->gitdir) {
 		/* Booleans config or default, cascades to other settings */
 		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
 		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
@@ -50,16 +48,6 @@ void prepare_repo_settings(struct repository *r)
 		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
 		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);

(Adding the if)

-	/*
-	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
-	 * either it *or* the config sets
-	 * r->settings.core_multi_pack_index if true. We don't take
-	 * the environment variable if it exists (even if false) over
-	 * any config, as in most other cases.
-	 */
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		r->settings.core_multi_pack_index = 1;
-
 		/*
 		 * Non-boolean config
 		 */
@@ -93,6 +81,17 @@ void prepare_repo_settings(struct repository *r)
 			else
 				die("unknown fetch negotiation algorithm '%s'", strval);
 		}
+	}
+
+	/*
+	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
+	 * either it *or* the config sets
+	 * r->settings.core_multi_pack_index if true. We don't take
+	 * the environment variable if it exists (even if false) over
+	 * any config, as in most other cases.
+	 */
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+		r->settings.core_multi_pack_index = 1;
 
(Moving this to group with other no-dir settings.)

I think what you're really trying to do is

	if (r->gitdir)
		(Load settings that require a repo)

	(Load settings that work without a repo.)

I think that you'd be better off extracting the two types of config
into helper methods (prepare_gitdir_settings()/prepare_nodir_settings()?)
across two patches, then in a third removing the BUG and inserting the
if (r->gitdir) above.

Does that seem reasonable?

Thanks,
-Stolee
Taylor Blau March 23, 2022, 7:52 p.m. UTC | #2
On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > prepare_repo_settings() initializes a `struct repository` with various
> > default config options and settings read from a repository-local config
> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > is called by a process whose CWD is not a Git repository. This approach
> > was suggested in [1].
> >
> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > fuzzing-engine-provided bytes as a commit graph file.
> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > since we run the fuzz tests without a valid repository, we are hitting
> > the BUG() from 44c7e62 for every test case.
> >
> > Fix this by refactoring prepare_repo_settings() such that it sets
> > default options unconditionally; if its process is in a Git repository,
> > it will also load settings from the local config. This eliminates the
> > need for a BUG() when not in a repository.
>
> I think you have the right idea and this can work.

Hmmm. To me this feels like bending over backwards in
`prepare_repo_settings()` to accommodate one particular caller. I'm not
necessarily opposed to it, but it does feel strange to make
`prepare_repo_settings()` a noop here, since I would expect that any
callers who do want to call `prepare_repo_settings()` are likely
convinced that they are inside of a repository, and it probably should
be a BUG() if they aren't.

I was initially thinking that Josh's alternative of introducing and
calling a lower-level version of `prepare_commit_graph()` that doesn't
require the use of any repository settings would make sense.

But when I started looking at implementing it, I became confused at how
this is supposed to work at all without using a repository. We depend on
the values parsed from:

  - commitGraph.generationVersion
  - commitGraph.readChangedPaths

to determine which part(s) of the commit graph we do and don't read.

Looking around, I think I probably inadvertently broke this in
ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
2020-09-09). But prior to ab14d0676c, neither of those settings existed,
so parsing the commit graph was a pure function of the commit graph's
contents alone, and didn't rely on the existence of a repository.

We could pretend as if `commitGraph.generationVersion` is always "2" and
`commitGraph.readChangedPaths` is always "true", and I think that would
still give us good-enough coverage.

I assume that libFuzzer doesn't give us a convenient way to stub out a
repository. Maybe we should have a variant of `parse_commit_graph` that
instead takes a `struct repo_settings` filled out with the defaults?

We could use that to teach libFuzzer about additional states that the
commit graph parser can be in, too (though probably outside the scope of
this patch).

I tried to sketch all of this out, which seems to work. Let me know what
you think:

--- 8< ---

diff --git a/commit-graph.c b/commit-graph.c
index adffd020dd..3d5aa536c2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);

-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,

 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}

-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();

 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,7 +400,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);

-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
@@ -412,7 +410,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 			graph->read_generation_data = 1;
 	}

-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2299,7 +2297,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;

 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 2e3ac35237..1fdca7a927 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);

 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..07ef4643d8 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;

 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	g = parse_commit_graph_settings(the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..a7e5d10b27 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }

+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -43,6 +50,7 @@ void prepare_repo_settings(struct repository *r)

 	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
diff --git a/repository.h b/repository.h
index e29f361703..6c40ea3f1b 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;

 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

--- >8 ---

Thanks,
Taylor
Victoria Dye March 23, 2022, 8:11 p.m. UTC | #3
Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
> 
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
> 
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.
> 

I think the decision of whether to go with this approach or the alternative
listed below depends on the validity of a 'repository' without a gitdir. 

As far as I can tell, there is an implicit conflict between the changes in:

1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
   2020-09-09)
2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
   2021-12-06) (as you pointed out in your message)

The former says that commit-graph should use a repository setting (implying
it needs a valid repository), and the latter says that you need a valid
gitdir to get repository settings.

So to me, how to proceed depends on whether a repository can be "valid"
without a gitdir or not:

* If it is valid (or not-completely-broken/semi-valid - i.e., not a BUG()),
  then your approach is probably fine - the 'BUG()' should be removed from
  'prepare_repo_settings()'. But, to make sure we don't run into this
  question again, 'prepare_repo_settings()' should be audited for any
  settings/setup that implicitly require a gitdir and updated accordingly,
  along with a comment somewhere saying that the 'gitdir' isn't required.
* If it is not valid, then this use in 'fuzz-commit-graph' tests is fragile
  (even if it's not a problem right now) - 'prepare_repo_settings()' could
  start needing the gitdir, or "the_hash_algo" could become invalid, or some
  other problem could arise from the invalid repo and the tests would break.

I don't have a good answer to this question, but someone with more
experience in this area might be able to help.

> Concerns:
> 
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.
> 
> Alternatives considered:
> 
> Setting up a valid repository for fuzz testing would avoid triggering
> this bug, but would unacceptably slow down the test cases.
> 
> Refactoring parse_commit_graph() in such a way that the fuzz test has an
> alternate entry point that avoids calling prepare_repo_settings() might
> be possible, but would be a much larger change than this one. It would
> also run the risk that the changes would be so extensive that the fuzzer
> would be merely testing its own custom commit-graph implementation,
> rather than the one that's actually used in the real world.
> 

If the 'repository' really is *completely* invalid without a gitdir, I don't
think the refactor of 'commit-graph.c' would necessarily be too bad -
certainly not so extensive that it's not testing "real world" behavior:

* Make 'parse_commit_graph()' only call 'prepare_repo_settings()' if the
  repo has a valid gitdir (guarded like 'git diff' is)
* Guard all use use of 'r' in 'parse_commit_graph()' based on the existence
  of 'gitdir', with fallbacks on e.g. global config settings. From what I
  can see, the only usage is:
    * The check for 'r->settings.commit_graph_read_changed_paths'
    * The call to 'get_configured_generation_version()'
    * The use of 'the_hash_algo->raw_sz'

> [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  repo-settings.c | 111 ++++++++++++++++++++++++------------------------
>  1 file changed, 55 insertions(+), 56 deletions(-)
> 
> diff --git a/repo-settings.c b/repo-settings.c
> index b4fbd16cdc..d154b37007 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
>  	char *strval;
>  	int manyfiles;
>  
> -	if (!r->gitdir)
> -		BUG("Cannot add settings for uninitialized repository");
> -
>  	if (r->settings.initialized++)
>  		return;
>  
> @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r)
>  	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
>  	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
>  
> -	/* Booleans config or default, cascades to other settings */
> -	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> -	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> +	if (r->gitdir) {
> +		/* Booleans config or default, cascades to other settings */
> +		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> +		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
>  
> -	/* Defaults modified by feature.* */
> -	if (experimental) {
> -		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -	}
> -	if (manyfiles) {
> -		r->settings.index_version = 4;
> -		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> -	}
> +		/* Defaults modified by feature.* */
> +		if (experimental) {
> +			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +		}
> +		if (manyfiles) {
> +			r->settings.index_version = 4;
> +			r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> +		}
> +
> +		/* Boolean config or default, does not cascade (simple)  */
> +		repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> +		repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> +		repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> +		repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> +		repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> +		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> +		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
>  
> -	/* Boolean config or default, does not cascade (simple)  */
> -	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> -	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> -	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> -	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> -	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> -	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> -	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
> +		/*
> +		 * Non-boolean config
> +		 */
> +		if (!repo_config_get_int(r, "index.version", &value))
> +			r->settings.index_version = value;
> +
> +		if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> +			int v = git_parse_maybe_bool(strval);
> +
> +			/*
> +			 * If it's set to "keep", or some other non-boolean
> +			 * value then "v < 0". Then we do nothing and keep it
> +			 * at the default of UNTRACKED_CACHE_KEEP.
> +			 */
> +			if (v >= 0)
> +				r->settings.core_untracked_cache = v ?
> +					UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> +			free(strval);
> +		}
> +
> +		if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> +			int fetch_default = r->settings.fetch_negotiation_algorithm;
> +			if (!strcasecmp(strval, "skipping"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +			else if (!strcasecmp(strval, "noop"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> +			else if (!strcasecmp(strval, "consecutive"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> +			else if (!strcasecmp(strval, "default"))
> +				r->settings.fetch_negotiation_algorithm = fetch_default;
> +			else
> +				die("unknown fetch negotiation algorithm '%s'", strval);
> +		}
> +	}
>  
>  	/*
>  	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
> @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r)
>  	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
>  		r->settings.core_multi_pack_index = 1;
>  
> -	/*
> -	 * Non-boolean config
> -	 */
> -	if (!repo_config_get_int(r, "index.version", &value))
> -		r->settings.index_version = value;
> -
> -	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> -		int v = git_parse_maybe_bool(strval);
> -
> -		/*
> -		 * If it's set to "keep", or some other non-boolean
> -		 * value then "v < 0". Then we do nothing and keep it
> -		 * at the default of UNTRACKED_CACHE_KEEP.
> -		 */
> -		if (v >= 0)
> -			r->settings.core_untracked_cache = v ?
> -				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> -		free(strval);
> -	}
> -
> -	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> -		int fetch_default = r->settings.fetch_negotiation_algorithm;
> -		if (!strcasecmp(strval, "skipping"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -		else if (!strcasecmp(strval, "noop"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> -		else if (!strcasecmp(strval, "consecutive"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> -		else if (!strcasecmp(strval, "default"))
> -			r->settings.fetch_negotiation_algorithm = fetch_default;
> -		else
> -			die("unknown fetch negotiation algorithm '%s'", strval);
> -	}
> -
>  	/*
>  	 * This setting guards all index reads to require a full index
>  	 * over a sparse index. After suitable guards are placed in the
> 
> base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
Junio C Hamano March 23, 2022, 8:51 p.m. UTC | #4
Josh Steadmon <steadmon@google.com> writes:

> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
>
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.

I think the right approach for such a breakage is to ensure it is in
a repository, not to force prepare_repo_settings() to lie.  In the
day-to-day real code paths the end user uses, we do want to catch
mistakes in our code that calls prepare_repo_settings() when we are
not in a repository.
Junio C Hamano March 23, 2022, 8:54 p.m. UTC | #5
Victoria Dye <vdye@github.com> writes:

> I think the decision of whether to go with this approach or the alternative
> listed below depends on the validity of a 'repository' without a gitdir. 
>
> As far as I can tell, there is an implicit conflict between the changes in:
>
> 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
>    2020-09-09)
> 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
>    2021-12-06) (as you pointed out in your message)
>
> The former says that commit-graph should use a repository setting (implying
> it needs a valid repository), and the latter says that you need a valid
> gitdir to get repository settings.
>
> So to me, how to proceed depends on whether a repository can be "valid"
> without a gitdir or not:

Sorry, I do not get it.  What does "a repository without a git dir"
look like?  It does not make any sense to me.  A repository without
working tree, I can understand.
Victoria Dye March 23, 2022, 9:19 p.m. UTC | #6
Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> I think the decision of whether to go with this approach or the alternative
>> listed below depends on the validity of a 'repository' without a gitdir. 
>>
>> As far as I can tell, there is an implicit conflict between the changes in:
>>
>> 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
>>    2020-09-09)
>> 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
>>    2021-12-06) (as you pointed out in your message)
>>
>> The former says that commit-graph should use a repository setting (implying
>> it needs a valid repository), and the latter says that you need a valid
>> gitdir to get repository settings.
>>
>> So to me, how to proceed depends on whether a repository can be "valid"
>> without a gitdir or not:
> 
> Sorry, I do not get it.  What does "a repository without a git dir"
> look like?  It does not make any sense to me.  A repository without
> working tree, I can understand.

I think that answers my question - if it doesn't make sense to have a
"repository without a git dir", then the code shouldn't allow
'the_repository' to be used without a valid 'the_repository.gitdir'. The
'BUG()' in 'prepare_repo_settings()' enforces that condition right now, so
removing it like this RFC does would make the tests fragile and the code
more prone to future bugs.

That said, if it's conceptually sensible for 'fuzz-commit-graph' to work
without a repository, then it could be updated to get its defaults from
somewhere other than 'the_repository' when the repo doesn't exist (like what
I mentioned in [1] ("If the 'repository' really is..."), or Taylor's
suggestion in [2]).

[1] https://lore.kernel.org/git/fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com/
[2] https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
Josh Steadmon March 28, 2022, 7:15 p.m. UTC | #7
On 2022.03.23 15:52, Taylor Blau wrote:
> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> > On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > > prepare_repo_settings() initializes a `struct repository` with various
> > > default config options and settings read from a repository-local config
> > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > > is called by a process whose CWD is not a Git repository. This approach
> > > was suggested in [1].
> > >
> > > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > > fuzzing-engine-provided bytes as a commit graph file.
> > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > > since we run the fuzz tests without a valid repository, we are hitting
> > > the BUG() from 44c7e62 for every test case.
> > >
> > > Fix this by refactoring prepare_repo_settings() such that it sets
> > > default options unconditionally; if its process is in a Git repository,
> > > it will also load settings from the local config. This eliminates the
> > > need for a BUG() when not in a repository.
> >
> > I think you have the right idea and this can work.
> 
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.
> 
> I was initially thinking that Josh's alternative of introducing and
> calling a lower-level version of `prepare_commit_graph()` that doesn't
> require the use of any repository settings would make sense.
> 
> But when I started looking at implementing it, I became confused at how
> this is supposed to work at all without using a repository. We depend on
> the values parsed from:
> 
>   - commitGraph.generationVersion
>   - commitGraph.readChangedPaths
> 
> to determine which part(s) of the commit graph we do and don't read.
> 
> Looking around, I think I probably inadvertently broke this in
> ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> so parsing the commit graph was a pure function of the commit graph's
> contents alone, and didn't rely on the existence of a repository.

Yeah, I have not done a great job keeping the fuzzers up to date with
commit-graph changes :(.

> We could pretend as if `commitGraph.generationVersion` is always "2" and
> `commitGraph.readChangedPaths` is always "true", and I think that would
> still give us good-enough coverage.

It might also be worthwhile for the fuzzer to test each interesting
combination of settings, using the same arbitrary input.

> I assume that libFuzzer doesn't give us a convenient way to stub out a
> repository. Maybe we should have a variant of `parse_commit_graph` that
> instead takes a `struct repo_settings` filled out with the defaults?
> 
> We could use that to teach libFuzzer about additional states that the
> commit graph parser can be in, too (though probably outside the scope of
> this patch).
> 
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:

Yes, your patch looks like a much smaller change than I feared would be
the case for a parse_commit_graph refactor. I'll test it out, and
probably add some additional fuzzer improvements on top. Thanks for the
patch!
Josh Steadmon March 28, 2022, 7:53 p.m. UTC | #8
On 2022.03.23 15:52, Taylor Blau wrote:
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:

BTW, is it OK if I include your Signed-off-by on this when I send my V2?
Taylor Blau March 29, 2022, 1:21 a.m. UTC | #9
On Mon, Mar 28, 2022 at 12:15:53PM -0700, Josh Steadmon wrote:
> > Looking around, I think I probably inadvertently broke this in
> > ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> > 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> > so parsing the commit graph was a pure function of the commit graph's
> > contents alone, and didn't rely on the existence of a repository.
>
> Yeah, I have not done a great job keeping the fuzzers up to date with
> commit-graph changes :(.

I think that puts you and I in the same boat, since the original
breakage from ab14d0676c blames back to me. I'm sorry that I didn't
notice that my change had broken the fuzzing code at the time, and I
appreciate you working on fixing it!

> > We could pretend as if `commitGraph.generationVersion` is always "2" and
> > `commitGraph.readChangedPaths` is always "true", and I think that would
> > still give us good-enough coverage.
>
> It might also be worthwhile for the fuzzer to test each interesting
> combination of settings, using the same arbitrary input.

Definitely. I don't think it hurts to just focus on getting the common
case ("2", "true") working again. And if libFuzzer makes it easy-ish to
test more of the possible input space, I'm all for it.

Thanks,
Taylor
Taylor Blau March 29, 2022, 1:22 a.m. UTC | #10
On Mon, Mar 28, 2022 at 12:53:33PM -0700, Josh Steadmon wrote:
> On 2022.03.23 15:52, Taylor Blau wrote:
> > I tried to sketch all of this out, which seems to work. Let me know what
> > you think:
>
> BTW, is it OK if I include your Signed-off-by on this when I send my V2?

Yes, absolutely. Thanks for asking; it's fine to include my
Signed-off-by on any patches / diffs that I send to the mailing list.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason March 29, 2022, 9:03 a.m. UTC | #11
On Wed, Mar 23 2022, Taylor Blau wrote:

>  	/* Boolean config or default, does not cascade (simple)  */
>  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);

If this ends up being kept let's add it to its own commented "section",
the comment 2-lines above it is now incorrect.
Ævar Arnfjörð Bjarmason March 29, 2022, 9:04 a.m. UTC | #12
On Wed, Mar 23 2022, Taylor Blau wrote:

> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> > prepare_repo_settings() initializes a `struct repository` with various
>> > default config options and settings read from a repository-local config
>> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> > is called by a process whose CWD is not a Git repository. This approach
>> > was suggested in [1].
>> >
>> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> > fuzzing-engine-provided bytes as a commit graph file.
>> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> > since we run the fuzz tests without a valid repository, we are hitting
>> > the BUG() from 44c7e62 for every test case.
>> >
>> > Fix this by refactoring prepare_repo_settings() such that it sets
>> > default options unconditionally; if its process is in a Git repository,
>> > it will also load settings from the local config. This eliminates the
>> > need for a BUG() when not in a repository.
>>
>> I think you have the right idea and this can work.
>
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.

I think adding that BUG() was overzelous in the first place, per
https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;

I don't see what purpose it solves to be this overly anal in this code,
and 44c7e62e51e (repo-settings: prepare_repo_settings only in git repos,
2021-12-06) just discusses "what" and not "why".

I think a perfectly fine solution to this is just to revert it:
	
	diff --git a/repo-settings.c b/repo-settings.c
	index b4fbd16cdcc..e162c1479bf 100644
	--- a/repo-settings.c
	+++ b/repo-settings.c
	@@ -18,7 +18,7 @@ void prepare_repo_settings(struct repository *r)
	 	int manyfiles;
	 
	 	if (!r->gitdir)
	-		BUG("Cannot add settings for uninitialized repository");
	+		return;
	 
	 	if (r->settings.initialized++)
	 		return;

I have that in my local integration branch, because I ended up wanting
to add prepare_repo_settings() to usage.c, which may or may not run
inside a repo (and maybe we'll have that config, maybe not).

But really, in common-main.c we do a initialize_the_repository(), so a
"struct repository" is already a thing we have before we get to the
"RUN_SETUP_GENTLY" or whatever in git.c, and a bunch of things all over
the place assume that it's the equivalent of { 0 }-initialized.

If we actually want to turn repository.[ch] into some strict API where
"Tho Shalt Not Use the_repository unless" we're actually in a repo
surely we should have it be NULL then, and to add that BUG() to the
likes of initialize_the_repository().

Except I think there's no point in that, and it would just lead to
pointless churn, so why do it for the settings in particular? Why can't
they just be { 0 }-init'd too?

If some caller cares about the distinction between r->settings being
like it is because of us actually having a repo, or us using the
defaults why can't they just check r->gitdir themselves?

For the rest the default of "just provide the defaults then" is a much
saner API.

I think *maybe* what this actually wanted to do was to bridge the gap
between "startup_info->have_repository" and a caller in builtin/ calling
prepare_repo_settings(), i.e. that it was a logic error to have that
RUN_SETUP_GENTLY caller do that.

I can see how that *might* be useful as some sanity assertion, but then
maybe we could add a more narrow BUG() just for that case, even having a
builtin_prepare_repo_settings() wrapper in builtin.h or whatever.
Taylor Blau March 30, 2022, 2:26 a.m. UTC | #13
On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> >  	/* Boolean config or default, does not cascade (simple)  */
> >  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
>
> If this ends up being kept let's add it to its own commented "section",
> the comment 2-lines above it is now incorrect.

Thanks for pointing it out; indeed that comment is definitely stale with
respect to the newer code. This patch was just a sketch of an
alternative approach for Josh to consider, so I can't guarantee that it
isn't immune to nit-picks ;).

I think that Josh is planning on picking this up, so hopefully he
doesn't mind applying these and any other touch-ups locally before
resubmitting.

(Josh, if you do: thank you very much in advance!)

Thanks,
Taylor
Taylor Blau March 30, 2022, 2:34 a.m. UTC | #14
On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> >> > prepare_repo_settings() initializes a `struct repository` with various
> >> > default config options and settings read from a repository-local config
> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> >> > is called by a process whose CWD is not a Git repository. This approach
> >> > was suggested in [1].
> >> >
> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> >> > fuzzing-engine-provided bytes as a commit graph file.
> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> >> > since we run the fuzz tests without a valid repository, we are hitting
> >> > the BUG() from 44c7e62 for every test case.
> >> >
> >> > Fix this by refactoring prepare_repo_settings() such that it sets
> >> > default options unconditionally; if its process is in a Git repository,
> >> > it will also load settings from the local config. This eliminates the
> >> > need for a BUG() when not in a repository.
> >>
> >> I think you have the right idea and this can work.
> >
> > Hmmm. To me this feels like bending over backwards in
> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
> > necessarily opposed to it, but it does feel strange to make
> > `prepare_repo_settings()` a noop here, since I would expect that any
> > callers who do want to call `prepare_repo_settings()` are likely
> > convinced that they are inside of a repository, and it probably should
> > be a BUG() if they aren't.
>
> I think adding that BUG() was overzelous in the first place, per
> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;

I think Junio raised a good point in

    https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/

, though some of the detail was lost in 44c7e62e51 (repo-settings:
prepare_repo_settings only in git repos, 2021-12-06).

> I have that in my local integration branch, because I ended up wanting
> to add prepare_repo_settings() to usage.c, which may or may not run
> inside a repo (and maybe we'll have that config, maybe not).

I see what you're saying, though I think we would be equally OK to have
a default value of the repo_settings struct that we could rely on. I
said some of this back in

    https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/

, namely the parts around "I would expect that any callers who do want
to call `prepare_repo_settings()` are likely convinced that they are
inside of a repository, and it probably should be a BUG() if they
aren't."

Thinking in terms of your message, though, I think the distinction (from
my perspective, at least) is between (a) using something called
_repo_-settings in a non-repo context, and (b) calling a function which
is supposed to fill in its values from a repository (which the caller
implicitly expects to exist).

Neither feel _good_ to me, but (b) feels worse, since it is making it OK
to operate in a likely-unexpected context with respect to the caller's
expectations.

Anyway, I think that we are pretty far into the weeds, and it's likely
time to turn around. I don't have that strong a feeling either way, and
in all honesty either approach is probably just fine.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason March 30, 2022, 5:38 p.m. UTC | #15
On Tue, Mar 29 2022, Taylor Blau wrote:

> On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Mar 23 2022, Taylor Blau wrote:
>>
>> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> >> > prepare_repo_settings() initializes a `struct repository` with various
>> >> > default config options and settings read from a repository-local config
>> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> >> > is called by a process whose CWD is not a Git repository. This approach
>> >> > was suggested in [1].
>> >> >
>> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> >> > fuzzing-engine-provided bytes as a commit graph file.
>> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> >> > since we run the fuzz tests without a valid repository, we are hitting
>> >> > the BUG() from 44c7e62 for every test case.
>> >> >
>> >> > Fix this by refactoring prepare_repo_settings() such that it sets
>> >> > default options unconditionally; if its process is in a Git repository,
>> >> > it will also load settings from the local config. This eliminates the
>> >> > need for a BUG() when not in a repository.
>> >>
>> >> I think you have the right idea and this can work.
>> >
>> > Hmmm. To me this feels like bending over backwards in
>> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
>> > necessarily opposed to it, but it does feel strange to make
>> > `prepare_repo_settings()` a noop here, since I would expect that any
>> > callers who do want to call `prepare_repo_settings()` are likely
>> > convinced that they are inside of a repository, and it probably should
>> > be a BUG() if they aren't.
>>
>> I think adding that BUG() was overzelous in the first place, per
>> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;
>
> I think Junio raised a good point in
>
>     https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
>
> , though some of the detail was lost in 44c7e62e51 (repo-settings:
> prepare_repo_settings only in git repos, 2021-12-06).
>
>> I have that in my local integration branch, because I ended up wanting
>> to add prepare_repo_settings() to usage.c, which may or may not run
>> inside a repo (and maybe we'll have that config, maybe not).
>
> I see what you're saying, though I think we would be equally OK to have
> a default value of the repo_settings struct that we could rely on. I
> said some of this back in
>
>     https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
>
> , namely the parts around "I would expect that any callers who do want
> to call `prepare_repo_settings()` are likely convinced that they are
> inside of a repository, and it probably should be a BUG() if they
> aren't."
>
> Thinking in terms of your message, though, I think the distinction (from
> my perspective, at least) is between (a) using something called
> _repo_-settings in a non-repo context, and (b) calling a function which
> is supposed to fill in its values from a repository (which the caller
> implicitly expects to exist).
>
> Neither feel _good_ to me, but (b) feels worse, since it is making it OK
> to operate in a likely-unexpected context with respect to the caller's
> expectations.

I agree that it's a bit iffy. I'm basically advocating for treating
"the_repository->settings" as though it's a new "the_config" or
whatever.

Maybe we'd be better off just making that move, or having
the_repository->settings contain only settings relevant to cases where
we only have a repository.

But I think predicating useful uses of it on that refactoring is
overdoing it a bit, especially as I think your "(b)" concern here is
already something we deal with when it comes to
initialize_the_repository() and checks for
"the_repository->gitdir".

Can't we just have callers that really care about the distinction check
"->gitdir" instead? As they're already doing in some cases already?

Or just:

    git mv {repo,global}-settings.c

Since that's what it seems to want to be anyway.

> Anyway, I think that we are pretty far into the weeds, and it's likely
> time to turn around. I don't have that strong a feeling either way, and
> in all honesty either approach is probably just fine.

*nod*
Junio C Hamano March 30, 2022, 8:14 p.m. UTC | #16
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Or just:
>
>     git mv {repo,global}-settings.c
>
> Since that's what it seems to want to be anyway.

Hmph, care to elaborate a bit more on "seems to"?

Here is my take

 - The code makes extensive use of repo_cfg_bool(), which is a thin
   wrapper around repo_config_get_bool(); despite its name, it is
   not about reading from the configuration file of that repository
   and nowhere else.  It can be affected by global configuration.

 - Other uses of repo_config_get_*() it uses is the same way.

So, it wants to grab a set of configuration that would +apply+ to
this specific instance of "struct repository".

But that is quite different from "give us settings that would apply
in general, I do not have a specific repository in mind", which is
what "global-settings.c" would imply at least to me.

And in order for the "this specific instance" to make sense, the
caller should have made sure that it is indeed a repository.
Lifting that BUG() from the code path not only smells sloppy way to
work around some corner case code that does not prepare the
repository properly, but does not make much sense, at least to me.
In exchange for scrapping the safety to help a caller that forgets
to prepare repository before it is ready to call this function, what
are we gaining?

I went back to the thread-starter message and re-read its
justification.  It talks about:

> Concerns:
>
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.

Asking for existing callers being dependent on having a BUG() is a
pure nonsense.  The existing callers are there in shipped versions
of Git exactly because they do things correctly not to hit the BUG(),
so BY DEFINITION, they do not care if the BUG() is there or not.

So that is not "a blocker", but is a non-argument to ask if existing
code paths care if the BUG() is gone.

What BUG() is protecting us against is a careless developer who
writes a new code or alters an existing code path that ends up
making the control flow in such a way that a proper set-up of the
repository structure is bypassed by mistake before calling this
function.  The function is call-once by r->settings.initialized
guarding it, calling it and then doing a set-up will result in an
unexplainable bug even if the caller tries to compensate by calling
it twice, as r->settings that is set incorrectly will be sticky.

Having said all that, I can be pursuaded to consider an approach to
allow callers to explicitly ask for running outside repository, just
like the more strict setup_git_directory() for majority of callers
has looser setup_git_directory_gently() counterpart.  The current
callers should retain the "you must have discovered gitdir" there,
but a special purpose code that is not even Git (like fuzzer) can
say

    prepare_repo_settings_gently(r, &nongit_ok);

instead.

diff --git c/repo-settings.c w/repo-settings.c
index b4fbd16cdc..c492bc7671 100644
--- c/repo-settings.c
+++ w/repo-settings.c
@@ -10,15 +10,24 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }
 
-void prepare_repo_settings(struct repository *r)
+void prepare_repo_settings_gently(struct repository *r, int *nongit)
 {
 	int experimental;
 	int value;
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
+	if (!r->gitdir) {
+		/*
+		 * The caller can pass nongit (out paremeter) to ask if r is already
+		 * initialized (and act on it after this function returns).
+		 */
+		if (!nongit)
+			BUG("Cannot add settings for uninitialized repository");
+		*nongit = 1;
+	} else if (nongit) {
+		*nongit = 0;
+	}
 
 	if (r->settings.initialized++)
 		return;
diff --git c/repository.h w/repository.h
index e29f361703..98f6ec12cc 100644
--- c/repository.h
+++ w/repository.h
@@ -222,7 +222,8 @@ int repo_read_index_unmerged(struct repository *);
  */
 void repo_update_index_if_able(struct repository *, struct lock_file *);
 
-void prepare_repo_settings(struct repository *r);
+#define prepare_repo_settings(r) prepare_repo_settings_gently((r), NULL)
+void prepare_repo_settings_gently(struct repository *r, int *nongit);
 
 /*
  * Return 1 if upgrade repository format to target_version succeeded,
Josh Steadmon April 9, 2022, 6:33 a.m. UTC | #17
On 2022.03.29 22:26, Taylor Blau wrote:
> On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Wed, Mar 23 2022, Taylor Blau wrote:
> >
> > >  	/* Boolean config or default, does not cascade (simple)  */
> > >  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > > +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
> >
> > If this ends up being kept let's add it to its own commented "section",
> > the comment 2-lines above it is now incorrect.
> 
> Thanks for pointing it out; indeed that comment is definitely stale with
> respect to the newer code. This patch was just a sketch of an
> alternative approach for Josh to consider, so I can't guarantee that it
> isn't immune to nit-picks ;).
> 
> I think that Josh is planning on picking this up, so hopefully he
> doesn't mind applying these and any other touch-ups locally before
> resubmitting.
> 
> (Josh, if you do: thank you very much in advance!)
> 
> Thanks,
> Taylor

Done in V2, to be sent out shortly.
diff mbox series

Patch

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..d154b37007 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,9 +17,6 @@  void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
-
 	if (r->settings.initialized++)
 		return;
 
@@ -28,27 +25,63 @@  void prepare_repo_settings(struct repository *r)
 	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
 	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
 
-	/* Booleans config or default, cascades to other settings */
-	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
-	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
+	if (r->gitdir) {
+		/* Booleans config or default, cascades to other settings */
+		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
+		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
 
-	/* Defaults modified by feature.* */
-	if (experimental) {
-		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
-	}
-	if (manyfiles) {
-		r->settings.index_version = 4;
-		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
-	}
+		/* Defaults modified by feature.* */
+		if (experimental) {
+			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
+		}
+		if (manyfiles) {
+			r->settings.index_version = 4;
+			r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
+		}
+
+		/* Boolean config or default, does not cascade (simple)  */
+		repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+		repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
+		repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
+		repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+		repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
+		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
+		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
 
-	/* Boolean config or default, does not cascade (simple)  */
-	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
-	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
-	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
-	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
-	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
-	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
-	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
+		/*
+		 * Non-boolean config
+		 */
+		if (!repo_config_get_int(r, "index.version", &value))
+			r->settings.index_version = value;
+
+		if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
+			int v = git_parse_maybe_bool(strval);
+
+			/*
+			 * If it's set to "keep", or some other non-boolean
+			 * value then "v < 0". Then we do nothing and keep it
+			 * at the default of UNTRACKED_CACHE_KEEP.
+			 */
+			if (v >= 0)
+				r->settings.core_untracked_cache = v ?
+					UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
+			free(strval);
+		}
+
+		if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
+			int fetch_default = r->settings.fetch_negotiation_algorithm;
+			if (!strcasecmp(strval, "skipping"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
+			else if (!strcasecmp(strval, "noop"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
+			else if (!strcasecmp(strval, "consecutive"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
+			else if (!strcasecmp(strval, "default"))
+				r->settings.fetch_negotiation_algorithm = fetch_default;
+			else
+				die("unknown fetch negotiation algorithm '%s'", strval);
+		}
+	}
 
 	/*
 	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
@@ -60,40 +93,6 @@  void prepare_repo_settings(struct repository *r)
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
 		r->settings.core_multi_pack_index = 1;
 
-	/*
-	 * Non-boolean config
-	 */
-	if (!repo_config_get_int(r, "index.version", &value))
-		r->settings.index_version = value;
-
-	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
-		int v = git_parse_maybe_bool(strval);
-
-		/*
-		 * If it's set to "keep", or some other non-boolean
-		 * value then "v < 0". Then we do nothing and keep it
-		 * at the default of UNTRACKED_CACHE_KEEP.
-		 */
-		if (v >= 0)
-			r->settings.core_untracked_cache = v ?
-				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
-		free(strval);
-	}
-
-	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
-		int fetch_default = r->settings.fetch_negotiation_algorithm;
-		if (!strcasecmp(strval, "skipping"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
-		else if (!strcasecmp(strval, "noop"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
-		else if (!strcasecmp(strval, "consecutive"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
-		else if (!strcasecmp(strval, "default"))
-			r->settings.fetch_negotiation_algorithm = fetch_default;
-		else
-			die("unknown fetch negotiation algorithm '%s'", strval);
-	}
-
 	/*
 	 * This setting guards all index reads to require a full index
 	 * over a sparse index. After suitable guards are placed in the