diff mbox series

[v3,3/8] config.c: create config_reader and the_reader

Message ID 72774fd08f3eb9ff1d449814637e584692ba2bfc.1680025914.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 0c60285147441fbf93e435b4b022fe362b616a5a
Headers show
Series config.c: use struct for config reading state | expand

Commit Message

Glen Choo March 28, 2023, 5:51 p.m. UTC
From: Glen Choo <chooglen@google.com>

Create "struct config_reader" to hold the state of the config source
currently being read. Then, create a static instance of it,
"the_reader", and use "the_reader.source" to replace references to
"cf_global" in public functions.

This doesn't create much immediate benefit (since we're mostly replacing
static variables with a bigger static variable), but it prepares us for
a future where this state doesn't have to be global; "struct
config_reader" (or a similar struct) could be provided by the caller, or
constructed internally by a function like "do_config_from()".

A more typical approach would be to put this struct on "the_repository",
but that's a worse fit for this use case since config reading is not
scoped to a repository. E.g. we can read config before the repository is
known ("read_very_early_config()"), blatantly ignore the repo
("read_protected_config()"), or read only from a file
("git_config_from_file()"). This is especially evident in t5318 and
t9210, where test-tool and scalar parse config but don't fully
initialize "the_repository".

We could have also replaced the references to "cf_global" in callback
functions (which are the only ones left), but we'll eventually plumb
"the_reader" through the callback "*data" arg, so that would be
unnecessary churn. Until we remove "cf_global" altogether, add logic to
"config_reader_*_source()" to keep "cf_global" and "the_reader.source"
in sync.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 config.c | 82 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 31 deletions(-)

Comments

Ævar Arnfjörð Bjarmason March 29, 2023, 10:41 a.m. UTC | #1
On Tue, Mar 28 2023, Glen Choo via GitGitGadget wrote:

> From: Glen Choo <chooglen@google.com>
>
> Create "struct config_reader" to hold the state of the config source
> currently being read. Then, create a static instance of it,
> "the_reader", and use "the_reader.source" to replace references to
> "cf_global" in public functions.
>
> This doesn't create much immediate benefit (since we're mostly replacing
> static variables with a bigger static variable), but it prepares us for
> a future where this state doesn't have to be global; "struct
> config_reader" (or a similar struct) could be provided by the caller, or
> constructed internally by a function like "do_config_from()".
>
> A more typical approach would be to put this struct on "the_repository",
> but that's a worse fit for this use case since config reading is not
> scoped to a repository. E.g. we can read config before the repository is
> known ("read_very_early_config()"), blatantly ignore the repo
> ("read_protected_config()"), or read only from a file
> ("git_config_from_file()"). This is especially evident in t5318 and
> t9210, where test-tool and scalar parse config but don't fully
> initialize "the_repository".

I don't mean to just rehash previous discussion
(i.e. https://lore.kernel.org/git/230307.86wn3szrzu.gmgdl@evledraar.gmail.com/
and downthread). I get that you think sticking this in a "struct
repository *" here isn't clean, and would prefer to not conflate the
two.

Fair enough.

But I think this paragraph still does a bad job of justifying this
direction with reference to existing code.

Why? Because from reading it you get the impression that with
read_very_early_config() and read_protected_config() "config reading is
not scoped to a repository", but "scoped to" is doing a *lot* of work
here.

At the start of read_very_early_config() we do:

	struct config_options opts = { 0 };
	[...]
	opts.ignore_repo = 1;
	opts.ignore_worktree = 1;

And then call config_with_options(), which does:

	struct config_include_data inc = CONFIG_INCLUDE_INIT;

And that struct has:

	struct git_config_source *config_source;

Which in turn has:

	/* The repository if blob is not NULL; leave blank for the_repository */
	struct repository *repo;
	const char *blob;

The read_protected_config() is then another thin wrapper for
config_with_options().

So, so far the reader might be genuinely confused, since we already have
a "repo" in scope why can't we use it for this cache? Even if just
reading the system config etc.

For *those* cases I think what I *think* you're going for is that while
we have a "struct repository" already, we don't want to use it for our
"cache", and instead have a file-scoped one.

Personally, I don't see how it's cleaner to always use a file-scope
rather than piggy-back on the global we almost always have (or provide a
fallback), but let's not get on that topic again :)

Now, the case that *is* special on the other hand is
git_config_from_file(), there we really don't have a "repository" at
all, as it never gets the "struct config_include_data inc", or a
"git_config_source".

But if we dig a bit into those cases there's 3x users of
git_config_from_file() outside of config.c itself:

 * setup.c, to read only repo's "config.worktree"
 * setup.c, to read only repo "config"
 * sequencer.c, to read "sequencer/opts"

For the former two, I think the only thing that's needed is something
like this, along with a corresponding change to
do_git_config_sequence():

	diff --git a/config.h b/config.h
	index 7606246531a..b8a3de4eb93 100644
	--- a/config.h
	+++ b/config.h
	@@ -85,7 +85,10 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
	 
	 struct config_options {
	        unsigned int respect_includes : 1;
	+       unsigned int ignore_system : 1;
	+       unsigned int ignore_global : 1;
	        unsigned int ignore_repo : 1;
	+       unsigned int ignore_local : 1;
	        unsigned int ignore_worktree : 1;
	        unsigned int ignore_cmdline : 1;
	        unsigned int system_gently : 1;

I.e. we actually *do* have a repo there, we just haven't bridged the gap
of "ignore most of its config" so we can use config_with_options()
there.

The sequencer.c case is trickier, but presumably for such isolated
reading we could have a lower-level function which would return the
equivalent of a "key_value_info" on errors or whatever.


Anyway, I'm fine with this direction for now, but given the above & my
previous RFC
https://lore.kernel.org/git/RFC-cover-0.5-00000000000-20230317T042408Z-avarab@gmail.com/
I can't help but think we're taking two steps forward & one step
backwards for some of this.

I.e. are we assuming no "repo", but per the above we really do have one,
but we just don't pass it because we don't have a "read only the
worktree config part", or whatever?

Ditto the line number relaying for builtin/config.c, which as my RFC
showed we have one or two API users that care, which we can just
convert...
Junio C Hamano March 29, 2023, 6:57 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But I think this paragraph still does a bad job of justifying this
> direction with reference to existing code.

I thought it read reasonably well, if not perfect, and do not think
I am capable of rewriting it better, unfortunately.

Care to suggest a better rewrite?

> 	 struct config_options {
> 	        unsigned int respect_includes : 1;
> 	+       unsigned int ignore_system : 1;
> 	+       unsigned int ignore_global : 1;
> 	        unsigned int ignore_repo : 1;
> 	+       unsigned int ignore_local : 1;
> 	        unsigned int ignore_worktree : 1;
> 	        unsigned int ignore_cmdline : 1;
> 	        unsigned int system_gently : 1;

That does look (I am not sure about _local bit, though) well
organized, but I suspect that it can be left for a follow-on
clean-up series, perhaps?

Thanks.
Glen Choo March 29, 2023, 8:02 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>>> I.e. we actually *do* have a repo there, we just haven't bridged the gap
>>> of "ignore most of its config" so we can use config_with_options()
>>> there.
>> 	 struct config_options {
>> 	        unsigned int respect_includes : 1;
>> 	+       unsigned int ignore_system : 1;
>> 	+       unsigned int ignore_global : 1;
>> 	        unsigned int ignore_repo : 1;
>> 	+       unsigned int ignore_local : 1;
>> 	        unsigned int ignore_worktree : 1;
>> 	        unsigned int ignore_cmdline : 1;
>> 	        unsigned int system_gently : 1;
>
> That does look (I am not sure about _local bit, though) well
> organized, but I suspect that it can be left for a follow-on
> clean-up series, perhaps?

Makes sense, I did suggest something similar previously:

  https://lore.kernel.org/git/kl6ly1oze7wb.fsf@chooglen-macbookpro.roam.corp.google.com

But I think that's a follow up series for sure.
Glen Choo March 30, 2023, 5:51 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Mar 28 2023, Glen Choo via GitGitGadget wrote:
>> A more typical approach would be to put this struct on "the_repository",
>> but that's a worse fit for this use case since config reading is not
>> scoped to a repository. E.g. we can read config before the repository is
>> known ("read_very_early_config()"), blatantly ignore the repo
>> ("read_protected_config()"), or read only from a file
>> ("git_config_from_file()"). This is especially evident in t5318 and
>> t9210, where test-tool and scalar parse config but don't fully
>> initialize "the_repository".
>
> [...]
>
> But I think this paragraph still does a bad job of justifying this
> direction with reference to existing code.
>
> Why? Because from reading it you get the impression that with
> read_very_early_config() and read_protected_config() "config reading is
> not scoped to a repository", but "scoped to" is doing a *lot* of work
> here.
>
> [...]
>
> So, so far the reader might be genuinely confused, since we already have
> a "repo" in scope why can't we use it for this cache? Even if just
> reading the system config etc.
>
> For *those* cases I think what I *think* you're going for is that while
> we have a "struct repository" already, we don't want to use it for our
> "cache", and instead have a file-scoped one.

I was probably unclear, bleh. I intended "repository" to mean 'the thing
users interact with', not "struct repository". At any rate, the major
use case I'm concerned with is 'reading config from a file', where the
repository really isn't relevant at all (more on that later).

> Personally, I don't see how it's cleaner to always use a file-scope
> rather than piggy-back on the global we almost always have (or provide a
> fallback), but let's not get on that topic again :)

Piggybacking is probably less intrusive, but I'm not sure it results in
a coherent interface. The _only_ use of git_config_source.repo is to
read config from blobs in config_with_options() (which we need to read
.gitmodules from commits, not the working copy). After that, we don't
actually propagate the "struct repository" at all (because it's not
needed), and I think it makes sense to keep it that way.

> Now, the case that *is* special on the other hand is
> git_config_from_file(), there we really don't have a "repository" at
> all, as it never gets the "struct config_include_data inc", or a
> "git_config_source".
>
> But if we dig a bit into those cases there's 3x users of
> git_config_from_file() outside of config.c itself:
>
>  * setup.c, to read only repo's "config.worktree"
>  * setup.c, to read only repo "config"
>  * sequencer.c, to read "sequencer/opts"

We should also include git_config_from_file_with_options() (which is
basically the same thing), which adds one more caller:

* bundle-uri.c, to read bundle URI files

> For the former two, I think the only thing that's needed is something
> like this, along with a corresponding change to
> do_git_config_sequence():
>
> 	diff --git a/config.h b/config.h
> 	index 7606246531a..b8a3de4eb93 100644
> 	--- a/config.h
> 	+++ b/config.h
> 	@@ -85,7 +85,10 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
> 	 
> 	 struct config_options {
> 	        unsigned int respect_includes : 1;
> 	+       unsigned int ignore_system : 1;
> 	+       unsigned int ignore_global : 1;
> 	        unsigned int ignore_repo : 1;
> 	+       unsigned int ignore_local : 1;
> 	        unsigned int ignore_worktree : 1;
> 	        unsigned int ignore_cmdline : 1;
> 	        unsigned int system_gently : 1;
>
> I.e. we actually *do* have a repo there, we just haven't bridged the gap
> of "ignore most of its config" so we can use config_with_options()
> there.

I'm ambivalent on this. On the one hand, you're not wrong to say that
there probably _is_ a repository that we just happen to not care about,
and maybe it makes sense for config_with_options() to see a "struct
repository". On the other, I'm still quite convinced that the "struct
repository" that we already have just happens to be there by accident
(because "struct git_config_source" is a union of unrelated things), and
I don't think we should be piggybacking onto that.

> The sequencer.c case is trickier, but presumably for such isolated
> reading we could have a lower-level function which would return the
> equivalent of a "key_value_info" on errors or whatever.

bundle-uri.c falls into this case of 'read a file in config syntax' too.
For this reason, I see at least two layers to the config API:

- Parsing a file in config syntax, i.e. the "lower level" API
- Reading Git-specific config (understanding where config is located,
  caching it, etc), i.e. the "higher level" API

We have in-tree callers for _both_ of these layers, and I think that's
appropriate. IOW I don't think we necessarily need to hide the "lower
level" API inside of config.c and expose only the "higher level" API
in-tree [1], which was the impression I got from some of your RFC
patches.

Separating the layers like this also makes it possible to expose the
"lower" level to out-of-tree callers in a sensible way. To parse a
file in a given syntax, a caller shouldn't need to know about
repositories and whatnot. That's exactly what this series is trying to
prepare for, and being principled about the 'config reading cache' is
essential to get this sort of separation.

> I.e. are we assuming no "repo", but per the above we really do have one,
> but we just don't pass it because we don't have a "read only the
> worktree config part", or whatever?

This was addressed above.

> Ditto the line number relaying for builtin/config.c, which as my RFC
> showed we have one or two API users that care, which we can just
> convert...

builtin/config.c is a weird case that I think needs some refactoring,
e.g. there's

- git config -l, which will list all of the git config ("higher level")
- git config -l -f <file>, which lists config from just a file ("lower
  level")

but it uses config_with_options() in both cases! It works because
config_with_options() can switch between "read a subset the git config"
and "read just this file", but it's pretty gross, and we sometimes get
it wrong. (see https://lore.kernel.org/git/xmqqzg9kew1q.fsf@gitster.g/
as an example of how --global is a bit broken).

Your suggestion to convert that (also made upthread, but I can't find
the link for some reason...) to something that uses config_set sounds
pretty reasonable [1].

> Anyway, I'm fine with this direction for now, but given the above & my
> previous RFC
> https://lore.kernel.org/git/RFC-cover-0.5-00000000000-20230317T042408Z-avarab@gmail.com/
> I can't help but think we're taking two steps forward & one step
> backwards for some of this.

Thanks. I appreciate the feedback, nevertheless; I think it's bringing
us closer to a good conclusion.

FWIW I'm working on a followup that will take _many_ steps forward by
adjusting config_fn_t. I don't know how that will pan out, so I
appreciate checking in this series, which is at least a marginal
improvement over the status quo.

[1] "struct config_set" doesn't fall very neatly into the "lower" and
"higher" level API discussion. It's useful to be able to read config
into some in-memory cache (in-tree and out-of-tree), though that isn't
as "low level" as parsing config (without caching). That will probably
be a good follow up to my work to _just_ parse config.
diff mbox series

Patch

diff --git a/config.c b/config.c
index 6627fad71cf..3a28b397c4d 100644
--- a/config.c
+++ b/config.c
@@ -51,6 +51,16 @@  struct config_source {
 };
 #define CONFIG_SOURCE_INIT { 0 }
 
+struct config_reader {
+	struct config_source *source;
+};
+/*
+ * Where possible, prefer to accept "struct config_reader" as an arg than to use
+ * "the_reader". "the_reader" should only be used if that is infeasible, e.g. in
+ * a public function.
+ */
+static struct config_reader the_reader;
+
 /*
  * These variables record the "current" config source, which
  * can be accessed by parsing callbacks.
@@ -66,6 +76,9 @@  struct config_source {
  * at the variables, it's either a bug for it to be called in the first place,
  * or it's a function which can be reused for non-config purposes, and should
  * fall back to some sane behavior).
+ *
+ * FIXME "cf_global" has been replaced by "the_reader.source", remove
+ * "cf_global" once we plumb "the_reader" through all of the callback functions.
  */
 static struct config_source *cf_global;
 static struct key_value_info *current_config_kvi;
@@ -80,19 +93,24 @@  static struct key_value_info *current_config_kvi;
  */
 static enum config_scope current_parsing_scope;
 
-static inline void config_reader_push_source(struct config_source *top)
+static inline void config_reader_push_source(struct config_reader *reader,
+					     struct config_source *top)
 {
-	top->prev = cf_global;
-	cf_global = top;
+	top->prev = reader->source;
+	reader->source = top;
+	/* FIXME remove this when cf_global is removed. */
+	cf_global = reader->source;
 }
 
-static inline struct config_source *config_reader_pop_source()
+static inline struct config_source *config_reader_pop_source(struct config_reader *reader)
 {
 	struct config_source *ret;
-	if (!cf_global)
+	if (!reader->source)
 		BUG("tried to pop config source, but we weren't reading config");
-	ret = cf_global;
-	cf_global = cf_global->prev;
+	ret = reader->source;
+	reader->source = reader->source->prev;
+	/* FIXME remove this when cf_global is removed. */
+	cf_global = reader->source;
 	return ret;
 }
 
@@ -732,7 +750,7 @@  int git_config_from_parameters(config_fn_t fn, void *data)
 	struct config_source source = CONFIG_SOURCE_INIT;
 
 	source.origin_type = CONFIG_ORIGIN_CMDLINE;
-	config_reader_push_source(&source);
+	config_reader_push_source(&the_reader, &source);
 
 	env = getenv(CONFIG_COUNT_ENVIRONMENT);
 	if (env) {
@@ -790,7 +808,7 @@  out:
 	strbuf_release(&envvar);
 	strvec_clear(&to_free);
 	free(envw);
-	config_reader_pop_source();
+	config_reader_pop_source(&the_reader);
 	return ret;
 }
 
@@ -1325,7 +1343,7 @@  int git_config_int(const char *name, const char *value)
 {
 	int ret;
 	if (!git_parse_int(value, &ret))
-		die_bad_number(cf_global, name, value);
+		die_bad_number(the_reader.source, name, value);
 	return ret;
 }
 
@@ -1333,7 +1351,7 @@  int64_t git_config_int64(const char *name, const char *value)
 {
 	int64_t ret;
 	if (!git_parse_int64(value, &ret))
-		die_bad_number(cf_global, name, value);
+		die_bad_number(the_reader.source, name, value);
 	return ret;
 }
 
@@ -1341,7 +1359,7 @@  unsigned long git_config_ulong(const char *name, const char *value)
 {
 	unsigned long ret;
 	if (!git_parse_ulong(value, &ret))
-		die_bad_number(cf_global, name, value);
+		die_bad_number(the_reader.source, name, value);
 	return ret;
 }
 
@@ -1349,7 +1367,7 @@  ssize_t git_config_ssize_t(const char *name, const char *value)
 {
 	ssize_t ret;
 	if (!git_parse_ssize_t(value, &ret))
-		die_bad_number(cf_global, name, value);
+		die_bad_number(the_reader.source, name, value);
 	return ret;
 }
 
@@ -1955,7 +1973,8 @@  int git_default_config(const char *var, const char *value, void *cb)
  * fgetc, ungetc, ftell of top need to be initialized before calling
  * this function.
  */
-static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
+static int do_config_from(struct config_reader *reader,
+			  struct config_source *top, config_fn_t fn, void *data,
 			  const struct config_options *opts)
 {
 	int ret;
@@ -1966,22 +1985,23 @@  static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
 	top->total_len = 0;
 	strbuf_init(&top->value, 1024);
 	strbuf_init(&top->var, 1024);
-	config_reader_push_source(top);
+	config_reader_push_source(reader, top);
 
 	ret = git_parse_source(top, fn, data, opts);
 
 	/* pop config-file parsing state stack */
 	strbuf_release(&top->value);
 	strbuf_release(&top->var);
-	config_reader_pop_source();
+	config_reader_pop_source(reader);
 
 	return ret;
 }
 
-static int do_config_from_file(config_fn_t fn,
-		const enum config_origin_type origin_type,
-		const char *name, const char *path, FILE *f,
-		void *data, const struct config_options *opts)
+static int do_config_from_file(struct config_reader *reader,
+			       config_fn_t fn,
+			       const enum config_origin_type origin_type,
+			       const char *name, const char *path, FILE *f,
+			       void *data, const struct config_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 	int ret;
@@ -1996,15 +2016,15 @@  static int do_config_from_file(config_fn_t fn,
 	top.do_ftell = config_file_ftell;
 
 	flockfile(f);
-	ret = do_config_from(&top, fn, data, opts);
+	ret = do_config_from(reader, &top, fn, data, opts);
 	funlockfile(f);
 	return ret;
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data)
 {
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, NULL);
+	return do_config_from_file(&the_reader, fn, CONFIG_ORIGIN_STDIN, "",
+				   NULL, stdin, data, NULL);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2018,8 +2038,8 @@  int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 		BUG("filename cannot be NULL");
 	f = fopen_or_warn(filename, "r");
 	if (f) {
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
-					  filename, f, data, opts);
+		ret = do_config_from_file(&the_reader, fn, CONFIG_ORIGIN_FILE,
+					  filename, filename, f, data, opts);
 		fclose(f);
 	}
 	return ret;
@@ -2048,7 +2068,7 @@  int git_config_from_mem(config_fn_t fn,
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
 
-	return do_config_from(&top, fn, data, opts);
+	return do_config_from(&the_reader, &top, fn, data, opts);
 }
 
 int git_config_from_blob_oid(config_fn_t fn,
@@ -3797,8 +3817,8 @@  const char *current_config_origin_type(void)
 	int type;
 	if (current_config_kvi)
 		type = current_config_kvi->origin_type;
-	else if(cf_global)
-		type = cf_global->origin_type;
+	else if(the_reader.source)
+		type = the_reader.source->origin_type;
 	else
 		BUG("current_config_origin_type called outside config callback");
 
@@ -3843,8 +3863,8 @@  const char *current_config_name(void)
 	const char *name;
 	if (current_config_kvi)
 		name = current_config_kvi->filename;
-	else if (cf_global)
-		name = cf_global->name;
+	else if (the_reader.source)
+		name = the_reader.source->name;
 	else
 		BUG("current_config_name called outside config callback");
 	return name ? name : "";
@@ -3863,7 +3883,7 @@  int current_config_line(void)
 	if (current_config_kvi)
 		return current_config_kvi->linenr;
 	else
-		return cf_global->linenr;
+		return the_reader.source->linenr;
 }
 
 int lookup_config(const char **mapping, int nr_mapping, const char *var)