Message ID | 4920d3c474375abb39ed163c5ed6138a5e5dccc6.1566863604.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | config: make config_with_options() handle any repo | expand |
On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > Currently, config_with_options() relies on the global the_repository > when it has to configure from a blob. Not really reading the patch, but my last experience with moving config.c away from the_repo [1] shows that there are more hidden dependencies, in git_path() and particularly the git_config_clear() call in git_config_set_multivar_... Not really sure if those deps really affect your goals or not. Have a look at that branch, filtering on config.c for more info (and if you want to pick up some patches from that, you have my sign-off). [1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees -- Duy
Hi, Duy On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote: > > On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > Currently, config_with_options() relies on the global the_repository > > when it has to configure from a blob. > > Not really reading the patch, but my last experience with moving > config.c away from the_repo [1] shows that there are more hidden > dependencies, in git_path() and particularly the git_config_clear() > call in git_config_set_multivar_... Not really sure if those deps > really affect your goals or not. Have a look at that branch, filtering > on config.c for more info (and if you want to pick up some patches > from that, you have my sign-off). Thanks for the advice. Indeed, I see now that do_git_config_sequence() may call git_pathdup(), which relies on the_repo. For my use in patch 2/2, repo_config_with_options() won't ever get to call do_git_config_sequence(), so that's fine. But in other use cases it may have to, so I'll need to check that. > [1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees > > -- > Duy
On Tue, Aug 27, 2019 at 8:46 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > Hi, Duy > > On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote: > > > > On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > > > > > > Currently, config_with_options() relies on the global the_repository > > > when it has to configure from a blob. > > > > Not really reading the patch, but my last experience with moving > > config.c away from the_repo [1] shows that there are more hidden > > dependencies, in git_path() and particularly the git_config_clear() > > call in git_config_set_multivar_... Not really sure if those deps > > really affect your goals or not. Have a look at that branch, filtering > > on config.c for more info (and if you want to pick up some patches > > from that, you have my sign-off). > > Thanks for the advice. Indeed, I see now that do_git_config_sequence() > may call git_pathdup(), which relies on the_repo. For my use in patch > 2/2, repo_config_with_options() won't ever get to call > do_git_config_sequence(), so that's fine. But in other use cases it > may have to, so I'll need to check that. While working on this, I think I may have found a bug: The repo_read_config() function takes a repository R as parameter and calls this chain of functions: repo_read_config(struct repository *R) > config_with_options() > do_git_config_sequence() > git_pathdup("config.worktree") Shouldn't, however, the last call consider R instead of using the_repository? i.e., use repo_git_path(R, "config.worktree"), instead? If so, how could we get R there? I mean, we could pass it through this chain, but the chain already passes a "struct config_options", which carries the "commondir" and "git_dir" fields. So it would probably be confusing to have them and an extra repository parameter (which also has "commondir" and "git_dir"), right? Any ideas on how to better approach this? > > [1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees > > > > -- > > Duy
On Thu, Aug 29, 2019 at 11:24 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Tue, Aug 27, 2019 at 8:46 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > Hi, Duy > > > > On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote: > > > > > > On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares > > > <matheus.bernardino@usp.br> wrote: > > > > > > > > Currently, config_with_options() relies on the global the_repository > > > > when it has to configure from a blob. > > > > > > Not really reading the patch, but my last experience with moving > > > config.c away from the_repo [1] shows that there are more hidden > > > dependencies, in git_path() and particularly the git_config_clear() > > > call in git_config_set_multivar_... Not really sure if those deps > > > really affect your goals or not. Have a look at that branch, filtering > > > on config.c for more info (and if you want to pick up some patches > > > from that, you have my sign-off). > > > > Thanks for the advice. Indeed, I see now that do_git_config_sequence() > > may call git_pathdup(), which relies on the_repo. For my use in patch > > 2/2, repo_config_with_options() won't ever get to call > > do_git_config_sequence(), so that's fine. But in other use cases it > > may have to, so I'll need to check that. > > While working on this, I think I may have found a bug: The > repo_read_config() function takes a repository R as parameter and > calls this chain of functions: > > repo_read_config(struct repository *R) > config_with_options() > > do_git_config_sequence() > git_pathdup("config.worktree") > > Shouldn't, however, the last call consider R instead of using > the_repository? i.e., use repo_git_path(R, "config.worktree"), > instead? Yes. You just found one of the plenty traps because the_repository is still hidden in many core functions. > If so, how could we get R there? I mean, we could pass it through this > chain, but the chain already passes a "struct config_options", which > carries the "commondir" and "git_dir" fields. So it would probably be > confusing to have them and an extra repository parameter (which also > has "commondir" and "git_dir"), right? Any ideas on how to better > approach this? I would change 'struct config_options' to carry 'struct repository' which also contains git_dir and other info inside. Though I have no idea how big that change would be (didn't check the code). Config code relies on plenty callbacks without "void *cb_data" so relying on global state is the only way in some cases.
On Thu, Aug 29, 2019 at 04:31:34PM +0700, Duy Nguyen wrote: > > If so, how could we get R there? I mean, we could pass it through this > > chain, but the chain already passes a "struct config_options", which > > carries the "commondir" and "git_dir" fields. So it would probably be > > confusing to have them and an extra repository parameter (which also > > has "commondir" and "git_dir"), right? Any ideas on how to better > > approach this? > > I would change 'struct config_options' to carry 'struct repository' > which also contains git_dir and other info inside. Though I have no > idea how big that change would be (didn't check the code). Config code > relies on plenty callbacks without "void *cb_data" so relying on > global state is the only way in some cases. I'm not sure about that, at least for this particular git_pathdup(). We pass along the git_dir because we might not have a repository struct yet (i.e., when reading config before repo discovery has happened). So it might be that this case should actually be making a path out of $git_dir/config.worktree (but I'm not 100% sure, as I don't know the ins and outs of worktree config files). I'm sure there are other gotchas in the config code, though, related to things for which we _do_ need a repository. E.g., include_by_branch() looks at the_repository, and should use a repository struct matching the git_dir we're looking at (though it may be acceptable to bail during early pre-repo-initialization config and just disallow branch includes, which is what happens now). -Peff
On Thu, Aug 29, 2019 at 11:00 AM Jeff King <peff@peff.net> wrote: > > On Thu, Aug 29, 2019 at 04:31:34PM +0700, Duy Nguyen wrote: > > > > If so, how could we get R there? I mean, we could pass it through this > > > chain, but the chain already passes a "struct config_options", which > > > carries the "commondir" and "git_dir" fields. So it would probably be > > > confusing to have them and an extra repository parameter (which also > > > has "commondir" and "git_dir"), right? Any ideas on how to better > > > approach this? > > > > I would change 'struct config_options' to carry 'struct repository' > > which also contains git_dir and other info inside. Though I have no > > idea how big that change would be (didn't check the code). Config code > > relies on plenty callbacks without "void *cb_data" so relying on > > global state is the only way in some cases. > > I'm not sure about that, at least for this particular git_pathdup(). We > pass along the git_dir because we might not have a repository struct yet > (i.e., when reading config before repo discovery has happened). Yes, I think read_early_config(), for example, may call config_with_options() before the_repo is initialized. > So it might be that this case should actually be making a path out of > $git_dir/config.worktree (but I'm not 100% sure, as I don't know the ins > and outs of worktree config files). Makes sense, config.worktree files are per-worktree, which have different git_dir's, right? > I'm sure there are other gotchas in the config code, though, related to > things for which we _do_ need a repository. E.g., include_by_branch() > looks at the_repository, and should use a repository struct matching the > git_dir we're looking at (though it may be acceptable to bail during > early pre-repo-initialization config and just disallow branch includes, > which is what happens now). I think config_with_options() is another example of a place where we should have a reference to a repo (but we currently don't). When configuring from a given blob, it will call git_config_from_blob_ref(), which calls get_oid() and read_object_file(). Both of these functions will use the_repo by default. But the git_dir and commondir fields passed to config_with_options() through 'struct config_options' may not refer to the_repo, right? I'm not sure what is the best solution to this, though. I mean, we could add a 'struct repository' in 'struct config_options', but as you already pointed out, some callers might not have a repository struct yet... > -Peff
On Thu, Aug 29, 2019 at 11:44 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > I'm sure there are other gotchas in the config code, though, related to > > things for which we _do_ need a repository. E.g., include_by_branch() > > looks at the_repository, and should use a repository struct matching the > > git_dir we're looking at (though it may be acceptable to bail during > > early pre-repo-initialization config and just disallow branch includes, > > which is what happens now). > > I think config_with_options() is another example of a place where we > should have a reference to a repo (but we currently don't). When > configuring from a given blob, it will call > git_config_from_blob_ref(), which calls get_oid() and > read_object_file(). Both of these functions will use the_repo by > default. But the git_dir and commondir fields passed to > config_with_options() through 'struct config_options' may not refer to > the_repo, right? > > I'm not sure what is the best solution to this, though. I mean, we > could add a 'struct repository' in 'struct config_options', but as you > already pointed out, some callers might not have a repository struct > yet... Early setup code has always been special (there's a lot of stuff you don't have access too). Ideally we could have a lower level API that takes git_dir and git_commondir only, no access to 'struct repository'. This is used for early access. And we have a higher level API that only takes struct repo and pass repo->gitdir down to the that lowlevel one. But I guess that's not the reality we're in. Since early setup code is special, perhaps you could make 'struct config_options' take both git_dir and struct repo, but not both at the same time. Early setup code sets repo to NULL and git_dir something else. Other code always leave git_dir and git_common_dir to NULL, documented to say those are for early setup only. PS. Again still not looking at the code so I may just be talking rubbish here.
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index 7d20716c32..ad99353df8 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt @@ -47,7 +47,7 @@ will first feed the user-wide one to the callback, and then the repo-specific one; by overwriting, the higher-priority repo-specific value is left at the end). -The `config_with_options` function lets the caller examine config +The `repo_config_with_options` function lets the caller examine config while adjusting some of the default behavior of `git_config`. It should almost never be used by "regular" Git code that is looking up configuration variables. It is intended for advanced callers like @@ -65,6 +65,9 @@ Specify options to adjust the behavior of parsing config files. See `struct config_options` in `config.h` for details. As an example: regular `git_config` sets `opts.respect_includes` to `1` by default. +The `config_with_options` macro is provided as an alias for +`repo_config_with_options`, using 'the_repository' by default. + Reading Specific Files ---------------------- diff --git a/config.c b/config.c index 3900e4947b..dc116f2582 100644 --- a/config.c +++ b/config.c @@ -1624,17 +1624,16 @@ int git_config_from_mem(config_fn_t fn, return do_config_from(&top, fn, data, opts); } -int git_config_from_blob_oid(config_fn_t fn, - const char *name, - const struct object_id *oid, - void *data) +int git_config_from_blob_oid(struct repository *r, config_fn_t fn, + const char *name, const struct object_id *oid, + void *data) { enum object_type type; char *buf; unsigned long size; int ret; - buf = read_object_file(oid, &type, &size); + buf = repo_read_object_file(r, oid, &type, &size); if (!buf) return error(_("unable to load config blob object '%s'"), name); if (type != OBJ_BLOB) { @@ -1649,15 +1648,14 @@ int git_config_from_blob_oid(config_fn_t fn, return ret; } -static int git_config_from_blob_ref(config_fn_t fn, - const char *name, - void *data) +static int git_config_from_blob_ref(struct repository *r, config_fn_t fn, + const char *name, void *data) { struct object_id oid; - if (get_oid(name, &oid) < 0) + if (repo_get_oid(r, name, &oid) < 0) return error(_("unable to resolve config blob '%s'"), name); - return git_config_from_blob_oid(fn, name, &oid, data); + return git_config_from_blob_oid(r, fn, name, &oid, data); } const char *git_etc_gitconfig(void) @@ -1751,9 +1749,9 @@ static int do_git_config_sequence(const struct config_options *opts, return ret; } -int config_with_options(config_fn_t fn, void *data, - struct git_config_source *config_source, - const struct config_options *opts) +int repo_config_with_options(struct repository *r, config_fn_t fn, void *data, + struct git_config_source *config_source, + const struct config_options *opts) { struct config_include_data inc = CONFIG_INCLUDE_INIT; @@ -1774,7 +1772,7 @@ int config_with_options(config_fn_t fn, void *data, else if (config_source && config_source->file) return git_config_from_file(fn, config_source->file, data); else if (config_source && config_source->blob) - return git_config_from_blob_ref(fn, config_source->blob, data); + return git_config_from_blob_ref(r, fn, config_source->blob, data); return do_git_config_sequence(opts, fn, data); } diff --git a/config.h b/config.h index f0ed464004..33ce46ecc3 100644 --- a/config.h +++ b/config.h @@ -82,16 +82,22 @@ int git_config_from_mem(config_fn_t fn, const char *name, const char *buf, size_t len, void *data, const struct config_options *opts); -int git_config_from_blob_oid(config_fn_t fn, const char *name, - const struct object_id *oid, void *data); +int git_config_from_blob_oid(struct repository *r, config_fn_t fn, + const char *name, const struct object_id *oid, + void *data); void git_config_push_parameter(const char *text); int git_config_from_parameters(config_fn_t fn, void *data); void read_early_config(config_fn_t cb, void *data); void read_very_early_config(config_fn_t cb, void *data); void git_config(config_fn_t fn, void *); -int config_with_options(config_fn_t fn, void *, - struct git_config_source *config_source, - const struct config_options *opts); +int repo_config_with_options(struct repository *r, config_fn_t fn, void *data, + struct git_config_source *config_source, + const struct config_options *opts); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define config_with_options(fn, data, config_source, opts) \ + repo_config_with_options(the_repository, fn, data, config_source, opts) +#endif + int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); int git_parse_maybe_bool(const char *); diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 2ee702ecf7..53ecc975bf 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -142,3 +142,14 @@ expression H; - format_commit_message( + repo_format_commit_message(the_repository, E, F, G, H); + +@@ +expression E; +expression F; +expression G; +expression H; +@@ +- config_with_options( ++ repo_config_with_options(the_repository, + E, F, G, H); + diff --git a/submodule-config.c b/submodule-config.c index 4264ee216f..1d28b17071 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -681,7 +681,7 @@ void gitmodules_config_oid(const struct object_id *commit_oid) submodule_cache_check_init(the_repository); if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) { - git_config_from_blob_oid(gitmodules_cb, rev.buf, + git_config_from_blob_oid(the_repository, gitmodules_cb, rev.buf, &oid, the_repository); } strbuf_release(&rev);
Currently, config_with_options() relies on the global the_repository when it has to configure from a blob. A possible way to bypass this limitation, when working with arbitrary repos, is to add their object directories into the in-memory alternates list. That's what submodule-config.c::config_from_gitmodules() does, for example. But this approach can negatively affect performance and memory. So introduce repo_config_with_options() which takes a struct repository as an additional parameter and pass it down in the call stack. Also, leave the original config_with_options() as a macro behind NO_THE_REPOSITORY_COMPATIBILITY_MACROS. Finally, adjust documentation and add a rule to coccinelle to reflect the change. Note: the following patch will take care of actually using the added function in place of config_with_options() at config_from_gitmodules(). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/technical/api-config.txt | 5 +++- config.c | 26 +++++++++---------- config.h | 16 ++++++++---- .../coccinelle/the_repository.pending.cocci | 11 ++++++++ submodule-config.c | 2 +- 5 files changed, 39 insertions(+), 21 deletions(-)