Message ID | RFC-cover-0.5-00000000000-20230317T042408Z-avarab@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bypass config.c global state with configset | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Ævar Arnfjörð Bjarmason (5): > config.h: move up "struct key_value_info" > config.c: use "enum config_origin_type", not "int" > config API: add a config_origin_type_name() helper > config.c: refactor configset_iter() > config API: add and use a repo_config_kvi() I haven't reached the end of 5/5 but it is a shame to see that these obviously good and trivial clean-up patches like [1-4/5] have to be buried in an RFC patch and benefit of applying them alone becomes unclear only because it is done in an area that is actively worked on by others. The latter is unfortunately inherent to the approach of "commenting by patches" but hopefully we can see them reappear when the tree is quiescent. Thanks.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > That libification url > (https://github.com/git/git/compare/master...chooglen:git:config-lib-parsing) > doesn't work for me, and I didn't find a branch with that name in your > published repo. So maybe you've already done all this > post-libification work... Argh, sorry, I renamed the branch. https://github.com/git/git/compare/master...chooglen:git:config/read-without-globals > ...in any case. This RFC expands a bit on my comments on the v1 (at > [1] and upthread). It doesn't get all the way there, but with the > small change in 5/5 we've gotten rid of current_config_line(), the > 1-4/5 are trivial pre-refactorings to make that diff smaller > (e.g. moving the "struct key_value_info" around in config.h). Thanks. I haven't read through it yet, but it sounds promising :)
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Maybe it still makes sense to go for this "the_reader" intermediate > step, but I can't help but think that we could just go for it all in > one leap, and that you've just got stuck on thinking that you needed > to change "config_fn_t" for all its callers. > > As the 5/5 here shows we have various orthagonal uses of the > "config_fn_t" in config.c, and can just implement a new callback type > for the edge cases where we need the file & line info. > > This still leave the current_config_name() etc, which > e.g. builtin/config.c still uses. In your series you've needed to add > the new "reader" parameter for everything from do_config_from(), but > if we're doing that can't we instead just go straight to passing a > "struct key_value_info *" (perhaps with an added "name" field) all the > way down, replacing "cf->linenr" etc? In the end state, I also think we should be passing "struct key_value_info *" around instead of "cf", so I think we are seeing "the_reader" in the same way (as a transitional state). I considered the "repo_config_kvi() + config_fn_kvi_t" as well, but I rejected it (before discussion on the list, whoops) because I didn't want to add _yet another_ set of parallel config APIs, e.g. we already have repo_config(), git_config(), configset*(), git_config_from_file*(). Multiplying that by 2 to add *_kvi() seems like way too much, especially when it seems clear that our current definition of config_fn_t has some problems. Maybe we could deprecate the non-*_kvi(), and have both as a transitional state? It might work, but I think biting the bullet and changing config_fn_t would be easier actually. I'll try applying your series on top of my 1/8 (and maybe 7/8, see next reply) and extending it to cover "cf" (instead of just current_config_kvi()) to see whether the *_kvi() approach saves a lot of unnecessary plumbing. I'd still feel very strongly about getting rid of all of the non *_kvi() versions, though, but maybe that would happen in a cleanup topic. > Similarly, you mention git_parse_int() wanting to report a filename > and/or line number. I'm aware that it can do that, but it doesn't do > so in the common case, e.g.: > > git -c format.filenameMaxLength=abc log > fatal: bad numeric config value 'abc' for 'format.filenamemaxlength': invalid unit > > And the same goes for writing it to e.g. ~/.gitconfig. It's only if > you use "git config --file" or similar that we'll report a filename. That's true, but I think that's a bug, not a feature. See 7/8 [1] where I addressed it. 1. https://lore.kernel.org/git/3c83d9535a037653c7de2d462a4df3a3c43a9308.1678925506.git.gitgitgadget@gmail.com/ > We can just make config_set_callback() and configset_iter() > non-static, so e.g. the builtin/config.c caller that implements > "--show-origin" can keep its config_with_options(...) call, but > instead of "streaming" the config, it'll buffer it up into a > configset. Hm, so to extrapolate, we could make it so that nobody outside of config.c uses the *config_from_file() APIs directly. Instead, all reads get buffered up into a configset. That might not be a bad idea. It would definitely help with some of your goals of config API surface reduction. This would be friendlier in cases where we were already creating custom configsets (I know we have some of those, but I don't recall where), but in cases where we were reading the file directly (e.g. builtin/config.c), we'd be taking a memory and runtime hit. I'm not sure how I (or others) feel about that yet.
Glen Choo <chooglen@google.com> writes: >> This still leave the current_config_name() etc, which >> e.g. builtin/config.c still uses. In your series you've needed to add >> the new "reader" parameter for everything from do_config_from(), but >> if we're doing that can't we instead just go straight to passing a >> "struct key_value_info *" (perhaps with an added "name" field) all the >> way down, replacing "cf->linenr" etc? > > In the end state, I also think we should be passing "struct > key_value_info *" around instead of "cf", so I think we are seeing > "the_reader" in the same way (as a transitional state). > > I considered the "repo_config_kvi() + config_fn_kvi_t" as well, but I > rejected it (before discussion on the list, whoops) because I didn't > want to add _yet another_ set of parallel config APIs, e.g. we already > have repo_config(), git_config(), configset*(), > git_config_from_file*(). Multiplying that by 2 to add *_kvi() seems like > way too much, especially when it seems clear that our current definition > of config_fn_t has some problems. > > Maybe we could deprecate the non-*_kvi(), and have both as a > transitional state? It might work, but I think biting the bullet and > changing config_fn_t would be easier actually. > > I'll try applying your series on top of my 1/8 (and maybe 7/8, see next > reply) and extending it to cover "cf" (instead of just > current_config_kvi()) to see whether the *_kvi() approach saves a lot of > unnecessary plumbing. I'd still feel very strongly about getting rid of > all of the non *_kvi() versions, though, but maybe that would happen in > a cleanup topic. As mentioned above, I think having both "config_fn_t" and config_kvi_fn_t" would make sense if we found a good way to extend it to the functions that use "cf" (parsing config syntax), not the ones that use "current_config_kvi" (iterating a config set). I think technical difficulty is not a barrier here: - Constructing a "struct key_value_info" out of "cf" is trivial - Supporting both "config_fn_t" and "config_kvi_fn_t" with one implementation is also doable in theory. One approach would be to use only *_kvi() internally, and then adapt the external "config_fn_t" like: struct adapt_nonkvi { config_fn_t fn; void *data; }; static int adapt_nonkvi_fn(const char *key, const char *value, struct key_value_info *kvi UNUSED, void *cb) { struct adapt_nonkvi *adapt = cb; return adapt->fn(key, value, adapt->data); } The real cost is that there are so many functions we'd need to adapt (I counted 12 functions that accept config_fn_t in config.h). I think I got through about 30% of it before thinking that it was too much work to try to avoid adjusting config_fn_t. I still strongly believe that we shouldn't have both config_fn_t and config_kvi_fn_t in the long run, and we should converge on one. It's plausible that if we support both as an intermediate state, we'll never do the actual cleanup, so I think the extra cost is not worth it.
On Fri, Mar 17 2023, Glen Choo wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > [...] >> Similarly, you mention git_parse_int() wanting to report a filename >> and/or line number. I'm aware that it can do that, but it doesn't do >> so in the common case, e.g.: >> >> git -c format.filenameMaxLength=abc log >> fatal: bad numeric config value 'abc' for 'format.filenamemaxlength': invalid unit >> >> And the same goes for writing it to e.g. ~/.gitconfig. It's only if >> you use "git config --file" or similar that we'll report a filename. > > That's true, but I think that's a bug, not a feature. See 7/8 [1] where > I addressed it. > > 1. https://lore.kernel.org/git/3c83d9535a037653c7de2d462a4df3a3c43a9308.1678925506.git.gitgitgadget@gmail.com/ > >> We can just make config_set_callback() and configset_iter() >> non-static, so e.g. the builtin/config.c caller that implements >> "--show-origin" can keep its config_with_options(...) call, but >> instead of "streaming" the config, it'll buffer it up into a >> configset. > > Hm, so to extrapolate, we could make it so that nobody outside of > config.c uses the *config_from_file() APIs directly. Instead, all reads > get buffered up into a configset. That might not be a bad idea. It would > definitely help with some of your goals of config API surface reduction. > > This would be friendlier in cases where we were already creating custom > configsets (I know we have some of those, but I don't recall where), but > in cases where we were reading the file directly (e.g. > builtin/config.c), we'd be taking a memory and runtime hit. I'm not sure > how I (or others) feel about that yet. I'm pretty sure that in the end this wouldn't matter, i.e. the time it takes to parse the config is trivial, and the users of these APIs like "git config -l --show-origin" aren't performance-senitive. But for the general case & if you're concerned about this a trivial addition on top of what I suggested would be to pass a streaming callback to config_set_callback(), i.e. you could get the values you'd get from configset_iter() as we parse them.