mbox series

[v3,0/8] config.c: use struct for config reading state

Message ID pull.1463.v3.git.git.1680025914.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series config.c: use struct for config reading state | expand

Message

Linus Arver via GitGitGadget March 28, 2023, 5:51 p.m. UTC
Note to Junio: 8/8 (which renames "cs" -> "set") conflicts with
ab/config-multi-and-nonbool. I previously said that I'd rebase this, but
presumably a remerge-diff is more ergonomic + flexible (let me know if I'm
mistaken), so I'll send a remerge-diff in a reply (I don't trust GGG not to
mangle the patch :/).

After sending this out, I'll see if cocci can make it easy enough to change
config_fn_t. If so, I'll probably go with that approach for the future
'libified' patches, which should also line up nicely with what Ævar
suggested.

= Changes in v3

 * Adjust the *_push() function to be unconditional (like the *_pop()
   function)

 * Various commit message and comment cleanups

= Changes in v2

 * To reduce churn, don't rename "struct config_source cf" to "cs" early in
   the series. Instead, rename the global "cf" to "cf_global", and leave the
   existing "cf"s untouched.
   
   Introduce 8/8 to get rid of the confusing acronym "struct config_source
   cf", but I don't mind ejecting it if it's too much churn.

 * Adjust 5/8 so to pass "struct config_reader" through args instead of
   "*data". v1 made the mistake of thinking "*data" was being passed to a
   callback, but it wasn't.

 * Add a 7/8 to fix a bug in die_bad_number(). I included this because it
   overlaps a little bit with the refactor here, but I don't mind ejecting
   this either.

 * Assorted BUG() message clarifications.

= Description

This series prepares for config.[ch] to be libified as as part of the
libification effort that Emily described in [1]. One of the first goals is
to read config from a file, but the trouble with how config.c is written
today is that all reading operations rely on global state, so before turning
that into a library, we'd want to make that state non-global.

This series doesn't remove all of the global state, but it gets us closer to
that goal by extracting the global config reading state into "struct
config_reader" and plumbing it through the config reading machinery. This
makes it possible to reuse the config machinery without global state, and to
enforce some constraints on "struct config_reader", which makes it more
predictable and easier to remove in the long run.

This process is very similar to how we've plumbed "struct repository" and
other 'context objects' in the past, except:

 * The global state (named "the_reader") for the git process lives in a
   config.c static variable, and not on "the_repository". See 3/6 for the
   rationale.

 * I've stopped short of adding "struct config_reader" to config.h public
   functions, since that would affect non-config.c callers.

Additionally, I've included a bugfix for die_bad_number() that became clear
as I did this refactor.

= Leftover bits

We still need a global "the_reader" because config callbacks are reading
auxiliary information about the config (e.g. line number, file name) via
global functions (e.g. current_config_line(), current_config_name()). This
is either because the callback uses this info directly (like
builtin/config.c printing the filename and scope of the value) or for error
reporting (like git_parse_int() reporting the filename of the value it
failed to parse).

If we had a way to plumb the state from "struct config_reader" to the config
callback functions, we could initialize "struct config_reader" in the config
machinery whenever we read config (instead of asking the caller to
initialize "struct config_reader" themselves), and config reading could
become a thread-safe operation. There isn't an obvious way to plumb this
state to config callbacks without adding an additional arg to config_fn_t
and incurring a lot of churn, but if we start replacing "config_fn_t" with
the configset API (which we've independently wanted for some time), this may
become feasible.

And if we do this, "struct config_reader" itself will probably become
obsolete, because we'd be able to plumb only the relevant state for the
current operation, e.g. if we are parsing a config file, we'd pass only the
config file parsing state, instead of "struct config_reader", which also
contains config set iterating state. In such a scenario, we'd probably want
to pass "struct key_value_info" to the config callback, since that's all the
callback should be interested in anyway. Interestingly, this was proposed by
Junio back in [2], and we didn't do this back then out of concern for the
churn (just like in v1).

[1]
https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com
[2]
https://lore.kernel.org/git/CAPc5daV6bdUKS-ExHmpT4Ppy2S832NXoyPw7aOLP7fG=WrBPgg@mail.gmail.com/

Glen Choo (8):
  config.c: plumb config_source through static fns
  config.c: don't assign to "cf_global" directly
  config.c: create config_reader and the_reader
  config.c: plumb the_reader through callbacks
  config.c: remove current_config_kvi
  config.c: remove current_parsing_scope
  config: report cached filenames in die_bad_number()
  config.c: rename "struct config_source cf"

 config.c               | 584 ++++++++++++++++++++++++-----------------
 config.h               |   1 +
 t/helper/test-config.c |  17 ++
 t/t1308-config-set.sh  |   9 +
 4 files changed, 368 insertions(+), 243 deletions(-)


base-commit: dadc8e6dacb629f46aee39bde90b6f09b73722eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1463%2Fchooglen%2Fconfig%2Fstructify-reading-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1463/chooglen/config/structify-reading-v3
Pull-Request: https://github.com/git/git/pull/1463

Range-diff vs v2:

 1:  75d0f0efb79 = 1:  75d0f0efb79 config.c: plumb config_source through static fns
 2:  7555da0b0e0 ! 2:  39db7d8596a config.c: don't assign to "cf_global" directly
     @@ config.c: static struct key_value_info *current_config_kvi;
       
      +static inline void config_reader_push_source(struct config_source *top)
      +{
     -+	if (cf_global)
     -+		top->prev = cf_global;
     ++	top->prev = cf_global;
      +	cf_global = top;
      +}
      +
 3:  4347896f0a4 ! 3:  72774fd08f3 config.c: create config_reader and the_reader
     @@ config.c: static struct key_value_info *current_config_kvi;
      +static inline void config_reader_push_source(struct config_reader *reader,
      +					     struct config_source *top)
       {
     --	if (cf_global)
     --		top->prev = cf_global;
     +-	top->prev = cf_global;
      -	cf_global = top;
     -+	if (reader->source)
     -+		top->prev = reader->source;
     ++	top->prev = reader->source;
      +	reader->source = top;
      +	/* FIXME remove this when cf_global is removed. */
      +	cf_global = reader->source;
     @@ config.c: static struct key_value_info *current_config_kvi;
      -	cf_global = cf_global->prev;
      +	ret = reader->source;
      +	reader->source = reader->source->prev;
     -+	/* FIXME remove this when cf is removed. */
     ++	/* FIXME remove this when cf_global is removed. */
      +	cf_global = reader->source;
       	return ret;
       }
 4:  22b69971749 ! 4:  e02dddd560f config.c: plumb the_reader through callbacks
     @@ config.c: static struct config_reader the_reader;
       
       /*
      @@ config.c: static inline void config_reader_push_source(struct config_reader *reader,
     - 	if (reader->source)
     - 		top->prev = reader->source;
     + {
     + 	top->prev = reader->source;
       	reader->source = top;
      -	/* FIXME remove this when cf_global is removed. */
      -	cf_global = reader->source;
     @@ config.c: static inline struct config_source *config_reader_pop_source(struct co
       		BUG("tried to pop config source, but we weren't reading config");
       	ret = reader->source;
       	reader->source = reader->source->prev;
     --	/* FIXME remove this when cf is removed. */
     +-	/* FIXME remove this when cf_global is removed. */
      -	cf_global = reader->source;
       	return ret;
       }
 5:  afb6e3e318d ! 5:  c79eaf74f89 config.c: remove current_config_kvi
     @@ config.c: static enum config_scope current_parsing_scope;
       {
      +	if (reader->config_kvi)
      +		BUG("source should not be set while iterating a config set");
     - 	if (reader->source)
     - 		top->prev = reader->source;
     + 	top->prev = reader->source;
       	reader->source = top;
     + }
      @@ config.c: static inline struct config_source *config_reader_pop_source(struct config_reade
       	return ret;
       }
 6:  a57e35163ae = 6:  05d9ffa21f6 config.c: remove current_parsing_scope
 7:  3c83d9535a0 ! 7:  eb843e6f08d config: report cached filenames in die_bad_number()
     @@ Commit message
      
          Fix this by refactoring the current_config_* functions into variants
          that don't BUG() when we aren't reading config, and using the resulting
     -    functions in die_bad_number(). Refactoring is needed because "git config
     -    --get[-regexp] --type=int" parses the int value _after_ parsing the
     -    config file, which will run into the BUG().
     +    functions in die_bad_number(). "git config --get[-regexp] --type=int"
     +    cannot use the non-refactored version because it parses the int value
     +    _after_ parsing the config file, which would run into the BUG().
      
     -    Also, plumb "struct config_reader" into the new functions. This isn't
     -    necessary per se, but this generalizes better, so it might help us avoid
     -    yet another refactor.
     +    Since the refactored functions aren't public, they use "struct
     +    config_reader".
      
          1. https://lore.kernel.org/git/20160518223712.GA18317@sigill.intra.peff.net/
      
 8:  9aec9092fdf = 8:  ab800aa104c config.c: rename "struct config_source cf"

Comments

Glen Choo March 28, 2023, 6 p.m. UTC | #1
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Note to Junio: 8/8 (which renames "cs" -> "set") conflicts with
> ab/config-multi-and-nonbool. I previously said that I'd rebase this, but
> presumably a remerge-diff is more ergonomic + flexible (let me know if I'm
> mistaken), so I'll send a remerge-diff in a reply (I don't trust GGG not to
> mangle the patch :/).


  diff --git a/config.c b/config.c
  remerge CONFLICT (content): Merge conflict in config.c
  index b17658e1ba..159c404d0c 100644
  --- a/config.c
  +++ b/config.c
  @@ -2351,15 +2351,9 @@ void read_very_early_config(config_fn_t cb, void *data)
    config_with_options(cb, data, NULL, &opts);
  }

  -<<<<<<< HEAD
  -static struct config_set_element *configset_find_element(struct config_set *set, const char *key)
  -||||||| c000d91638
  -static struct config_set_element *configset_find_element(struct config_set *cs, const char *key)
  -=======
  RESULT_MUST_BE_USED
  -static int configset_find_element(struct config_set *cs, const char *key,
  +static int configset_find_element(struct config_set *set, const char *key,
            struct config_set_element **dest)
  ->>>>>>> gitster/ab/config-multi-and-nonbool
  {
    struct config_set_element k;
    struct config_set_element *found_entry;
  @@ -2392,15 +2386,9 @@ static int configset_add_value(struct config_reader *reader,
    struct key_value_info *kv_info = xmalloc(sizeof(*kv_info));
    int ret;

  -<<<<<<< HEAD
  -	e = configset_find_element(set, key);
  -||||||| c000d91638
  -	e = configset_find_element(cs, key);
  -=======
  -	ret = configset_find_element(cs, key, &e);
  +	ret = configset_find_element(set, key, &e);
    if (ret)
      return ret;
  ->>>>>>> gitster/ab/config-multi-and-nonbool
    /*
    * Since the keys are being fed by git_config*() callback mechanism, they
    * are already normalized. So simply add them without any further munging.
  @@ -2510,40 +2498,21 @@ int git_configset_get_value(struct config_set *set, const char *key, const char
    * queried key in the files of the configset, the value returned will be the last
    * value in the value list for that key.
    */
  -<<<<<<< HEAD
  -	values = git_configset_get_value_multi(set, key);
  -||||||| c000d91638
  -	values = git_configset_get_value_multi(cs, key);
  -=======
  -	if ((ret = git_configset_get_value_multi(cs, key, &values)))
  +	if ((ret = git_configset_get_value_multi(set, key, &values)))
      return ret;
  ->>>>>>> gitster/ab/config-multi-and-nonbool

    assert(values->nr > 0);
    *value = values->items[values->nr - 1].string;
    return 0;
  }

  -<<<<<<< HEAD
  -const struct string_list *git_configset_get_value_multi(struct config_set *set, const char *key)
  -||||||| c000d91638
  -const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
  -=======
  -int git_configset_get_value_multi(struct config_set *cs, const char *key,
  +int git_configset_get_value_multi(struct config_set *set, const char *key,
            const struct string_list **dest)
  ->>>>>>> gitster/ab/config-multi-and-nonbool
  -{
  -<<<<<<< HEAD
  -	struct config_set_element *e = configset_find_element(set, key);
  -	return e ? &e->value_list : NULL;
  -||||||| c000d91638
  -	struct config_set_element *e = configset_find_element(cs, key);
  -	return e ? &e->value_list : NULL;
  -=======
  +{
    struct config_set_element *e;
    int ret;

  -	if ((ret = configset_find_element(cs, key, &e)))
  +	if ((ret = configset_find_element(set, key, &e)))
      return ret;
    else if (!e)
      return 1;
  @@ -2557,12 +2526,12 @@ static int check_multi_string(struct string_list_item *item, void *util)
    return item->string ? 0 : config_error_nonbool(util);
  }

  -int git_configset_get_string_multi(struct config_set *cs, const char *key,
  +int git_configset_get_string_multi(struct config_set *set, const char *key,
            const struct string_list **dest)
  {
    int ret;

  -	if ((ret = git_configset_get_value_multi(cs, key, dest)))
  +	if ((ret = git_configset_get_value_multi(set, key, dest)))
      return ret;
    if ((ret = for_each_string_list((struct string_list *)*dest,
            check_multi_string, (void *)key)))
  @@ -2571,17 +2540,16 @@ int git_configset_get_string_multi(struct config_set *cs, const char *key,
    return 0;
  }

  -int git_configset_get(struct config_set *cs, const char *key)
  +int git_configset_get(struct config_set *set, const char *key)
  {
    struct config_set_element *e;
    int ret;

  -	if ((ret = configset_find_element(cs, key, &e)))
  +	if ((ret = configset_find_element(set, key, &e)))
      return ret;
    else if (!e)
      return 1;
    return 0;
  ->>>>>>> gitster/ab/config-multi-and-nonbool
  }

  int git_configset_get_string(struct config_set *set, const char *key, char **dest)
Junio C Hamano March 28, 2023, 8:14 p.m. UTC | #2
Glen Choo <chooglen@google.com> writes:

> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Note to Junio: 8/8 (which renames "cs" -> "set") conflicts with
>> ab/config-multi-and-nonbool. I previously said that I'd rebase this, but
>> presumably a remerge-diff is more ergonomic + flexible (let me know if I'm
>> mistaken), so I'll send a remerge-diff in a reply (I don't trust GGG not to
>> mangle the patch :/).

A patch that is indented by two places would not be mechanically
applicable anyway and even without such indentation, the hunk-side
markers like c000d91638, gitster/ab/config-multi-and-nonbool, etc.
you have in the patch would be different from conflicts I would see,
so it wouldn't be very useful.  There should probably be a mode
where remerge-diff would omit these labels to help mechanical
application of such a "patch".

In any case, remerge-diff certainly helps eyeballing the result.

In this case, because the difference between v2 and v3 does not
overlap with the area that v2 (or v3 for that matter) would conflict
with other topics in flight, my rerere database can clealy reuse the
recorded resolution for conflicts between this iteration and the
topics in flight just fine.  I'll push out the integration result
after I am done with other topics.

Thanks.
Glen Choo March 28, 2023, 9:24 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> Note to Junio: 8/8 (which renames "cs" -> "set") conflicts with
>>> ab/config-multi-and-nonbool. I previously said that I'd rebase this, but
>>> presumably a remerge-diff is more ergonomic + flexible (let me know if I'm
>>> mistaken), so I'll send a remerge-diff in a reply (I don't trust GGG not to
>>> mangle the patch :/).
>
> A patch that is indented by two places would not be mechanically
> applicable anyway and even without such indentation, the hunk-side
> markers like c000d91638, gitster/ab/config-multi-and-nonbool, etc.
> you have in the patch would be different from conflicts I would see,
> so it wouldn't be very useful.  There should probably be a mode
> where remerge-diff would omit these labels to help mechanical
> application of such a "patch".

Ah, bummer :(

I was thinking about that as well. remerge-diff actually outputs
left and right in "--format=reference", which I then manually modified
to reflect the conflicted version in the working tree, giving the final
result here. I wonder what remerge-diff or "git apply" could do instead.