[v3,0/5] config-parse: create config parsing library

Message ID	cover.1695330852.git.steadmon@google.com (mailing list archive)
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Thu, 21 Sep 2023 14:17:19 -0700 In-Reply-To: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com> Mime-Version: 1.0 References: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com> Message-ID: <cover.1695330852.git.steadmon@google.com> Subject: [PATCH v3 0/5] config-parse: create config parsing library From: Josh Steadmon <steadmon@google.com> To: git@vger.kernel.org Cc: jonathantanmy@google.com, calvinwan@google.com, glencbz@gmail.com, gitster@pobox.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	config-parse: create config parsing library \| expand [v3,0/5] config-parse: create config parsing library [v3,1/5] config: split out config_parse_options [v3,2/5] config: split do_event() into start and flush operations [v3,3/5] config: report config parse errors using cb [v3,4/5] config.c: accept config_parse_options in git_config_from_stdin [v3,5/5] config-parse: split library out of config.[c\|h]

Josh Steadmon Sept. 21, 2023, 9:17 p.m. UTC

Config parsing no longer uses global state as of gc/config-context, so the
natural next step for libification is to turn that into its own library.
This series starts that process by moving config parsing into
config-parse.[c|h] so that other programs can include this functionality
without pulling in all of config.[c|h].

Open questions:
- How do folks feel about the do_event() refactor in patches 2 & 3?

Changes since v2:
- Added patch 2/5 to refactor do_event() into start_event() and
  flush_event().
- In patch 3/5, we can now add do_event_and_flush() to immediately run
  an event callback, rather than having to do_event() twice in a row.

Changes since v1.5:
- Dropped patch 1/5: config: return positive from git_config_parse_key()


Glen Choo (4):
  config: split out config_parse_options
  config: report config parse errors using cb
  config.c: accept config_parse_options in git_config_from_stdin
  config-parse: split library out of config.[c|h]

Josh Steadmon (1):
  config: split do_event() into start and flush operations

 Makefile           |   1 +
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config-parse.c     | 601 +++++++++++++++++++++++++++++++++++++++++
 config-parse.h     | 155 +++++++++++
 config.c           | 658 ++++-----------------------------------------
 config.h           | 134 +--------
 fsck.c             |   4 +-
 submodule-config.c |   9 +-
 9 files changed, 836 insertions(+), 734 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

Range-diff against v2:
1:  5c676fbac3 ! 1:  fa55b7836f config: split out config_parse_options
    @@ Metadata
      ## Commit message ##
         config: split out config_parse_options
     
    -    "struct config_options" is a disjoint set of options options used by the
    -    config parser (e.g. event listners) and options used by
    -    config_with_options() (e.g. to handle includes, choose which config
    -    files to parse). Split parser-only options into config_parse_options.
    +    "struct config_options" is a disjoint set of options used by the config
    +    parser (e.g. event listeners) and options used by config_with_options()
    +    (e.g. to handle includes, choose which config files to parse). Split
    +    parser-only options into config_parse_options.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## bundle-uri.c ##
-:  ---------- > 2:  8a1463c223 config: split do_event() into start and flush operations
2:  cb92a1f2e3 ! 3:  a888045c04 config: report config parse errors using cb
    @@ Commit message
         CONFIG_ERROR_UNSET and the config_source 'default', since all callers
         are now expected to specify the error handling they want.
     
    +    Add a new "do_event_and_flush" function for running event callbacks
    +    immediately, where the event does not need to calculate an end offset.
    +
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## builtin/config.c ##
    @@ config.c: static int add_remote_url(const char *var, const char *value,
      
      	opts = *inc->opts;
      	opts.unconditional_remote_url = 1;
    +@@ config.c: static int do_event(struct config_source *cs, enum config_event_t type,
    + 	return 0;
    + }
    + 
    ++static int do_event_and_flush(struct config_source *cs,
    ++			      enum config_event_t type,
    ++			      struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	/*
    ++	 * Not actually EOF, but this indicates we don't have a valid event
    ++	 * to flush next time around.
    ++	 */
    ++	data->previous_type = CONFIG_EVENT_EOF;
    ++
    ++	return 0;
    ++}
    ++
    + static void kvi_from_source(struct config_source *cs,
    + 			    enum config_scope scope,
    + 			    struct key_value_info *out)
     @@ config.c: static void kvi_from_source(struct config_source *cs,
      	out->path = cs->path;
      }
    @@ config.c: static int git_parse_source(struct config_source *cs, config_fn_t fn,
     -
     -	free(error_msg);
     -	return error_return;
    -+	/*
    -+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    -+	 * in order for our callback to receive the error event, we have to call
    -+	 * do_event twice
    -+	 */
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     +	return -1;
      }
      
    @@ config.c: void read_early_config(config_fn_t cb, void *data)
      void read_very_early_config(config_fn_t cb, void *data)
      {
     -	struct config_options opts = { 0 };
    +-
    +-	opts.respect_includes = 1;
    +-	opts.ignore_repo = 1;
    +-	opts.ignore_worktree = 1;
    +-	opts.ignore_cmdline = 1;
    +-	opts.system_gently = 1;
     +	struct config_options opts = {
    ++		.respect_includes = 1,
    ++		.ignore_repo = 1,
    ++		.ignore_worktree = 1,
    ++		.ignore_cmdline = 1,
    ++		.system_gently = 1,
     +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
     +	};
      
    - 	opts.respect_includes = 1;
    - 	opts.ignore_repo = 1;
    + 	config_with_options(cb, data, NULL, NULL, &opts);
    + }
     @@ config.c: int git_configset_get_pathname(struct config_set *set, const char *key, const ch
      /* Functions use to read configuration from a repository */
      static void repo_read_config(struct repository *repo)
    @@ config.c: int git_configset_get_pathname(struct config_set *set, const char *key
      
      	opts.respect_includes = 1;
      	opts.commondir = repo->commondir;
    -@@ config.c: int repo_config_get_pathname(struct repository *repo,
    - static void read_protected_config(void)
    - {
    - 	struct config_options opts = {
    --		.respect_includes = 1,
    --		.ignore_repo = 1,
    --		.ignore_worktree = 1,
    --		.system_gently = 1,
    +@@ config.c: static void read_protected_config(void)
    + 		.ignore_repo = 1,
    + 		.ignore_worktree = 1,
    + 		.system_gently = 1,
     +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
      	};
      
    -+	opts.respect_includes = 1;
    -+	opts.ignore_repo = 1;
    -+	opts.ignore_worktree = 1;
    -+	opts.system_gently = 1;
     +
      	git_configset_init(&protected_config);
      	config_with_options(config_set_callback, &protected_config, NULL,
3:  e034d0780c = 4:  49d4b64991 config.c: accept config_parse_options in git_config_from_stdin
4:  74c5dcd5a2 ! 5:  e59ca992d0 config-parse: split library out of config.[c|h]
    @@ config-parse.c (new)
     +	const struct config_parse_options *opts;
     +};
     +
    -+static int do_event(struct config_source *cs, enum config_event_t type,
    -+		    struct parse_event_data *data)
    ++static size_t get_corrected_offset(struct config_source *cs,
    ++				   enum config_event_t type)
     +{
    -+	size_t offset;
    -+
    -+	if (!data->opts || !data->opts->event_fn)
    -+		return 0;
    ++	size_t offset = cs->do_ftell(cs);
     +
    -+	if (type == CONFIG_EVENT_WHITESPACE &&
    -+	    data->previous_type == type)
    -+		return 0;
    -+
    -+	offset = cs->do_ftell(cs);
     +	/*
     +	 * At EOF, the parser always "inserts" an extra '\n', therefore
     +	 * the end offset of the event is the current file position, otherwise
    @@ config-parse.c (new)
     +	 */
     +	if (type != CONFIG_EVENT_EOF)
     +		offset--;
    ++	return offset;
    ++}
    ++
    ++static void start_event(struct config_source *cs, enum config_event_t type,
    ++		       struct parse_event_data *data)
    ++{
    ++	data->previous_type = type;
    ++	data->previous_offset = get_corrected_offset(cs, type);
    ++}
    ++
    ++static int flush_event(struct config_source *cs, enum config_event_t type,
    ++		       struct parse_event_data *data)
    ++{
    ++	if (!data->opts || !data->opts->event_fn)
    ++		return 0;
    ++
    ++	if (type == CONFIG_EVENT_WHITESPACE &&
    ++	    data->previous_type == type)
    ++		return 0;
     +
     +	if (data->previous_type != CONFIG_EVENT_EOF &&
     +	    data->opts->event_fn(data->previous_type, data->previous_offset,
    -+				 offset, cs, data->opts->event_fn_data) < 0)
    ++				 get_corrected_offset(cs, type), cs,
    ++				 data->opts->event_fn_data) < 0)
     +		return -1;
     +
    -+	data->previous_type = type;
    -+	data->previous_offset = offset;
    ++	return 1;
    ++}
    ++
    ++static int do_event(struct config_source *cs, enum config_event_t type,
    ++		    struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	return 0;
    ++}
    ++
    ++static int do_event_and_flush(struct config_source *cs,
    ++			      enum config_event_t type,
    ++			      struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	/*
    ++	 * Not actually EOF, but this indicates we don't have a valid event
    ++	 * to flush next time around.
    ++	 */
    ++	data->previous_type = CONFIG_EVENT_EOF;
     +
     +	return 0;
     +}
    @@ config-parse.c (new)
     +		if (get_value(cs, kvi, fn, data, var) < 0)
     +			break;
     +	}
    -+	/*
    -+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    -+	 * in order for our callback to receive the error event, we have to call
    -+	 * do_event twice
    -+	 */
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++
    ++	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     +	return -1;
     +}
     +
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -	const struct config_parse_options *opts;
     -};
     -
    --static int do_event(struct config_source *cs, enum config_event_t type,
    --		    struct parse_event_data *data)
    +-static size_t get_corrected_offset(struct config_source *cs,
    +-				   enum config_event_t type)
     -{
    --	size_t offset;
    --
    --	if (!data->opts || !data->opts->event_fn)
    --		return 0;
    +-	size_t offset = cs->do_ftell(cs);
     -
    --	if (type == CONFIG_EVENT_WHITESPACE &&
    --	    data->previous_type == type)
    --		return 0;
    --
    --	offset = cs->do_ftell(cs);
     -	/*
     -	 * At EOF, the parser always "inserts" an extra '\n', therefore
     -	 * the end offset of the event is the current file position, otherwise
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -	 */
     -	if (type != CONFIG_EVENT_EOF)
     -		offset--;
    +-	return offset;
    +-}
    +-
    +-static void start_event(struct config_source *cs, enum config_event_t type,
    +-		       struct parse_event_data *data)
    +-{
    +-	data->previous_type = type;
    +-	data->previous_offset = get_corrected_offset(cs, type);
    +-}
    +-
    +-static int flush_event(struct config_source *cs, enum config_event_t type,
    +-		       struct parse_event_data *data)
    +-{
    +-	if (!data->opts || !data->opts->event_fn)
    +-		return 0;
    +-
    +-	if (type == CONFIG_EVENT_WHITESPACE &&
    +-	    data->previous_type == type)
    +-		return 0;
     -
     -	if (data->previous_type != CONFIG_EVENT_EOF &&
     -	    data->opts->event_fn(data->previous_type, data->previous_offset,
    --				 offset, cs, data->opts->event_fn_data) < 0)
    +-				 get_corrected_offset(cs, type), cs,
    +-				 data->opts->event_fn_data) < 0)
     -		return -1;
     -
    --	data->previous_type = type;
    --	data->previous_offset = offset;
    +-	return 1;
    +-}
    +-
    +-static int do_event(struct config_source *cs, enum config_event_t type,
    +-		    struct parse_event_data *data)
    +-{
    +-	int maybe_ret;
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	start_event(cs, type, data);
    +-
    +-	return 0;
    +-}
    +-
    +-static int do_event_and_flush(struct config_source *cs,
    +-			      enum config_event_t type,
    +-			      struct parse_event_data *data)
    +-{
    +-	int maybe_ret;
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	start_event(cs, type, data);
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	/*
    +-	 * Not actually EOF, but this indicates we don't have a valid event
    +-	 * to flush next time around.
    +-	 */
    +-	data->previous_type = CONFIG_EVENT_EOF;
     -
     -	return 0;
     -}
    @@ config.c: int git_config_err_fn(enum config_event_t type, size_t begin_offset UN
     -			break;
     -	}
     -
    --	/*
    --	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    --	 * in order for our callback to receive the error event, we have to call
    --	 * do_event twice
    --	 */
    --	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    --	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    +-	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     -	return -1;
     -}
     -

base-commit: aa9166bcc0ba654fc21f198a30647ec087f733ed

Junio C Hamano Oct. 17, 2023, 5:13 p.m. UTC | #1

Josh Steadmon <steadmon@google.com> writes:

> Config parsing no longer uses global state as of gc/config-context, so the
> natural next step for libification is to turn that into its own library.
> This series starts that process by moving config parsing into
> config-parse.[c|h] so that other programs can include this functionality
> without pulling in all of config.[c|h].

This has been in list archive collecting dust.  It is unfortunate
that not many people appear to be interested in reviewing others'
patches?

> Open questions:
> - How do folks feel about the do_event() refactor in patches 2 & 3?

I gave a quick re-read and found that the code after patch 2 made it
easier to see how config.c::do_event() does its thing (even though
the patch text of that exact step was somehow a bit hard to follow).

However, the helper added by patch 3, do_event_and_flush(), that
duplicates exactly what do_event() does, is hard to reason about, at
least for me.  It returns early without setting .previous_type to
EOF and the value returned from the helper signals if that is the
case (the two early return points both return what flush_event()
gave us), but the only caller of the helper does not even inspect
the return value, unlike all the callers of do_event(), which also
looks a bit fishy.

Thanks.

Taylor Blau Oct. 23, 2023, 7:34 p.m. UTC | #2

On Tue, Oct 17, 2023 at 10:13:49AM -0700, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
>
> > Open questions:
> > - How do folks feel about the do_event() refactor in patches 2 & 3?
>
> I gave a quick re-read and found that the code after patch 2 made it
> easier to see how config.c::do_event() does its thing (even though
> the patch text of that exact step was somehow a bit hard to follow).
>
> However, the helper added by patch 3, do_event_and_flush(), that
> duplicates exactly what do_event() does, is hard to reason about, at
> least for me.  It returns early without setting .previous_type to
> EOF and the value returned from the helper signals if that is the
> case (the two early return points both return what flush_event()
> gave us), but the only caller of the helper does not even inspect
> the return value, unlike all the callers of do_event(), which also
> looks a bit fishy.

I had similar thoughts while reviewing.

But I am not sure that I agree that this series is moving us in the
right direction necessarily. Or at least I am not convinced that
shipping the intermediate state is worth doing before we have callers
that could drop '#include "config.h"' for just the parser.

This feels like churn that does not yield a tangible pay-off, at least
in the sense that the refactoring and code movement delivers us
something that we can substantively use today.

I dunno.

Thanks,
Taylor

Junio C Hamano Oct. 23, 2023, 8:13 p.m. UTC | #3

Taylor Blau <me@ttaylorr.com> writes:

> This feels like churn that does not yield a tangible pay-off, at least
> in the sense that the refactoring and code movement delivers us
> something that we can substantively use today.
>
> I dunno.

That matches something I felt but was too polite to say aloud ;-)

Jonathan Tan Oct. 24, 2023, 10:50 p.m. UTC | #4

Taylor Blau <me@ttaylorr.com> writes:
> But I am not sure that I agree that this series is moving us in the
> right direction necessarily. Or at least I am not convinced that
> shipping the intermediate state is worth doing before we have callers
> that could drop '#include "config.h"' for just the parser.
> 
> This feels like churn that does not yield a tangible pay-off, at least
> in the sense that the refactoring and code movement delivers us
> something that we can substantively use today.
> 
> I dunno.
> 
> Thanks,
> Taylor

Thanks for calling this out. We do want our changes to be good for both
the libification and the non-libification cases as much as possible. As
it is, I do agree that since we won't have callers that can use the new
parser header (I think the likeliest cause of having such a caller is
if we have a "interpret-config" command, like "interpret-trailers"), we
probably shouldn't merge this (at least, the last 2 patches).

I think patches 1-3 are still usable (they make some internals of config
parsing less confusing) but I'm also OK if we hold off on them until
we find a compelling use case that motivates refactoring on the config
parser.

Josh Steadmon Oct. 25, 2023, 7:37 p.m. UTC | #5

On 2023.10.24 15:50, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > But I am not sure that I agree that this series is moving us in the
> > right direction necessarily. Or at least I am not convinced that
> > shipping the intermediate state is worth doing before we have callers
> > that could drop '#include "config.h"' for just the parser.
> > 
> > This feels like churn that does not yield a tangible pay-off, at least
> > in the sense that the refactoring and code movement delivers us
> > something that we can substantively use today.
> > 
> > I dunno.
> > 
> > Thanks,
> > Taylor
> 
> Thanks for calling this out. We do want our changes to be good for both
> the libification and the non-libification cases as much as possible. As
> it is, I do agree that since we won't have callers that can use the new
> parser header (I think the likeliest cause of having such a caller is
> if we have a "interpret-config" command, like "interpret-trailers"), we
> probably shouldn't merge this (at least, the last 2 patches).
> 
> I think patches 1-3 are still usable (they make some internals of config
> parsing less confusing) but I'm also OK if we hold off on them until
> we find a compelling use case that motivates refactoring on the config
> parser.

Thanks everyone for the revived discussion here. I think I agree, this
series is not going in the right direction. Additionally, our internal
use case for this change has evaporated, so let's just drop the series.
We can pick it up again later if interest returns.

Junio C Hamano Oct. 27, 2023, 1:04 p.m. UTC | #6

Josh Steadmon <steadmon@google.com> writes:

> Thanks everyone for the revived discussion here. I think I agree, this
> series is not going in the right direction. Additionally, our internal
> use case for this change has evaporated, so let's just drop the series.
> We can pick it up again later if interest returns.

OK.  Let's scrap it for now.

The "internal use case" behind a proposed feature changing so
quickly is a bit worrying.  What is good for this project should
ideally be good for everybody, not only for satisfying a particular
$CORP needs of the day.  But I think the idea of giving enhanced
visibility into stakeholder companies directions and priorities
Emily (I think?)  floated during the contributors' summit may help
reduce such a risk, hopefully.

Thanks.

[v3,0/5] config-parse: create config parsing library

Message

Comments