mbox series

[00/10] config API: make "multi" safe, fix numerous segfaults

Message ID cover-00.10-00000000000-20221026T151328Z-avarab@gmail.com (mailing list archive)
Headers show
Series config API: make "multi" safe, fix numerous segfaults | expand

Message

Ævar Arnfjörð Bjarmason Oct. 26, 2022, 3:35 p.m. UTC
This series is a follow-up to an earlier RFC Stolee sent about making
the *_multi() config API return non-NULL, and instead give you an
empty string list[1].

I also think that part of the config API is a wart, but that we should
go for a different solution. It's the only config function that
doesn't return an "int" indicating whether we found the key.

Code that wants to use the values should then check that return
value. I.e. Stolee's version allows you to do:

	const struct string_list *list = git_config_get_value_multi(key);
	for_each_string_list_item(item, list) { ... found = 1 ... }

Whereas in this proposal we instead do (same as for non-multi):

	if (!git_config_get_const_value_multi(key, &list))
		for_each_string_list_item(item, list) { ... found = 1 ... }

Mid-series that's made nicer by adjusting the string_list API to have
sensible "const"'s (so we don't need catsing), and using utility
functions from there. I.e. the recently added code in builtin/gc.c
becomes (Stolee's at [3]):

	if (!git_config_get_knownkey_value_multi(key, &list))
		found = unsorted_string_list_has_string(list, maintpath);

But anyway.

Once I started poking at this approach I discovered that we have a
much larger issue here than whether the top-level value is NULL or an empty list.

As noted I don't think it's an issue that the *_multi() returns NULL
if we have no key, that's easy to handle.

But *_multi() doesn't have the equivalent of a wrapper that coerces
the values on the list into one of our types (as in "git config
--type=<type>").

A not so well known edge case in our config format (see 8/10) is that
value-less keys are represented as NULL's, and a "struct string_list"
is perfectly happy to have a "char *string" member that's NULL.

I.e.:

	[a]key=x
	[a]key
	[a]key=y

Is represented as:

	{ "x", NULL, "y" }

Not, as existing code apparently expected:

	{ "x", "", "y" }

As a result existing code using *_multi() would segfault when reading
config like that. See 9/10, which fixes segfaults in 6 commits as old
as from 2015, and a couple of recent ones: One in the last release,
and one on "master" but not released yet.

The fix is thoroughly boring, we just start doing for *_multi() what
we've been doing for other config since Junio's 2008 fix (see 9/10) to
fix the same issue for non-multi config variables.

I.e. we provide a safer "I want strings, please" variant of the API,
just for *_multi(). At the culmination of this topic only the
test-helper uses the underlying unsafe API, and only because it needs
to check that we're still parsing the config correctly.

1. https://lore.kernel.org/git/pull.1369.git.1664287711.gitgitgadget@gmail.com/
2. https://lore.kernel.org/git/220928.868rm3w9d4.gmgdl@evledraar.gmail.com
3. https://lore.kernel.org/git/e06cb4df081bc2222731f9185a22ed7ad67e3814.1664287711.git.gitgitgadget@gmail.com/

Ævar Arnfjörð Bjarmason (10):
  config API: have *_multi() return an "int" and take a "dest"
  for-each-repo: error on bad --config
  config API: mark *_multi() with RESULT_MUST_BE_USED
  string-list API: mark "struct_string_list" to "for_each_string_list"
    const
  string-list API: make has_string() and list_lookup() "const"
  builtin/gc.c: use "unsorted_string_list_has_string()" where
    appropriate
  config API: add and use "lookup_value" functions
  config tests: add "NULL" tests for *_get_value_multi()
  config API: add "string" version of *_value_multi(), fix segfaults
  for-each-repo: with bad config, don't conflate <path> and <cmd>

 builtin/for-each-repo.c        |  14 ++-
 builtin/gc.c                   |  29 +-----
 builtin/log.c                  |   6 +-
 builtin/submodule--helper.c    |   7 +-
 builtin/worktree.c             |   3 +-
 config.c                       | 164 +++++++++++++++++++++++++++++----
 config.h                       | 110 ++++++++++++++++++++--
 pack-bitmap.c                  |   7 +-
 string-list.c                  |   6 +-
 string-list.h                  |   6 +-
 submodule.c                    |   3 +-
 t/helper/test-config.c         |   6 +-
 t/t0068-for-each-repo.sh       |  19 ++++
 t/t1308-config-set.sh          |  30 ++++++
 t/t4202-log.sh                 |  15 +++
 t/t5310-pack-bitmaps.sh        |  21 +++++
 t/t7004-tag.sh                 |  17 ++++
 t/t7413-submodule-is-active.sh |  16 ++++
 t/t7900-maintenance.sh         |  38 ++++++++
 versioncmp.c                   |  18 +++-
 20 files changed, 451 insertions(+), 84 deletions(-)

Comments

Junio C Hamano Oct. 27, 2022, 8:12 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> I also think that part of the config API is a wart, but that we should
> go for a different solution. It's the only config function that
> doesn't return an "int" indicating whether we found the key.

Overall I saw some things to like in the series, but was not
impressed by others.  The _multi() thing in the earliest patch is a
welcome change, giving an option to call nonbool() is a good idea
(but I have doubts about the exectuion), and "does the key exist?"
may be a good thing to have.  Others ranged between "Meh?" to "it
might be good, but why does it have to be done here now?".

Thanks.