[5/7] bundle-uri: parse bundle list in config format

Message ID	1d1bd9c710327b4d705cfede017771da7fb6ec52.1661181174.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <1d1bd9c710327b4d705cfede017771da7fb6ec52.1661181174.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1333.git.1661181174.gitgitgadget@gmail.com> References: <pull.1333.git.1661181174.gitgitgadget@gmail.com> Date: Mon, 22 Aug 2022 15:12:52 +0000 Subject: [PATCH 5/7] bundle-uri: parse bundle list in config format Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, newren@gmail.com, avarab@gmail.com, mjcheetham@outlook.com, steadmon@google.com, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <derrickstolee@github.com> Precedence: bulk From: Derrick Stolee <derrickstolee@github.com>
Series	Bundle URIs III: Parse and download from bundle lists \| expand [0/7] Bundle URIs III: Parse and download from bundle lists [1/7] bundle-uri: create bundle_list struct and helpers [2/7] bundle-uri: create base key-value pair parsing [3/7] bundle-uri: create "key=value" line parsing [4/7] bundle-uri: unit test "key=value" parsing [5/7] bundle-uri: parse bundle list in config format [6/7] bundle-uri: limit recursion depth for bundle lists [7/7] bundle-uri: fetch a list of bundles

Message ID

1d1bd9c710327b4d705cfede017771da7fb6ec52.1661181174.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Message-Id: 
 <1d1bd9c710327b4d705cfede017771da7fb6ec52.1661181174.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1333.git.1661181174.gitgitgadget@gmail.com>
References: <pull.1333.git.1661181174.gitgitgadget@gmail.com>
Date: Mon, 22 Aug 2022 15:12:52 +0000
Subject: [PATCH 5/7] bundle-uri: parse bundle list in config format
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, newren@gmail.com,
        avarab@gmail.com, mjcheetham@outlook.com, steadmon@google.com,
        Derrick Stolee <derrickstolee@github.com>,
        Derrick Stolee <derrickstolee@github.com>
Precedence: bulk
From: Derrick Stolee <derrickstolee@github.com>

Series

Bundle URIs III: Parse and download from bundle lists | expand

Commit Message

Derrick Stolee Aug. 22, 2022, 3:12 p.m. UTC

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create parse_bundle_list_in_config_format() to parse a file in config
format and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to call git_config_from_file_with_options() because the
default action for git_config_from_file() is to die() on a parsing
error. The current warning isn't particularly helpful if it arises to a
user, but it will be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version. I would rather have a slightly confusing
test helper than complicated product code.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 29 +++++++++++++++++++++
 bundle-uri.h                | 10 ++++++++
 t/helper/test-bundle-uri.c  | 45 ++++++++++++++++++++++++++-------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 125 insertions(+), 9 deletions(-)

Comments

Junio C Hamano Aug. 22, 2022, 7:25 p.m. UTC | #1

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> To allow for the incremental fetch case, teach Git to understand a
> bundle list that could be advertised at an independent bundle URI. Such
> a bundle list is likely to be inspected by human readers, even if only
> by the bundle provider creating the list. For this reason, we can take
> our expected "key=value" pairs and instead format them using Git config
> format.

"can" does not explain why it is a good idea.  "As a sequence of
key=value pairs is a lot more dense and harder to read than the
configuration file format, let's declare that it is the format we
use in a file that holds a bundle-list" would be.

I do not personally buy it, though.  As I hinted in an earlier step,
some trait we associate with our configuration fioe format, like the
"last one wins" semantics, are undesirable ones, so even if we reuse
the appearance of the text, the semantics would have to become
different (including "syntax errors lead to die()" mentioned
elsewhere in the proposed log message).

> Update 'test-tool bundle-uri' to take this config file format as input.
> It uses a filename instead of stdin because there is no existing way to
> parse a FILE pointer in the config machinery. Using
> git_config_from_mem() is overly complicated and more likely to introduce
> bugs than this simpler version. I would rather have a slightly confusing
> test helper than complicated product code.

All the troubles described above seem to come from the initial
mistake to try reusing the configuration file parser or reusing the
configuration file format, at least to me.

Derrick Stolee Aug. 23, 2022, 4:43 p.m. UTC | #2

On 8/22/2022 3:25 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> To allow for the incremental fetch case, teach Git to understand a
>> bundle list that could be advertised at an independent bundle URI. Such
>> a bundle list is likely to be inspected by human readers, even if only
>> by the bundle provider creating the list. For this reason, we can take
>> our expected "key=value" pairs and instead format them using Git config
>> format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.

This "more dense and harder to read" was definitely my intention for
wanting a different format. 

> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

The points you made earlier about "last one wins" semantics are the
biggest road-blocks to using the config file format, from what I've read
so far. We could change those semantics to be different from my current
implementation which respects the "last one wins" rule, and then that
makes the config format match not as closely. That burden of avoiding
multiple key values is not on the end-user but the bundle provider to
match the new expectations. (There might be something we should be careful
about when advertising the bundle list from our Git config in the
'bundle-uri' command in the next series.)

The "syntax errors lead to die()" is mitigated by using
CONFIG_ERROR_ERROR, which is what I meant by "Be careful to call..." I
should have been more clear that we are _not_ going to die() based on the
remote data. We might write an error message and then abort the bundle
download.

With all of these points in mind, I'd still prefer to use the config file
format as described in the design document. If you still don't agree, then
I'll change the format to be key=value pairs split with newlines, and
update the design document accordingly.

Thanks,
-Stolee

Jonathan Tan Aug. 31, 2022, 10:18 p.m. UTC | #3

Junio C Hamano <gitster@pobox.com> writes:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > To allow for the incremental fetch case, teach Git to understand a
> > bundle list that could be advertised at an independent bundle URI. Such
> > a bundle list is likely to be inspected by human readers, even if only
> > by the bundle provider creating the list. For this reason, we can take
> > our expected "key=value" pairs and instead format them using Git config
> > format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.
> 
> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

One reason for using the configuration file format (which perhaps could
have been better explained in the commit message) is that we plan to
have a way for a repo to advertise a list of bundles during fetch.
I think that config is a natural place to put that, even with its "last
one wins" semantics.

It could be argued that we can just put a single URI in config and only
allow advertising of a single URI (and then use a different format for
the bundle lists with semantics that are stricter than "last one wins"),
but that seems unnecessarily restrictive (and would make the client make
one more network request). And if we're advertising multiple bundles, it
seems reasonable to make all bundle lists have the same format (whether
they are in config or in a separate file).

Teng Long Sept. 1, 2022, 8:05 a.m. UTC | #4

Derrick Stolee <derrickstolee@github.com> writes:

> diff --git a/bundle-uri.h b/bundle-uri.h
> index 41a1510a4ac..294ac804140 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -71,6 +71,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
>  struct FILE;
>  void print_bundle_list(FILE *fp, struct bundle_list *list);
>
> +/**
> + * A bundle URI may point to a bundle list where the key=value
> + * pairs are provided in config file format. This method is
> + * exposed publicly for testing purposes.
> + */
> +
> +int parse_bundle_list_in_config_format(const char *uri,
> +				       const char *filename,
> +				       struct bundle_list *list);
> +

Although the comment clarifies the purpose of why to introduce
"parse_bundle_list_in_config_format", but I think this API is useful if finally
config format is supported. So far, we have a API names "bundle_uri_parse_line"
which is used to parsing key-value pairs and package into bundle list, I think
maybe we should rename the API name from "parse_bundle_list_in_config_format" to
"bundle_uri_parse_config_format", maybe better in my opinion for more consistent
naming. I think it doesnt break anything, feel free to accept or remain.

Thanks.

diff --git a/bundle-uri.c b/bundle-uri.c
index d56c5e33d5f..dca88ed1e89 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@ 
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -172,6 +173,34 @@  static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	list->mode = BUNDLE_MODE_NONE;
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static int find_temp_filename(struct strbuf *name)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 41a1510a4ac..294ac804140 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -71,6 +71,16 @@  int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 5cb0c9196fa..23ce0eebca3 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,27 +4,52 @@ 
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
-	const char *usage[] = {
+	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <in",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	struct option options[] = {
 		OPT_END(),
 	};
+	const char **usage = key_value_usage;
 	struct strbuf sb = STRBUF_INIT;
 	struct bundle_list list;
 	int err = 0;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	while (strbuf_getline(&sb, stdin) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf) < 0)
-			err = error("bad line: '%s'", sb.buf);
+
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc)
+			goto usage;
+		while (strbuf_getline(&sb, stdin) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf) < 0)
+				err = error("bad line: '%s'", sb.buf);
+		}
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = parse_bundle_list_in_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
 
@@ -55,7 +80,9 @@  int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 675c1f1d2f4..dd9dc36bfd7 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -88,4 +88,54 @@  test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = <unknown>
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	warning: bundle list at '\''<uri>'\'' has no mode
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done

[5/7] bundle-uri: parse bundle list in config format

Commit Message

Comments

Patch