mbox series

[v3,0/9] Bundle URIs III: Parse and download from bundle lists

Message ID pull.1333.v3.git.1664886860.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Bundle URIs III: Parse and download from bundle lists | expand

Message

Linus Arver via GitGitGadget Oct. 4, 2022, 12:34 p.m. UTC
This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way:

 1. Patches 1-2 are cleanups from the previous part. The first was
    recommended by Teng Long and the second allows us to simplify our bundle
    list data structure slightly.

 2. Patches 3-4 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 5-6 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 7 teaches Git to parse Git config files into bundle lists.

 5. Patches 8-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v3
=============

 * Fixed a comment about a return value of -1.
 * Fixed and tested scenario where early URIs fail in "any" mode and Git
   should try the rest of the list.
 * Instead of using 'success_count' and 'failure_count', use the iterator
   return value to terminate the "all" mode loop early.


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (7):
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: fetch a list of bundles
  bundle-uri: suppress stderr from remote-https

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 bundle-uri.c                    | 449 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     | 143 ++++++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 13 files changed, 976 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v2:

  1:  2ca431e6c37 <  -:  ----------- bundle-uri: short-circuit capability parsing
  2:  ee6c4b824c2 =  1:  48beccb0f5e bundle-uri: use plain string in find_temp_filename()
  3:  d9812440594 =  2:  f0c4457951c bundle-uri: create bundle_list struct and helpers
  4:  70daef66833 !  3:  430e01cd2a4 bundle-uri: create base key-value pair parsing
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
       
      +/**
      + * Given a key-value pair, update the state of the given bundle list.
     -+ * Returns 0 if the key-value pair is understood. Returns 1 if the key
     ++ * Returns 0 if the key-value pair is understood. Returns -1 if the key
      + * is not understood or the value is malformed.
      + */
      +MAYBE_UNUSED
  5:  4df3f834029 !  4:  cd915d57f3b bundle-uri: create "key=value" line parsing
     @@ Commit message
      
       ## bundle-uri.c ##
      @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
     -  * Returns 0 if the key-value pair is understood. Returns 1 if the key
     +  * Returns 0 if the key-value pair is understood. Returns -1 if the key
        * is not understood or the value is malformed.
        */
      -MAYBE_UNUSED
  6:  91c5b58f011 !  5:  4d8cac67f66 bundle-uri: unit test "key=value" parsing
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +
       /**
        * Given a key-value pair, update the state of the given bundle list.
     -  * Returns 0 if the key-value pair is understood. Returns 1 if the key
     +  * Returns 0 if the key-value pair is understood. Returns -1 if the key
      
       ## bundle-uri.h ##
      @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list,
  7:  1492b8f5ef0 =  6:  0ecae3a44b3 bundle-uri: parse bundle list in config format
  8:  b5d570082fa =  7:  7e6b32313b0 bundle-uri: limit recursion depth for bundle lists
  9:  a6ab8f7c699 !  8:  46799648b4c bundle-uri: fetch a list of bundles
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +	 */
      +	if (!res)
      +		ctx->count++;
     -+	return res;
     ++
     ++	/*
     ++	 * In BUNDLE_MODE_ANY, we need to continue iterating until we find
     ++	 * a bundle that works, so do not signal a failure here.
     ++	 */
     ++	return ctx->mode == BUNDLE_MODE_ANY ? 0 : res;
      +}
      +
      +static int download_bundle_list(struct repository *r,
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
       	return result;
       }
       
     -+struct attempt_unbundle_context {
     -+	struct repository *r;
     -+	int success_count;
     -+	int failure_count;
     -+};
     -+
     ++/**
     ++ * This loop iterator breaks the loop with nonzero return code on the
     ++ * first successful unbundling of a bundle.
     ++ */
      +static int attempt_unbundle(struct remote_bundle_info *info, void *data)
      +{
     -+	struct attempt_unbundle_context *ctx = data;
     ++	struct repository *r = data;
      +
     -+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
     -+		ctx->success_count++;
     ++	if (!info->file || info->unbundled)
     ++		return 0;
     ++
     ++	if (!unbundle_from_file(r, info->file)) {
      +		info->unbundled = 1;
     -+	} else {
     -+		ctx->failure_count++;
     ++		return 1;
      +	}
      +
      +	return 0;
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      +static int unbundle_all_bundles(struct repository *r,
      +				struct bundle_list *list)
      +{
     -+	int last_success_count = -1;
     -+	struct attempt_unbundle_context ctx = {
     -+		.r = r,
     -+	};
     -+
      +	/*
      +	 * Iterate through all bundles looking for ones that can
      +	 * successfully unbundle. If any succeed, then perhaps another
      +	 * will succeed in the next attempt.
     ++	 *
     ++	 * Keep in mind that a non-zero result for the loop here means
     ++	 * the loop terminated early on a successful unbundling, which
     ++	 * signals that we can try again.
      +	 */
     -+	while (last_success_count < ctx.success_count) {
     -+		last_success_count = ctx.success_count;
     -+
     -+		ctx.success_count = 0;
     -+		ctx.failure_count = 0;
     -+		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
     -+	}
     -+
     -+	if (ctx.success_count)
     -+		git_config_set_multivar_gently("log.excludedecoration",
     -+						"refs/bundle/",
     -+						"refs/bundle/",
     -+						CONFIG_FLAGS_FIXED_VALUE |
     -+						CONFIG_FLAGS_MULTI_REPLACE);
     -+
     -+	if (ctx.failure_count)
     -+		warning(_("failed to unbundle %d bundles"),
     -+			ctx.failure_count);
     ++	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
      +
      +	return 0;
      +}
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' '
      +		uri = file://$(pwd)/clone-from/bundle-4.bundle
      +	EOF
      +
     -+	git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file &&
     -+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
     -+	do
     -+		git -C clone-list-file rev-parse $oid || return 1
     -+	done
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-list-file cat-file --batch-check <oids
      +'
      +
     ++test_expect_success 'clone bundle list (file, any mode)' '
     ++	cat >bundle-list <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = any
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = $HTTPD_URL/bundle-0.bundle
     ++
     ++	[bundle "bundle-1"]
     ++		uri = $HTTPD_URL/bundle-1.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = $HTTPD_URL/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-any-file cat-file --batch-check <oids
     ++'
      +
       #########################################################################
       # HTTP tests begin here
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
      +		uri = $HTTPD_URL/bundle-4.bundle
      +	EOF
      +
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http &&
     -+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
     -+	do
     -+		git -C clone-list-http rev-parse $oid || return 1
     -+	done
     ++	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-list-http cat-file --batch-check <oids
     ++'
     ++
     ++test_expect_success 'clone bundle list (HTTP, any mode)' '
     ++	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     ++	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = any
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = $HTTPD_URL/bundle-0.bundle
     ++
     ++	[bundle "bundle-1"]
     ++		uri = $HTTPD_URL/bundle-1.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = $HTTPD_URL/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-any-http cat-file --batch-check <oids
      +'
      +
       # Do not add tests here unless they use the HTTP server, as they will
  -:  ----------- >  9:  d84544859e4 bundle-uri: suppress stderr from remote-https