mbox series

[v5,00/12] Bundle URIs III: Parse and download from bundle lists

Message ID pull.1333.v5.git.1665579160.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Bundle URIs III: Parse and download from bundle lists | expand

Message

Philippe Blain via GitGitGadget Oct. 12, 2022, 12:52 p.m. UTC
This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way (updated for v4):

 1. Patch 1 is a cleanup from the previous part. This allows us to simplify
    our bundle list data structure slightly.

 2. Patches 2-3 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 6 teaches Git to parse Git config files into bundle lists.

 5. Patches 7-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists. This also fixes
    a previous bug when running verify_bundle() multiple times in the same
    process, as it did not clear the PREREQ_MARK flag upon leaving (see
    patch 8).

 6. Patches 10-12 suppress unhelpful warnings from user visibility.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v5
=============

 * The bug about verify_bundle() not working multile times in the same
   process is fixed without removing the revision walk. Instead, more flags
   needed to be removed as the method cleaned up after itself.


Updates in v4
=============

 * Properly updated the patch outline.

 * Jonathan Tan asked for more tests, and this revealed some interesting
   behaviors which I have now either fixed or made explicit:
   
   1. In "all" mode, we try to download and apply all bundles. Do not fail
      if a single bundle download fails.
   2. Previously, not all bundles were being applied, and this was noticed
      by the added checks for the refs/bundles/* refs at the end of the
      tests. This revealed the need for removing the reachability walk from
      verify_bundle() since the written refs/bundles/* refs were not being
      picked up by the loose ref cache. Since removing the reachability walk
      seemed like the faster (for users) option, I went that direction.
   3. While running those tests and examining the output carefully, I
      noticed several error messages related to missing prerequisites due to
      attempting unbundling in a random order. This doesn't appear in the
      later creationToken version, so I hadn't noticed it at the tip of my
      local work. These messages are removed with a new quiet mode for
      verify_bundle().


Updates in v3
=============

 * Fixed a comment about a return value of -1.
 * Fixed and tested scenario where early URIs fail in "any" mode and Git
   should try the rest of the list.
 * Instead of using 'success_count' and 'failure_count', use the iterator
   return value to terminate the "all" mode loop early.


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (10):
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle: properly clear all revision flags
  bundle-uri: fetch a list of bundles
  bundle: add flags to verify_bundle()
  bundle-uri: quiet failed unbundlings
  bundle-uri: suppress stderr from remote-https

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 builtin/bundle.c                |   5 +-
 bundle-uri.c                    | 458 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 bundle.c                        |  42 +--
 bundle.h                        |  15 +-
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     | 275 +++++++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 transport.c                     |   2 +-
 17 files changed, 1156 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v4:

  1:  48beccb0f5e =  1:  48beccb0f5e bundle-uri: use plain string in find_temp_filename()
  2:  f0c4457951c =  2:  f0c4457951c bundle-uri: create bundle_list struct and helpers
  3:  430e01cd2a4 =  3:  430e01cd2a4 bundle-uri: create base key-value pair parsing
  4:  cd915d57f3b =  4:  cd915d57f3b bundle-uri: create "key=value" line parsing
  5:  4d8cac67f66 =  5:  4d8cac67f66 bundle-uri: unit test "key=value" parsing
  6:  0ecae3a44b3 =  6:  0ecae3a44b3 bundle-uri: parse bundle list in config format
  7:  7e6b32313b0 =  7:  7e6b32313b0 bundle-uri: limit recursion depth for bundle lists
  -:  ----------- >  8:  8dc5a8e4e63 bundle: properly clear all revision flags
  9:  6b9c764c6b3 =  9:  51e9b8474fb bundle-uri: fetch a list of bundles
  8:  83f2cd893a4 ! 10:  fba3a4a117e bundle: add flags to verify_bundle(), skip walk
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    bundle: add flags to verify_bundle(), skip walk
     +    bundle: add flags to verify_bundle()
      
     -    The verify_bundle() method checks if a bundle can be applied to a given
     -    repository. This not only verifies that certain commits exist in the
     -    repository, but Git also checks that these commits are reachable.
     -
     -    This behavior dates back to the original git-bundle builtin written in
     -    2e0afafebd8 (Add git-bundle: move objects and references by archive,
     -    2007-02-22), but the message does not go into detail why the
     -    reachability check is important.
     -
     -    Since verify_bundle() is called from unbundle(), we need to add an
     -    option to pipe the flags through that method.
     -
     -    When unbundling from a list of bundles, Git will create refs that point
     -    to the tips of the latest bundle, which makes this reachability walk
     -    succeed, in theory. However, the loose refs cache does not get
     -    invalidated and hence the reachability walk fails. By disabling the
     -    reachability walk in the bundle URI code, we can get around this
     -    reachability check.
     +    The verify_bundle() method has a 'verbose' option, but we will want to
     +    extend this method to have more granular control over its output. First,
     +    replace this 'verbose' option with a new 'flags' option with a single
     +    possible value: VERIFY_BUNDLE_VERBOSE.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +	 * a reachable ref pointing to the new tips, which will reach
      +	 * the prerequisite commits.
      +	 */
     -+	if ((result = unbundle(r, &header, bundle_fd, NULL,
     -+			       VERIFY_BUNDLE_SKIP_REACHABLE)))
     ++	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
       		return 1;
       
       	/*
     @@ bundle.c: static int list_refs(struct string_list *r, int argc, const char **arg
       	/*
       	 * Do fast check, then if any prereqs are missing then go line by line
      @@ bundle.c: int verify_bundle(struct repository *r,
     - 			error("%s", message);
       		error("%s %s", oid_to_hex(oid), name);
       	}
     --	if (revs.pending.nr != p->nr)
     -+	if (revs.pending.nr != p->nr ||
     -+	    (flags & VERIFY_BUNDLE_SKIP_REACHABLE))
     - 		goto cleanup;
     - 	req_nr = revs.pending.nr;
     - 	setup_revisions(2, argv, &revs, NULL);
     -@@ bundle.c: int verify_bundle(struct repository *r,
     - 			clear_commit_marks(commit, ALL_REV_FLAGS);
     - 	}
       
      -	if (verbose) {
      +	if (flags & VERIFY_BUNDLE_VERBOSE) {
     @@ bundle.h: int read_bundle_header_fd(int fd, struct bundle_header *header,
      +
      +enum verify_bundle_flags {
      +	VERIFY_BUNDLE_VERBOSE = (1 << 0),
     -+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
      +};
      +
      +int verify_bundle(struct repository *r, struct bundle_header *header,
 10:  1cae3096624 ! 11:  2e0bfa834f1 bundle-uri: quiet failed unbundlings
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## builtin/bundle.c ##
     +@@ builtin/bundle.c: static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
     + 	}
     + 	close(bundle_fd);
     + 	if (verify_bundle(the_repository, &header,
     +-			  quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) {
     ++			  quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) {
     + 		ret = 1;
     + 		goto cleanup;
     + 	}
     +
       ## bundle-uri.c ##
      @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
     + 	 * a reachable ref pointing to the new tips, which will reach
       	 * the prerequisite commits.
       	 */
     - 	if ((result = unbundle(r, &header, bundle_fd, NULL,
     --			       VERIFY_BUNDLE_SKIP_REACHABLE)))
     -+			       VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET)))
     +-	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
     ++	if ((result = unbundle(r, &header, bundle_fd, NULL,
     ++			       VERIFY_BUNDLE_QUIET)))
       		return 1;
       
       	/*
     @@ bundle.h: int create_bundle(struct repository *r, const char *path,
       
       enum verify_bundle_flags {
       	VERIFY_BUNDLE_VERBOSE = (1 << 0),
     --	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
     -+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1),
     -+	VERIFY_BUNDLE_QUIET = (1 << 2),
     ++	VERIFY_BUNDLE_QUIET = (1 << 1),
       };
       
       int verify_bundle(struct repository *r, struct bundle_header *header,
 11:  52a575f8a69 = 12:  5729ff2af4b bundle-uri: suppress stderr from remote-https

Comments

Derrick Stolee Oct. 26, 2022, 2:34 p.m. UTC | #1
On 10/12/2022 8:52 AM, Derrick Stolee via GitGitGadget wrote:
> This is the third series building the bundle URI feature. It is built on top
> of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
> a URI to a bundle file. This series adds the capability of downloading and
> parsing a bundle list and then downloading the URIs in that list.
> 
> The core functionality of bundle lists is implemented by creating data
> structures from a list of key-value pairs. These pairs can come from a
> plain-text file in Git config format, but in the future, we will support the
> list being supplied by packet lines over Git's protocol v2 in the
> 'bundle-uri' command (reserved for the next series).

This version has been available for a while now without comment. Could
we consider it for merging to 'next' soon?

I want to wait for this series to merge into 'master' before sending
part IV on top, which advertises bundle URIs over protocol v2.

Thanks,
-Stolee
Junio C Hamano Oct. 26, 2022, 4:06 p.m. UTC | #2
Derrick Stolee <derrickstolee@github.com> writes:

> This version has been available for a while now without comment. Could
> we consider it for merging to 'next' soon?

Could somebody who has reviewed it fully give an Ack (or two)?  I
know an earlier rounds had some comments, but after v3 things have
quieted down.

I know the change from v4 to v5 has good improvements, but do not
claim to have read the other parts in detail.

Thanks.