@@ -84,6 +84,13 @@ transfer.bundleURI::
using bundles to bootstap is possible. Defaults to `true`,
i.e. bundle-uri is tried whenever a server offers it.
+transfer.bundleURI.downloader::
+ When set to `<program>` will be invoked when
+ `transfer.bundleURI` is in effect to download URIs containing
+ bundles. Expected to take one `URI` as an argument, and to
+ emit the bundle on STDOUT. Defaults to "curl --silent --output
+ - --". I.e. we'll invoke "curl --silent --output - -- <URI>".
+
transfer.injectBundleURI::
Allows for the injection of `bundle-uri` lines into the
protocol v2 transport dialog (see `protocol.version` in
@@ -1121,12 +1121,18 @@ static int get_bundle_uri(struct string_list_item *item, unsigned int nth,
const char *uri = item->string;
FILE *out;
int out_fd;
+ const char *tmp;
strvec_push(&cmd.args, "curl");
strvec_push(&cmd.args, "--silent");
strvec_push(&cmd.args, "--output");
strvec_push(&cmd.args, "-");
strvec_push(&cmd.args, "--");
+ if (!git_config_get_string_tmp("transfer.bundleURI.downloader", &tmp)) {
+ strvec_clear(&cmd.args);
+ strvec_push(&cmd.args, tmp);
+ cmd.use_shell = 1;
+ }
strvec_push(&cmd.args, item->string);
cmd.git_cmd = 0;
cmd.no_stdin = 1;
As noted in a preceding commit we really should be using libcurl's C API by default in get_bundle_uri(), but testing with a command-line program can be very handy, and useful e.g. to implement custom or ad-hoc caching. E.g. using part of the recipe noted in a preceding commit to create the "git-master-only.bdl" and "git-master-to-next.bdl" files, we can implement a trivial caching shellscript as: cat >get-bundle.sh <<-\EOF && #!/bin/sh set -xe uri="$1" bundle_cache_key () { echo "Computing cache key for URI '$1' (only getting the header)" >&2 curl --silent --output - -- "$1" | sed -n -e '/^$/q' -e 'p' | git hash-object --stdin } get_cached_bundle_uri() { cache_key=$(bundle_cache_key "$1") path="/tmp/bundle-cache-$cache_key.bdl" if test -e "$path" then echo "Using cache '$path' for URI '$1'" >&2 cat "$path" else echo "Downloading bundle URI $1" >&2 curl --silent --output - -- "$uri" | tee "$path" fi } get_cached_bundle_uri "$1" EOF chmod +x get-bundle.sh && rm -rf /tmp/git.git && ./git \ -c protocol.version=2 \ -c fetch.uriProtocols=file \ -c transfer.bundleURI.downloader=./get-bundle.sh \ -c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \ -c transfer.injectBundleURI="file:///tmp/git-master-to-next.bdl" \ clone --bare --no-tags --single-branch --branch next --template= \ --verbose --verbose \ https://github.com/git/git.git /tmp/git.git Now, clearly that specific example is rather pointless. We're getting a local file anyway, so "cat"-ing another local file doesn't make any difference, it's even slightly slower & more redundant as we're having to get it twice with "curl". But the point is that this can be trivially improved for use in any arbitrary custom caching strategy. E.g.: * A less dumber implementation that would stream the remote URL, check the header as we go, and disconnect if we've got that content locally. * Ditto, but using an ETag or other strategy. * N boxes could share a cache an NFS with a shared mount, or N disconnected git processes could use a common cache without the need for a front-line HTTP proxy server. * It would be trivial to extend this to guard against a "thundering herd" (e.g. concurrent CI) downloading the same bundle N times. As soon as we'd get the header we'd create a $cache_key.lock as we download the rest, and other concurrent clients spotting that would wait, then eventually cache "$cache_key". Still racy as N clients could download the header in parallel, but way less so (the header will be a tiny part of the payload). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/config/transfer.txt | 7 +++++++ fetch-pack.c | 6 ++++++ 2 files changed, 13 insertions(+)