diff mbox series

[RFC,v2,16/36] bundle-uri: make the download program configurable

Message ID RFC-patch-v2-16.36-632c68b224f-20220418T165545Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series bundle-uri: a "dumb CDN" for git + TOC format | expand

Commit Message

Ævar Arnfjörð Bjarmason April 18, 2022, 5:23 p.m. UTC
As noted in a preceding commit we really should be using libcurl's C
API by default in get_bundle_uri(), but testing with a command-line
program can be very handy, and useful e.g. to implement custom or
ad-hoc caching.

E.g. using part of the recipe noted in a preceding commit to create
the "git-master-only.bdl" and "git-master-to-next.bdl" files, we can
implement a trivial caching shellscript as:

	cat >get-bundle.sh <<-\EOF &&
	#!/bin/sh
	set -xe

	uri="$1"

	bundle_cache_key () {
		echo "Computing cache key for URI '$1' (only getting the header)" >&2

		curl --silent --output - -- "$1" |
		sed -n -e '/^$/q' -e 'p' |
		git hash-object --stdin
	}

	get_cached_bundle_uri() {
		cache_key=$(bundle_cache_key "$1")

		path="/tmp/bundle-cache-$cache_key.bdl"

		if test -e "$path"
		then
			echo "Using cache '$path' for URI '$1'" >&2
			cat "$path"
		else
			echo "Downloading bundle URI $1" >&2
			curl --silent --output - -- "$uri" | tee "$path"
		fi
	}

	get_cached_bundle_uri "$1"
	EOF
	chmod +x get-bundle.sh &&
	rm -rf /tmp/git.git &&
	./git \
		-c protocol.version=2 \
		-c fetch.uriProtocols=file \
		-c transfer.bundleURI.downloader=./get-bundle.sh \
		-c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \
		-c transfer.injectBundleURI="file:///tmp/git-master-to-next.bdl" \
		clone --bare --no-tags --single-branch --branch next --template= \
		--verbose --verbose \
		https://github.com/git/git.git /tmp/git.git

Now, clearly that specific example is rather pointless. We're getting
a local file anyway, so "cat"-ing another local file doesn't make any
difference, it's even slightly slower & more redundant as we're having
to get it twice with "curl".

But the point is that this can be trivially improved for use in any
arbitrary custom caching strategy. E.g.:

 * A less dumber implementation that would stream the remote URL,
   check the header as we go, and disconnect if we've got that content
   locally.
 * Ditto, but using an ETag or other strategy.
 * N boxes could share a cache an NFS with a shared mount, or N
   disconnected git processes could use a common cache without the
   need for a front-line HTTP proxy server.

 * It would be trivial to extend this to guard against a "thundering
   herd" (e.g. concurrent CI) downloading the same bundle N times. As
   soon as we'd get the header we'd create a $cache_key.lock as we
   download the rest, and other concurrent clients spotting that would
   wait, then eventually cache "$cache_key".

   Still racy as N clients could download the header in parallel, but
   way less so (the header will be a tiny part of the payload).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/transfer.txt | 7 +++++++
 fetch-pack.c                      | 6 ++++++
 2 files changed, 13 insertions(+)
diff mbox series

Patch

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index ae85ca5760b..5310cd96cb9 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -84,6 +84,13 @@  transfer.bundleURI::
 	using bundles to bootstap is possible. Defaults to `true`,
 	i.e. bundle-uri is tried whenever a server offers it.
 
+transfer.bundleURI.downloader::
+	When set to `<program>` will be invoked when
+	`transfer.bundleURI` is in effect to download URIs containing
+	bundles. Expected to take one `URI` as an argument, and to
+	emit the bundle on STDOUT. Defaults to "curl --silent --output
+	- --". I.e. we'll invoke "curl --silent --output - -- <URI>".
+
 transfer.injectBundleURI::
 	Allows for the injection of `bundle-uri` lines into the
 	protocol v2 transport dialog (see `protocol.version` in
diff --git a/fetch-pack.c b/fetch-pack.c
index 316fb2fd65d..7e696142c4d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1121,12 +1121,18 @@  static int get_bundle_uri(struct string_list_item *item, unsigned int nth,
 	const char *uri = item->string;
 	FILE *out;
 	int out_fd;
+	const char *tmp;
 
 	strvec_push(&cmd.args, "curl");
 	strvec_push(&cmd.args, "--silent");
 	strvec_push(&cmd.args, "--output");
 	strvec_push(&cmd.args, "-");
 	strvec_push(&cmd.args, "--");
+	if (!git_config_get_string_tmp("transfer.bundleURI.downloader", &tmp)) {
+		strvec_clear(&cmd.args);
+		strvec_push(&cmd.args, tmp);
+		cmd.use_shell = 1;
+	}
 	strvec_push(&cmd.args, item->string);
 	cmd.git_cmd = 0;
 	cmd.no_stdin = 1;