diff mbox series

fetch-pack: make packfile URIs work with transfer.fsckobjects

Message ID 20200814193234.3072139-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series fetch-pack: make packfile URIs work with transfer.fsckobjects | expand

Commit Message

Jonathan Tan Aug. 14, 2020, 7:32 p.m. UTC
When fetching with packfile URIs and transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected. (A subsequent connectivity check will be done when
all the packs have been downloaded regardless of whether
transfer.fsckobjects is set.)

This is similar to 98a2ea46c2 ("fetch-pack: do not check links for
partial fetch", 2018-03-15), but for packfile URIs instead of partial
clones.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
The subject is longer than 50 characters but I couldn't find a way to
shorten it, especially since I think it's important to mention packfile
URIs and transfer.fsckobjects. Any suggestions appreciated.
---
 fetch-pack.c           |  2 +-
 t/t5702-protocol-v2.sh | 53 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Aug. 14, 2020, 7:59 p.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> When fetching with packfile URIs and transfer.fsckobjects=1, use the
> --fsck-objects instead of the --strict flag when invoking index-pack so
> that links are not checked, only objects. This is because incomplete
> links are expected. (A subsequent connectivity check will be done when
> all the packs have been downloaded regardless of whether
> transfer.fsckobjects is set.)

Good reasoning.  The change looks surprisingly small, thanks to the
existing need already.

I realize that the code's quality (from readability and
discoverability's point of view) has deteriorated in this area quite
a lot over the past few years, though.  The "from_promisor" field is
set when fetch-pack is run with the corresponding command line
option, but the option is documented nowhere, so it is not
immediately obvious why the new need can be fulfilled by just
piggybacking on the existing codepath.  The meaning of the
only_packfile parameter get_pack() takes is never explained
anywhere, either.

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 7f20eca4f8..66631d0034 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -892,7 +892,7 @@ static int get_pack(struct fetch_pack_args *args,
>  	    : transfer_fsck_objects >= 0
>  	    ? transfer_fsck_objects
>  	    : 0) {
> -		if (args->from_promisor)
> +		if (args->from_promisor || !only_packfile)
>  			/*
>  			 * We cannot use --strict in index-pack because it
>  			 * checks both broken objects and links, but we only

I think this is a good way to work around the "we do not have full
set of objects until we grab out-of-line packfiles, but we process
the in-protocol packdata before we grab them, so we cannot validate
yet" problem.  My guess is that "only_packfile" means "after reading
this packfile, the repository should be fully complete and we can
afford to check for connectivity", which would be always true for
protocol below v2 that lack the packfile-uri extension (hence the
call to get_pack() in do_fetch_pack() passes hardcoded 1 in this
parameter).  The v2 codepath in do_fetch_pack_v2() calls get_pack()
with true only when there is no packfile-uri, so we loosen the
validation when we know we will further grab one or more packfiles
out of line.

> +test_expect_success 'packfile-uri with transfer.fsckobjects' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
> +	ls http_child/.git/objects/pack/* >filelist &&

Subtle but correct.

> +	test_line_count = 4 filelist
> +'
> +
> +test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	cat >bogus-commit <<EOF &&
> +tree $EMPTY_TREE
> +author Bugs Bunny 1234567890 +0000
> +committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
> +
> +This commit object intentionally broken
> +EOF

Use <<-EOF for readablity, please.

> +	BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) &&
> +	git -C "$P" branch bogus-branch "$BOGUS" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child 2>error &&
> +	test_i18ngrep "invalid author/committer line - missing email" error
> +'
> +
>  # DO NOT add non-httpd-specific tests here, because the last part of this
>  # test script is only executed when httpd is available and enabled.
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index 7f20eca4f8..66631d0034 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -892,7 +892,7 @@  static int get_pack(struct fetch_pack_args *args,
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
 	    : 0) {
-		if (args->from_promisor)
+		if (args->from_promisor || !only_packfile)
 			/*
 			 * We cannot use --strict in index-pack because it
 			 * checks both broken objects and links, but we only
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 5a60fbe3ed..8c6c67b10d 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -883,6 +883,59 @@  test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
 	test_i18ngrep "pack downloaded from.*does not match expected hash" err
 '
 
+test_expect_success 'packfile-uri with transfer.fsckobjects' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
+	ls http_child/.git/objects/pack/* >filelist &&
+	test_line_count = 4 filelist
+'
+
+test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	cat >bogus-commit <<EOF &&
+tree $EMPTY_TREE
+author Bugs Bunny 1234567890 +0000
+committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
+
+This commit object intentionally broken
+EOF
+	BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) &&
+	git -C "$P" branch bogus-branch "$BOGUS" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child 2>error &&
+	test_i18ngrep "invalid author/committer line - missing email" error
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.