Message ID | 20200814193234.3072139-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch-pack: make packfile URIs work with transfer.fsckobjects | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > When fetching with packfile URIs and transfer.fsckobjects=1, use the > --fsck-objects instead of the --strict flag when invoking index-pack so > that links are not checked, only objects. This is because incomplete > links are expected. (A subsequent connectivity check will be done when > all the packs have been downloaded regardless of whether > transfer.fsckobjects is set.) Good reasoning. The change looks surprisingly small, thanks to the existing need already. I realize that the code's quality (from readability and discoverability's point of view) has deteriorated in this area quite a lot over the past few years, though. The "from_promisor" field is set when fetch-pack is run with the corresponding command line option, but the option is documented nowhere, so it is not immediately obvious why the new need can be fulfilled by just piggybacking on the existing codepath. The meaning of the only_packfile parameter get_pack() takes is never explained anywhere, either. > diff --git a/fetch-pack.c b/fetch-pack.c > index 7f20eca4f8..66631d0034 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -892,7 +892,7 @@ static int get_pack(struct fetch_pack_args *args, > : transfer_fsck_objects >= 0 > ? transfer_fsck_objects > : 0) { > - if (args->from_promisor) > + if (args->from_promisor || !only_packfile) > /* > * We cannot use --strict in index-pack because it > * checks both broken objects and links, but we only I think this is a good way to work around the "we do not have full set of objects until we grab out-of-line packfiles, but we process the in-protocol packdata before we grab them, so we cannot validate yet" problem. My guess is that "only_packfile" means "after reading this packfile, the repository should be fully complete and we can afford to check for connectivity", which would be always true for protocol below v2 that lack the packfile-uri extension (hence the call to get_pack() in do_fetch_pack() passes hardcoded 1 in this parameter). The v2 codepath in do_fetch_pack_v2() calls get_pack() with true only when there is no packfile-uri, so we loosen the validation when we know we will further grab one or more packfiles out of line. > +test_expect_success 'packfile-uri with transfer.fsckobjects' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + configure_exclusion "$P" my-blob >h && > + > + sane_unset GIT_TEST_SIDEBAND_ALL && > + git -c protocol.version=2 -c transfer.fsckobjects=1 \ > + -c fetch.uriprotocols=http,https \ > + clone "$HTTPD_URL/smart/http_parent" http_child && > + > + # Ensure that there are exactly 4 files (2 .pack and 2 .idx). > + ls http_child/.git/objects/pack/* >filelist && Subtle but correct. > + test_line_count = 4 filelist > +' > + > +test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' ' > + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && > + rm -rf "$P" http_child log && > + > + git init "$P" && > + git -C "$P" config "uploadpack.allowsidebandall" "true" && > + > + cat >bogus-commit <<EOF && > +tree $EMPTY_TREE > +author Bugs Bunny 1234567890 +0000 > +committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000 > + > +This commit object intentionally broken > +EOF Use <<-EOF for readablity, please. > + BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) && > + git -C "$P" branch bogus-branch "$BOGUS" && > + > + echo my-blob >"$P/my-blob" && > + git -C "$P" add my-blob && > + git -C "$P" commit -m x && > + > + configure_exclusion "$P" my-blob >h && > + > + sane_unset GIT_TEST_SIDEBAND_ALL && > + test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \ > + -c fetch.uriprotocols=http,https \ > + clone "$HTTPD_URL/smart/http_parent" http_child 2>error && > + test_i18ngrep "invalid author/committer line - missing email" error > +' > + > # DO NOT add non-httpd-specific tests here, because the last part of this > # test script is only executed when httpd is available and enabled.
diff --git a/fetch-pack.c b/fetch-pack.c index 7f20eca4f8..66631d0034 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -892,7 +892,7 @@ static int get_pack(struct fetch_pack_args *args, : transfer_fsck_objects >= 0 ? transfer_fsck_objects : 0) { - if (args->from_promisor) + if (args->from_promisor || !only_packfile) /* * We cannot use --strict in index-pack because it * checks both broken objects and links, but we only diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index 5a60fbe3ed..8c6c67b10d 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -883,6 +883,59 @@ test_expect_success 'fetching with valid packfile URI but invalid hash fails' ' test_i18ngrep "pack downloaded from.*does not match expected hash" err ' +test_expect_success 'packfile-uri with transfer.fsckobjects' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + sane_unset GIT_TEST_SIDEBAND_ALL && + git -c protocol.version=2 -c transfer.fsckobjects=1 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child && + + # Ensure that there are exactly 4 files (2 .pack and 2 .idx). + ls http_child/.git/objects/pack/* >filelist && + test_line_count = 4 filelist +' + +test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' ' + P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && + rm -rf "$P" http_child log && + + git init "$P" && + git -C "$P" config "uploadpack.allowsidebandall" "true" && + + cat >bogus-commit <<EOF && +tree $EMPTY_TREE +author Bugs Bunny 1234567890 +0000 +committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000 + +This commit object intentionally broken +EOF + BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) && + git -C "$P" branch bogus-branch "$BOGUS" && + + echo my-blob >"$P/my-blob" && + git -C "$P" add my-blob && + git -C "$P" commit -m x && + + configure_exclusion "$P" my-blob >h && + + sane_unset GIT_TEST_SIDEBAND_ALL && + test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \ + -c fetch.uriprotocols=http,https \ + clone "$HTTPD_URL/smart/http_parent" http_child 2>error && + test_i18ngrep "invalid author/committer line - missing email" error +' + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled.
When fetching with packfile URIs and transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected. (A subsequent connectivity check will be done when all the packs have been downloaded regardless of whether transfer.fsckobjects is set.) This is similar to 98a2ea46c2 ("fetch-pack: do not check links for partial fetch", 2018-03-15), but for packfile URIs instead of partial clones. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- The subject is longer than 50 characters but I couldn't find a way to shorten it, especially since I think it's important to mention packfile URIs and transfer.fsckobjects. Any suggestions appreciated. --- fetch-pack.c | 2 +- t/t5702-protocol-v2.sh | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 1 deletion(-)