Message ID | 20191105185619.207173-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] fetch: remove fetch_if_missing=0 | expand |
On Tue, Nov 5, 2019 at 1:56 PM Jonathan Tan <jonathantanmy@google.com> wrote: > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh > @@ -296,6 +296,76 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly > +setup_triangle () { > + rm -rf big-blob.txt server client promisor-remote && > + > + printf "line %d\n" $(test_seq 1 100) >big-blob.txt Broken &&-chain. > + # Create a server with 2 commits: a commit with a big blob and a child > + # commit with an incremental change. Also, create a partial clone > + # client that only contains the first commit. > + git init server && > + git -C server config --local uploadpack.allowfilter 1 && > + cp big-blob.txt server && > + git -C server add big-blob.txt && > + git -C server commit -m "initial" && > + git clone --bare --filter=tree:0 "file://$(pwd)/server" client && > + echo another line >>server/big-blob.txt && > + git -C server commit -am "append line to big blob" && > + > + # Create a promisor remote that only contains the blob from the first > + # commit, and set it as the promisor remote of client. Thus, whenever > + # the client lazy fetches, the lazy fetch will succeed only if it is > + # for this blob. > + git init promisor-remote && > + test_commit -C promisor-remote one && # so that ref advertisement is not empty > + git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 && > + git -C promisor-remote hash-object -w --stdin <big-blob.txt && > + git -C client remote set-url origin "file://$(pwd)/promisor-remote" > +}
Jonathan Tan <jonathantanmy@google.com> writes: > In fetch_pack() (and all functions it calls), pass > OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be > a tree or blob that we do not want to be lazy-fetched even if it is > absent. Thus, the only lazy-fetches occurring for trees and blobs are > when resolving deltas. > > Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove > this, and also add a test ensuring that such objects are not > lazy-fetched. (We might be able to remove fetch_if_missing=0 from other > places too, but I have limited myself to builtin/fetch.c in this commit > because I have not written tests for the other commands yet.) > > Note that commits and tags may still be lazy-fetched. I limited myself > to objects that could be trees or blobs here because Git does not > support creating such commit- and tag-excluding clones yet, and even if > such a clone were manually created, Git does not have good support for > fetching a single commit (when fetching a commit, it and all its > ancestors would be sent). > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Changes from v1: > - Added NEEDSWORK in test, suggested by Jonathan Nieder > - Used printf in test, suggested by Eric Sunshine Nice. The less we use big "plug the machinery to lazily fetch for now" switch, the better, I think. > +setup_triangle () { > + rm -rf big-blob.txt server client promisor-remote && > + > + printf "line %d\n" $(test_seq 1 100) >big-blob.txt I'll tweak this line while queueing with a trailing " &&" (no need to resend only for this).
Jonathan Tan <jonathantanmy@google.com> writes: > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 0c345b5dfe..5ff7367dd7 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map) > * we need all direct targets to exist. > */ > for (r = rm; r; r = r->next) { > - if (!has_object_file(&r->old_oid)) > + if (!has_object_file_with_flags(&r->old_oid, > + OBJECT_INFO_SKIP_FETCH_OBJECT)) > return -1; > } > > @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) > > packet_trace_identity("fetch"); > > - fetch_if_missing = 0; > - > /* Record the command line for the reflog */ > strbuf_addstr(&default_rla, "fetch"); > for (i = 1; i < argc; i++) As this is on a stale codebase, let's base the fix on top of c32ca691c2 or later. The part of the patch for builtin/fetch.c adjusted for that decision would look like attached. An interesting thing is that c32ca691c2^2 that moved the assignment to this big red switch variable around causes 3-way merge to fail in a miserable way. The "moving around" would involve removing from the same location as the rebased patch below removes, plus adding the assignment elsewhere, so "both sides removed the assignment from this hunk, so take that" would correctly leave the original assignment we see in the second hunk above removed, but fails to notice that the assignment made elsewhere (the result of "moving around" patch) is no longer needed, because "c32ca691c2^2 added one and this change does not do anything there, so take the addition" cleanly resolves to an incorrect merge result. builtin/fetch.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 863c858fde..5ff7367dd7 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map) * we need all direct targets to exist. */ for (r = rm; r; r = r->next) { - if (!has_object_file(&r->old_oid)) + if (!has_object_file_with_flags(&r->old_oid, + OBJECT_INFO_SKIP_FETCH_OBJECT)) return -1; } @@ -1822,8 +1823,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) } } - fetch_if_missing = 0; - if (remote) { if (filter_options.choice || has_promisor_remote()) fetch_one_setup_partial(remote);
Junio C Hamano <gitster@pobox.com> writes: > An interesting thing is that c32ca691c2^2 that moved the assignment > to this big red switch variable around causes 3-way merge to fail in > a miserable way. The "moving around" would involve removing from > the same location as the rebased patch below removes, plus adding > the assignment elsewhere, so "both sides removed the assignment from > this hunk, so take that" would correctly leave the original assignment > we see in the second hunk above removed, but fails to notice that > the assignment made elsewhere (the result of "moving around" patch) > is no longer needed, because "c32ca691c2^2 added one and this change > does not do anything there, so take the addition" cleanly resolves > to an incorrect merge result. Total tangent. One frustrating thing is that we do this a bit better at the tree level merge. After read-tree does three-way merge at the tree level, what is passed to the merge-recursive machinery has "side A added" and "side A removed" left unresolved, so that the post-processing phase could try to match them up and say "aha, side A moved that path elsewhere while side B just removed, which is a conflict". I wish if xdiff/xmerge.c could learn a similar trick, but the necessary change feels quite involved, error prone and too magical.
diff --git a/builtin/fetch.c b/builtin/fetch.c index 0c345b5dfe..5ff7367dd7 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map) * we need all direct targets to exist. */ for (r = rm; r; r = r->next) { - if (!has_object_file(&r->old_oid)) + if (!has_object_file_with_flags(&r->old_oid, + OBJECT_INFO_SKIP_FETCH_OBJECT)) return -1; } @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) packet_trace_identity("fetch"); - fetch_if_missing = 0; - /* Record the command line for the reflog */ strbuf_addstr(&default_rla, "fetch"); for (i = 1; i < argc; i++) diff --git a/fetch-pack.c b/fetch-pack.c index 0130b44112..37178e2d34 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator, struct object *o; if (!has_object_file_with_flags(&ref->old_oid, - OBJECT_INFO_QUICK)) + OBJECT_INFO_QUICK | + OBJECT_INFO_SKIP_FETCH_OBJECT)) continue; o = parse_object(the_repository, &ref->old_oid); if (!o) diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 79f7b65f8c..e785fba7ed 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -296,6 +296,76 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly test_i18ngrep "unable to parse sparse filter data in" err ' +setup_triangle () { + rm -rf big-blob.txt server client promisor-remote && + + printf "line %d\n" $(test_seq 1 100) >big-blob.txt + + # Create a server with 2 commits: a commit with a big blob and a child + # commit with an incremental change. Also, create a partial clone + # client that only contains the first commit. + git init server && + git -C server config --local uploadpack.allowfilter 1 && + cp big-blob.txt server && + git -C server add big-blob.txt && + git -C server commit -m "initial" && + git clone --bare --filter=tree:0 "file://$(pwd)/server" client && + echo another line >>server/big-blob.txt && + git -C server commit -am "append line to big blob" && + + # Create a promisor remote that only contains the blob from the first + # commit, and set it as the promisor remote of client. Thus, whenever + # the client lazy fetches, the lazy fetch will succeed only if it is + # for this blob. + git init promisor-remote && + test_commit -C promisor-remote one && # so that ref advertisement is not empty + git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 && + git -C promisor-remote hash-object -w --stdin <big-blob.txt && + git -C client remote set-url origin "file://$(pwd)/promisor-remote" +} + +# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only +# test that "fetch" avoid fetching trees and blobs, but not commits or +# tags. Revisit this if Git is ever taught to support partial clones +# with commits and/or tags filtered out. + +test_expect_success 'fetch lazy-fetches only to resolve deltas' ' + setup_triangle && + + # Exercise to make sure it works. Git will not fetch anything from the + # promisor remote other than for the big blob (because it needs to + # resolve the delta). + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ + fetch "file://$(pwd)/server" master && + + # Verify the assumption that the client needed to fetch the delta base + # to resolve the delta. + git hash-object big-blob.txt >hash && + grep "want $(cat hash)" trace +' + +test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' ' + setup_triangle && + + git -C server config --local protocol.version 2 && + git -C client config --local protocol.version 2 && + git -C promisor-remote config --local protocol.version 2 && + + # Exercise to make sure it works. Git will not fetch anything from the + # promisor remote other than for the big blob (because it needs to + # resolve the delta). + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ + fetch "file://$(pwd)/server" master && + + # Verify that protocol version 2 was used. + grep "fetch< version 2" trace && + + # Verify the assumption that the client needed to fetch the delta base + # to resolve the delta. + git hash-object big-blob.txt >hash && + grep "want $(cat hash)" trace +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd