diff mbox series

[v2] fetch: remove fetch_if_missing=0

Message ID 20191105185619.207173-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] fetch: remove fetch_if_missing=0 | expand

Commit Message

Jonathan Tan Nov. 5, 2019, 6:56 p.m. UTC
In fetch_pack() (and all functions it calls), pass
OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
a tree or blob that we do not want to be lazy-fetched even if it is
absent. Thus, the only lazy-fetches occurring for trees and blobs are
when resolving deltas.

Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
this, and also add a test ensuring that such objects are not
lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
places too, but I have limited myself to builtin/fetch.c in this commit
because I have not written tests for the other commands yet.)

Note that commits and tags may still be lazy-fetched. I limited myself
to objects that could be trees or blobs here because Git does not
support creating such commit- and tag-excluding clones yet, and even if
such a clone were manually created, Git does not have good support for
fetching a single commit (when fetching a commit, it and all its
ancestors would be sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1:
- Added NEEDSWORK in test, suggested by Jonathan Nieder
- Used printf in test, suggested by Eric Sunshine

Range-diff against v1:
1:  f52c8d4c11 ! 1:  8e93cb8b02 fetch: remove fetch_if_missing=0
    @@ Commit message
         ancestors would be sent).
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## builtin/fetch.c ##
     @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
    @@ t/t5616-partial-clone.sh: test_expect_success 'partial clone with unresolvable s
     +setup_triangle () {
     +	rm -rf big-blob.txt server client promisor-remote &&
     +
    -+	touch big-blob.txt &&
    -+	for i in $(seq 1 100)
    -+	do
    -+		echo line $i >>big-blob.txt
    -+	done &&
    ++	printf "line %d\n" $(test_seq 1 100) >big-blob.txt
     +
     +	# Create a server with 2 commits: a commit with a big blob and a child
     +	# commit with an incremental change. Also, create a partial clone
    @@ t/t5616-partial-clone.sh: test_expect_success 'partial clone with unresolvable s
     +	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
     +}
     +
    ++# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only
    ++# test that "fetch" avoid fetching trees and blobs, but not commits or
    ++# tags. Revisit this if Git is ever taught to support partial clones
    ++# with commits and/or tags filtered out.
    ++
     +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
     +	setup_triangle &&
     +

 builtin/fetch.c          |  5 ++-
 fetch-pack.c             |  3 +-
 t/t5616-partial-clone.sh | 70 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 4 deletions(-)

Comments

Eric Sunshine Nov. 5, 2019, 8:06 p.m. UTC | #1
On Tue, Nov 5, 2019 at 1:56 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> @@ -296,6 +296,76 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
> +setup_triangle () {
> +       rm -rf big-blob.txt server client promisor-remote &&
> +
> +       printf "line %d\n" $(test_seq 1 100) >big-blob.txt

Broken &&-chain.

> +       # Create a server with 2 commits: a commit with a big blob and a child
> +       # commit with an incremental change. Also, create a partial clone
> +       # client that only contains the first commit.
> +       git init server &&
> +       git -C server config --local uploadpack.allowfilter 1 &&
> +       cp big-blob.txt server &&
> +       git -C server add big-blob.txt &&
> +       git -C server commit -m "initial" &&
> +       git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
> +       echo another line >>server/big-blob.txt &&
> +       git -C server commit -am "append line to big blob" &&
> +
> +       # Create a promisor remote that only contains the blob from the first
> +       # commit, and set it as the promisor remote of client. Thus, whenever
> +       # the client lazy fetches, the lazy fetch will succeed only if it is
> +       # for this blob.
> +       git init promisor-remote &&
> +       test_commit -C promisor-remote one && # so that ref advertisement is not empty
> +       git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
> +       git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
> +       git -C client remote set-url origin "file://$(pwd)/promisor-remote"
> +}
Junio C Hamano Nov. 6, 2019, 1:45 a.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)
>
> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Changes from v1:
> - Added NEEDSWORK in test, suggested by Jonathan Nieder
> - Used printf in test, suggested by Eric Sunshine

Nice.  The less we use big "plug the machinery to lazily fetch for
now" switch, the better, I think.

> +setup_triangle () {
> +	rm -rf big-blob.txt server client promisor-remote &&
> +
> +	printf "line %d\n" $(test_seq 1 100) >big-blob.txt

I'll tweak this line while queueing with a trailing " &&" (no need
to resend only for this).
Junio C Hamano Nov. 8, 2019, 6:33 a.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0c345b5dfe..5ff7367dd7 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
>  	 * we need all direct targets to exist.
>  	 */
>  	for (r = rm; r; r = r->next) {
> -		if (!has_object_file(&r->old_oid))
> +		if (!has_object_file_with_flags(&r->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))
>  			return -1;
>  	}
>  
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -
>  	/* Record the command line for the reflog */
>  	strbuf_addstr(&default_rla, "fetch");
>  	for (i = 1; i < argc; i++)

As this is on a stale codebase, let's base the fix on top of
c32ca691c2 or later.  The part of the patch for builtin/fetch.c
adjusted for that decision would look like attached.

An interesting thing is that c32ca691c2^2 that moved the assignment
to this big red switch variable around causes 3-way merge to fail in
a miserable way.  The "moving around" would involve removing from
the same location as the rebased patch below removes, plus adding
the assignment elsewhere, so "both sides removed the assignment from
this hunk, so take that" would correctly leave the original assignment
we see in the second hunk above removed, but fails to notice that
the assignment made elsewhere (the result of "moving around" patch)
is no longer needed, because "c32ca691c2^2 added one and this change
does not do anything there, so take the addition" cleanly resolves
to an incorrect merge result.

 builtin/fetch.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 863c858fde..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1822,8 +1823,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	fetch_if_missing = 0;
-
 	if (remote) {
 		if (filter_options.choice || has_promisor_remote())
 			fetch_one_setup_partial(remote);
Junio C Hamano Nov. 8, 2019, 7:40 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> An interesting thing is that c32ca691c2^2 that moved the assignment
> to this big red switch variable around causes 3-way merge to fail in
> a miserable way.  The "moving around" would involve removing from
> the same location as the rebased patch below removes, plus adding
> the assignment elsewhere, so "both sides removed the assignment from
> this hunk, so take that" would correctly leave the original assignment
> we see in the second hunk above removed, but fails to notice that
> the assignment made elsewhere (the result of "moving around" patch)
> is no longer needed, because "c32ca691c2^2 added one and this change
> does not do anything there, so take the addition" cleanly resolves
> to an incorrect merge result.

Total tangent.  One frustrating thing is that we do this a bit
better at the tree level merge.  After read-tree does three-way
merge at the tree level, what is passed to the merge-recursive
machinery has "side A added" and "side A removed" left unresolved,
so that the post-processing phase could try to match them up and say
"aha, side A moved that path elsewhere while side B just removed,
which is a conflict".  

I wish if xdiff/xmerge.c could learn a similar trick, but the
necessary change feels quite involved, error prone and too magical.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@  static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1755,8 +1756,6 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0130b44112..37178e2d34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -673,7 +673,8 @@  static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 		struct object *o;
 
 		if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK))
+						OBJECT_INFO_QUICK |
+							OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		o = parse_object(the_repository, &ref->old_oid);
 		if (!o)
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 79f7b65f8c..e785fba7ed 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -296,6 +296,76 @@  test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
+setup_triangle () {
+	rm -rf big-blob.txt server client promisor-remote &&
+
+	printf "line %d\n" $(test_seq 1 100) >big-blob.txt
+
+	# Create a server with 2 commits: a commit with a big blob and a child
+	# commit with an incremental change. Also, create a partial clone
+	# client that only contains the first commit.
+	git init server &&
+	git -C server config --local uploadpack.allowfilter 1 &&
+	cp big-blob.txt server &&
+	git -C server add big-blob.txt &&
+	git -C server commit -m "initial" &&
+	git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
+	echo another line >>server/big-blob.txt &&
+	git -C server commit -am "append line to big blob" &&
+
+	# Create a promisor remote that only contains the blob from the first
+	# commit, and set it as the promisor remote of client. Thus, whenever
+	# the client lazy fetches, the lazy fetch will succeed only if it is
+	# for this blob.
+	git init promisor-remote &&
+	test_commit -C promisor-remote one && # so that ref advertisement is not empty
+	git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
+	git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
+	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
+}
+
+# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only
+# test that "fetch" avoid fetching trees and blobs, but not commits or
+# tags. Revisit this if Git is ever taught to support partial clones
+# with commits and/or tags filtered out.
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas' '
+	setup_triangle &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' '
+	setup_triangle &&
+
+	git -C server config --local protocol.version 2 &&
+	git -C client config --local protocol.version 2 &&
+	git -C promisor-remote config --local protocol.version 2 &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify that protocol version 2 was used.
+	grep "fetch< version 2" trace &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd