diff mbox series

[v2,2/2] diff: batch fetching of missing blobs

Message ID 44de02e584f449481e6fb00cf35d74adf0192e9d.1553895166.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series Batch fetching of missing blobs in diff and show | expand

Commit Message

Jonathan Tan March 29, 2019, 9:39 p.m. UTC
When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.

This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                        |  32 +++++++++++
 t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
 2 files changed, 135 insertions(+)
 create mode 100755 t/t4067-diff-partial-clone.sh

Comments

SZEDER Gábor April 4, 2019, 2:47 a.m. UTC | #1
On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote:
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755
> index 0000000000..349851be7d
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +	
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines

These patches and 'cc/multi-promisor' don't seem to work well
together, and all tests checking 'test_line_count = 1 done_lines' in
this test script fail in current 'pu', because there are two
"git> done" lines.

> +'
> +
> +test_expect_success 'diff batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	echo c >server/c &&
> +	echo d >server/d &&
> +	git -C server add c d &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	echo another-a >server/a &&
> +	git -C server add a &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	echo a | git hash-object --stdin >hash-old-a &&
> +	echo another-a | git hash-object --stdin >hash-new-a &&
> +	echo b | git hash-object --stdin >hash-b &&
> +
> +	# Ensure that only a and another-a are fetched.
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +	grep "want $(cat hash-old-a)" trace &&
> +	grep "want $(cat hash-new-a)" trace &&
> +	! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	printf "b\nb\nb\nb\nb\n" >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	rm server/b &&
> +	printf "b\nb\nb\nb\nbX\n" >server/c &&
> +	git -C server add c &&
> +	git -C server commit -a -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> +	grep "similarity index" out &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
> +test_done
> -- 
> 2.21.0.197.gd478713db0
>
Duy Nguyen April 5, 2019, 9:39 a.m. UTC | #2
On Sat, Mar 30, 2019 at 4:40 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When running a command like "git show" or "git diff" in a partial clone,
> batch all missing blobs to be fetched as one request.
>
> This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
> blobs", 2017-12-08), but for another command.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  diff.c                        |  32 +++++++++++
>  t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100755 t/t4067-diff-partial-clone.sh
>
> diff --git a/diff.c b/diff.c
> index ec5c095199..1eccefb4ef 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,6 +25,7 @@
>  #include "packfile.h"
>  #include "parse-options.h"
>  #include "help.h"
> +#include "fetch-object.h"
>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void)
>         QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> +static void add_if_missing(struct oid_array *to_fetch,
> +                          const struct diff_filespec *filespec)
> +{
> +       if (filespec && filespec->oid_valid &&
> +           oid_object_info_extended(the_repository, &filespec->oid, NULL,

I'm quite sure we can pass 'struct repository *' around in diff code
now. I think it's the "repo" field in "struct diff_options". Please
use it and avoid more references to the_repository.

> +                                    OBJECT_INFO_FOR_PREFETCH))
> +               oid_array_append(to_fetch, &filespec->oid);
> +}
> +
>  void diffcore_std(struct diff_options *options)
>  {
> +       if (repository_format_partial_clone) {
> +               /*
> +                * Prefetch the diff pairs that are about to be flushed.
> +                */
> +               int i;
> +               struct diff_queue_struct *q = &diff_queued_diff;
> +               struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +               for (i = 0; i < q->nr; i++) {
> +                       struct diff_filepair *p = q->queue[i];
> +                       add_if_missing(&to_fetch, p->one);
> +                       add_if_missing(&to_fetch, p->two);
> +               }
> +               if (to_fetch.nr)
> +                       /*
> +                        * NEEDSWORK: Consider deduplicating the OIDs sent.
> +                        */
> +                       fetch_objects(repository_format_partial_clone,
> +                                     to_fetch.oid, to_fetch.nr);
> +               oid_array_clear(&to_fetch);
> +       }
> +
>         /* NOTE please keep the following in sync with diff_tree_combined() */
>         if (options->skip_stat_unmatch)
>                 diffcore_skip_stat_unmatch(options);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755
> index 0000000000..349851be7d
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo c >server/c &&
> +       echo d >server/d &&
> +       git -C server add c d &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo another-a >server/a &&
> +       git -C server add a &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       echo a | git hash-object --stdin >hash-old-a &&
> +       echo another-a | git hash-object --stdin >hash-new-a &&
> +       echo b | git hash-object --stdin >hash-b &&
> +
> +       # Ensure that only a and another-a are fetched.
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "want $(cat hash-old-a)" trace &&
> +       grep "want $(cat hash-new-a)" trace &&
> +       ! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       printf "b\nb\nb\nb\nb\n" >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       rm server/b &&
> +       printf "b\nb\nb\nb\nbX\n" >server/c &&
> +       git -C server add c &&
> +       git -C server commit -a -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> +       grep "similarity index" out &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_done
> --
> 2.21.0.197.gd478713db0
>
Johannes Schindelin April 5, 2019, 1:38 p.m. UTC | #3
Hi,

On Thu, 4 Apr 2019, SZEDER Gábor wrote:

> On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote:
> > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> > new file mode 100755
> > index 0000000000..349851be7d
> > --- /dev/null
> > +++ b/t/t4067-diff-partial-clone.sh
> > @@ -0,0 +1,103 @@
> > +#!/bin/sh
> > +
> > +test_description='behavior of diff when reading objects in a partial clone'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'git show batches blobs' '
> > +	test_when_finished "rm -rf server client trace" &&
> > +
> > +	test_create_repo server &&
> > +	echo a >server/a &&
> > +	echo b >server/b &&
> > +	git -C server add a b &&
> > +	git -C server commit -m x &&
> > +
> > +	test_config -C server uploadpack.allowfilter 1 &&
> > +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> > +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> > +
> > +	# Ensure that there is exactly 1 negotiation by checking that there is
> > +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> > +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> > +	grep "git> done" trace >done_lines &&
> > +	test_line_count = 1 done_lines
>
> These patches and 'cc/multi-promisor' don't seem to work well
> together, and all tests checking 'test_line_count = 1 done_lines' in
> this test script fail in current 'pu', because there are two
> "git> done" lines.

I investigated a little further, and it would seem that it is neither this
patch nor the cc/multi-promisor patches that introduce the problem, but
the merge between the two... The latter tries to get away from using the
global variable `repository_format_partial_clone` while this patch
introduces another user.

So that merge between these two branches needs to become an "evil merge"
(or should I say "blessed merge", or even a "merge with a blessing"?). I
have this patch, tentatively, for the `shears/pu` branch [*1*], which
seems to fix the test here:

-- snip --
Subject: [PATCH] fixup??? Merge branch 'cc/multi-promisor' into pu

The `cc/multi-promisor` patch series and the
`jt/batch-fetch-blobs-in-diff` patch series have a semantic conflict:
the former replaces checks for `repository_format_partial_clone` with
checks for `has_promisor_remote()`, while the latter introduces such a
check.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 diff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index fa12b5d04a..86278ce676 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #include "parse-options.h"
 #include "help.h"
 #include "fetch-object.h"
+#include "promisor-remote.h"

 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -6493,7 +6494,7 @@ static void add_if_missing(struct oid_array *to_fetch,

 void diffcore_std(struct diff_options *options)
 {
-	if (repository_format_partial_clone) {
+	if (has_promisor_remote()) {
 		/*
 		 * Prefetch the diff pairs that are about to be flushed.
 		 */
-- snap --

Junio, would you terribly mind adopting this?

Ciao,
Dscho

Footnote *1*: I started again to maintain `shears/*` branches in
https://github.com/git-for-windows/git which essentially are Git for
Windows' patch thicket rebased to all four integration branches of
upstream (or "core") Git. I try my best at keeping the CI green on those,
meaning that I liberally add commits on top that are neither in Git for
Windows' `master` nor in core Git's branches.
Johannes Schindelin April 5, 2019, 2:17 p.m. UTC | #4
Hi Jonathan,

On Fri, 29 Mar 2019, Jonathan Tan wrote:

> When running a command like "git show" or "git diff" in a partial clone,
> batch all missing blobs to be fetched as one request.
>
> This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
> blobs", 2017-12-08), but for another command.

Still makes sense ;-)

> diff --git a/diff.c b/diff.c
> index ec5c095199..1eccefb4ef 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,6 +25,7 @@
>  #include "packfile.h"
>  #include "parse-options.h"
>  #include "help.h"
> +#include "fetch-object.h"
>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void)
>  	QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> +static void add_if_missing(struct oid_array *to_fetch,
> +			   const struct diff_filespec *filespec)
> +{
> +	if (filespec && filespec->oid_valid &&
> +	    oid_object_info_extended(the_repository, &filespec->oid, NULL,
> +				     OBJECT_INFO_FOR_PREFETCH))
> +		oid_array_append(to_fetch, &filespec->oid);
> +}

Thank you for introducing this, in looks more elegant to my eyes than the
previous iteration.

> +
>  void diffcore_std(struct diff_options *options)
>  {
> +	if (repository_format_partial_clone) {
> +		/*
> +		 * Prefetch the diff pairs that are about to be flushed.
> +		 */
> +		int i;
> +		struct diff_queue_struct *q = &diff_queued_diff;
> +		struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +		for (i = 0; i < q->nr; i++) {
> +			struct diff_filepair *p = q->queue[i];
> +			add_if_missing(&to_fetch, p->one);
> +			add_if_missing(&to_fetch, p->two);
> +		}
> +		if (to_fetch.nr)
> +			/*
> +			 * NEEDSWORK: Consider deduplicating the OIDs sent.
> +			 */
> +			fetch_objects(repository_format_partial_clone,
> +				      to_fetch.oid, to_fetch.nr);
> +		oid_array_clear(&to_fetch);
> +	}
> +
>  	/* NOTE please keep the following in sync with diff_tree_combined() */
>  	if (options->skip_stat_unmatch)
>  		diffcore_skip_stat_unmatch(options);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755

Also: thank you very much for introducing this test script, to make sure
that things work as expected. Without it, we would not have detected any
regression wrt multi-promisor.

This iteration looks good to me, thank you so much!
Dscho

> index 0000000000..349851be7d
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	echo c >server/c &&
> +	echo d >server/d &&
> +	git -C server add c d &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	echo another-a >server/a &&
> +	git -C server add a &&
> +	git -C server commit -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	echo a | git hash-object --stdin >hash-old-a &&
> +	echo another-a | git hash-object --stdin >hash-new-a &&
> +	echo b | git hash-object --stdin >hash-b &&
> +
> +	# Ensure that only a and another-a are fetched.
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +	grep "want $(cat hash-old-a)" trace &&
> +	grep "want $(cat hash-new-a)" trace &&
> +	! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	printf "b\nb\nb\nb\nb\n" >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +	rm server/b &&
> +	printf "b\nb\nb\nb\nbX\n" >server/c &&
> +	git -C server add c &&
> +	git -C server commit -a -m x &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +	# Ensure that there is exactly 1 negotiation by checking that there is
> +	# only 1 "done" line sent. ("done" marks the end of negotiation.)
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> +	grep "similarity index" out &&
> +	grep "git> done" trace >done_lines &&
> +	test_line_count = 1 done_lines
> +'
> +
> +test_done
> --
> 2.21.0.197.gd478713db0
>
>
Christian Couder April 7, 2019, 6 a.m. UTC | #5
Hi,

On Fri, Apr 5, 2019 at 3:38 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> On Thu, 4 Apr 2019, SZEDER Gábor wrote:
>
> > On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote:
> > > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> > > new file mode 100755
> > > index 0000000000..349851be7d
> > > --- /dev/null
> > > +++ b/t/t4067-diff-partial-clone.sh
> > > @@ -0,0 +1,103 @@
> > > +#!/bin/sh
> > > +
> > > +test_description='behavior of diff when reading objects in a partial clone'
> > > +
> > > +. ./test-lib.sh
> > > +
> > > +test_expect_success 'git show batches blobs' '
> > > +   test_when_finished "rm -rf server client trace" &&
> > > +
> > > +   test_create_repo server &&
> > > +   echo a >server/a &&
> > > +   echo b >server/b &&
> > > +   git -C server add a b &&
> > > +   git -C server commit -m x &&
> > > +
> > > +   test_config -C server uploadpack.allowfilter 1 &&
> > > +   test_config -C server uploadpack.allowanysha1inwant 1 &&
> > > +   git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> > > +
> > > +   # Ensure that there is exactly 1 negotiation by checking that there is
> > > +   # only 1 "done" line sent. ("done" marks the end of negotiation.)
> > > +   GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> > > +   grep "git> done" trace >done_lines &&
> > > +   test_line_count = 1 done_lines
> >
> > These patches and 'cc/multi-promisor' don't seem to work well
> > together, and all tests checking 'test_line_count = 1 done_lines' in
> > this test script fail in current 'pu', because there are two
> > "git> done" lines.
>
> I investigated a little further, and it would seem that it is neither this
> patch nor the cc/multi-promisor patches that introduce the problem, but
> the merge between the two... The latter tries to get away from using the
> global variable `repository_format_partial_clone` while this patch
> introduces another user.

Thanks for investigating! Yeah, that's part of the problem.

The fix I would suggest is:

diff --git a/diff.c b/diff.c
index f685ab10b5..a2b1241f83 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #include "parse-options.h"
 #include "help.h"
 #include "fetch-object.h"
+#include "promisor-remote.h"

 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch,

 void diffcore_std(struct diff_options *options)
 {
-    if (repository_format_partial_clone) {
+    if (has_promisor_remote()) {
         /*
          * Prefetch the diff pairs that are about to be flushed.
          */
@@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options)
             /*
              * NEEDSWORK: Consider deduplicating the OIDs sent.
              */
-            fetch_objects(repository_format_partial_clone,
-                      to_fetch.oid, to_fetch.nr);
+            promisor_remote_get_direct(to_fetch.oid, to_fetch.nr);
         oid_array_clear(&to_fetch);
     }

I will send a new version with the above soon based on top of
jt/batch-fetch-blobs-in-diff in pu.
Junio C Hamano April 8, 2019, 2:36 a.m. UTC | #6
Christian Couder <christian.couder@gmail.com> writes:

> Thanks for investigating! Yeah, that's part of the problem.
>
> The fix I would suggest is:
>
> diff --git a/diff.c b/diff.c
> index f685ab10b5..a2b1241f83 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #include "parse-options.h"
>  #include "help.h"
>  #include "fetch-object.h"
> +#include "promisor-remote.h"

Thanks.

>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch,
>
>  void diffcore_std(struct diff_options *options)
>  {
> -    if (repository_format_partial_clone) {
> +    if (has_promisor_remote()) {

Hmph, I see quite a few references to the variable disappears
between next and pu.  Is it that in the new world order, nobody
outside the low-level object-access layer should look at the
variable directly, but instead ask the has_promisor_remote()
function?  If so, can we at least document that?  Making it static
(or at least renaming it) would have helped the compiler to notice
this semantic merge conflict better.

> @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options)
>              /*
>               * NEEDSWORK: Consider deduplicating the OIDs sent.
>               */
> -            fetch_objects(repository_format_partial_clone,
> -                      to_fetch.oid, to_fetch.nr);
> +            promisor_remote_get_direct(to_fetch.oid, to_fetch.nr);

Likewise between fetch_objects() and promisor_remote_get_direct().
Shouldn't the underlying fetch_objects be hidden from general
callers?

>          oid_array_clear(&to_fetch);
>      }
>
> I will send a new version with the above soon based on top of
> jt/batch-fetch-blobs-in-diff in pu.
Junio C Hamano April 8, 2019, 5:51 a.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> Thanks for investigating! Yeah, that's part of the problem.
>>
>> The fix I would suggest is:
>>
>> diff --git a/diff.c b/diff.c
>> index f685ab10b5..a2b1241f83 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -26,6 +26,7 @@
>>  #include "parse-options.h"
>>  #include "help.h"
>>  #include "fetch-object.h"
>> +#include "promisor-remote.h"
>
> Thanks.

Together with "if we are forbidding the direct access to the
repository_format_partial_clone variable and the fetch_objects()
funciton, make it clear that is what is being done in the
multi-promisor topic", I think a patch that adds this header should
also remove inclusion of "fetch-object.h".
Junio C Hamano April 8, 2019, 6:03 a.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:

>>>  #include "fetch-object.h"
>>> +#include "promisor-remote.h"
>>
>> Thanks.
>
> Together with "if we are forbidding the direct access to the
> repository_format_partial_clone variable and the fetch_objects()
> funciton, make it clear that is what is being done in the
> multi-promisor topic", I think a patch that adds this header should
> also remove inclusion of "fetch-object.h".

In fact, your topic itself has the same issue.  I'll queue the
following at the tip of the topic tentatively before merging it to
'pu' with the fix we have been discussing around this thread.

Thanks.

-- >8 --
In multi-promisor world, fetch_objects() should not be used
directly; promisor_remote_get_direct() call replaces the helper.

 sha1-file.c    | 1 -
 unpack-trees.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index 715a2b882a..87ea8606f4 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -30,7 +30,6 @@
 #include "mergesort.h"
 #include "quote.h"
 #include "packfile.h"
-#include "fetch-object.h"
 #include "object-store.h"
 #include "promisor-remote.h"
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 47353d85c3..ca0ab68c32 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -16,7 +16,6 @@
 #include "submodule-config.h"
 #include "fsmonitor.h"
 #include "object-store.h"
-#include "fetch-object.h"
 #include "promisor-remote.h"
 
 /*
Christian Couder April 8, 2019, 6:40 a.m. UTC | #9
On Mon, Apr 8, 2019 at 4:36 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> >  #ifdef NO_FAST_WORKING_DIRECTORY
> >  #define FAST_WORKING_DIRECTORY 0
> > @@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch,
> >
> >  void diffcore_std(struct diff_options *options)
> >  {
> > -    if (repository_format_partial_clone) {
> > +    if (has_promisor_remote()) {
>
> Hmph, I see quite a few references to the variable disappears
> between next and pu.  Is it that in the new world order, nobody
> outside the low-level object-access layer should look at the
> variable directly, but instead ask the has_promisor_remote()
> function?

Yeah, repository_format_partial_clone contains the remote configured
in extensions.partialclone, while in the new world order there can be
other promisor remotes than this one, and most of the code is only
interested in asking if we use any promisor remote.

> If so, can we at least document that?  Making it static
> (or at least renaming it) would have helped the compiler to notice
> this semantic merge conflict better.

Ok, I will see if I can make it static (or maybe rename it).

Patch 3f82acbca2 (Use promisor_remote_get_direct() and
has_promisor_remote(), 2019-04-01) in cc/multi-promisor starts with:

    Use promisor_remote_get_direct() and has_promisor_remote()

    Instead of using the repository_format_partial_clone global
    and fetch_objects() directly, let's use has_promisor_remote()
    and promisor_remote_get_direct().

    This way all the configured promisor remotes will be taken
    into account, not only the one specified by
    extensions.partialClone.

I will at least add something telling that in most cases
"repository_format_partial_clone" and fetch_objects() shouldn't be
used directly anymore.

> > @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options)
> >              /*
> >               * NEEDSWORK: Consider deduplicating the OIDs sent.
> >               */
> > -            fetch_objects(repository_format_partial_clone,
> > -                      to_fetch.oid, to_fetch.nr);
> > +            promisor_remote_get_direct(to_fetch.oid, to_fetch.nr);
>
> Likewise between fetch_objects() and promisor_remote_get_direct().
> Shouldn't the underlying fetch_objects be hidden from general
> callers?

I will see if I can do that, though it has to be used in promisor-remote.c.
Christian Couder April 8, 2019, 6:45 a.m. UTC | #10
On Mon, Apr 8, 2019 at 8:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> >>>  #include "fetch-object.h"
> >>> +#include "promisor-remote.h"
> >>
> >> Thanks.
> >
> > Together with "if we are forbidding the direct access to the
> > repository_format_partial_clone variable and the fetch_objects()
> > funciton, make it clear that is what is being done in the
> > multi-promisor topic", I think a patch that adds this header should
> > also remove inclusion of "fetch-object.h".
>
> In fact, your topic itself has the same issue.

Yeah sorry, this is the kind of things I easily forget.

> I'll queue the
> following at the tip of the topic tentatively before merging it to
> 'pu' with the fix we have been discussing around this thread.

Thanks!
Junio C Hamano April 8, 2019, 7:59 a.m. UTC | #11
Christian Couder <christian.couder@gmail.com> writes:

> Patch 3f82acbca2 (Use promisor_remote_get_direct() and
> has_promisor_remote(), 2019-04-01) in cc/multi-promisor starts with:
>
>     Use promisor_remote_get_direct() and has_promisor_remote()
>
>     Instead of using the repository_format_partial_clone global
>     and fetch_objects() directly, let's use has_promisor_remote()
>     and promisor_remote_get_direct().
>
>     This way all the configured promisor remotes will be taken
>     into account, not only the one specified by
>     extensions.partialClone.
>
> I will at least add something telling that in most cases
> "repository_format_partial_clone" and fetch_objects() shouldn't be
> used directly anymore.

Yes, that would be necessary not in the log, but in-code comments
next to now "by-exception-only" API functions and variables.

>> > @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options)
>> >              /*
>> >               * NEEDSWORK: Consider deduplicating the OIDs sent.
>> >               */
>> > -            fetch_objects(repository_format_partial_clone,
>> > -                      to_fetch.oid, to_fetch.nr);
>> > +            promisor_remote_get_direct(to_fetch.oid, to_fetch.nr);
>>
>> Likewise between fetch_objects() and promisor_remote_get_direct().
>> Shouldn't the underlying fetch_objects be hidden from general
>> callers?
>
> I will see if I can do that, though it has to be used in promisor-remote.c.

Does fetch-objects.[ch] even need to exist after multi-promisor
world as an external interface?  

Wouldn't it become an implementation detail of promisor-remote.[ch]
API?
Christian Couder April 8, 2019, 9:56 a.m. UTC | #12
On Mon, Apr 8, 2019 at 9:59 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > I will at least add something telling that in most cases
> > "repository_format_partial_clone" and fetch_objects() shouldn't be
> > used directly anymore.
>
> Yes, that would be necessary not in the log, but in-code comments
> next to now "by-exception-only" API functions and variables.

Ok, I will add in-code comments then, or ...

> >> Likewise between fetch_objects() and promisor_remote_get_direct().
> >> Shouldn't the underlying fetch_objects be hidden from general
> >> callers?
> >
> > I will see if I can do that, though it has to be used in promisor-remote.c.
>
> Does fetch-objects.[ch] even need to exist after multi-promisor
> world as an external interface?
>
> Wouldn't it become an implementation detail of promisor-remote.[ch]
> API?

... yeah, I will try this.
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index ec5c095199..1eccefb4ef 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@ 
 #include "packfile.h"
 #include "parse-options.h"
 #include "help.h"
+#include "fetch-object.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -6366,8 +6367,39 @@  void diffcore_fix_diff_index(void)
 	QSORT(q->queue, q->nr, diffnamecmp);
 }
 
+static void add_if_missing(struct oid_array *to_fetch,
+			   const struct diff_filespec *filespec)
+{
+	if (filespec && filespec->oid_valid &&
+	    oid_object_info_extended(the_repository, &filespec->oid, NULL,
+				     OBJECT_INFO_FOR_PREFETCH))
+		oid_array_append(to_fetch, &filespec->oid);
+}
+
 void diffcore_std(struct diff_options *options)
 {
+	if (repository_format_partial_clone) {
+		/*
+		 * Prefetch the diff pairs that are about to be flushed.
+		 */
+		int i;
+		struct diff_queue_struct *q = &diff_queued_diff;
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+
+		for (i = 0; i < q->nr; i++) {
+			struct diff_filepair *p = q->queue[i];
+			add_if_missing(&to_fetch, p->one);
+			add_if_missing(&to_fetch, p->two);
+		}
+		if (to_fetch.nr)
+			/*
+			 * NEEDSWORK: Consider deduplicating the OIDs sent.
+			 */
+			fetch_objects(repository_format_partial_clone,
+				      to_fetch.oid, to_fetch.nr);
+		oid_array_clear(&to_fetch);
+	}
+
 	/* NOTE please keep the following in sync with diff_tree_combined() */
 	if (options->skip_stat_unmatch)
 		diffcore_skip_stat_unmatch(options);
diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
new file mode 100755
index 0000000000..349851be7d
--- /dev/null
+++ b/t/t4067-diff-partial-clone.sh
@@ -0,0 +1,103 @@ 
+#!/bin/sh
+
+test_description='behavior of diff when reading objects in a partial clone'
+
+. ./test-lib.sh
+
+test_expect_success 'git show batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+	
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo c >server/c &&
+	echo d >server/d &&
+	git -C server add c d &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff skips same-OID blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo another-a >server/a &&
+	git -C server add a &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	echo a | git hash-object --stdin >hash-old-a &&
+	echo another-a | git hash-object --stdin >hash-new-a &&
+	echo b | git hash-object --stdin >hash-b &&
+
+	# Ensure that only a and another-a are fetched.
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "want $(cat hash-old-a)" trace &&
+	grep "want $(cat hash-new-a)" trace &&
+	! grep "want $(cat hash-b)" trace
+'
+
+test_expect_success 'diff with rename detection batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	printf "b\nb\nb\nb\nb\n" >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	rm server/b &&
+	printf "b\nb\nb\nb\nbX\n" >server/c &&
+	git -C server add c &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
+	grep "similarity index" out &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_done