diff mbox series

fixup! diff: batch fetching of missing blobs

Message ID 20190405170934.20441-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series fixup! diff: batch fetching of missing blobs | expand

Commit Message

Jonathan Tan April 5, 2019, 5:09 p.m. UTC
This is a fixup on the tip of jt/batch-fetch-blobs-in-diff (571debe1d9).

I don't know if Junio has already merged this branch to next (he marked
this as "Will merge to 'next'" in the "What's Cooking" email, but when I
fetched, it hasn't been merged yet). If he has, we can use this commit
message:

diff: propagate options->repo to add_if_missing

Avoid a usage of the_repository by propagating the configured repository
to add_if_missing(). Also, prefetch only if the repository being diffed
is the_repository (because we do not support lazy fetching for any other
repository anyway).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Thanks, Duy, for noticing this.
---
 diff.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Johannes Schindelin April 5, 2019, 8:16 p.m. UTC | #1
Hi Jonathan,

On Fri, 5 Apr 2019, Jonathan Tan wrote:

> This is a fixup on the tip of jt/batch-fetch-blobs-in-diff (571debe1d9).
>
> I don't know if Junio has already merged this branch to next (he marked
> this as "Will merge to 'next'" in the "What's Cooking" email, but when I
> fetched, it hasn't been merged yet). If he has, we can use this commit
> message:
>
> diff: propagate options->repo to add_if_missing
>
> Avoid a usage of the_repository by propagating the configured repository
> to add_if_missing(). Also, prefetch only if the repository being diffed
> is the_repository (because we do not support lazy fetching for any other
> repository anyway).

True, and the introduction of `has_promisor_remotes()` should probably
help that, by introducing a parameter of type `struct repository *r`.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>

The patch and the commit message look good!

Thanks,
Dscho

> diff --git a/diff.c b/diff.c
> index 1eccefb4ef..811afbdfb1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6367,18 +6367,19 @@ void diffcore_fix_diff_index(void)
>  	QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> -static void add_if_missing(struct oid_array *to_fetch,
> +static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
>  			   const struct diff_filespec *filespec)
>  {
>  	if (filespec && filespec->oid_valid &&
> -	    oid_object_info_extended(the_repository, &filespec->oid, NULL,
> +	    oid_object_info_extended(r, &filespec->oid, NULL,
>  				     OBJECT_INFO_FOR_PREFETCH))
>  		oid_array_append(to_fetch, &filespec->oid);
>  }
>
>  void diffcore_std(struct diff_options *options)
>  {
> -	if (repository_format_partial_clone) {
> +	if (options->repo == the_repository &&
> +	    repository_format_partial_clone) {
>  		/*
>  		 * Prefetch the diff pairs that are about to be flushed.
>  		 */
> @@ -6388,8 +6389,8 @@ void diffcore_std(struct diff_options *options)
>
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
> -			add_if_missing(&to_fetch, p->one);
> -			add_if_missing(&to_fetch, p->two);
> +			add_if_missing(&to_fetch, options->repo, p->one);
> +			add_if_missing(&to_fetch, options->repo, p->two);
>  		}
>  		if (to_fetch.nr)
>  			/*
> --
> 2.21.0.392.gf8f6787159e-goog
>
>
Duy Nguyen April 6, 2019, 4:17 a.m. UTC | #2
On Sat, Apr 6, 2019 at 12:09 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is a fixup on the tip of jt/batch-fetch-blobs-in-diff (571debe1d9).
>
> I don't know if Junio has already merged this branch to next (he marked
> this as "Will merge to 'next'" in the "What's Cooking" email, but when I
> fetched, it hasn't been merged yet). If he has, we can use this commit
> message:
>
> diff: propagate options->repo to add_if_missing
>
> Avoid a usage of the_repository by propagating the configured repository
> to add_if_missing(). Also, prefetch only if the repository being diffed
> is the_repository (because we do not support lazy fetching for any other
> repository anyway).
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Thanks, Duy, for noticing this.
> ---
>  diff.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 1eccefb4ef..811afbdfb1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6367,18 +6367,19 @@ void diffcore_fix_diff_index(void)
>         QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> -static void add_if_missing(struct oid_array *to_fetch,
> +static void add_if_missing(struct oid_array *to_fetch, struct repository *r,

One last nit. "struct repository *r" is often the first argument. See
https://public-inbox.org/git/xmqqsh2p6l43.fsf@gitster-ct.c.googlers.com/

>                            const struct diff_filespec *filespec)
>  {
>         if (filespec && filespec->oid_valid &&
> -           oid_object_info_extended(the_repository, &filespec->oid, NULL,
> +           oid_object_info_extended(r, &filespec->oid, NULL,
>                                      OBJECT_INFO_FOR_PREFETCH))
>                 oid_array_append(to_fetch, &filespec->oid);
>  }
>
>  void diffcore_std(struct diff_options *options)
>  {
> -       if (repository_format_partial_clone) {
> +       if (options->repo == the_repository &&
> +           repository_format_partial_clone) {
>                 /*
>                  * Prefetch the diff pairs that are about to be flushed.
>                  */
> @@ -6388,8 +6389,8 @@ void diffcore_std(struct diff_options *options)
>
>                 for (i = 0; i < q->nr; i++) {
>                         struct diff_filepair *p = q->queue[i];
> -                       add_if_missing(&to_fetch, p->one);
> -                       add_if_missing(&to_fetch, p->two);
> +                       add_if_missing(&to_fetch, options->repo, p->one);
> +                       add_if_missing(&to_fetch, options->repo, p->two);
>                 }
>                 if (to_fetch.nr)
>                         /*
> --
> 2.21.0.392.gf8f6787159e-goog
>
Junio C Hamano April 8, 2019, 3:46 a.m. UTC | #3
Duy Nguyen <pclouds@gmail.com> writes:

>> -static void add_if_missing(struct oid_array *to_fetch,
>> +static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
>
> One last nit. "struct repository *r" is often the first argument. See

Yes, and that's easy enough to tweak locally ;-)

Thanks.
Junio C Hamano April 8, 2019, 4:06 a.m. UTC | #4
Duy Nguyen <pclouds@gmail.com> writes:

>> Avoid a usage of the_repository by propagating the configured repository
>> to add_if_missing(). Also, prefetch only if the repository being diffed
>> is the_repository (because we do not support lazy fetching for any other
>> repository anyway).

If we are willing to stay limited to the default repository anyway,
allowing add_if_missing() to take an arbitrary repository does not
really matter, but before the caller of add_if_missing() befcomes
ready to work on an arbitrary repository, this change has to happen.

To update the caller, it seems to me that fetch_objects() must learn
to take an arbitrary repository, but is that the only thing needed?
After that, the function that the caller resides in and callchain
upwards can learn to take a repository instance if we want to be
able to diff inside an arbitrary repository.

But.  Such a change still would not allow us to compare a tree in
one repository against a tree in another repository.  It is likely
that a caller with such a need would simply make sure that objects
in both repositories are available by using the in-core alternate
object store mechanism, making it a more-or-less moot point to be
able to pass a repository instance through the callchain X-<.  We
probably should make it, and spell it out somewhere in a long term
vision shared among the developers, an explicit goal to get rid of
the internal (ab)use of the alternate object store mechanism.

With squashing the fix-up commit in, the 2/2 patch has become like
so.

Thanks, both.

-- >8 --
From: Jonathan Tan <jonathantanmy@google.com>
Date: Fri, 5 Apr 2019 10:09:34 -0700
Subject: [PATCH] diff: batch fetching of missing blobs

When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.

This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c                        |  33 +++++++++++
 t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)
 create mode 100755 t/t4067-diff-partial-clone.sh

diff --git a/diff.c b/diff.c
index ec5c095199..811afbdfb1 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@
 #include "packfile.h"
 #include "parse-options.h"
 #include "help.h"
+#include "fetch-object.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -6366,8 +6367,40 @@ void diffcore_fix_diff_index(void)
 	QSORT(q->queue, q->nr, diffnamecmp);
 }
 
+static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
+			   const struct diff_filespec *filespec)
+{
+	if (filespec && filespec->oid_valid &&
+	    oid_object_info_extended(r, &filespec->oid, NULL,
+				     OBJECT_INFO_FOR_PREFETCH))
+		oid_array_append(to_fetch, &filespec->oid);
+}
+
 void diffcore_std(struct diff_options *options)
 {
+	if (options->repo == the_repository &&
+	    repository_format_partial_clone) {
+		/*
+		 * Prefetch the diff pairs that are about to be flushed.
+		 */
+		int i;
+		struct diff_queue_struct *q = &diff_queued_diff;
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+
+		for (i = 0; i < q->nr; i++) {
+			struct diff_filepair *p = q->queue[i];
+			add_if_missing(&to_fetch, options->repo, p->one);
+			add_if_missing(&to_fetch, options->repo, p->two);
+		}
+		if (to_fetch.nr)
+			/*
+			 * NEEDSWORK: Consider deduplicating the OIDs sent.
+			 */
+			fetch_objects(repository_format_partial_clone,
+				      to_fetch.oid, to_fetch.nr);
+		oid_array_clear(&to_fetch);
+	}
+
 	/* NOTE please keep the following in sync with diff_tree_combined() */
 	if (options->skip_stat_unmatch)
 		diffcore_skip_stat_unmatch(options);
diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
new file mode 100755
index 0000000000..90c8fb2901
--- /dev/null
+++ b/t/t4067-diff-partial-clone.sh
@@ -0,0 +1,103 @@
+#!/bin/sh
+
+test_description='behavior of diff when reading objects in a partial clone'
+
+. ./test-lib.sh
+
+test_expect_success 'git show batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo c >server/c &&
+	echo d >server/d &&
+	git -C server add c d &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff skips same-OID blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo another-a >server/a &&
+	git -C server add a &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	echo a | git hash-object --stdin >hash-old-a &&
+	echo another-a | git hash-object --stdin >hash-new-a &&
+	echo b | git hash-object --stdin >hash-b &&
+
+	# Ensure that only a and another-a are fetched.
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "want $(cat hash-old-a)" trace &&
+	grep "want $(cat hash-new-a)" trace &&
+	! grep "want $(cat hash-b)" trace
+'
+
+test_expect_success 'diff with rename detection batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	printf "b\nb\nb\nb\nb\n" >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	rm server/b &&
+	printf "b\nb\nb\nb\nbX\n" >server/c &&
+	git -C server add c &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
+	grep "similarity index" out &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_done
Duy Nguyen April 8, 2019, 9:58 a.m. UTC | #5
On Mon, Apr 8, 2019 at 11:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> >> Avoid a usage of the_repository by propagating the configured repository
> >> to add_if_missing(). Also, prefetch only if the repository being diffed
> >> is the_repository (because we do not support lazy fetching for any other
> >> repository anyway).
>
> If we are willing to stay limited to the default repository anyway,
> allowing add_if_missing() to take an arbitrary repository does not
> really matter, but before the caller of add_if_missing() befcomes
> ready to work on an arbitrary repository, this change has to happen.
>
> To update the caller, it seems to me that fetch_objects() must learn
> to take an arbitrary repository, but is that the only thing needed?
> After that, the function that the caller resides in and callchain
> upwards can learn to take a repository instance if we want to be
> able to diff inside an arbitrary repository.
>
> But.  Such a change still would not allow us to compare a tree in
> one repository against a tree in another repository.

I feel lost (and the answer "go read partial clone code!" is perfectly
acceptable) but why would we need to diff trees of two different
repositories?

> It is likely
> that a caller with such a need would simply make sure that objects
> in both repositories are available by using the in-core alternate
> object store mechanism, making it a more-or-less moot point to be
> able to pass a repository instance through the callchain X-<.  We
> probably should make it, and spell it out somewhere in a long term
> vision shared among the developers, an explicit goal to get rid of
> the internal (ab)use of the alternate object store mechanism.

I think submodule code so far is doing this way. Though I don't see
any reason we need it for submodule code. Objects are not supposed to
be shared between the super- and the sub-repo.

>
> With squashing the fix-up commit in, the 2/2 patch has become like
> so.
>
> Thanks, both.
>
> -- >8 --
> From: Jonathan Tan <jonathantanmy@google.com>
> Date: Fri, 5 Apr 2019 10:09:34 -0700
> Subject: [PATCH] diff: batch fetching of missing blobs
>
> When running a command like "git show" or "git diff" in a partial clone,
> batch all missing blobs to be fetched as one request.
>
> This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
> blobs", 2017-12-08), but for another command.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  diff.c                        |  33 +++++++++++
>  t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
>  2 files changed, 136 insertions(+)
>  create mode 100755 t/t4067-diff-partial-clone.sh
>
> diff --git a/diff.c b/diff.c
> index ec5c095199..811afbdfb1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,6 +25,7 @@
>  #include "packfile.h"
>  #include "parse-options.h"
>  #include "help.h"
> +#include "fetch-object.h"
>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -6366,8 +6367,40 @@ void diffcore_fix_diff_index(void)
>         QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> +static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
> +                          const struct diff_filespec *filespec)
> +{
> +       if (filespec && filespec->oid_valid &&
> +           oid_object_info_extended(r, &filespec->oid, NULL,
> +                                    OBJECT_INFO_FOR_PREFETCH))
> +               oid_array_append(to_fetch, &filespec->oid);
> +}
> +
>  void diffcore_std(struct diff_options *options)
>  {
> +       if (options->repo == the_repository &&
> +           repository_format_partial_clone) {
> +               /*
> +                * Prefetch the diff pairs that are about to be flushed.
> +                */
> +               int i;
> +               struct diff_queue_struct *q = &diff_queued_diff;
> +               struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +               for (i = 0; i < q->nr; i++) {
> +                       struct diff_filepair *p = q->queue[i];
> +                       add_if_missing(&to_fetch, options->repo, p->one);
> +                       add_if_missing(&to_fetch, options->repo, p->two);
> +               }
> +               if (to_fetch.nr)
> +                       /*
> +                        * NEEDSWORK: Consider deduplicating the OIDs sent.
> +                        */
> +                       fetch_objects(repository_format_partial_clone,
> +                                     to_fetch.oid, to_fetch.nr);
> +               oid_array_clear(&to_fetch);
> +       }
> +
>         /* NOTE please keep the following in sync with diff_tree_combined() */
>         if (options->skip_stat_unmatch)
>                 diffcore_skip_stat_unmatch(options);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755
> index 0000000000..90c8fb2901
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo c >server/c &&
> +       echo d >server/d &&
> +       git -C server add c d &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo another-a >server/a &&
> +       git -C server add a &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       echo a | git hash-object --stdin >hash-old-a &&
> +       echo another-a | git hash-object --stdin >hash-new-a &&
> +       echo b | git hash-object --stdin >hash-b &&
> +
> +       # Ensure that only a and another-a are fetched.
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "want $(cat hash-old-a)" trace &&
> +       grep "want $(cat hash-new-a)" trace &&
> +       ! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       printf "b\nb\nb\nb\nb\n" >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       rm server/b &&
> +       printf "b\nb\nb\nb\nbX\n" >server/c &&
> +       git -C server add c &&
> +       git -C server commit -a -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> +       grep "similarity index" out &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_done
> --
> 2.21.0-196-g041f5ea1cf
>
Junio C Hamano April 9, 2019, 6:36 a.m. UTC | #6
Duy Nguyen <pclouds@gmail.com> writes:

> I feel lost (and the answer "go read partial clone code!" is perfectly
> acceptable) but why would we need to diff trees of two different
> repositories?

It's just a speculation into far future to open possibilities of
what we cannot do right now.  There is no *NEED*, just like there is
no need for a single process of git to be working in multiple
repositories at the same time (you can just fork another one into
the other repository).


>> It is likely
>> that a caller with such a need would simply make sure that objects
>> in both repositories are available by using the in-core alternate
>> object store mechanism, making it a more-or-less moot point to be
>> able to pass a repository instance through the callchain X-<.  We
>> probably should make it, and spell it out somewhere in a long term
>> vision shared among the developers, an explicit goal to get rid of
>> the internal (ab)use of the alternate object store mechanism.
>
> I think submodule code so far is doing this way. Though I don't see
> any reason we need it for submodule code. Objects are not supposed to
> be shared between the super- and the sub-repo.

Yup, that is why I made that comment---if we were to pass repository
pointer throughout all the codepaths, one of the goal (eh, rather, a
useful yardstick to measure how close we are to the goal) should be
that we use less and less of that pattern to access objects from
random repositories using the in-core alternate mechanism.
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index 1eccefb4ef..811afbdfb1 100644
--- a/diff.c
+++ b/diff.c
@@ -6367,18 +6367,19 @@  void diffcore_fix_diff_index(void)
 	QSORT(q->queue, q->nr, diffnamecmp);
 }
 
-static void add_if_missing(struct oid_array *to_fetch,
+static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
 			   const struct diff_filespec *filespec)
 {
 	if (filespec && filespec->oid_valid &&
-	    oid_object_info_extended(the_repository, &filespec->oid, NULL,
+	    oid_object_info_extended(r, &filespec->oid, NULL,
 				     OBJECT_INFO_FOR_PREFETCH))
 		oid_array_append(to_fetch, &filespec->oid);
 }
 
 void diffcore_std(struct diff_options *options)
 {
-	if (repository_format_partial_clone) {
+	if (options->repo == the_repository &&
+	    repository_format_partial_clone) {
 		/*
 		 * Prefetch the diff pairs that are about to be flushed.
 		 */
@@ -6388,8 +6389,8 @@  void diffcore_std(struct diff_options *options)
 
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
-			add_if_missing(&to_fetch, p->one);
-			add_if_missing(&to_fetch, p->two);
+			add_if_missing(&to_fetch, options->repo, p->one);
+			add_if_missing(&to_fetch, options->repo, p->two);
 		}
 		if (to_fetch.nr)
 			/*