Message ID | 44de02e584f449481e6fb00cf35d74adf0192e9d.1553895166.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Batch fetching of missing blobs in diff and show | expand |
On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote: > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh > new file mode 100755 > index 0000000000..349851be7d > --- /dev/null > +++ b/t/t4067-diff-partial-clone.sh > @@ -0,0 +1,103 @@ > +#!/bin/sh > + > +test_description='behavior of diff when reading objects in a partial clone' > + > +. ./test-lib.sh > + > +test_expect_success 'git show batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines These patches and 'cc/multi-promisor' don't seem to work well together, and all tests checking 'test_line_count = 1 done_lines' in this test script fail in current 'pu', because there are two "git> done" lines. > +' > + > +test_expect_success 'diff batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo c >server/c && > + echo d >server/d && > + git -C server add c d && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_expect_success 'diff skips same-OID blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo another-a >server/a && > + git -C server add a && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + echo a | git hash-object --stdin >hash-old-a && > + echo another-a | git hash-object --stdin >hash-new-a && > + echo b | git hash-object --stdin >hash-b && > + > + # Ensure that only a and another-a are fetched. > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "want $(cat hash-old-a)" trace && > + grep "want $(cat hash-new-a)" trace && > + ! grep "want $(cat hash-b)" trace > +' > + > +test_expect_success 'diff with rename detection batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + printf "b\nb\nb\nb\nb\n" >server/b && > + git -C server add a b && > + git -C server commit -m x && > + rm server/b && > + printf "b\nb\nb\nb\nbX\n" >server/c && > + git -C server add c && > + git -C server commit -a -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out && > + grep "similarity index" out && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_done > -- > 2.21.0.197.gd478713db0 >
On Sat, Mar 30, 2019 at 4:40 AM Jonathan Tan <jonathantanmy@google.com> wrote: > > When running a command like "git show" or "git diff" in a partial clone, > batch all missing blobs to be fetched as one request. > > This is similar to c0c578b33c ("unpack-trees: batch fetching of missing > blobs", 2017-12-08), but for another command. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > diff.c | 32 +++++++++++ > t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++ > 2 files changed, 135 insertions(+) > create mode 100755 t/t4067-diff-partial-clone.sh > > diff --git a/diff.c b/diff.c > index ec5c095199..1eccefb4ef 100644 > --- a/diff.c > +++ b/diff.c > @@ -25,6 +25,7 @@ > #include "packfile.h" > #include "parse-options.h" > #include "help.h" > +#include "fetch-object.h" > > #ifdef NO_FAST_WORKING_DIRECTORY > #define FAST_WORKING_DIRECTORY 0 > @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void) > QSORT(q->queue, q->nr, diffnamecmp); > } > > +static void add_if_missing(struct oid_array *to_fetch, > + const struct diff_filespec *filespec) > +{ > + if (filespec && filespec->oid_valid && > + oid_object_info_extended(the_repository, &filespec->oid, NULL, I'm quite sure we can pass 'struct repository *' around in diff code now. I think it's the "repo" field in "struct diff_options". Please use it and avoid more references to the_repository. > + OBJECT_INFO_FOR_PREFETCH)) > + oid_array_append(to_fetch, &filespec->oid); > +} > + > void diffcore_std(struct diff_options *options) > { > + if (repository_format_partial_clone) { > + /* > + * Prefetch the diff pairs that are about to be flushed. > + */ > + int i; > + struct diff_queue_struct *q = &diff_queued_diff; > + struct oid_array to_fetch = OID_ARRAY_INIT; > + > + for (i = 0; i < q->nr; i++) { > + struct diff_filepair *p = q->queue[i]; > + add_if_missing(&to_fetch, p->one); > + add_if_missing(&to_fetch, p->two); > + } > + if (to_fetch.nr) > + /* > + * NEEDSWORK: Consider deduplicating the OIDs sent. > + */ > + fetch_objects(repository_format_partial_clone, > + to_fetch.oid, to_fetch.nr); > + oid_array_clear(&to_fetch); > + } > + > /* NOTE please keep the following in sync with diff_tree_combined() */ > if (options->skip_stat_unmatch) > diffcore_skip_stat_unmatch(options); > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh > new file mode 100755 > index 0000000000..349851be7d > --- /dev/null > +++ b/t/t4067-diff-partial-clone.sh > @@ -0,0 +1,103 @@ > +#!/bin/sh > + > +test_description='behavior of diff when reading objects in a partial clone' > + > +. ./test-lib.sh > + > +test_expect_success 'git show batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_expect_success 'diff batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo c >server/c && > + echo d >server/d && > + git -C server add c d && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_expect_success 'diff skips same-OID blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo another-a >server/a && > + git -C server add a && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + echo a | git hash-object --stdin >hash-old-a && > + echo another-a | git hash-object --stdin >hash-new-a && > + echo b | git hash-object --stdin >hash-b && > + > + # Ensure that only a and another-a are fetched. > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "want $(cat hash-old-a)" trace && > + grep "want $(cat hash-new-a)" trace && > + ! grep "want $(cat hash-b)" trace > +' > + > +test_expect_success 'diff with rename detection batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + printf "b\nb\nb\nb\nb\n" >server/b && > + git -C server add a b && > + git -C server commit -m x && > + rm server/b && > + printf "b\nb\nb\nb\nbX\n" >server/c && > + git -C server add c && > + git -C server commit -a -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out && > + grep "similarity index" out && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_done > -- > 2.21.0.197.gd478713db0 >
Hi, On Thu, 4 Apr 2019, SZEDER Gábor wrote: > On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote: > > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh > > new file mode 100755 > > index 0000000000..349851be7d > > --- /dev/null > > +++ b/t/t4067-diff-partial-clone.sh > > @@ -0,0 +1,103 @@ > > +#!/bin/sh > > + > > +test_description='behavior of diff when reading objects in a partial clone' > > + > > +. ./test-lib.sh > > + > > +test_expect_success 'git show batches blobs' ' > > + test_when_finished "rm -rf server client trace" && > > + > > + test_create_repo server && > > + echo a >server/a && > > + echo b >server/b && > > + git -C server add a b && > > + git -C server commit -m x && > > + > > + test_config -C server uploadpack.allowfilter 1 && > > + test_config -C server uploadpack.allowanysha1inwant 1 && > > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > > + > > + # Ensure that there is exactly 1 negotiation by checking that there is > > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && > > + grep "git> done" trace >done_lines && > > + test_line_count = 1 done_lines > > These patches and 'cc/multi-promisor' don't seem to work well > together, and all tests checking 'test_line_count = 1 done_lines' in > this test script fail in current 'pu', because there are two > "git> done" lines. I investigated a little further, and it would seem that it is neither this patch nor the cc/multi-promisor patches that introduce the problem, but the merge between the two... The latter tries to get away from using the global variable `repository_format_partial_clone` while this patch introduces another user. So that merge between these two branches needs to become an "evil merge" (or should I say "blessed merge", or even a "merge with a blessing"?). I have this patch, tentatively, for the `shears/pu` branch [*1*], which seems to fix the test here: -- snip -- Subject: [PATCH] fixup??? Merge branch 'cc/multi-promisor' into pu The `cc/multi-promisor` patch series and the `jt/batch-fetch-blobs-in-diff` patch series have a semantic conflict: the former replaces checks for `repository_format_partial_clone` with checks for `has_promisor_remote()`, while the latter introduces such a check. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- diff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/diff.c b/diff.c index fa12b5d04a..86278ce676 100644 --- a/diff.c +++ b/diff.c @@ -26,6 +26,7 @@ #include "parse-options.h" #include "help.h" #include "fetch-object.h" +#include "promisor-remote.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 @@ -6493,7 +6494,7 @@ static void add_if_missing(struct oid_array *to_fetch, void diffcore_std(struct diff_options *options) { - if (repository_format_partial_clone) { + if (has_promisor_remote()) { /* * Prefetch the diff pairs that are about to be flushed. */ -- snap -- Junio, would you terribly mind adopting this? Ciao, Dscho Footnote *1*: I started again to maintain `shears/*` branches in https://github.com/git-for-windows/git which essentially are Git for Windows' patch thicket rebased to all four integration branches of upstream (or "core") Git. I try my best at keeping the CI green on those, meaning that I liberally add commits on top that are neither in Git for Windows' `master` nor in core Git's branches.
Hi Jonathan, On Fri, 29 Mar 2019, Jonathan Tan wrote: > When running a command like "git show" or "git diff" in a partial clone, > batch all missing blobs to be fetched as one request. > > This is similar to c0c578b33c ("unpack-trees: batch fetching of missing > blobs", 2017-12-08), but for another command. Still makes sense ;-) > diff --git a/diff.c b/diff.c > index ec5c095199..1eccefb4ef 100644 > --- a/diff.c > +++ b/diff.c > @@ -25,6 +25,7 @@ > #include "packfile.h" > #include "parse-options.h" > #include "help.h" > +#include "fetch-object.h" > > #ifdef NO_FAST_WORKING_DIRECTORY > #define FAST_WORKING_DIRECTORY 0 > @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void) > QSORT(q->queue, q->nr, diffnamecmp); > } > > +static void add_if_missing(struct oid_array *to_fetch, > + const struct diff_filespec *filespec) > +{ > + if (filespec && filespec->oid_valid && > + oid_object_info_extended(the_repository, &filespec->oid, NULL, > + OBJECT_INFO_FOR_PREFETCH)) > + oid_array_append(to_fetch, &filespec->oid); > +} Thank you for introducing this, in looks more elegant to my eyes than the previous iteration. > + > void diffcore_std(struct diff_options *options) > { > + if (repository_format_partial_clone) { > + /* > + * Prefetch the diff pairs that are about to be flushed. > + */ > + int i; > + struct diff_queue_struct *q = &diff_queued_diff; > + struct oid_array to_fetch = OID_ARRAY_INIT; > + > + for (i = 0; i < q->nr; i++) { > + struct diff_filepair *p = q->queue[i]; > + add_if_missing(&to_fetch, p->one); > + add_if_missing(&to_fetch, p->two); > + } > + if (to_fetch.nr) > + /* > + * NEEDSWORK: Consider deduplicating the OIDs sent. > + */ > + fetch_objects(repository_format_partial_clone, > + to_fetch.oid, to_fetch.nr); > + oid_array_clear(&to_fetch); > + } > + > /* NOTE please keep the following in sync with diff_tree_combined() */ > if (options->skip_stat_unmatch) > diffcore_skip_stat_unmatch(options); > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh > new file mode 100755 Also: thank you very much for introducing this test script, to make sure that things work as expected. Without it, we would not have detected any regression wrt multi-promisor. This iteration looks good to me, thank you so much! Dscho > index 0000000000..349851be7d > --- /dev/null > +++ b/t/t4067-diff-partial-clone.sh > @@ -0,0 +1,103 @@ > +#!/bin/sh > + > +test_description='behavior of diff when reading objects in a partial clone' > + > +. ./test-lib.sh > + > +test_expect_success 'git show batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_expect_success 'diff batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo c >server/c && > + echo d >server/d && > + git -C server add c d && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_expect_success 'diff skips same-OID blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + echo b >server/b && > + git -C server add a b && > + git -C server commit -m x && > + echo another-a >server/a && > + git -C server add a && > + git -C server commit -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + echo a | git hash-object --stdin >hash-old-a && > + echo another-a | git hash-object --stdin >hash-new-a && > + echo b | git hash-object --stdin >hash-b && > + > + # Ensure that only a and another-a are fetched. > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && > + grep "want $(cat hash-old-a)" trace && > + grep "want $(cat hash-new-a)" trace && > + ! grep "want $(cat hash-b)" trace > +' > + > +test_expect_success 'diff with rename detection batches blobs' ' > + test_when_finished "rm -rf server client trace" && > + > + test_create_repo server && > + echo a >server/a && > + printf "b\nb\nb\nb\nb\n" >server/b && > + git -C server add a b && > + git -C server commit -m x && > + rm server/b && > + printf "b\nb\nb\nb\nbX\n" >server/c && > + git -C server add c && > + git -C server commit -a -m x && > + > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > + > + # Ensure that there is exactly 1 negotiation by checking that there is > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out && > + grep "similarity index" out && > + grep "git> done" trace >done_lines && > + test_line_count = 1 done_lines > +' > + > +test_done > -- > 2.21.0.197.gd478713db0 > >
Hi, On Fri, Apr 5, 2019 at 3:38 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Thu, 4 Apr 2019, SZEDER Gábor wrote: > > > On Fri, Mar 29, 2019 at 02:39:28PM -0700, Jonathan Tan wrote: > > > diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh > > > new file mode 100755 > > > index 0000000000..349851be7d > > > --- /dev/null > > > +++ b/t/t4067-diff-partial-clone.sh > > > @@ -0,0 +1,103 @@ > > > +#!/bin/sh > > > + > > > +test_description='behavior of diff when reading objects in a partial clone' > > > + > > > +. ./test-lib.sh > > > + > > > +test_expect_success 'git show batches blobs' ' > > > + test_when_finished "rm -rf server client trace" && > > > + > > > + test_create_repo server && > > > + echo a >server/a && > > > + echo b >server/b && > > > + git -C server add a b && > > > + git -C server commit -m x && > > > + > > > + test_config -C server uploadpack.allowfilter 1 && > > > + test_config -C server uploadpack.allowanysha1inwant 1 && > > > + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && > > > + > > > + # Ensure that there is exactly 1 negotiation by checking that there is > > > + # only 1 "done" line sent. ("done" marks the end of negotiation.) > > > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && > > > + grep "git> done" trace >done_lines && > > > + test_line_count = 1 done_lines > > > > These patches and 'cc/multi-promisor' don't seem to work well > > together, and all tests checking 'test_line_count = 1 done_lines' in > > this test script fail in current 'pu', because there are two > > "git> done" lines. > > I investigated a little further, and it would seem that it is neither this > patch nor the cc/multi-promisor patches that introduce the problem, but > the merge between the two... The latter tries to get away from using the > global variable `repository_format_partial_clone` while this patch > introduces another user. Thanks for investigating! Yeah, that's part of the problem. The fix I would suggest is: diff --git a/diff.c b/diff.c index f685ab10b5..a2b1241f83 100644 --- a/diff.c +++ b/diff.c @@ -26,6 +26,7 @@ #include "parse-options.h" #include "help.h" #include "fetch-object.h" +#include "promisor-remote.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 @@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch, void diffcore_std(struct diff_options *options) { - if (repository_format_partial_clone) { + if (has_promisor_remote()) { /* * Prefetch the diff pairs that are about to be flushed. */ @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options) /* * NEEDSWORK: Consider deduplicating the OIDs sent. */ - fetch_objects(repository_format_partial_clone, - to_fetch.oid, to_fetch.nr); + promisor_remote_get_direct(to_fetch.oid, to_fetch.nr); oid_array_clear(&to_fetch); } I will send a new version with the above soon based on top of jt/batch-fetch-blobs-in-diff in pu.
Christian Couder <christian.couder@gmail.com> writes: > Thanks for investigating! Yeah, that's part of the problem. > > The fix I would suggest is: > > diff --git a/diff.c b/diff.c > index f685ab10b5..a2b1241f83 100644 > --- a/diff.c > +++ b/diff.c > @@ -26,6 +26,7 @@ > #include "parse-options.h" > #include "help.h" > #include "fetch-object.h" > +#include "promisor-remote.h" Thanks. > > #ifdef NO_FAST_WORKING_DIRECTORY > #define FAST_WORKING_DIRECTORY 0 > @@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch, > > void diffcore_std(struct diff_options *options) > { > - if (repository_format_partial_clone) { > + if (has_promisor_remote()) { Hmph, I see quite a few references to the variable disappears between next and pu. Is it that in the new world order, nobody outside the low-level object-access layer should look at the variable directly, but instead ask the has_promisor_remote() function? If so, can we at least document that? Making it static (or at least renaming it) would have helped the compiler to notice this semantic merge conflict better. > @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options) > /* > * NEEDSWORK: Consider deduplicating the OIDs sent. > */ > - fetch_objects(repository_format_partial_clone, > - to_fetch.oid, to_fetch.nr); > + promisor_remote_get_direct(to_fetch.oid, to_fetch.nr); Likewise between fetch_objects() and promisor_remote_get_direct(). Shouldn't the underlying fetch_objects be hidden from general callers? > oid_array_clear(&to_fetch); > } > > I will send a new version with the above soon based on top of > jt/batch-fetch-blobs-in-diff in pu.
Junio C Hamano <gitster@pobox.com> writes: > Christian Couder <christian.couder@gmail.com> writes: > >> Thanks for investigating! Yeah, that's part of the problem. >> >> The fix I would suggest is: >> >> diff --git a/diff.c b/diff.c >> index f685ab10b5..a2b1241f83 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -26,6 +26,7 @@ >> #include "parse-options.h" >> #include "help.h" >> #include "fetch-object.h" >> +#include "promisor-remote.h" > > Thanks. Together with "if we are forbidding the direct access to the repository_format_partial_clone variable and the fetch_objects() funciton, make it clear that is what is being done in the multi-promisor topic", I think a patch that adds this header should also remove inclusion of "fetch-object.h".
Junio C Hamano <gitster@pobox.com> writes: >>> #include "fetch-object.h" >>> +#include "promisor-remote.h" >> >> Thanks. > > Together with "if we are forbidding the direct access to the > repository_format_partial_clone variable and the fetch_objects() > funciton, make it clear that is what is being done in the > multi-promisor topic", I think a patch that adds this header should > also remove inclusion of "fetch-object.h". In fact, your topic itself has the same issue. I'll queue the following at the tip of the topic tentatively before merging it to 'pu' with the fix we have been discussing around this thread. Thanks. -- >8 -- In multi-promisor world, fetch_objects() should not be used directly; promisor_remote_get_direct() call replaces the helper. sha1-file.c | 1 - unpack-trees.c | 1 - 2 files changed, 2 deletions(-) diff --git a/sha1-file.c b/sha1-file.c index 715a2b882a..87ea8606f4 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -30,7 +30,6 @@ #include "mergesort.h" #include "quote.h" #include "packfile.h" -#include "fetch-object.h" #include "object-store.h" #include "promisor-remote.h" diff --git a/unpack-trees.c b/unpack-trees.c index 47353d85c3..ca0ab68c32 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -16,7 +16,6 @@ #include "submodule-config.h" #include "fsmonitor.h" #include "object-store.h" -#include "fetch-object.h" #include "promisor-remote.h" /*
On Mon, Apr 8, 2019 at 4:36 AM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > #ifdef NO_FAST_WORKING_DIRECTORY > > #define FAST_WORKING_DIRECTORY 0 > > @@ -6489,7 +6490,7 @@ static void add_if_missing(struct oid_array *to_fetch, > > > > void diffcore_std(struct diff_options *options) > > { > > - if (repository_format_partial_clone) { > > + if (has_promisor_remote()) { > > Hmph, I see quite a few references to the variable disappears > between next and pu. Is it that in the new world order, nobody > outside the low-level object-access layer should look at the > variable directly, but instead ask the has_promisor_remote() > function? Yeah, repository_format_partial_clone contains the remote configured in extensions.partialclone, while in the new world order there can be other promisor remotes than this one, and most of the code is only interested in asking if we use any promisor remote. > If so, can we at least document that? Making it static > (or at least renaming it) would have helped the compiler to notice > this semantic merge conflict better. Ok, I will see if I can make it static (or maybe rename it). Patch 3f82acbca2 (Use promisor_remote_get_direct() and has_promisor_remote(), 2019-04-01) in cc/multi-promisor starts with: Use promisor_remote_get_direct() and has_promisor_remote() Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. I will at least add something telling that in most cases "repository_format_partial_clone" and fetch_objects() shouldn't be used directly anymore. > > @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options) > > /* > > * NEEDSWORK: Consider deduplicating the OIDs sent. > > */ > > - fetch_objects(repository_format_partial_clone, > > - to_fetch.oid, to_fetch.nr); > > + promisor_remote_get_direct(to_fetch.oid, to_fetch.nr); > > Likewise between fetch_objects() and promisor_remote_get_direct(). > Shouldn't the underlying fetch_objects be hidden from general > callers? I will see if I can do that, though it has to be used in promisor-remote.c.
On Mon, Apr 8, 2019 at 8:03 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > >>> #include "fetch-object.h" > >>> +#include "promisor-remote.h" > >> > >> Thanks. > > > > Together with "if we are forbidding the direct access to the > > repository_format_partial_clone variable and the fetch_objects() > > funciton, make it clear that is what is being done in the > > multi-promisor topic", I think a patch that adds this header should > > also remove inclusion of "fetch-object.h". > > In fact, your topic itself has the same issue. Yeah sorry, this is the kind of things I easily forget. > I'll queue the > following at the tip of the topic tentatively before merging it to > 'pu' with the fix we have been discussing around this thread. Thanks!
Christian Couder <christian.couder@gmail.com> writes: > Patch 3f82acbca2 (Use promisor_remote_get_direct() and > has_promisor_remote(), 2019-04-01) in cc/multi-promisor starts with: > > Use promisor_remote_get_direct() and has_promisor_remote() > > Instead of using the repository_format_partial_clone global > and fetch_objects() directly, let's use has_promisor_remote() > and promisor_remote_get_direct(). > > This way all the configured promisor remotes will be taken > into account, not only the one specified by > extensions.partialClone. > > I will at least add something telling that in most cases > "repository_format_partial_clone" and fetch_objects() shouldn't be > used directly anymore. Yes, that would be necessary not in the log, but in-code comments next to now "by-exception-only" API functions and variables. >> > @@ -6506,8 +6507,7 @@ void diffcore_std(struct diff_options *options) >> > /* >> > * NEEDSWORK: Consider deduplicating the OIDs sent. >> > */ >> > - fetch_objects(repository_format_partial_clone, >> > - to_fetch.oid, to_fetch.nr); >> > + promisor_remote_get_direct(to_fetch.oid, to_fetch.nr); >> >> Likewise between fetch_objects() and promisor_remote_get_direct(). >> Shouldn't the underlying fetch_objects be hidden from general >> callers? > > I will see if I can do that, though it has to be used in promisor-remote.c. Does fetch-objects.[ch] even need to exist after multi-promisor world as an external interface? Wouldn't it become an implementation detail of promisor-remote.[ch] API?
On Mon, Apr 8, 2019 at 9:59 AM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > I will at least add something telling that in most cases > > "repository_format_partial_clone" and fetch_objects() shouldn't be > > used directly anymore. > > Yes, that would be necessary not in the log, but in-code comments > next to now "by-exception-only" API functions and variables. Ok, I will add in-code comments then, or ... > >> Likewise between fetch_objects() and promisor_remote_get_direct(). > >> Shouldn't the underlying fetch_objects be hidden from general > >> callers? > > > > I will see if I can do that, though it has to be used in promisor-remote.c. > > Does fetch-objects.[ch] even need to exist after multi-promisor > world as an external interface? > > Wouldn't it become an implementation detail of promisor-remote.[ch] > API? ... yeah, I will try this.
diff --git a/diff.c b/diff.c index ec5c095199..1eccefb4ef 100644 --- a/diff.c +++ b/diff.c @@ -25,6 +25,7 @@ #include "packfile.h" #include "parse-options.h" #include "help.h" +#include "fetch-object.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void) QSORT(q->queue, q->nr, diffnamecmp); } +static void add_if_missing(struct oid_array *to_fetch, + const struct diff_filespec *filespec) +{ + if (filespec && filespec->oid_valid && + oid_object_info_extended(the_repository, &filespec->oid, NULL, + OBJECT_INFO_FOR_PREFETCH)) + oid_array_append(to_fetch, &filespec->oid); +} + void diffcore_std(struct diff_options *options) { + if (repository_format_partial_clone) { + /* + * Prefetch the diff pairs that are about to be flushed. + */ + int i; + struct diff_queue_struct *q = &diff_queued_diff; + struct oid_array to_fetch = OID_ARRAY_INIT; + + for (i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + add_if_missing(&to_fetch, p->one); + add_if_missing(&to_fetch, p->two); + } + if (to_fetch.nr) + /* + * NEEDSWORK: Consider deduplicating the OIDs sent. + */ + fetch_objects(repository_format_partial_clone, + to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); + } + /* NOTE please keep the following in sync with diff_tree_combined() */ if (options->skip_stat_unmatch) diffcore_skip_stat_unmatch(options); diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh new file mode 100755 index 0000000000..349851be7d --- /dev/null +++ b/t/t4067-diff-partial-clone.sh @@ -0,0 +1,103 @@ +#!/bin/sh + +test_description='behavior of diff when reading objects in a partial clone' + +. ./test-lib.sh + +test_expect_success 'git show batches blobs' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + echo b >server/b && + git -C server add a b && + git -C server commit -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure that there is exactly 1 negotiation by checking that there is + # only 1 "done" line sent. ("done" marks the end of negotiation.) + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD && + grep "git> done" trace >done_lines && + test_line_count = 1 done_lines +' + +test_expect_success 'diff batches blobs' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + echo b >server/b && + git -C server add a b && + git -C server commit -m x && + echo c >server/c && + echo d >server/d && + git -C server add c d && + git -C server commit -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure that there is exactly 1 negotiation by checking that there is + # only 1 "done" line sent. ("done" marks the end of negotiation.) + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && + grep "git> done" trace >done_lines && + test_line_count = 1 done_lines +' + +test_expect_success 'diff skips same-OID blobs' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + echo b >server/b && + git -C server add a b && + git -C server commit -m x && + echo another-a >server/a && + git -C server add a && + git -C server commit -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + echo a | git hash-object --stdin >hash-old-a && + echo another-a | git hash-object --stdin >hash-new-a && + echo b | git hash-object --stdin >hash-b && + + # Ensure that only a and another-a are fetched. + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD && + grep "want $(cat hash-old-a)" trace && + grep "want $(cat hash-new-a)" trace && + ! grep "want $(cat hash-b)" trace +' + +test_expect_success 'diff with rename detection batches blobs' ' + test_when_finished "rm -rf server client trace" && + + test_create_repo server && + echo a >server/a && + printf "b\nb\nb\nb\nb\n" >server/b && + git -C server add a b && + git -C server commit -m x && + rm server/b && + printf "b\nb\nb\nb\nbX\n" >server/c && + git -C server add c && + git -C server commit -a -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure that there is exactly 1 negotiation by checking that there is + # only 1 "done" line sent. ("done" marks the end of negotiation.) + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out && + grep "similarity index" out && + grep "git> done" trace >done_lines && + test_line_count = 1 done_lines +' + +test_done
When running a command like "git show" or "git diff" in a partial clone, batch all missing blobs to be fetched as one request. This is similar to c0c578b33c ("unpack-trees: batch fetching of missing blobs", 2017-12-08), but for another command. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 32 +++++++++++ t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++ 2 files changed, 135 insertions(+) create mode 100755 t/t4067-diff-partial-clone.sh