Message ID | f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d3da223f2214ebc1527ccf66428aa975de916682 |
Headers | show |
Series | Another partial clone prefetch | expand |
> "git read-tree" checks the existence of the blobs referenced by the > given tree, but does not bulk prefetch them. Add a bulk prefetch. > > The lack of prefetch here was noticed at $DAYJOB during a merge > involving some specific commits, but I couldn't find a minimal merge > that didn't also trigger the prefetch in check_updates() in > unpack-trees.c (and in all these cases, the lack of prefetch in > cache-tree.c didn't matter because all the relevant blobs would have > already been prefetched by then). This is why I used read-tree here to > exercise this code path. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Forgot to mention: the $DAYJOB case is the same case as in [1]. In [1] I noticed that the object wasn't actually being used, so I disabled the object existence check. But that's probably the wrong approach - if the caller really didn't want the object's existence to be checked, they could have used WRITE_TREE_MISSING_OK when calling cache_tree_update(). [1] https://lore.kernel.org/git/cover.1627066238.git.jonathantanmy@google.com/
Jonathan Tan <jonathantanmy@google.com> writes: > @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags) > if (!istate->cache_tree) > istate->cache_tree = cache_tree(); > > + if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote()) > + prefetch_cache_entries(istate, must_check_existence); > + It's so nice when a "fix" to an issue can be this simple. Thanks; will queue. > trace_performance_enter(); > trace2_region_enter("cache_tree", "update", the_repository); > i = update_one(istate->cache_tree, istate->cache, istate->cache_nr, > diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh > new file mode 100755 > index 0000000000..a763e27c7d > --- /dev/null > +++ b/t/t1022-read-tree-partial-clone.sh > @@ -0,0 +1,33 @@ > +#!/bin/sh > + > +test_description='git read-tree in partial clones' > + > +TEST_NO_CREATE_REPO=1 > + > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME > + > +. ./test-lib.sh > + > +test_expect_success 'read-tree in partial clone prefetches in one batch' ' > + test_when_finished "rm -rf server client trace" && > + > + git init server && > + echo foo >server/one && > + echo bar >server/two && > + git -C server add one two && > + git -C server commit -m "initial commit" && > + TREE=$(git -C server rev-parse HEAD^{tree}) && > + > + git -C server config uploadpack.allowfilter 1 && > + git -C server config uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:none "file://$(pwd)/server" client && > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE && > + > + # "done" marks the end of negotiation (once per fetch). Expect that > + # only one fetch occurs. > + grep "fetch> done" trace >donelines && > + test_line_count = 1 donelines > +' > + > +test_done
On Fri, Jul 23, 2021 at 11:55 AM Jonathan Tan <jonathantanmy@google.com> wrote: > > "git read-tree" checks the existence of the blobs referenced by the > given tree, but does not bulk prefetch them. Add a bulk prefetch. > > The lack of prefetch here was noticed at $DAYJOB during a merge > involving some specific commits, but I couldn't find a minimal merge > that didn't also trigger the prefetch in check_updates() in > unpack-trees.c (and in all these cases, the lack of prefetch in > cache-tree.c didn't matter because all the relevant blobs would have > already been prefetched by then). This is why I used read-tree here to > exercise this code path. Okay, you have me stumped, I can't figure out what kind of a merge would bypass the check_updates() in unpack-trees.c either. I was curious about octopus or merge.autostash, but I just can't trigger it. Using read-tree to trigger the case makes perfect sense, though. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > cache-tree.c | 11 ++++++++-- > t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+), 2 deletions(-) > create mode 100755 t/t1022-read-tree-partial-clone.sh > > diff --git a/cache-tree.c b/cache-tree.c > index 45e58666af..9ba2c7c6b2 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it) > return 1; > } > > +static int must_check_existence(const struct cache_entry *ce) > +{ > + return !(has_promisor_remote() && ce_skip_worktree(ce)); > +} > + > static int update_one(struct cache_tree *it, > struct cache_entry **cache, > int entries, > @@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it, > } > > ce_missing_ok = mode == S_IFGITLINK || missing_ok || > - (has_promisor_remote() && > - ce_skip_worktree(ce)); > + !must_check_existence(ce); > if (is_null_oid(oid) || > (!ce_missing_ok && !has_object_file(oid))) { > strbuf_release(&buffer); > @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags) > if (!istate->cache_tree) > istate->cache_tree = cache_tree(); > > + if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote()) > + prefetch_cache_entries(istate, must_check_existence); > + Nice that the fix is so simple. > trace_performance_enter(); > trace2_region_enter("cache_tree", "update", the_repository); > i = update_one(istate->cache_tree, istate->cache, istate->cache_nr, > diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh > new file mode 100755 > index 0000000000..a763e27c7d > --- /dev/null > +++ b/t/t1022-read-tree-partial-clone.sh > @@ -0,0 +1,33 @@ > +#!/bin/sh > + > +test_description='git read-tree in partial clones' > + > +TEST_NO_CREATE_REPO=1 > + > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME > + > +. ./test-lib.sh > + > +test_expect_success 'read-tree in partial clone prefetches in one batch' ' > + test_when_finished "rm -rf server client trace" && > + > + git init server && > + echo foo >server/one && > + echo bar >server/two && > + git -C server add one two && > + git -C server commit -m "initial commit" && > + TREE=$(git -C server rev-parse HEAD^{tree}) && > + > + git -C server config uploadpack.allowfilter 1 && > + git -C server config uploadpack.allowanysha1inwant 1 && > + git clone --bare --filter=blob:none "file://$(pwd)/server" client && > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE && > + > + # "done" marks the end of negotiation (once per fetch). Expect that > + # only one fetch occurs. > + grep "fetch> done" trace >donelines && > + test_line_count = 1 donelines > +' > + > +test_done > -- > 2.32.0.432.gabb21c7263-goog Any reason for preferring GIT_TRACE_PACKET over GIT_TRACE2_PERF and looking for the reported fetch_count (or even the number of fetch_count lines)? Just curious. Anyway, looks good to me.
diff --git a/cache-tree.c b/cache-tree.c index 45e58666af..9ba2c7c6b2 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it) return 1; } +static int must_check_existence(const struct cache_entry *ce) +{ + return !(has_promisor_remote() && ce_skip_worktree(ce)); +} + static int update_one(struct cache_tree *it, struct cache_entry **cache, int entries, @@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it, } ce_missing_ok = mode == S_IFGITLINK || missing_ok || - (has_promisor_remote() && - ce_skip_worktree(ce)); + !must_check_existence(ce); if (is_null_oid(oid) || (!ce_missing_ok && !has_object_file(oid))) { strbuf_release(&buffer); @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags) if (!istate->cache_tree) istate->cache_tree = cache_tree(); + if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote()) + prefetch_cache_entries(istate, must_check_existence); + trace_performance_enter(); trace2_region_enter("cache_tree", "update", the_repository); i = update_one(istate->cache_tree, istate->cache, istate->cache_nr, diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh new file mode 100755 index 0000000000..a763e27c7d --- /dev/null +++ b/t/t1022-read-tree-partial-clone.sh @@ -0,0 +1,33 @@ +#!/bin/sh + +test_description='git read-tree in partial clones' + +TEST_NO_CREATE_REPO=1 + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +test_expect_success 'read-tree in partial clone prefetches in one batch' ' + test_when_finished "rm -rf server client trace" && + + git init server && + echo foo >server/one && + echo bar >server/two && + git -C server add one two && + git -C server commit -m "initial commit" && + TREE=$(git -C server rev-parse HEAD^{tree}) && + + git -C server config uploadpack.allowfilter 1 && + git -C server config uploadpack.allowanysha1inwant 1 && + git clone --bare --filter=blob:none "file://$(pwd)/server" client && + GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE && + + # "done" marks the end of negotiation (once per fetch). Expect that + # only one fetch occurs. + grep "fetch> done" trace >donelines && + test_line_count = 1 donelines +' + +test_done
"git read-tree" checks the existence of the blobs referenced by the given tree, but does not bulk prefetch them. Add a bulk prefetch. The lack of prefetch here was noticed at $DAYJOB during a merge involving some specific commits, but I couldn't find a minimal merge that didn't also trigger the prefetch in check_updates() in unpack-trees.c (and in all these cases, the lack of prefetch in cache-tree.c didn't matter because all the relevant blobs would have already been prefetched by then). This is why I used read-tree here to exercise this code path. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- cache-tree.c | 11 ++++++++-- t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) create mode 100755 t/t1022-read-tree-partial-clone.sh