diff mbox series

[2/2] cache-tree: prefetch in partial clone read-tree

Message ID f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com (mailing list archive)
State Accepted
Commit d3da223f2214ebc1527ccf66428aa975de916682
Headers show
Series Another partial clone prefetch | expand

Commit Message

Jonathan Tan July 23, 2021, 6:52 p.m. UTC
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.

The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 cache-tree.c                       | 11 ++++++++--
 t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100755 t/t1022-read-tree-partial-clone.sh

Comments

Jonathan Tan July 23, 2021, 6:55 p.m. UTC | #1
> "git read-tree" checks the existence of the blobs referenced by the
> given tree, but does not bulk prefetch them. Add a bulk prefetch.
> 
> The lack of prefetch here was noticed at $DAYJOB during a merge
> involving some specific commits, but I couldn't find a minimal merge
> that didn't also trigger the prefetch in check_updates() in
> unpack-trees.c (and in all these cases, the lack of prefetch in
> cache-tree.c didn't matter because all the relevant blobs would have
> already been prefetched by then). This is why I used read-tree here to
> exercise this code path.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>

Forgot to mention: the $DAYJOB case is the same case as in [1]. In [1] I
noticed that the object wasn't actually being used, so I disabled the
object existence check. But that's probably the wrong approach - if the
caller really didn't want the object's existence to be checked, they
could have used WRITE_TREE_MISSING_OK when calling cache_tree_update().

[1] https://lore.kernel.org/git/cover.1627066238.git.jonathantanmy@google.com/
Junio C Hamano July 23, 2021, 9:20 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

> @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags)
>  	if (!istate->cache_tree)
>  		istate->cache_tree = cache_tree();
>  
> +	if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
> +		prefetch_cache_entries(istate, must_check_existence);
> +

It's so nice when a "fix" to an issue can be this simple.

Thanks; will queue.

>  	trace_performance_enter();
>  	trace2_region_enter("cache_tree", "update", the_repository);
>  	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
> new file mode 100755
> index 0000000000..a763e27c7d
> --- /dev/null
> +++ b/t/t1022-read-tree-partial-clone.sh
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +
> +test_description='git read-tree in partial clones'
> +
> +TEST_NO_CREATE_REPO=1
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'read-tree in partial clone prefetches in one batch' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	git init server &&
> +	echo foo >server/one &&
> +	echo bar >server/two &&
> +	git -C server add one two &&
> +	git -C server commit -m "initial commit" &&
> +	TREE=$(git -C server rev-parse HEAD^{tree}) &&
> +
> +	git -C server config uploadpack.allowfilter 1 &&
> +	git -C server config uploadpack.allowanysha1inwant 1 &&
> +	git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
> +
> +	# "done" marks the end of negotiation (once per fetch). Expect that
> +	# only one fetch occurs.
> +	grep "fetch> done" trace >donelines &&
> +	test_line_count = 1 donelines
> +'
> +
> +test_done
Elijah Newren July 23, 2021, 9:34 p.m. UTC | #3
On Fri, Jul 23, 2021 at 11:55 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> "git read-tree" checks the existence of the blobs referenced by the
> given tree, but does not bulk prefetch them. Add a bulk prefetch.
>
> The lack of prefetch here was noticed at $DAYJOB during a merge
> involving some specific commits, but I couldn't find a minimal merge
> that didn't also trigger the prefetch in check_updates() in
> unpack-trees.c (and in all these cases, the lack of prefetch in
> cache-tree.c didn't matter because all the relevant blobs would have
> already been prefetched by then). This is why I used read-tree here to
> exercise this code path.

Okay, you have me stumped, I can't figure out what kind of a merge
would bypass the check_updates() in unpack-trees.c either.  I was
curious about octopus or merge.autostash, but I just can't trigger it.

Using read-tree to trigger the case makes perfect sense, though.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  cache-tree.c                       | 11 ++++++++--
>  t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 2 deletions(-)
>  create mode 100755 t/t1022-read-tree-partial-clone.sh
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 45e58666af..9ba2c7c6b2 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it)
>         return 1;
>  }
>
> +static int must_check_existence(const struct cache_entry *ce)
> +{
> +       return !(has_promisor_remote() && ce_skip_worktree(ce));
> +}
> +
>  static int update_one(struct cache_tree *it,
>                       struct cache_entry **cache,
>                       int entries,
> @@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it,
>                 }
>
>                 ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> -                       (has_promisor_remote() &&
> -                        ce_skip_worktree(ce));
> +                       !must_check_existence(ce);
>                 if (is_null_oid(oid) ||
>                     (!ce_missing_ok && !has_object_file(oid))) {
>                         strbuf_release(&buffer);
> @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags)
>         if (!istate->cache_tree)
>                 istate->cache_tree = cache_tree();
>
> +       if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
> +               prefetch_cache_entries(istate, must_check_existence);
> +

Nice that the fix is so simple.

>         trace_performance_enter();
>         trace2_region_enter("cache_tree", "update", the_repository);
>         i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
> new file mode 100755
> index 0000000000..a763e27c7d
> --- /dev/null
> +++ b/t/t1022-read-tree-partial-clone.sh
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +
> +test_description='git read-tree in partial clones'
> +
> +TEST_NO_CREATE_REPO=1
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'read-tree in partial clone prefetches in one batch' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       git init server &&
> +       echo foo >server/one &&
> +       echo bar >server/two &&
> +       git -C server add one two &&
> +       git -C server commit -m "initial commit" &&
> +       TREE=$(git -C server rev-parse HEAD^{tree}) &&
> +
> +       git -C server config uploadpack.allowfilter 1 &&
> +       git -C server config uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
> +
> +       # "done" marks the end of negotiation (once per fetch). Expect that
> +       # only one fetch occurs.
> +       grep "fetch> done" trace >donelines &&
> +       test_line_count = 1 donelines
> +'
> +
> +test_done
> --
> 2.32.0.432.gabb21c7263-goog

Any reason for preferring GIT_TRACE_PACKET over GIT_TRACE2_PERF and
looking for the reported fetch_count (or even the number of
fetch_count lines)?  Just curious.

Anyway, looks good to me.
diff mbox series

Patch

diff --git a/cache-tree.c b/cache-tree.c
index 45e58666af..9ba2c7c6b2 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -237,6 +237,11 @@  int cache_tree_fully_valid(struct cache_tree *it)
 	return 1;
 }
 
+static int must_check_existence(const struct cache_entry *ce)
+{
+	return !(has_promisor_remote() && ce_skip_worktree(ce));
+}
+
 static int update_one(struct cache_tree *it,
 		      struct cache_entry **cache,
 		      int entries,
@@ -378,8 +383,7 @@  static int update_one(struct cache_tree *it,
 		}
 
 		ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
-			(has_promisor_remote() &&
-			 ce_skip_worktree(ce));
+			!must_check_existence(ce);
 		if (is_null_oid(oid) ||
 		    (!ce_missing_ok && !has_object_file(oid))) {
 			strbuf_release(&buffer);
@@ -466,6 +470,9 @@  int cache_tree_update(struct index_state *istate, int flags)
 	if (!istate->cache_tree)
 		istate->cache_tree = cache_tree();
 
+	if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
+		prefetch_cache_entries(istate, must_check_existence);
+
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", the_repository);
 	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
new file mode 100755
index 0000000000..a763e27c7d
--- /dev/null
+++ b/t/t1022-read-tree-partial-clone.sh
@@ -0,0 +1,33 @@ 
+#!/bin/sh
+
+test_description='git read-tree in partial clones'
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'read-tree in partial clone prefetches in one batch' '
+	test_when_finished "rm -rf server client trace" &&
+
+	git init server &&
+	echo foo >server/one &&
+	echo bar >server/two &&
+	git -C server add one two &&
+	git -C server commit -m "initial commit" &&
+	TREE=$(git -C server rev-parse HEAD^{tree}) &&
+
+	git -C server config uploadpack.allowfilter 1 &&
+	git -C server config uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
+
+	# "done" marks the end of negotiation (once per fetch). Expect that
+	# only one fetch occurs.
+	grep "fetch> done" trace >donelines &&
+	test_line_count = 1 donelines
+'
+
+test_done