[v2] cache-tree: skip some blob checks in partial clone
diff mbox series

Message ID 20181009184037.246871-1-jonathantanmy@google.com
State New
Headers show
Series
  • [v2] cache-tree: skip some blob checks in partial clone
Related show

Commit Message

Jonathan Tan Oct. 9, 2018, 6:40 p.m. UTC
In a partial clone, whenever a sparse checkout occurs, the existence of
all blobs in the index is verified, whether they are included or
excluded by the .git/info/sparse-checkout specification. This
significantly degrades performance because a lazy fetch occurs whenever
the existence of a missing blob is checked.

This is because cache_tree_update() checks the existence of all objects
in the index, whether or not CE_SKIP_WORKTREE is set on them. Teach
cache_tree_update() to skip checking CE_SKIP_WORKTREE objects when the
repository is a partial clone. This improves performance for sparse
checkout and also other operations that use cache_tree_update().

Instead of completely removing the check, an argument could be made that
the check should instead be replaced by a check that the blob is
promised, but for performance reasons, I decided not to do this.
If the user needs to verify the repository, it can be done using fsck
(which will notify if a tree points to a missing and non-promised blob,
whether the blob is included or excluded by the sparse-checkout
specification).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1:

After feedback, I restricted this to partial clone. Once restricted, I
agree with Ben that this can be done for all users of
cache_tree_update(), not just unpack-trees, so I have removed the
ability to control the behavior using a flag.

I also took the opportunity to simplify the missing check by using a
variable.
---
 cache-tree.c                     |  6 +++++-
 t/t1090-sparse-checkout-scope.sh | 33 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Oct. 10, 2018, 1:19 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> After feedback, I restricted this to partial clone. Once restricted, I
> agree with Ben that this can be done for all users of
> cache_tree_update(), not just unpack-trees, so I have removed the
> ability to control the behavior using a flag.

Makes sense.  Great.

> I also took the opportunity to simplify the missing check by using a
> variable.
>  
> +		ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> +			(repository_format_partial_clone &&
> +			 ce_skip_worktree(ce));
>  		if (is_null_oid(oid) ||
> -		    (mode != S_IFGITLINK && !missing_ok && !has_object_file(oid))) {
> +		    (!ce_missing_ok && !has_object_file(oid))) {

OK.  "An attempt to check out null object is bad, and otherwise,
unless we determined that it is OK to lack the object recorded in
ce, it is bad too.  By the way, the way we determine if it is OK to
be missing the object is given above".  Easier to read than the
original.

Patch
diff mbox series

diff --git a/cache-tree.c b/cache-tree.c
index 5ce51468f0..f210481f9b 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -326,6 +326,7 @@  static int update_one(struct cache_tree *it,
 		unsigned mode;
 		int expected_missing = 0;
 		int contains_ita = 0;
+		int ce_missing_ok;
 
 		path = ce->name;
 		pathlen = ce_namelen(ce);
@@ -355,8 +356,11 @@  static int update_one(struct cache_tree *it,
 			i++;
 		}
 
+		ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
+			(repository_format_partial_clone &&
+			 ce_skip_worktree(ce));
 		if (is_null_oid(oid) ||
-		    (mode != S_IFGITLINK && !missing_ok && !has_object_file(oid))) {
+		    (!ce_missing_ok && !has_object_file(oid))) {
 			strbuf_release(&buffer);
 			if (expected_missing)
 				return -1;
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 25d7c700f6..090b7fc3d3 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -63,4 +63,37 @@  test_expect_success 'return to full checkout of master' '
 	test "$(cat b)" = "modified"
 '
 
+test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
+	test_create_repo server &&
+	git clone "file://$(pwd)/server" client &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	echo a >server/a &&
+	echo bb >server/b &&
+	mkdir server/c &&
+	echo ccc >server/c/c &&
+	git -C server add a b c/c &&
+	git -C server commit -m message &&
+
+	test_config -C client core.sparsecheckout 1 &&
+	test_config -C client extensions.partialclone origin &&
+	echo "!/*" >client/.git/info/sparse-checkout &&
+	echo "/a" >>client/.git/info/sparse-checkout &&
+	git -C client fetch --filter=blob:none origin &&
+	git -C client checkout FETCH_HEAD &&
+
+	git -C client rev-list HEAD \
+		--quiet --objects --missing=print >unsorted_actual &&
+	(
+		printf "?" &&
+		git hash-object server/b &&
+		printf "?" &&
+		git hash-object server/c/c
+	) >unsorted_expect &&
+	sort unsorted_actual >actual &&
+	sort unsorted_expect >expect &&
+	test_cmp expect actual
+'
+
 test_done