diff mbox series

clone: use "quick" lookup while following tags

Message ID 20200401121537.GA1916590@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series clone: use "quick" lookup while following tags | expand

Commit Message

Jeff King April 1, 2020, 12:15 p.m. UTC
On Sat, Mar 28, 2020 at 10:40:23AM -0400, Jeff King wrote:

> So I guess the problem is not with shallow clones specifically, but they
> lead us to not having fetched the commits pointed to by tags, which
> leads to us trying to fault in those commits (and their trees) rather
> than realizing that we weren't meant to have them. And the size of the
> local repo balloons because you're fetching all those commits one by
> one, and not getting the benefit of the deltas you would when you do a
> single --filter=blob:none fetch.
> 
> I guess we need something like this:

The issue is actually with --single-branch, which is implied by --depth.
But the fix is the same either way.

Here it is with a commit message and test.

-- >8 --
Subject: [PATCH] clone: use "quick" lookup while following tags

When cloning with --single-branch, we implement git-fetch's usual
tag-following behavior, grabbing any tag objects that point to objects
we have locally.

When we're a partial clone, though, our has_object_file() check will
actually lazy-fetch each tag. That not only defeats the purpose of
--single-branch, but it does it incredibly slowly, potentially kicking
off a new fetch for each tag. This is even worse for a shallow clone,
which implies --single-branch, because even tags which are supersets of
each other will be fetched individually.

We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call,
which is what git-fetch does in this case.

Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch
does. The rationale is discussed in 5827a03545 (fetch: use "quick"
has_sha1_file for tag following, 2016-10-13), but here the tradeoff
would apply even more so because clone is very unlikely to be racing
with another process repacking our newly-created repository.

This may provide a very small speedup even in the non-partial case case,
as we'd avoid calling reprepare_packed_git() for each tag (though in
practice, we'd only have a single packfile, so that reprepare should be
quite cheap).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/clone.c          | 4 +++-
 t/t5616-partial-clone.sh | 8 ++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Konstantin Tokarev April 1, 2020, 7:12 p.m. UTC | #1
01.04.2020, 15:15, "Jeff King" <peff@peff.net>:
> On Sat, Mar 28, 2020 at 10:40:23AM -0400, Jeff King wrote:
>
>>  So I guess the problem is not with shallow clones specifically, but they
>>  lead us to not having fetched the commits pointed to by tags, which
>>  leads to us trying to fault in those commits (and their trees) rather
>>  than realizing that we weren't meant to have them. And the size of the
>>  local repo balloons because you're fetching all those commits one by
>>  one, and not getting the benefit of the deltas you would when you do a
>>  single --filter=blob:none fetch.
>>
>>  I guess we need something like this:
>
> The issue is actually with --single-branch, which is implied by --depth.
> But the fix is the same either way.
>
> Here it is with a commit message and test.
>
> -- >8 --
> Subject: [PATCH] clone: use "quick" lookup while following tags
>
> When cloning with --single-branch, we implement git-fetch's usual
> tag-following behavior, grabbing any tag objects that point to objects
> we have locally.
>
> When we're a partial clone, though, our has_object_file() check will
> actually lazy-fetch each tag. That not only defeats the purpose of
> --single-branch, but it does it incredibly slowly, potentially kicking
> off a new fetch for each tag. This is even worse for a shallow clone,
> which implies --single-branch, because even tags which are supersets of
> each other will be fetched individually.
>
> We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call,
> which is what git-fetch does in this case.
>
> Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch
> does. The rationale is discussed in 5827a03545 (fetch: use "quick"
> has_sha1_file for tag following, 2016-10-13), but here the tradeoff
> would apply even more so because clone is very unlikely to be racing
> with another process repacking our newly-created repository.
>
> This may provide a very small speedup even in the non-partial case case,
> as we'd avoid calling reprepare_packed_git() for each tag (though in
> practice, we'd only have a single packfile, so that reprepare should be
> quite cheap).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/clone.c | 4 +++-
>  t/t5616-partial-clone.sh | 8 ++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index d8b1f413aa..9da6459f1d 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -643,7 +643,9 @@ static void write_followtags(const struct ref *refs, const char *msg)
>                          continue;
>                  if (ends_with(ref->name, "^{}"))
>                          continue;
> - if (!has_object_file(&ref->old_oid))
> + if (!has_object_file_with_flags(&ref->old_oid,
> + OBJECT_INFO_QUICK |
> + OBJECT_INFO_SKIP_FETCH_OBJECT))
>                          continue;
>                  update_ref(msg, ref->name, &ref->old_oid, NULL, 0,
>                             UPDATE_REFS_DIE_ON_ERR);
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 77bb91e976..8f0d81a27e 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -415,6 +415,14 @@ test_expect_success 'verify fetch downloads only one pack when updating refs' '
>          test_line_count = 3 pack-list
>  '
>
> +test_expect_success 'single-branch tag following respects partial clone' '
> + git clone --single-branch -b B --filter=blob:none \
> + "file://$(pwd)/srv.bare" single &&
> + git -C single rev-parse --verify refs/tags/B &&
> + git -C single rev-parse --verify refs/tags/A &&
> + test_must_fail git -C single rev-parse --verify refs/tags/C
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd
>
> --
> 2.26.0.408.gebd8a4413c

Would you recommend using this patch in production, or is it better to wait for reviews?
Jeff King April 1, 2020, 7:25 p.m. UTC | #2
On Wed, Apr 01, 2020 at 10:12:30PM +0300, Konstantin Tokarev wrote:

> Would you recommend using this patch in production, or is it better to wait for reviews?

Only you know the risk tolerance of your production systems, so I make
no guarantees.  But I would personally think it is quite safe.

-Peff
diff mbox series

Patch

diff --git a/builtin/clone.c b/builtin/clone.c
index d8b1f413aa..9da6459f1d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -643,7 +643,9 @@  static void write_followtags(const struct ref *refs, const char *msg)
 			continue;
 		if (ends_with(ref->name, "^{}"))
 			continue;
-		if (!has_object_file(&ref->old_oid))
+		if (!has_object_file_with_flags(&ref->old_oid,
+						OBJECT_INFO_QUICK |
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		update_ref(msg, ref->name, &ref->old_oid, NULL, 0,
 			   UPDATE_REFS_DIE_ON_ERR);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e976..8f0d81a27e 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -415,6 +415,14 @@  test_expect_success 'verify fetch downloads only one pack when updating refs' '
 	test_line_count = 3 pack-list
 '
 
+test_expect_success 'single-branch tag following respects partial clone' '
+	git clone --single-branch -b B --filter=blob:none \
+		"file://$(pwd)/srv.bare" single &&
+	git -C single rev-parse --verify refs/tags/B &&
+	git -C single rev-parse --verify refs/tags/A &&
+	test_must_fail git -C single rev-parse --verify refs/tags/C
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd