diff mbox series

[2/6] fetch: avoid unpacking headers in object existence check

Message ID d3dac607f2235c5913621813c443aa10b99c8fe8.1629452412.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Speed up mirror-fetches with many refs | expand

Commit Message

Patrick Steinhardt Aug. 20, 2021, 10:08 a.m. UTC
When updating local refs after the fetch has transferred all objects, we
do an object existence test as a safety guard to avoid updating a ref to
an object which we don't have. We do so via `oid_object_info()`: if it
returns an error, then we know the object does not exist.

One side effect of `oid_object_info()` is that it parses the object's
type, and to do so it must unpack the object header. This is completely
pointless: we don't care for the type, but only want to assert that the
object exists.

Refactor the code to use `repo_has_object_file()`, which both makes the
code's intent clearer and is also faster because it does not unpack
object headers. In a real-world repo with 2.3M refs, this results in a
small speedup when doing a mirror-fetch:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     33.686 s ±  0.176 s    [User: 30.119 s, System: 5.262 s]
      Range (min … max):   33.512 s … 33.944 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.247 s ±  0.195 s    [User: 28.135 s, System: 5.066 s]
      Range (min … max):   30.948 s … 31.472 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.08 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 25, 2021, 11:44 p.m. UTC | #1
On Fri, Aug 20 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> When updating local refs after the fetch has transferred all objects, we
> do an object existence test as a safety guard to avoid updating a ref to
> an object which we don't have. We do so via `oid_object_info()`: if it
> returns an error, then we know the object does not exist.
>
> One side effect of `oid_object_info()` is that it parses the object's
> type, and to do so it must unpack the object header. This is completely
> pointless: we don't care for the type, but only want to assert that the
> object exists.
>
> Refactor the code to use `repo_has_object_file()`, which both makes the
> code's intent clearer and is also faster because it does not unpack
> object headers. In a real-world repo with 2.3M refs, this results in a
> small speedup when doing a mirror-fetch:
>
>     Benchmark #1: HEAD~: git-fetch
>       Time (mean ± σ):     33.686 s ±  0.176 s    [User: 30.119 s, System: 5.262 s]
>       Range (min … max):   33.512 s … 33.944 s    5 runs
>
>     Benchmark #2: HEAD: git-fetch
>       Time (mean ± σ):     31.247 s ±  0.195 s    [User: 28.135 s, System: 5.066 s]
>       Range (min … max):   30.948 s … 31.472 s    5 runs
>
>     Summary
>       'HEAD: git-fetch' ran
>         1.08 ± 0.01 times faster than 'HEAD~: git-fetch'
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/fetch.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 73f5b286d5..5fd0f7c791 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -846,13 +846,11 @@ static int update_local_ref(struct ref *ref,
>  			    int summary_width)
>  {
>  	struct commit *current = NULL, *updated;
> -	enum object_type type;
>  	struct branch *current_branch = branch_get(NULL);
>  	const char *pretty_ref = prettify_refname(ref->name);
>  	int fast_forward = 0;
>  
> -	type = oid_object_info(the_repository, &ref->new_oid, NULL);
> -	if (type < 0)
> +	if (!repo_has_object_file(the_repository, &ref->new_oid))
>  		die(_("object %s not found"), oid_to_hex(&ref->new_oid));
>  
>  	if (oideq(&ref->old_oid, &ref->new_oid)) {

I tried grepping the source for any other candidates for a migration to
repo_has_object_file(), but this is the only "type = oid_object_info" I
could find that didn't care about the type, perhaps there's some callers
of *_extended() that could be moved over, but that's less likely, and I
didn't check...
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 73f5b286d5..5fd0f7c791 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -846,13 +846,11 @@  static int update_local_ref(struct ref *ref,
 			    int summary_width)
 {
 	struct commit *current = NULL, *updated;
-	enum object_type type;
 	struct branch *current_branch = branch_get(NULL);
 	const char *pretty_ref = prettify_refname(ref->name);
 	int fast_forward = 0;
 
-	type = oid_object_info(the_repository, &ref->new_oid, NULL);
-	if (type < 0)
+	if (!repo_has_object_file(the_repository, &ref->new_oid))
 		die(_("object %s not found"), oid_to_hex(&ref->new_oid));
 
 	if (oideq(&ref->old_oid, &ref->new_oid)) {