diff mbox series

[4/6] fetch-pack: optimize loading of refs via commit graph

Message ID 67917af7ceeefe41ae0f6edf69cd61e2ee8c0ea3.1629452412.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Speed up mirror-fetches with many refs | expand

Commit Message

Patrick Steinhardt Aug. 20, 2021, 10:08 a.m. UTC
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.

In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     31.634 s ±  0.258 s    [User: 28.400 s, System: 5.090 s]
      Range (min … max):   31.280 s … 31.896 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.129 s ±  0.543 s    [User: 27.976 s, System: 5.056 s]
      Range (min … max):   30.172 s … 31.479 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.02 ± 0.02 times faster than 'HEAD~: git-fetch'

In case this fails, we fall back to the old code which peels the
objects to a commit.
---
 fetch-pack.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Derrick Stolee Aug. 20, 2021, 2:37 p.m. UTC | #1
On 8/20/2021 6:08 AM, Patrick Steinhardt wrote:
> In order to negotiate a packfile, we need to dereference refs to see
> which commits we have in common with the remote. To do so, we first look
> up the object's type -- if it's a tag, we peel until we hit a non-tag
> object. If we hit a commit eventually, then we return that commit.
> 
> In case the object ID points to a commit directly, we can avoid the
> initial lookup of the object type by opportunistically looking up the
> commit via the commit-graph, if available, which gives us a slight speed
> bump of about 2% in a huge repository with about 2.3M refs:
> 
>     Benchmark #1: HEAD~: git-fetch
>       Time (mean ± σ):     31.634 s ±  0.258 s    [User: 28.400 s, System: 5.090 s]
>       Range (min … max):   31.280 s … 31.896 s    5 runs
> 
>     Benchmark #2: HEAD: git-fetch
>       Time (mean ± σ):     31.129 s ±  0.543 s    [User: 27.976 s, System: 5.056 s]
>       Range (min … max):   30.172 s … 31.479 s    5 runs
> 
>     Summary
>       'HEAD: git-fetch' ran
>         1.02 ± 0.02 times faster than 'HEAD~: git-fetch'

This 2% gain is nice, especially because you are measuring the
end-to-end scenario. If you use GIT_TRACE2_PERF=1 on a few runs,
then you could likely isolate some of the regions from
mark_complete_and_common_ref() and demonstrate a larger improvement
in that focused area.

> @@ -119,6 +119,11 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
> +	struct commit *commit;
> +
> +	commit = lookup_commit_in_graph(the_repository, oid);
> +	if (commit)
> +		return commit;

Obviously a correct thing to do.

>  	if (type == OBJ_COMMIT) {
> -		struct commit *commit = lookup_commit(the_repository, oid);
> +		commit = lookup_commit(the_repository, oid);

Re-using the local simplifies this. Good.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index 1a6242cd71..c57faf278f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -119,6 +119,11 @@  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
+	struct commit *commit;
+
+	commit = lookup_commit_in_graph(the_repository, oid);
+	if (commit)
+		return commit;
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -139,7 +144,7 @@  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 	}
 
 	if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(the_repository, oid);
+		commit = lookup_commit(the_repository, oid);
 		if (!commit || repo_parse_commit(the_repository, commit))
 			return NULL;
 		return commit;