mbox series

[v3,0/7] Speed up mirror-fetches with many refs

Message ID cover.1630501732.git.ps@pks.im (mailing list archive)
Headers show
Series Speed up mirror-fetches with many refs | expand

Message

Patrick Steinhardt Sept. 1, 2021, 1:09 p.m. UTC
Hi,

this is the third version of my patch series to speed up mirror-fetches
with many refs. This patch series applies on top of master with
ps/connectivity-optim merged into it.

There's only some smallish changes based on Stolee's feedback (thanks
for that!):

    - A small typo in 1/7.

    - A confict fix in 4/7 required now because it's based on master
      instead of directly on my merged topic.

    - I've adjusted patch 5/7 such that I don't have to re-touch the
      logic in 6/7.

Patrick

Patrick Steinhardt (7):
  fetch: speed up lookup of want refs via commit-graph
  fetch: avoid unpacking headers in object existence check
  connected: refactor iterator to return next object ID directly
  fetch-pack: optimize loading of refs via commit graph
  fetch: refactor fetch refs to be more extendable
  fetch: merge fetching and consuming refs
  fetch: avoid second connectivity check if we already have all objects

 builtin/clone.c        |  8 ++---
 builtin/fetch.c        | 74 +++++++++++++++++++++++-------------------
 builtin/receive-pack.c | 17 ++++------
 connected.c            | 15 +++++----
 connected.h            |  2 +-
 fetch-pack.c           | 12 ++++---
 6 files changed, 67 insertions(+), 61 deletions(-)

Range-diff against v2:
1:  4a819a6830 ! 1:  8214f04971 fetch: speed up lookup of want refs via commit-graph
    @@ Commit message
         that we repeatedly need to unpack object headers for each of the
         referenced objects.
     
    -    Speed this up by opportunistcally trying to resolve object IDs via the
    +    Speed this up by opportunistically trying to resolve object IDs via the
         commit graph. We only do so for any refs which are not in "refs/tags":
         more likely than not, these are going to be a commit anyway, and this
         lets us avoid having to unpack object headers completely in case the
2:  81ebadabe8 = 2:  991a27cb82 fetch: avoid unpacking headers in object existence check
3:  98e981ced9 = 3:  ba834803ab connected: refactor iterator to return next object ID directly
4:  6311203f08 ! 4:  99d3316d48 fetch-pack: optimize loading of refs via commit graph
    @@ fetch-pack.c: static struct commit *deref_without_lazy_fetch(const struct object
      
      	while (1) {
      		if (oid_object_info_extended(the_repository, oid, &info,
    -@@ fetch-pack.c: static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
    - 	}
    - 
    - 	if (type == OBJ_COMMIT) {
    --		struct commit *commit = lookup_commit(the_repository, oid);
    -+		commit = lookup_commit(the_repository, oid);
    - 		if (!commit || repo_parse_commit(the_repository, commit))
    - 			return NULL;
    - 		return commit;
5:  56a9158ac3 ! 5:  d64888e072 fetch: refactor fetch refs to be more extendable
    @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
      		trace2_region_enter("fetch", "fetch_refs", the_repository);
      		ret = transport_fetch_refs(transport, ref_map);
      		trace2_region_leave("fetch", "fetch_refs", the_repository);
    -+		if (ret) {
    -+			transport_unlock_pack(transport);
    -+			return ret;
    -+		}
    ++		if (ret)
    ++			goto out;
      	}
     -	if (!ret)
     -		/*
    @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
     -		 * time to update refs to reference the new objects.
     -		 */
     -		return 0;
    --	transport_unlock_pack(transport);
    --	return ret;
     +
     +	/*
     +	 * Keep the new pack's ".keep" file around to allow the caller
     +	 * time to update refs to reference the new objects.
     +	 */
    -+	return 0;
    ++	return ret;
    ++
    ++out:
    + 	transport_unlock_pack(transport);
    + 	return ret;
      }
    - 
    - /* Update local refs based on the ref values fetched from a remote */
6:  31d9f72edf ! 6:  56ecbfc9c3 fetch: merge fetching and consuming refs
    @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
      
      	/*
     @@ builtin/fetch.c: static int fetch_refs(struct transport *transport, struct ref *ref_map)
    - 		trace2_region_enter("fetch", "fetch_refs", the_repository);
    - 		ret = transport_fetch_refs(transport, ref_map);
    - 		trace2_region_leave("fetch", "fetch_refs", the_repository);
    --		if (ret) {
    --			transport_unlock_pack(transport);
    --			return ret;
    --		}
    -+		if (ret)
    -+			goto out;
    + 			goto out;
      	}
      
     -	/*
     -	 * Keep the new pack's ".keep" file around to allow the caller
     -	 * time to update refs to reference the new objects.
     -	 */
    --	return 0;
    +-	return ret;
    +-
    +-out:
    +-	transport_unlock_pack(transport);
    +-	return ret;
     -}
     -
     -/* Update local refs based on the ref values fetched from a remote */
7:  84e39c847f = 7:  c342fc0c69 fetch: avoid second connectivity check if we already have all objects

Comments

Junio C Hamano Sept. 1, 2021, 7:58 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> There's only some smallish changes based on Stolee's feedback (thanks
> for that!):
>
>     - A small typo in 1/7.
>
>     - A confict fix in 4/7 required now because it's based on master
>       instead of directly on my merged topic.
>
>     - I've adjusted patch 5/7 such that I don't have to re-touch the
>       logic in 6/7.

OK, as a sanity check on my end, I've tried to apply this on master
and then to merge the result with 'ps/connectivity-optim' topic
(i.e. expected endgame for this and the other topic if both of them
graduated today without any other topics), and then tried a merge on
master to the previous round (i.e. has the other topic already in),
and did not see any difference in the end result.

Will replace.
Junio C Hamano Sept. 8, 2021, 12:08 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> There's only some smallish changes based on Stolee's feedback (thanks
>> for that!):
>>
>>     - A small typo in 1/7.
>>
>>     - A confict fix in 4/7 required now because it's based on master
>>       instead of directly on my merged topic.
>>
>>     - I've adjusted patch 5/7 such that I don't have to re-touch the
>>       logic in 6/7.
>
> OK, as a sanity check on my end, I've tried to apply this on master
> and then to merge the result with 'ps/connectivity-optim' topic
> (i.e. expected endgame for this and the other topic if both of them
> graduated today without any other topics), and then tried a merge on
> master to the previous round (i.e. has the other topic already in),
> and did not see any difference in the end result.
>
> Will replace.

OK.  Is everybody happy with this version?  I am about to mark it
for 'next' in the next issue of "What's cooking" report, so please
holler if I should wait.