diff mbox series

[2/2] fetch: forgo full connectivity check if --filter

Message ID be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series Skip a connectivity check during fetch --filter | expand

Commit Message

Jonathan Tan Jan. 12, 2020, 4:15 a.m. UTC
If a filter is specified, we do not need a full connectivity check on
the contents of the packfile we just fetched; we only need to check that
the objects referenced are promisor objects.

This significantly speeds up fetches into repositories that have many
promisor objects, because during the connectivity check, all promisor
objects are enumerated (to mark them UNINTERESTING), and that takes a
significant amount of time.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
the remaining time is spent in yet another connectivity check
(fetch_refs -> check_exist_and_connected) prior to the fetch - that will
hopefully be done in a subsequent patch.
---
 builtin/fetch.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Jonathan Nieder Jan. 29, 2020, 8:43 p.m. UTC | #1
Jonathan Tan wrote:

> If a filter is specified, we do not need a full connectivity check on
> the contents of the packfile we just fetched; we only need to check that
> the objects referenced are promisor objects.
>
> This significantly speeds up fetches into repositories that have many
> promisor objects, because during the connectivity check, all promisor
> objects are enumerated (to mark them UNINTERESTING), and that takes a
> significant amount of time.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
> the remaining time is spent in yet another connectivity check
> (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
> hopefully be done in a subsequent patch.

Can this information (at least the speedup) be included in the comment
message?

Or even better, can we demonstrate the impact using a perf test?

> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  		url = xstrdup("foreign");
>  
>  	if (!connectivity_checked) {
> +		struct check_connected_options opt = CHECK_CONNECTED_INIT;
> +
> +		if (filter_options.choice)
> +			/*
> +			 * Since a filter is specified, objects indirectly
> +			 * referenced by refs are allowed to be absent.
> +			 */
> +			opt.check_refs_are_promisor_objects_only = 1;
> +
>  		rm = ref_map;
> -		if (check_connected(iterate_ref_map, &rm, NULL)) {
> +		if (check_connected(iterate_ref_map, &rm, &opt)) {
>  			rc = error(_("%s did not send all necessary objects\n"), url);
>  			goto abort;
>  		}

Simple and sensible.  With or without a change like described above,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.
Junio C Hamano Jan. 30, 2020, 6:57 p.m. UTC | #2
Jonathan Nieder <jrnieder@gmail.com> writes:

> Jonathan Tan wrote:
>
>> If a filter is specified, we do not need a full connectivity check on
>> the contents of the packfile we just fetched; we only need to check that
>> the objects referenced are promisor objects.
>>
>> This significantly speeds up fetches into repositories that have many
>> promisor objects, because during the connectivity check, all promisor
>> objects are enumerated (to mark them UNINTERESTING), and that takes a
>> significant amount of time.
>>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>> For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
>> the remaining time is spent in yet another connectivity check
>> (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
>> hopefully be done in a subsequent patch.
>
> Can this information (at least the speedup) be included in the comment
> message?
>
> Or even better, can we demonstrate the impact using a perf test?

It does make sense, but let's queue these two first and then add it
as a follow-up patch on top.

Thanks for writing and reviewing.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b4c6d921d0..6fb50320eb 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -906,8 +906,17 @@  static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	if (!connectivity_checked) {
+		struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+		if (filter_options.choice)
+			/*
+			 * Since a filter is specified, objects indirectly
+			 * referenced by refs are allowed to be absent.
+			 */
+			opt.check_refs_are_promisor_objects_only = 1;
+
 		rm = ref_map;
-		if (check_connected(iterate_ref_map, &rm, NULL)) {
+		if (check_connected(iterate_ref_map, &rm, &opt)) {
 			rc = error(_("%s did not send all necessary objects\n"), url);
 			goto abort;
 		}