Message ID | be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Skip a connectivity check during fetch --filter | expand |
Jonathan Tan wrote: > If a filter is specified, we do not need a full connectivity check on > the contents of the packfile we just fetched; we only need to check that > the objects referenced are promisor objects. > > This significantly speeds up fetches into repositories that have many > promisor objects, because during the connectivity check, all promisor > objects are enumerated (to mark them UNINTERESTING), and that takes a > significant amount of time. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of > the remaining time is spent in yet another connectivity check > (fetch_refs -> check_exist_and_connected) prior to the fetch - that will > hopefully be done in a subsequent patch. Can this information (at least the speedup) be included in the comment message? Or even better, can we demonstrate the impact using a perf test? > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, > url = xstrdup("foreign"); > > if (!connectivity_checked) { > + struct check_connected_options opt = CHECK_CONNECTED_INIT; > + > + if (filter_options.choice) > + /* > + * Since a filter is specified, objects indirectly > + * referenced by refs are allowed to be absent. > + */ > + opt.check_refs_are_promisor_objects_only = 1; > + > rm = ref_map; > - if (check_connected(iterate_ref_map, &rm, NULL)) { > + if (check_connected(iterate_ref_map, &rm, &opt)) { > rc = error(_("%s did not send all necessary objects\n"), url); > goto abort; > } Simple and sensible. With or without a change like described above, Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Thanks.
Jonathan Nieder <jrnieder@gmail.com> writes: > Jonathan Tan wrote: > >> If a filter is specified, we do not need a full connectivity check on >> the contents of the packfile we just fetched; we only need to check that >> the objects referenced are promisor objects. >> >> This significantly speeds up fetches into repositories that have many >> promisor objects, because during the connectivity check, all promisor >> objects are enumerated (to mark them UNINTERESTING), and that takes a >> significant amount of time. >> >> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> >> --- >> For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of >> the remaining time is spent in yet another connectivity check >> (fetch_refs -> check_exist_and_connected) prior to the fetch - that will >> hopefully be done in a subsequent patch. > > Can this information (at least the speedup) be included in the comment > message? > > Or even better, can we demonstrate the impact using a perf test? It does make sense, but let's queue these two first and then add it as a follow-up patch on top. Thanks for writing and reviewing.
diff --git a/builtin/fetch.c b/builtin/fetch.c index b4c6d921d0..6fb50320eb 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, url = xstrdup("foreign"); if (!connectivity_checked) { + struct check_connected_options opt = CHECK_CONNECTED_INIT; + + if (filter_options.choice) + /* + * Since a filter is specified, objects indirectly + * referenced by refs are allowed to be absent. + */ + opt.check_refs_are_promisor_objects_only = 1; + rm = ref_map; - if (check_connected(iterate_ref_map, &rm, NULL)) { + if (check_connected(iterate_ref_map, &rm, &opt)) { rc = error(_("%s did not send all necessary objects\n"), url); goto abort; }
If a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects. This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of the remaining time is spent in yet another connectivity check (fetch_refs -> check_exist_and_connected) prior to the fetch - that will hopefully be done in a subsequent patch. --- builtin/fetch.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)