diff mbox series

fetch: skip formatting updated refs with `--quiet`

Message ID 40c385048a023dbd447c5f0b4c95ff32485e1e23.1629906005.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series fetch: skip formatting updated refs with `--quiet` | expand

Commit Message

Patrick Steinhardt Aug. 25, 2021, 3:45 p.m. UTC
When fetching, Git will by default print a list of all updated refs in a
nicely formatted table. In order to come up with this table, Git needs
to iterate refs twice: first to determine the maximum column width, and
a second time to actually format these changed refs.

While this table will not be printed in case the user passes `--quiet`,
we still go out of our way and do all these steps. In fact, we even do
more work compared to not passing `--quiet`: without the flag, we will
skip all references in the column width computation which have not been
updated, but if it is set we will now compute widths for all refs.

Fix this issue by completely skipping both preparation of the format and
formatting data for display in case the user passes `--quiet`, improving
performance especially with many refs. The following benchmark shows a
nice speedup for a quiet mirror-fetch in a repository with 2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     26.929 s ±  0.145 s    [User: 24.194 s, System: 4.656 s]
      Range (min … max):   26.692 s … 27.068 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     25.189 s ±  0.094 s    [User: 22.556 s, System: 4.606 s]
      Range (min … max):   25.070 s … 25.314 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.07 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 25, 2021, 4:57 p.m. UTC | #1
On Wed, Aug 25 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> When fetching, Git will by default print a list of all updated refs in a
> nicely formatted table. In order to come up with this table, Git needs
> to iterate refs twice: first to determine the maximum column width, and
> a second time to actually format these changed refs.
>
> While this table will not be printed in case the user passes `--quiet`,
> we still go out of our way and do all these steps. In fact, we even do
> more work compared to not passing `--quiet`: without the flag, we will
> skip all references in the column width computation which have not been
> updated, but if it is set we will now compute widths for all refs.
>
> Fix this issue by completely skipping both preparation of the format and
> formatting data for display in case the user passes `--quiet`, improving
> performance especially with many refs. The following benchmark shows a
> nice speedup for a quiet mirror-fetch in a repository with 2.3M refs:
>
>     Benchmark #1: HEAD~: git-fetch
>       Time (mean ± σ):     26.929 s ±  0.145 s    [User: 24.194 s, System: 4.656 s]
>       Range (min … max):   26.692 s … 27.068 s    5 runs
>
>     Benchmark #2: HEAD: git-fetch
>       Time (mean ± σ):     25.189 s ±  0.094 s    [User: 22.556 s, System: 4.606 s]
>       Range (min … max):   25.070 s … 25.314 s    5 runs
>
>     Summary
>       'HEAD: git-fetch' ran
>         1.07 ± 0.01 times faster than 'HEAD~: git-fetch'
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/fetch.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e064687dbd..d064b66512 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -757,6 +757,9 @@ static void prepare_format_display(struct ref *ref_map)
>  		die(_("configuration fetch.output contains invalid value %s"),
>  		    format);
>  
> +	if (verbosity < 0)
> +		return;
> +
>  	for (rm = ref_map; rm; rm = rm->next) {
>  		if (rm->status == REF_STATUS_REJECT_SHALLOW ||
>  		    !rm->peer_ref ||
> @@ -827,7 +830,12 @@ static void format_display(struct strbuf *display, char code,
>  			   const char *remote, const char *local,
>  			   int summary_width)
>  {
> -	int width = (summary_width + strlen(summary) - gettext_width(summary));
> +	int width;
> +
> +	if (verbosity < 0)
> +		return;
> +
> +	width = (summary_width + strlen(summary) - gettext_width(summary));
>  
>  	strbuf_addf(display, "%c %-*s ", code, width, summary);
>  	if (!compact_format)

I wonder what you think of the below, which is a bit more of a verbose
search/replacement change, but I think ultimately cleaner. We just pass
the "ref_map" down to all the formatting functions, and the first
function that needs to know the "compact_format" or "refcol_width"
lazily computes it (stored in a static variable). So we avoid the tricky
action at a distance of not preparing certain things because we know
we're in the quiet mode.

But if I understand your change correctly we're on the one hand not
preparing the format change, but then also not printing this at all now
under --quiet, correct? I think it would be good to split up that
proposed functional change from the optimization of the output, it also
seems like if that were done the git-fetch docs on --quiet need to be
changed.

But I wonder if once we have the change below, whether the small change
on top of it to just add a fetch.outputRefWidth=123 wouldn't also
largely solve the optimization problem you have, i.e. adding a config
variable to adjust the width, instead of us auto-discovering it by
looping over all refs.

Or maybe not, but I think it would be interesting to see how much of the
slowdown you have is that v.s. that we end up emitting this output to
stderr. I.e. is it the I/O of the output or the extra loop that's the
expensive part?

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e064687dbdc..88a8b3660d4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -707,6 +707,24 @@ static int s_update_ref(const char *action,
 static int refcol_width = 10;
 static int compact_format;
 
+static int prepared_compact_format;
+static void prepare_compact_format(void)
+{
+	const char *format = "full";
+
+	if (prepared_compact_format++)
+		return;
+
+	git_config_get_string_tmp("fetch.output", &format);
+	if (!strcasecmp(format, "full"))
+		compact_format = 0;
+	else if (!strcasecmp(format, "compact"))
+		compact_format = 1;
+	else
+		die(_("configuration fetch.output contains invalid value %s"),
+		    format);
+}
+
 static void adjust_refcol_width(const struct ref *ref)
 {
 	int max, rlen, llen, len;
@@ -726,6 +744,7 @@ static void adjust_refcol_width(const struct ref *ref)
 	 * anyway because we don't know if the error explanation part
 	 * will be printed in update_local_ref)
 	 */
+	prepare_compact_format();
 	if (compact_format) {
 		llen = 0;
 		max = max * 2 / 3;
@@ -743,19 +762,13 @@ static void adjust_refcol_width(const struct ref *ref)
 		refcol_width = rlen;
 }
 
-static void prepare_format_display(struct ref *ref_map)
+static int prepared_refcol_width;
+static void prepare_refcol_width(struct ref *ref_map)
 {
 	struct ref *rm;
-	const char *format = "full";
 
-	git_config_get_string_tmp("fetch.output", &format);
-	if (!strcasecmp(format, "full"))
-		compact_format = 0;
-	else if (!strcasecmp(format, "compact"))
-		compact_format = 1;
-	else
-		die(_("configuration fetch.output contains invalid value %s"),
-		    format);
+	if (prepared_refcol_width++)
+		return;
 
 	for (rm = ref_map; rm; rm = rm->next) {
 		if (rm->status == REF_STATUS_REJECT_SHALLOW ||
@@ -767,9 +780,10 @@ static void prepare_format_display(struct ref *ref_map)
 	}
 }
 
-static void print_remote_to_local(struct strbuf *display,
+static void print_remote_to_local(struct ref *ref_map, struct strbuf *display,
 				  const char *remote, const char *local)
 {
+	prepare_refcol_width(ref_map);
 	strbuf_addf(display, "%-*s -> %s", refcol_width, remote, local);
 }
 
@@ -800,7 +814,7 @@ static int find_and_replace(struct strbuf *haystack,
 	return 1;
 }
 
-static void print_compact(struct strbuf *display,
+static void print_compact(struct ref *ref_map, struct strbuf *display,
 			  const char *remote, const char *local)
 {
 	struct strbuf r = STRBUF_INIT;
@@ -816,13 +830,14 @@ static void print_compact(struct strbuf *display,
 
 	if (!find_and_replace(&r, local, "*"))
 		find_and_replace(&l, remote, "*");
-	print_remote_to_local(display, r.buf, l.buf);
+	print_remote_to_local(ref_map, display, r.buf, l.buf);
 
 	strbuf_release(&r);
 	strbuf_release(&l);
 }
 
-static void format_display(struct strbuf *display, char code,
+static void format_display(struct ref *ref_map,
+			   struct strbuf *display, char code,
 			   const char *summary, const char *error,
 			   const char *remote, const char *local,
 			   int summary_width)
@@ -830,15 +845,17 @@ static void format_display(struct strbuf *display, char code,
 	int width = (summary_width + strlen(summary) - gettext_width(summary));
 
 	strbuf_addf(display, "%c %-*s ", code, width, summary);
+	prepare_compact_format();
 	if (!compact_format)
-		print_remote_to_local(display, remote, local);
+		print_remote_to_local(ref_map, display, remote, local);
 	else
-		print_compact(display, remote, local);
+		print_compact(ref_map, display, remote, local);
 	if (error)
 		strbuf_addf(display, "  (%s)", error);
 }
 
-static int update_local_ref(struct ref *ref,
+static int update_local_ref(struct ref *ref_map,
+			    struct ref *ref,
 			    struct ref_transaction *transaction,
 			    const char *remote,
 			    const struct ref *remote_ref,
@@ -857,7 +874,7 @@ static int update_local_ref(struct ref *ref,
 
 	if (oideq(&ref->old_oid, &ref->new_oid)) {
 		if (verbosity > 0)
-			format_display(display, '=', _("[up to date]"), NULL,
+			format_display(ref_map, display, '=', _("[up to date]"), NULL,
 				       remote, pretty_ref, summary_width);
 		return 0;
 	}
@@ -870,7 +887,7 @@ static int update_local_ref(struct ref *ref,
 		 * If this is the head, and it's not okay to update
 		 * the head, and the old value of the head isn't empty...
 		 */
-		format_display(display, '!', _("[rejected]"),
+		format_display(ref_map, display, '!', _("[rejected]"),
 			       _("can't fetch in current branch"),
 			       remote, pretty_ref, summary_width);
 		return 1;
@@ -881,12 +898,12 @@ static int update_local_ref(struct ref *ref,
 		if (force || ref->force) {
 			int r;
 			r = s_update_ref("updating tag", ref, transaction, 0);
-			format_display(display, r ? '!' : 't', _("[tag update]"),
+			format_display(ref_map, display, r ? '!' : 't', _("[tag update]"),
 				       r ? _("unable to update local ref") : NULL,
 				       remote, pretty_ref, summary_width);
 			return r;
 		} else {
-			format_display(display, '!', _("[rejected]"), _("would clobber existing tag"),
+			format_display(ref_map, display, '!', _("[rejected]"), _("would clobber existing tag"),
 				       remote, pretty_ref, summary_width);
 			return 1;
 		}
@@ -918,7 +935,7 @@ static int update_local_ref(struct ref *ref,
 		}
 
 		r = s_update_ref(msg, ref, transaction, 0);
-		format_display(display, r ? '!' : '*', what,
+		format_display(ref_map, display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
 		return r;
@@ -940,7 +957,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
 		r = s_update_ref("fast-forward", ref, transaction, 1);
-		format_display(display, r ? '!' : ' ', quickref.buf,
+		format_display(ref_map, display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
 		strbuf_release(&quickref);
@@ -952,13 +969,13 @@ static int update_local_ref(struct ref *ref,
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
 		r = s_update_ref("forced-update", ref, transaction, 1);
-		format_display(display, r ? '!' : '+', quickref.buf,
+		format_display(ref_map, display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
 			       remote, pretty_ref, summary_width);
 		strbuf_release(&quickref);
 		return r;
 	} else {
-		format_display(display, '!', _("[rejected]"), _("non-fast-forward"),
+		format_display(ref_map, display, '!', _("[rejected]"), _("non-fast-forward"),
 			       remote, pretty_ref, summary_width);
 		return 1;
 	}
@@ -1111,8 +1128,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	prepare_format_display(ref_map);
-
 	/*
 	 * We do a pass for each fetch_head_status type in their enum order, so
 	 * merged entries are written before not-for-merge. That lets readers
@@ -1187,8 +1202,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, transaction, what,
-						       rm, &note, summary_width);
+				rc |= update_local_ref(ref_map, ref,
+						       transaction, what, rm,
+						       &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
 				/*
@@ -1196,7 +1212,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				 * would be written to FETCH_HEAD, if --dry-run
 				 * is set).
 				 */
-				format_display(&note, '*',
+				format_display(ref_map, &note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
@@ -1357,7 +1373,7 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
 				fprintf(stderr, _("From %.*s\n"), url_len, url);
 				shown_url = 1;
 			}
-			format_display(&sb, '-', _("[deleted]"), NULL,
+			format_display(ref_map, &sb, '-', _("[deleted]"), NULL,
 				       _("(none)"), prettify_refname(ref->name),
 				       summary_width);
 			fprintf(stderr, " %s\n",sb.buf);
Junio C Hamano Aug. 25, 2021, 5:51 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> When fetching, Git will by default print a list of all updated refs in a
> nicely formatted table. In order to come up with this table, Git needs
> to iterate refs twice: first to determine the maximum column width, and
> a second time to actually format these changed refs.
>
> While this table will not be printed in case the user passes `--quiet`,
> we still go out of our way and do all these steps. In fact, we even do
> more work compared to not passing `--quiet`: without the flag, we will
> skip all references in the column width computation which have not been
> updated, but if it is set we will now compute widths for all refs.

Interesting.  This line

	/* uptodate lines are only shown on high verbosity level */
	if (!verbosity && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
		return;

at the beginning of the adjust_refcol_width() function indeed does
not skip if verbosity is negative, so the comment is wrong---it is
not only computed on high verbosity level.  Why doesn't this patch
include a change like this then?

	if (verbosity <= 0 || oideq(&ref->peer_ref->old_oid, &ref->old_oid))
		return;

Another thing I notice is this part from store_updated_refs():

			if (note.len) {
				if (verbosity >= 0 && !shown_url) {
					fprintf(stderr, _("From %.*s\n"),
							url_len, url);
					shown_url = 1;
				}
				if (verbosity >= 0)
					fprintf(stderr, " %s\n", note.buf);
			}

We no longer need to check for verbosity, right?

Other than these two, I like the approach a lot.
Patrick Steinhardt Aug. 25, 2021, 6:03 p.m. UTC | #3
On Wed, Aug 25, 2021 at 10:51:55AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When fetching, Git will by default print a list of all updated refs in a
> > nicely formatted table. In order to come up with this table, Git needs
> > to iterate refs twice: first to determine the maximum column width, and
> > a second time to actually format these changed refs.
> >
> > While this table will not be printed in case the user passes `--quiet`,
> > we still go out of our way and do all these steps. In fact, we even do
> > more work compared to not passing `--quiet`: without the flag, we will
> > skip all references in the column width computation which have not been
> > updated, but if it is set we will now compute widths for all refs.
> 
> Interesting.  This line
> 
> 	/* uptodate lines are only shown on high verbosity level */
> 	if (!verbosity && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
> 		return;
> 
> at the beginning of the adjust_refcol_width() function indeed does
> not skip if verbosity is negative, so the comment is wrong---it is
> not only computed on high verbosity level.  Why doesn't this patch
> include a change like this then?
> 
> 	if (verbosity <= 0 || oideq(&ref->peer_ref->old_oid, &ref->old_oid))
> 		return;

This was indeed my first iteration. But if we just fix the condition
like you do here, then we still iterate over all refs even though we
know that we're not going to do anything with them. So my version just
skips over the iteration completely.

> Another thing I notice is this part from store_updated_refs():
> 
> 			if (note.len) {
> 				if (verbosity >= 0 && !shown_url) {
> 					fprintf(stderr, _("From %.*s\n"),
> 							url_len, url);
> 					shown_url = 1;
> 				}
> 				if (verbosity >= 0)
> 					fprintf(stderr, " %s\n", note.buf);
> 			}
> 
> We no longer need to check for verbosity, right?

Right. It would be less obvious though that we indeed never print to the
buffer if `verbosity < 0`, which is why I bailed on that refactoring.

I wonder whether we can just refactor this such that the buffer is
caller-provided. If `--quiet`, the caller just passes a `NULL` pointer
so it's explicit that it cannot be written to. This would also address
Ævar's feedback about the "tricky action at a distance", which is valid
criticism in my eyes.

Patrick
Junio C Hamano Aug. 25, 2021, 6:37 p.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

>> Interesting.  This line
>> 
>> 	/* uptodate lines are only shown on high verbosity level */
>> 	if (!verbosity && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
>> 		return;
>> 
>> at the beginning of the adjust_refcol_width() function indeed does
>> not skip if verbosity is negative, so the comment is wrong---it is
>> not only computed on high verbosity level.  Why doesn't this patch
>> include a change like this then?
>> 
>> 	if (verbosity <= 0 || oideq(&ref->peer_ref->old_oid, &ref->old_oid))
>> 		return;
>
> This was indeed my first iteration. But if we just fix the condition
> like you do here, then we still iterate over all refs even though we
> know that we're not going to do anything with them. So my version just
> skips over the iteration completely.

I didn't ask "why isn't this patch the above change and nothing else?"

With or without the "don't bother counting columns under --quiet"
fix, the condition used in the above if statement is wrong.  With
the fix, only because the function is never called with negative
verbosity, "if (!verbosity)" means the same thing as "only shown on
high verbosity level", but the reader must have followed the flow of
logic to realize it.  If you fix the condition to exclude all
non-verbose cases (including --quiet), the readers do not have to do
so to realize that the code is doing what the comment claims that it
is doing.

>> Another thing I notice is this part from store_updated_refs():
>> 
>> 			if (note.len) {
>> 				if (verbosity >= 0 && !shown_url) {
>> 					fprintf(stderr, _("From %.*s\n"),
>> 							url_len, url);
>> 					shown_url = 1;
>> 				}
>> 				if (verbosity >= 0)
>> 					fprintf(stderr, " %s\n", note.buf);
>> 			}
>> 
>> We no longer need to check for verbosity, right?
>
> Right. It would be less obvious though that we indeed never print to the
> buffer if `verbosity < 0`, which is why I bailed on that refactoring.

I actually think that the resulting code will still be obvious, even
with the (apparently unrelated) URL display, and actually even be
better, if we drop the condition on verbosity from this code.  

The code that led to this part would have stuffed bytes in the note
strbuf only when we wanted to show something based on the verbosity
setting, and we show what is in note.len only when we have something
to say (the URL thing is to give the header for the message).
Decision to give what contents to show (or not to show at all) is
made elsewhere---it happens to involve verbosity and perhaps in the
future it may use some other input, but this part of the code does
not have to know.

Thanks.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e064687dbd..d064b66512 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -757,6 +757,9 @@  static void prepare_format_display(struct ref *ref_map)
 		die(_("configuration fetch.output contains invalid value %s"),
 		    format);
 
+	if (verbosity < 0)
+		return;
+
 	for (rm = ref_map; rm; rm = rm->next) {
 		if (rm->status == REF_STATUS_REJECT_SHALLOW ||
 		    !rm->peer_ref ||
@@ -827,7 +830,12 @@  static void format_display(struct strbuf *display, char code,
 			   const char *remote, const char *local,
 			   int summary_width)
 {
-	int width = (summary_width + strlen(summary) - gettext_width(summary));
+	int width;
+
+	if (verbosity < 0)
+		return;
+
+	width = (summary_width + strlen(summary) - gettext_width(summary));
 
 	strbuf_addf(display, "%c %-*s ", code, width, summary);
 	if (!compact_format)