[2/2,GSOC] ref-filter: reuse output buffer

Message ID	1c7a69ba072ac740273ef06972122f74cf3fa684.1618831726.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <1c7a69ba072ac740273ef06972122f74cf3fa684.1618831726.git.gitgitgadget@gmail.com> In-Reply-To: <pull.935.git.1618831726.gitgitgadget@gmail.com> References: <pull.935.git.1618831726.gitgitgadget@gmail.com> Date: Mon, 19 Apr 2021 11:28:45 +0000 Subject: [PATCH 2/2] [GSOC] ref-filter: reuse output buffer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fcc: Sent To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>, Christian Couder <chriscool@tuxfamily.org>, Hariom Verma <hariom18599@gmail.com>, Eric Sunshine <sunshine@sunshineco.com>, Derrick Stolee <stolee@gmail.com>, =?utf-8?b?UmVuw6k=?= Scharfe <l.s.r@web.de>, ZheNing Hu <adlternative@gmail.com>, ZheNing Hu <adlternative@gmail.com> Precedence: bulk From: ZheNing Hu <adlternative@gmail.com>
Series	ref-filter: reuse output buffer \| expand [0/2,GSOC] ref-filter: reuse output buffer [1/2,GSOC] ref-filter: get rid of show_ref_array_item [2/2,GSOC] ref-filter: reuse output buffer

Message ID

1c7a69ba072ac740273ef06972122f74cf3fa684.1618831726.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <1c7a69ba072ac740273ef06972122f74cf3fa684.1618831726.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.935.git.1618831726.gitgitgadget@gmail.com>
References: <pull.935.git.1618831726.gitgitgadget@gmail.com>
Date: Mon, 19 Apr 2021 11:28:45 +0000
Subject: [PATCH 2/2] [GSOC] ref-filter: reuse output buffer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Fcc: Sent
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
 Christian Couder <chriscool@tuxfamily.org>,
 Hariom Verma <hariom18599@gmail.com>,
 Eric Sunshine <sunshine@sunshineco.com>, Derrick Stolee <stolee@gmail.com>,
	=?utf-8?b?UmVuw6k=?= Scharfe <l.s.r@web.de>,
 ZheNing Hu <adlternative@gmail.com>, ZheNing Hu <adlternative@gmail.com>
Precedence: bulk
From: ZheNing Hu <adlternative@gmail.com>

Series

ref-filter: reuse output buffer | expand

Commit Message

ZheNing Hu April 19, 2021, 11:28 a.m. UTC

From: ZheNing Hu <adlternative@gmail.com>

When we use `git for-each-ref`, every ref will allocate
its own output strbuf. But we can reuse the final strbuf
for each step ref's output.

The performance for `git for-each-ref` on the Git repository
itself with performance testing tool `hyperfine` changes from
23.7 ms ± 0.9 ms to 22.2 ms ± 1.0 ms. Optimization is relatively
minor.

At the same time, we apply this optimization to `git tag -l`
and `git branch -l`.

This approach is similar to the one used by 79ed0a5
(cat-file: use a single strbuf for all output, 2018-08-14)
to speed up the cat-file builtin.

Helped-by: Jeff King <peff@peff.net>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/branch.c       |  9 +++++----
 builtin/for-each-ref.c | 12 ++++++------
 builtin/tag.c          | 12 ++++++------
 3 files changed, 17 insertions(+), 16 deletions(-)

Comments

Junio C Hamano April 19, 2021, 9:04 p.m. UTC | #1

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/branch.c b/builtin/branch.c
> index bcc00bcf182d..00081de1aed8 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -411,6 +411,8 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
>  {
>  	int i;
>  	struct ref_array array;
> +	struct strbuf out = STRBUF_INIT;
> +	struct strbuf err = STRBUF_INIT;
>  	int maxwidth = 0;
>  	const char *remote_prefix = "";
>  	char *to_free = NULL;
> @@ -440,8 +442,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
>  	ref_array_sort(sorting, &array);
>  
>  	for (i = 0; i < array.nr; i++) {
> -		struct strbuf out = STRBUF_INIT;
> -		struct strbuf err = STRBUF_INIT;
> +		strbuf_reset(&out);
>  		if (format_ref_array_item(array.items[i], format, &out, &err))
>  			die("%s", err.buf);

This change relies on the fact that format_ref_array_item() will
never touch error when it returns 0 (success); otherwise, we'd end
up accumulating err from multiple calls to it in the loop until it
returns non-zero (failure), at which point we emit a single "fatal:"
prefix to show multiple error messages.  Which leans me ...

> @@ -452,10 +453,10 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
>  			fwrite(out.buf, 1, out.len, stdout);
>  			putchar('\n');
>  		}
> -		strbuf_release(&err);
> -		strbuf_release(&out);
>  	}
>  
> +	strbuf_release(&err);
> +	strbuf_release(&out);

... to suspect that the _release() of err will always be a no-op.

It may be easier to follow if err is _reset() always where out is
_reset(), from code cleanliness's perspective.  Then nobody has to
wonder why we do not reset err inside loop even though we release
at the end.

It also is OK to document more clearly that we assume that the loop
will not exit without calling die() when err is not empty.  If we
take that route, we may want to drop _release(&err) at the end.

I do not know which of the two is better, but the code presented
which is halfway between these two does not quite look easy to
reason about.

Thanks.

ZheNing Hu April 20, 2021, 6:05 a.m. UTC | #2

Junio C Hamano <gitster@pobox.com> 于2021年4月20日周二 上午5:04写道：
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/builtin/branch.c b/builtin/branch.c
> > index bcc00bcf182d..00081de1aed8 100644
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -411,6 +411,8 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >  {
> >       int i;
> >       struct ref_array array;
> > +     struct strbuf out = STRBUF_INIT;
> > +     struct strbuf err = STRBUF_INIT;
> >       int maxwidth = 0;
> >       const char *remote_prefix = "";
> >       char *to_free = NULL;
> > @@ -440,8 +442,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >       ref_array_sort(sorting, &array);
> >
> >       for (i = 0; i < array.nr; i++) {
> > -             struct strbuf out = STRBUF_INIT;
> > -             struct strbuf err = STRBUF_INIT;
> > +             strbuf_reset(&out);
> >               if (format_ref_array_item(array.items[i], format, &out, &err))
> >                       die("%s", err.buf);
>
> This change relies on the fact that format_ref_array_item() will
> never touch error when it returns 0 (success); otherwise, we'd end
> up accumulating err from multiple calls to it in the loop until it
> returns non-zero (failure), at which point we emit a single "fatal:"
> prefix to show multiple error messages.  Which leans me ...
>
> > @@ -452,10 +453,10 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >                       fwrite(out.buf, 1, out.len, stdout);
> >                       putchar('\n');
> >               }
> > -             strbuf_release(&err);
> > -             strbuf_release(&out);
> >       }
> >
> > +     strbuf_release(&err);
> > +     strbuf_release(&out);
>
> ... to suspect that the _release() of err will always be a no-op.
>

Yes, it's a no-op to _release(&err) In the present situation.

> It may be easier to follow if err is _reset() always where out is
> _reset(), from code cleanliness's perspective.  Then nobody has to
> wonder why we do not reset err inside loop even though we release
> at the end.
>
> It also is OK to document more clearly that we assume that the loop
> will not exit without calling die() when err is not empty.  If we
> take that route, we may want to drop _release(&err) at the end.
>
> I do not know which of the two is better, but the code presented
> which is halfway between these two does not quite look easy to
> reason about.
>

René Scharfe mention that it make leaks checking harder if we without
releasing this err. So on balance, adding err's _reset() in the loop seems
like a viable option. The change in performance will also be minimal too.

Even though we're using _release() in the loop in v1, and then Peff think that
we don't need to _release() err, but code cleanness wasn't a concern
at the time.

So I'll add _reset() to the loop in the next iteration.

> Thanks.
>

Thanks.
--
ZheNing Hu

diff --git a/builtin/branch.c b/builtin/branch.c
index bcc00bcf182d..00081de1aed8 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -411,6 +411,8 @@  static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 {
 	int i;
 	struct ref_array array;
+	struct strbuf out = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 	int maxwidth = 0;
 	const char *remote_prefix = "";
 	char *to_free = NULL;
@@ -440,8 +442,7 @@  static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
-		struct strbuf out = STRBUF_INIT;
-		struct strbuf err = STRBUF_INIT;
+		strbuf_reset(&out);
 		if (format_ref_array_item(array.items[i], format, &out, &err))
 			die("%s", err.buf);
 		if (column_active(colopts)) {
@@ -452,10 +453,10 @@  static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 			fwrite(out.buf, 1, out.len, stdout);
 			putchar('\n');
 		}
-		strbuf_release(&err);
-		strbuf_release(&out);
 	}
 
+	strbuf_release(&err);
+	strbuf_release(&out);
 	ref_array_clear(&array);
 	free(to_free);
 }
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 8520008604e3..bf24c595c526 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -22,6 +22,8 @@  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_array array;
 	struct ref_filter filter;
 	struct ref_format format = REF_FORMAT_INIT;
+	struct strbuf output = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -81,17 +83,15 @@  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	if (!maxcount || array.nr < maxcount)
 		maxcount = array.nr;
 	for (i = 0; i < maxcount; i++) {
-		struct strbuf output = STRBUF_INIT;
-		struct strbuf err = STRBUF_INIT;
-
+		strbuf_reset(&output);
 		if (format_ref_array_item(array.items[i], &format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
 		putchar('\n');
-
-		strbuf_release(&err);
-		strbuf_release(&output);
 	}
+
+	strbuf_release(&err);
+	strbuf_release(&output);
 	ref_array_clear(&array);
 	return 0;
 }
diff --git a/builtin/tag.c b/builtin/tag.c
index d92d8e110b4d..592af1d154ea 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -39,6 +39,8 @@  static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
 {
 	struct ref_array array;
+	struct strbuf output = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 	char *to_free = NULL;
 	int i;
 
@@ -64,17 +66,15 @@  static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
-		struct strbuf output = STRBUF_INIT;
-		struct strbuf err = STRBUF_INIT;
-
+		strbuf_reset(&output);
 		if (format_ref_array_item(array.items[i], format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
 		putchar('\n');
-
-		strbuf_release(&err);
-		strbuf_release(&output);
 	}
+
+	strbuf_release(&err);
+	strbuf_release(&output);
 	ref_array_clear(&array);
 	free(to_free);

[2/2,GSOC] ref-filter: reuse output buffer

Commit Message

Comments

Patch