[12/15,GSOC] cat-file: reuse ref-filter logic

Message ID	e04b970ccb0cad8c0b651ab11f5f52063bd84606.1625155693.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <e04b970ccb0cad8c0b651ab11f5f52063bd84606.1625155693.git.gitgitgadget@gmail.com> In-Reply-To: <pull.989.git.1625155693.gitgitgadget@gmail.com> References: <pull.989.git.1625155693.gitgitgadget@gmail.com> Date: Thu, 01 Jul 2021 16:08:10 +0000 Subject: [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fcc: Sent To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <christian.couder@gmail.com>, Hariom Verma <hariom18599@gmail.com>, Bagas Sanjaya <bagasdotme@gmail.com>, Jeff King <peff@peff.net>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, ZheNing Hu <adlternative@gmail.com>, ZheNing Hu <adlternative@gmail.com> Precedence: bulk From: ZheNing Hu <adlternative@gmail.com>
Series	cat-file: reuse ref-filter logic \| expand [00/15,GSOC] cat-file: reuse ref-filter logic [01/15,GSOC] ref-filter: add obj-type check in grab contents [02/15,GSOC] ref-filter: add %(raw) atom [03/15,GSOC] ref-filter: --format=%(raw) re-support --perl [04/15,GSOC] ref-filter: use non-const ref_format in *_atom_parser() [05/15,GSOC] ref-filter: add %(rest) atom [06/15,GSOC] ref-filter: pass get_object() return value to their callers [07/15,GSOC] ref-filter: introduce free_ref_array_item_value() function [08/15,GSOC] ref-filter: add cat_file_mode in struct ref_format [09/15,GSOC] ref-filter: modify the error message and value in get_object [10/15,GSOC] cat-file: add has_object_file() check [11/15,GSOC] cat-file: change batch_objects parameter name [12/15,GSOC] cat-file: reuse ref-filter logic [13/15,GSOC] cat-file: reuse err buf in batch_object_write() [14/15,GSOC] cat-file: re-implement --textconv, --filters options [15/15,GSOC] ref-filter: remove grab_oid() function

ZheNing Hu July 1, 2021, 4:08 p.m. UTC

From: ZheNing Hu <adlternative@gmail.com>

In order to let cat-file use ref-filter logic, let's do the
following:

1. Change the type of member `format` in struct `batch_options`
to `ref_format`, we will pass it to ref-filter later.
2. Let `batch_objects()` add atoms to format, and use
`verify_ref_format()` to check atoms.
3. Use `format_ref_array_item()` in `batch_object_write()` to
get the formatted data corresponding to the object. If the
return value of `format_ref_array_item()` is equals to zero,
use `batch_write()` to print object data; else if the return
value is less than zero, use `die()` to print the error message
and exit; else if return value is greater than zero, only print
the error message, but don't exit.
4. Use free_ref_array_item_value() to free ref_array_item's
value.

Most of the atoms in `for-each-ref --format` are now supported,
such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
`%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
that pointed to by a ref, "for-each-ref"'s family can naturally use
these atoms, but not all objects are pointed to be a ref, so "cat-file"
will not be able to use them.

The performance for `git cat-file --batch-all-objects
--batch-check` on the Git repository itself with performance
testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
1.134 s ±  0.063 s.

The performance for `git cat-file --batch-all-objects --batch
>/dev/null` on the Git repository itself with performance testing
tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
total" to "33.69s user 1.54s system 87% cpu 40.258 total".

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-cat-file.txt |   6 +
 builtin/cat-file.c             | 242 ++++++-------------------------
 t/t1006-cat-file.sh            | 251 +++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+), 195 deletions(-)

Ævar Arnfjörð Bjarmason July 2, 2021, 1:36 p.m. UTC | #1

On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote:

> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to let cat-file use ref-filter logic, let's do the
> following:
>
> 1. Change the type of member `format` in struct `batch_options`
> to `ref_format`, we will pass it to ref-filter later.
> 2. Let `batch_objects()` add atoms to format, and use
> `verify_ref_format()` to check atoms.
> 3. Use `format_ref_array_item()` in `batch_object_write()` to
> get the formatted data corresponding to the object. If the
> return value of `format_ref_array_item()` is equals to zero,
> use `batch_write()` to print object data; else if the return
> value is less than zero, use `die()` to print the error message
> and exit; else if return value is greater than zero, only print
> the error message, but don't exit.
> 4. Use free_ref_array_item_value() to free ref_array_item's
> value.
>
> Most of the atoms in `for-each-ref --format` are now supported,
> such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
> that pointed to by a ref, "for-each-ref"'s family can naturally use
> these atoms, but not all objects are pointed to be a ref, so "cat-file"
> will not be able to use them.
>
> The performance for `git cat-file --batch-all-objects
> --batch-check` on the Git repository itself with performance
> testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> 1.134 s ±  0.063 s.
>
> The performance for `git cat-file --batch-all-objects --batch
>>/dev/null` on the Git repository itself with performance testing
> tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> total" to "33.69s user 1.54s system 87% cpu 40.258 total".

This new feature is really nice, but that's a really bad performance
regression. A lot of software in the wild relies on "cat-file --batch"
to be *the* performant interface to git for mass-extrction of object
data.

That's in increase of ~70% and ~20%, respectively. Have you dug into
(e.g. with a profiler) where we're now spending all this time?

Christian Couder July 2, 2021, 1:45 p.m. UTC | #2

On Fri, Jul 2, 2021 at 3:39 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote:

> > The performance for `git cat-file --batch-all-objects
> > --batch-check` on the Git repository itself with performance
> > testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> > 1.134 s ±  0.063 s.
> >
> > The performance for `git cat-file --batch-all-objects --batch
> >>/dev/null` on the Git repository itself with performance testing
> > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> > total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>
> This new feature is really nice, but that's a really bad performance
> regression. A lot of software in the wild relies on "cat-file --batch"
> to be *the* performant interface to git for mass-extrction of object
> data.
>
> That's in increase of ~70% and ~20%, respectively. Have you dug into
> (e.g. with a profiler) where we're now spending all this time?

Yeah, I think it's better to discuss this performance issue as part of
the discussion of this series, instead of as part of the discussion of
the blog posts.

ZheNing Hu July 3, 2021, 11:45 a.m. UTC | #3

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月2日周五 下午9:39写道：
>
>
> On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to let cat-file use ref-filter logic, let's do the
> > following:
> >
> > 1. Change the type of member `format` in struct `batch_options`
> > to `ref_format`, we will pass it to ref-filter later.
> > 2. Let `batch_objects()` add atoms to format, and use
> > `verify_ref_format()` to check atoms.
> > 3. Use `format_ref_array_item()` in `batch_object_write()` to
> > get the formatted data corresponding to the object. If the
> > return value of `format_ref_array_item()` is equals to zero,
> > use `batch_write()` to print object data; else if the return
> > value is less than zero, use `die()` to print the error message
> > and exit; else if return value is greater than zero, only print
> > the error message, but don't exit.
> > 4. Use free_ref_array_item_value() to free ref_array_item's
> > value.
> >
> > Most of the atoms in `for-each-ref --format` are now supported,
> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
> > that pointed to by a ref, "for-each-ref"'s family can naturally use
> > these atoms, but not all objects are pointed to be a ref, so "cat-file"
> > will not be able to use them.
> >
> > The performance for `git cat-file --batch-all-objects
> > --batch-check` on the Git repository itself with performance
> > testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> > 1.134 s ±  0.063 s.
> >
> > The performance for `git cat-file --batch-all-objects --batch
> >>/dev/null` on the Git repository itself with performance testing
> > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> > total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>
> This new feature is really nice, but that's a really bad performance
> regression. A lot of software in the wild relies on "cat-file --batch"
> to be *the* performant interface to git for mass-extrction of object
> data.
>

Thanks, this performance is indeed worrying.

> That's in increase of ~70% and ~20%, respectively. Have you dug into
> (e.g. with a profiler) where we're now spending all this time?

See this two attachment about performance flame graph,
oid_object_info_extended() in get_object() is the key to performance
limitations.

--
ZheNing Hu

Ævar Arnfjörð Bjarmason July 3, 2021, 1:37 p.m. UTC | #4

On Sat, Jul 03 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月2日周五 下午9:39写道：
>>
>>
>> On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote:
>>
>> > From: ZheNing Hu <adlternative@gmail.com>
>> >
>> > In order to let cat-file use ref-filter logic, let's do the
>> > following:
>> >
>> > 1. Change the type of member `format` in struct `batch_options`
>> > to `ref_format`, we will pass it to ref-filter later.
>> > 2. Let `batch_objects()` add atoms to format, and use
>> > `verify_ref_format()` to check atoms.
>> > 3. Use `format_ref_array_item()` in `batch_object_write()` to
>> > get the formatted data corresponding to the object. If the
>> > return value of `format_ref_array_item()` is equals to zero,
>> > use `batch_write()` to print object data; else if the return
>> > value is less than zero, use `die()` to print the error message
>> > and exit; else if return value is greater than zero, only print
>> > the error message, but don't exit.
>> > 4. Use free_ref_array_item_value() to free ref_array_item's
>> > value.
>> >
>> > Most of the atoms in `for-each-ref --format` are now supported,
>> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
>> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
>> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
>> > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
>> > that pointed to by a ref, "for-each-ref"'s family can naturally use
>> > these atoms, but not all objects are pointed to be a ref, so "cat-file"
>> > will not be able to use them.
>> >
>> > The performance for `git cat-file --batch-all-objects
>> > --batch-check` on the Git repository itself with performance
>> > testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
>> > 1.134 s ±  0.063 s.
>> >
>> > The performance for `git cat-file --batch-all-objects --batch
>> >>/dev/null` on the Git repository itself with performance testing
>> > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
>> > total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>>
>> This new feature is really nice, but that's a really bad performance
>> regression. A lot of software in the wild relies on "cat-file --batch"
>> to be *the* performant interface to git for mass-extrction of object
>> data.
>>
>
> Thanks, this performance is indeed worrying.
>
>> That's in increase of ~70% and ~20%, respectively. Have you dug into
>> (e.g. with a profiler) where we're now spending all this time?
>
> See this two attachment about performance flame graph,
> oid_object_info_extended() in get_object() is the key to performance
> limitations.

Most of the problem, although this may not entirely fix the performance
regression, is that you're either looking up everything twice now, or
taking a much more expensive path.

I think using gprof is probably much more handy here. See [1. I did a
`git rev-list --all >rla` and ran that piped into 'git cat-file --batch'
with/without your pathces. Results:
    
    $ gprof ./git-master ./gmon-master.out | head -n 10
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
     14.29      0.02     0.02   475186     0.00     0.00  nth_packed_object_offset
     14.29      0.04     0.02   237835     0.00     0.00  hash_to_hex_algop_r
      7.14      0.05     0.01  5220425     0.00     0.00  hashcmp_algop
      7.14      0.06     0.01  4757120     0.00     0.00  hex2chr
      7.14      0.07     0.01  1732023     0.00     0.00  find_entry_ptr

And:
    
    $ gprof ./git-new ./gmon-new.out |head -n 10
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls  ms/call  ms/call  name
      7.32      0.06     0.06   764570     0.00     0.00  lookup_object
      7.32      0.12     0.06   237835     0.00     0.00  parse_commit_date
      4.88      0.16     0.04   712779     0.00     0.00  nth_packed_object_offset
      3.66      0.19     0.03   964574     0.00     0.00  bsearch_hash
      3.66      0.22     0.03   237835     0.00     0.00  grab_sub_body_contents

If you e.g. make lookup_object() simply die when it's called you'll see
that before we don't call it at all, after your patch it's our #1
function.

Before when we have the simplest case of writing out an object this is
our callstack:
    
    (gdb) bt
    #0  batch_write (opt=0x7fffffffde50, data=0x555555ab9470, len=52) at builtin/cat-file.c:298
    #1  0x000055555558b160 in batch_object_write (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0,
        opt=0x7fffffffde50, data=0x7fffffffd7f0) at builtin/cat-file.c:375
    #2  0x000055555558b36e in batch_one_object (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0, opt=0x7fffffffde50,
        data=0x7fffffffd7f0) at builtin/cat-file.c:431
    #3  0x000055555558b8ed in batch_objects (opt=0x7fffffffde50) at builtin/cat-file.c:588
    #4  0x000055555558c0d3 in cmd_cat_file (argc=0, argv=0x7fffffffe1e0, prefix=0x0) at builtin/cat-file.c:716
    #5  0x0000555555573adb in run_builtin (p=0x555555941870 <commands+240>, argc=2, argv=0x7fffffffe1e0) at git.c:461
    #6  0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1e0) at git.c:714
    #7  0x0000555555574182 in run_argv (argcp=0x7fffffffe08c, argv=0x7fffffffe080) at git.c:781
    #8  0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1e0) at git.c:912
    #9  0x000055555565b508 in main (argc=3, argv=0x7fffffffe1d8) at common-main.c:52

After (well, here we're not even to writing it, just looking it up), the
BUG() is my addition:
    
    (gdb) bt
    #0  BUG_fl (file=0x5555558ade71 "object.c", line=91, fmt=0x5555558ade6e "yo") at usage.c:290
    #1  0x00005555557441ca in lookup_object (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at object.c:91
    #2  0x000055555569dfc8 in lookup_commit (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at commit.c:62
    #3  0x00005555557445f5 in parse_object_buffer (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>, type=OBJ_COMMIT, size=342, buffer=0x555555ab48e0,
        eaten_p=0x7fffffffd36c) at object.c:215
    #4  0x0000555555785094 in get_object (ref=0x7fffffffd6b0, deref=0, obj=0x7fffffffd520, oi=0x555555975160 <oi>, err=0x7fffffffd860) at ref-filter.c:1803
    #5  0x0000555555785c99 in populate_value (ref=0x7fffffffd6b0, err=0x7fffffffd860) at ref-filter.c:2030
    #6  0x0000555555785d7b in get_ref_atom_value (ref=0x7fffffffd6b0, atom=0, v=0x7fffffffd628, err=0x7fffffffd860) at ref-filter.c:2064
    #7  0x000055555578742f in format_ref_array_item (info=0x7fffffffd6b0, format=0x7fffffffde30, final_buf=0x7fffffffd880, error_buf=0x7fffffffd860)
        at ref-filter.c:2659
    #8  0x000055555558ab1c in batch_object_write (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880,
        err=0x7fffffffd860, opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:225
    #9  0x000055555558ade5 in batch_one_object (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880, err=0x7fffffffd860,
        opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:298
    #10 0x000055555558b394 in batch_objects (batch=0x7fffffffde10, options=0x7fffffffd900) at builtin/cat-file.c:458
    #11 0x000055555558bbd5 in cmd_cat_file (argc=0, argv=0x7fffffffe1d0, prefix=0x0) at builtin/cat-file.c:585
    #12 0x0000555555573adb in run_builtin (p=0x555555942850 <commands+240>, argc=2, argv=0x7fffffffe1d0) at git.c:461
    #13 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1d0) at git.c:714
    #14 0x0000555555574182 in run_argv (argcp=0x7fffffffe07c, argv=0x7fffffffe070) at git.c:781
    #15 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1d0) at git.c:912
    #16 0x000055555565afc1 in main (argc=3, argv=0x7fffffffe1c8) at common-main.c:52

I.e. before in batch_object_write() we could use a cheap path of doing
oid_object_info_extended() and directly emitting the content. With your
version we're all the way down to parse_object_buffer(). Meaning that
we're going to be creating a "struct commit" or whatever if we're
looking at a commit, just to print out the raw contents.

I think the best next step here is to add a t/perf/t1006-cat-file.sh
test to stress these various cases, i.e. a plain --batch without a
format, with format, with --batch-all-objects etc. Try to then run that
on each of your commits against the preceding one and see if/when you
have regressions.

Aside from any double-lookups etc, the problem is also that you're
trying to handle a really general case (e.g. with textconv) in a
codepath that needs to be really fast. If anything we should be
inserting some more more optimization shortcuts for common cases into
it. E.g. I was able to trivially speed up 'cat-file --batch-check' on
"master" by hardcoding a path for our default format (patch at the end
of this mail):
    
    # passed all 2 test(s)
    1..2
    Test                             origin/master     HEAD
    -------------------------------------------------------------------------
    1006.2: cat-file --batch-check   0.60(0.37+0.23)   0.35(0.33+0.02) -41.7%

Anything that needs to handle general format patching is going to be
slower. I think /some/ performance regression if we're using something
that's not just the current light strbuf_expand() probably can't be
avoided, but we could/should try to make up the difference at least for
the common case of --batch or --batch-check without --textconv and
perhaps hardcode (and document that it's faster) a path for the default
formats).

1. https://sourceware.org/binutils/docs/gprof/Output.html

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..775b7dd1b01 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -360,6 +360,11 @@ static void batch_object_write(const char *obj_name,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
+	int default_format = !strcmp(opt->format, "%(objectname) %(objecttype) %(objectsize)");
+	struct strbuf type_name = STRBUF_INIT;
+	if (default_format)
+		data->info.type_name = &type_name;
+		
 	if (!data->skip_object_info &&
 	    oid_object_info_extended(the_repository, &data->oid, &data->info,
 				     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
@@ -369,14 +374,20 @@ static void batch_object_write(const char *obj_name,
 		return;
 	}
 
-	strbuf_reset(scratch);
-	strbuf_expand(scratch, opt->format, expand_format, data);
-	strbuf_addch(scratch, '\n');
-	batch_write(opt, scratch->buf, scratch->len);
-
-	if (opt->print_contents) {
-		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
+	if (default_format && !opt->print_contents) {
+		fprintf(stdout, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
+			data->info.type_name->buf, 
+			(uintmax_t)*data->info.sizep);
+	} else {
+		strbuf_reset(scratch);
+		strbuf_expand(scratch, opt->format, expand_format, data);
+		strbuf_addch(scratch, '\n');
+		batch_write(opt, scratch->buf, scratch->len);
+
+		if (opt->print_contents) {
+			print_object_or_die(opt, data);
+			batch_write(opt, "\n", 1);
+		}
 	}
 }
 
diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh
new file mode 100755
index 00000000000..a295d334715
--- /dev/null
+++ b/t/perf/p1006-cat-file.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+test_description='Basic sort performance tests'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'setup' '
+	git rev-list --all >rla
+'
+
+test_perf 'cat-file --batch-check' '
+	git cat-file --batch-check <rla
+'
+
+test_done

ZheNing Hu July 3, 2021, 2:17 p.m. UTC | #5

ZheNing Hu <adlternative@gmail.com> 于2021年7月3日周六 下午7:45写道：
>
> See this two attachment about performance flame graph,
> oid_object_info_extended() in get_object() is the key to performance
> limitations.
>

Well, from the current point of view, The time of oid_object_info_extended()
before and after reconstruction is similar. But after refactoring,
there are a lot
of additional detections in ref-filter.

E.g. append_literal() 1.68%  append_atom 1.63% ...

--
ZheNing Hu

ZheNing Hu July 4, 2021, 11:10 a.m. UTC | #6

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月3日周六 下午10:18写道：

> Most of the problem, although this may not entirely fix the performance
> regression, is that you're either looking up everything twice now, or
> taking a much more expensive path.
>

Yeah, In get_object(), oid_object_info_extended() is called twice when we
want print object's contents. The original intention is to reduce the call of
oid_object_info_extended() once when using --textconv. Should we make
--textconv and --filters have the same logic? In this way, without using
--textconv and --filters, we can call oid_object_info_extended() only once.

> I think using gprof is probably much more handy here. See [1. I did a
> `git rev-list --all >rla` and ran that piped into 'git cat-file --batch'
> with/without your pathces. Results:
>
>     $ gprof ./git-master ./gmon-master.out | head -n 10
>     Flat profile:
>
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls  ms/call  ms/call  name
>      14.29      0.02     0.02   475186     0.00     0.00  nth_packed_object_offset
>      14.29      0.04     0.02   237835     0.00     0.00  hash_to_hex_algop_r
>       7.14      0.05     0.01  5220425     0.00     0.00  hashcmp_algop
>       7.14      0.06     0.01  4757120     0.00     0.00  hex2chr
>       7.14      0.07     0.01  1732023     0.00     0.00  find_entry_ptr
>
> And:
>
>     $ gprof ./git-new ./gmon-new.out |head -n 10
>     Flat profile:
>
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls  ms/call  ms/call  name
>       7.32      0.06     0.06   764570     0.00     0.00  lookup_object
>       7.32      0.12     0.06   237835     0.00     0.00  parse_commit_date
>       4.88      0.16     0.04   712779     0.00     0.00  nth_packed_object_offset
>       3.66      0.19     0.03   964574     0.00     0.00  bsearch_hash
>       3.66      0.22     0.03   237835     0.00     0.00  grab_sub_body_contents
>

It seems that lookup_object() took a lot of time with the patch of my version .

> If you e.g. make lookup_object() simply die when it's called you'll see
> that before we don't call it at all, after your patch it's our #1
> function.
>
> Before when we have the simplest case of writing out an object this is
> our callstack:
>
>     (gdb) bt
>     #0  batch_write (opt=0x7fffffffde50, data=0x555555ab9470, len=52) at builtin/cat-file.c:298
>     #1  0x000055555558b160 in batch_object_write (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0,
>         opt=0x7fffffffde50, data=0x7fffffffd7f0) at builtin/cat-file.c:375
>     #2  0x000055555558b36e in batch_one_object (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0, opt=0x7fffffffde50,
>         data=0x7fffffffd7f0) at builtin/cat-file.c:431
>     #3  0x000055555558b8ed in batch_objects (opt=0x7fffffffde50) at builtin/cat-file.c:588
>     #4  0x000055555558c0d3 in cmd_cat_file (argc=0, argv=0x7fffffffe1e0, prefix=0x0) at builtin/cat-file.c:716
>     #5  0x0000555555573adb in run_builtin (p=0x555555941870 <commands+240>, argc=2, argv=0x7fffffffe1e0) at git.c:461
>     #6  0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1e0) at git.c:714
>     #7  0x0000555555574182 in run_argv (argcp=0x7fffffffe08c, argv=0x7fffffffe080) at git.c:781
>     #8  0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1e0) at git.c:912
>     #9  0x000055555565b508 in main (argc=3, argv=0x7fffffffe1d8) at common-main.c:52
>
> After (well, here we're not even to writing it, just looking it up), the
> BUG() is my addition:
>
>     (gdb) bt
>     #0  BUG_fl (file=0x5555558ade71 "object.c", line=91, fmt=0x5555558ade6e "yo") at usage.c:290
>     #1  0x00005555557441ca in lookup_object (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at object.c:91
>     #2  0x000055555569dfc8 in lookup_commit (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at commit.c:62
>     #3  0x00005555557445f5 in parse_object_buffer (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>, type=OBJ_COMMIT, size=342, buffer=0x555555ab48e0,
>         eaten_p=0x7fffffffd36c) at object.c:215
>     #4  0x0000555555785094 in get_object (ref=0x7fffffffd6b0, deref=0, obj=0x7fffffffd520, oi=0x555555975160 <oi>, err=0x7fffffffd860) at ref-filter.c:1803
>     #5  0x0000555555785c99 in populate_value (ref=0x7fffffffd6b0, err=0x7fffffffd860) at ref-filter.c:2030
>     #6  0x0000555555785d7b in get_ref_atom_value (ref=0x7fffffffd6b0, atom=0, v=0x7fffffffd628, err=0x7fffffffd860) at ref-filter.c:2064
>     #7  0x000055555578742f in format_ref_array_item (info=0x7fffffffd6b0, format=0x7fffffffde30, final_buf=0x7fffffffd880, error_buf=0x7fffffffd860)
>         at ref-filter.c:2659
>     #8  0x000055555558ab1c in batch_object_write (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880,
>         err=0x7fffffffd860, opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:225
>     #9  0x000055555558ade5 in batch_one_object (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880, err=0x7fffffffd860,
>         opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:298
>     #10 0x000055555558b394 in batch_objects (batch=0x7fffffffde10, options=0x7fffffffd900) at builtin/cat-file.c:458
>     #11 0x000055555558bbd5 in cmd_cat_file (argc=0, argv=0x7fffffffe1d0, prefix=0x0) at builtin/cat-file.c:585
>     #12 0x0000555555573adb in run_builtin (p=0x555555942850 <commands+240>, argc=2, argv=0x7fffffffe1d0) at git.c:461
>     #13 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1d0) at git.c:714
>     #14 0x0000555555574182 in run_argv (argcp=0x7fffffffe07c, argv=0x7fffffffe070) at git.c:781
>     #15 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1d0) at git.c:912
>     #16 0x000055555565afc1 in main (argc=3, argv=0x7fffffffe1c8) at common-main.c:52
>

It seems that the call stack is very deep after using my version.

> I.e. before in batch_object_write() we could use a cheap path of doing
> oid_object_info_extended() and directly emitting the content. With your
> version we're all the way down to parse_object_buffer(). Meaning that
> we're going to be creating a "struct commit" or whatever if we're
> looking at a commit, just to print out the raw contents.
>

I agree that the logic in ref-filter is too complicated in order to be
able to print
the structured object data.

> I think the best next step here is to add a t/perf/t1006-cat-file.sh
> test to stress these various cases, i.e. a plain --batch without a
> format, with format, with --batch-all-objects etc. Try to then run that
> on each of your commits against the preceding one and see if/when you
> have regressions.
>

Make sence.

> Aside from any double-lookups etc, the problem is also that you're
> trying to handle a really general case (e.g. with textconv) in a

Well, "--textconv" is a common situation? Here I may not be sure which
scenarios where the upper application calls "git cat-file --batch" are the
most common.

> codepath that needs to be really fast. If anything we should be
> inserting some more more optimization shortcuts for common cases into
> it. E.g. I was able to trivially speed up 'cat-file --batch-check' on
> "master" by hardcoding a path for our default format (patch at the end
> of this mail):
>
>     # passed all 2 test(s)
>     1..2
>     Test                             origin/master     HEAD
>     -------------------------------------------------------------------------
>     1006.2: cat-file --batch-check   0.60(0.37+0.23)   0.35(0.33+0.02) -41.7%
>
> Anything that needs to handle general format patching is going to be
> slower. I think /some/ performance regression if we're using something
> that's not just the current light strbuf_expand() probably can't be
> avoided, but we could/should try to make up the difference at least for
> the common case of --batch or --batch-check without --textconv and
> perhaps hardcode (and document that it's faster) a path for the default
> formats).
>

Yeah, Like the hardcode in your patch may be a solution to the performance
degradation. This will indeed help those upper-level applications that use the
 common case.

> 1. https://sourceware.org/binutils/docs/gprof/Output.html
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ebf13359e8..775b7dd1b01 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -360,6 +360,11 @@ static void batch_object_write(const char *obj_name,
>                                struct batch_options *opt,
>                                struct expand_data *data)
>  {
> +       int default_format = !strcmp(opt->format, "%(objectname) %(objecttype) %(objectsize)");
> +       struct strbuf type_name = STRBUF_INIT;
> +       if (default_format)
> +               data->info.type_name = &type_name;
> +
>         if (!data->skip_object_info &&
>             oid_object_info_extended(the_repository, &data->oid, &data->info,
>                                      OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> @@ -369,14 +374,20 @@ static void batch_object_write(const char *obj_name,
>                 return;
>         }
>
> -       strbuf_reset(scratch);
> -       strbuf_expand(scratch, opt->format, expand_format, data);
> -       strbuf_addch(scratch, '\n');
> -       batch_write(opt, scratch->buf, scratch->len);
> -
> -       if (opt->print_contents) {
> -               print_object_or_die(opt, data);
> -               batch_write(opt, "\n", 1);
> +       if (default_format && !opt->print_contents) {
> +               fprintf(stdout, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
> +                       data->info.type_name->buf,
> +                       (uintmax_t)*data->info.sizep);
> +       } else {
> +               strbuf_reset(scratch);
> +               strbuf_expand(scratch, opt->format, expand_format, data);
> +               strbuf_addch(scratch, '\n');
> +               batch_write(opt, scratch->buf, scratch->len);
> +
> +               if (opt->print_contents) {
> +                       print_object_or_die(opt, data);
> +                       batch_write(opt, "\n", 1);
> +               }
>         }
>  }
>
> diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh
> new file mode 100755
> index 00000000000..a295d334715
> --- /dev/null
> +++ b/t/perf/p1006-cat-file.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +
> +test_description='Basic sort performance tests'
> +. ./perf-lib.sh
> +
> +test_perf_default_repo
> +
> +test_expect_success 'setup' '
> +       git rev-list --all >rla
> +'
> +
> +test_perf 'cat-file --batch-check' '
> +       git cat-file --batch-check <rla
> +'
> +
> +test_done

Thanks.
--
ZheNing Hu

Ævar Arnfjörð Bjarmason July 5, 2021, 7:18 a.m. UTC | #7

On Sun, Jul 04 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月3日周六 下午10:18写道：
>
>> Most of the problem, although this may not entirely fix the performance
>> regression, is that you're either looking up everything twice now, or
>> taking a much more expensive path.
>>
>
> Yeah, In get_object(), oid_object_info_extended() is called twice when we
> want print object's contents. The original intention is to reduce the call of
> oid_object_info_extended() once when using --textconv. Should we make
> --textconv and --filters have the same logic? In this way, without using
> --textconv and --filters, we can call oid_object_info_extended() only once.
>
>> I think using gprof is probably much more handy here. See [1. I did a
>> `git rev-list --all >rla` and ran that piped into 'git cat-file --batch'
>> with/without your pathces. Results:
>>
>>     $ gprof ./git-master ./gmon-master.out | head -n 10
>>     Flat profile:
>>
>>     Each sample counts as 0.01 seconds.
>>       %   cumulative   self              self     total
>>      time   seconds   seconds    calls  ms/call  ms/call  name
>>      14.29      0.02     0.02   475186     0.00     0.00  nth_packed_object_offset
>>      14.29      0.04     0.02   237835     0.00     0.00  hash_to_hex_algop_r
>>       7.14      0.05     0.01  5220425     0.00     0.00  hashcmp_algop
>>       7.14      0.06     0.01  4757120     0.00     0.00  hex2chr
>>       7.14      0.07     0.01  1732023     0.00     0.00  find_entry_ptr
>>
>> And:
>>
>>     $ gprof ./git-new ./gmon-new.out |head -n 10
>>     Flat profile:
>>
>>     Each sample counts as 0.01 seconds.
>>       %   cumulative   self              self     total
>>      time   seconds   seconds    calls  ms/call  ms/call  name
>>       7.32      0.06     0.06   764570     0.00     0.00  lookup_object
>>       7.32      0.12     0.06   237835     0.00     0.00  parse_commit_date
>>       4.88      0.16     0.04   712779     0.00     0.00  nth_packed_object_offset
>>       3.66      0.19     0.03   964574     0.00     0.00  bsearch_hash
>>       3.66      0.22     0.03   237835     0.00     0.00  grab_sub_body_contents
>>
>
> It seems that lookup_object() took a lot of time with the patch of my version .
>
>> If you e.g. make lookup_object() simply die when it's called you'll see
>> that before we don't call it at all, after your patch it's our #1
>> function.
>>
>> Before when we have the simplest case of writing out an object this is
>> our callstack:
>>
>>     (gdb) bt
>>     #0  batch_write (opt=0x7fffffffde50, data=0x555555ab9470, len=52) at builtin/cat-file.c:298
>>     #1  0x000055555558b160 in batch_object_write (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0,
>>         opt=0x7fffffffde50, data=0x7fffffffd7f0) at builtin/cat-file.c:375
>>     #2  0x000055555558b36e in batch_one_object (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0, opt=0x7fffffffde50,
>>         data=0x7fffffffd7f0) at builtin/cat-file.c:431
>>     #3  0x000055555558b8ed in batch_objects (opt=0x7fffffffde50) at builtin/cat-file.c:588
>>     #4  0x000055555558c0d3 in cmd_cat_file (argc=0, argv=0x7fffffffe1e0, prefix=0x0) at builtin/cat-file.c:716
>>     #5  0x0000555555573adb in run_builtin (p=0x555555941870 <commands+240>, argc=2, argv=0x7fffffffe1e0) at git.c:461
>>     #6  0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1e0) at git.c:714
>>     #7  0x0000555555574182 in run_argv (argcp=0x7fffffffe08c, argv=0x7fffffffe080) at git.c:781
>>     #8  0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1e0) at git.c:912
>>     #9  0x000055555565b508 in main (argc=3, argv=0x7fffffffe1d8) at common-main.c:52
>>
>> After (well, here we're not even to writing it, just looking it up), the
>> BUG() is my addition:
>>
>>     (gdb) bt
>>     #0  BUG_fl (file=0x5555558ade71 "object.c", line=91, fmt=0x5555558ade6e "yo") at usage.c:290
>>     #1  0x00005555557441ca in lookup_object (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at object.c:91
>>     #2  0x000055555569dfc8 in lookup_commit (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at commit.c:62
>>     #3  0x00005555557445f5 in parse_object_buffer (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>, type=OBJ_COMMIT, size=342, buffer=0x555555ab48e0,
>>         eaten_p=0x7fffffffd36c) at object.c:215
>>     #4  0x0000555555785094 in get_object (ref=0x7fffffffd6b0, deref=0, obj=0x7fffffffd520, oi=0x555555975160 <oi>, err=0x7fffffffd860) at ref-filter.c:1803
>>     #5  0x0000555555785c99 in populate_value (ref=0x7fffffffd6b0, err=0x7fffffffd860) at ref-filter.c:2030
>>     #6  0x0000555555785d7b in get_ref_atom_value (ref=0x7fffffffd6b0, atom=0, v=0x7fffffffd628, err=0x7fffffffd860) at ref-filter.c:2064
>>     #7  0x000055555578742f in format_ref_array_item (info=0x7fffffffd6b0, format=0x7fffffffde30, final_buf=0x7fffffffd880, error_buf=0x7fffffffd860)
>>         at ref-filter.c:2659
>>     #8  0x000055555558ab1c in batch_object_write (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880,
>>         err=0x7fffffffd860, opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:225
>>     #9  0x000055555558ade5 in batch_one_object (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880, err=0x7fffffffd860,
>>         opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:298
>>     #10 0x000055555558b394 in batch_objects (batch=0x7fffffffde10, options=0x7fffffffd900) at builtin/cat-file.c:458
>>     #11 0x000055555558bbd5 in cmd_cat_file (argc=0, argv=0x7fffffffe1d0, prefix=0x0) at builtin/cat-file.c:585
>>     #12 0x0000555555573adb in run_builtin (p=0x555555942850 <commands+240>, argc=2, argv=0x7fffffffe1d0) at git.c:461
>>     #13 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1d0) at git.c:714
>>     #14 0x0000555555574182 in run_argv (argcp=0x7fffffffe07c, argv=0x7fffffffe070) at git.c:781
>>     #15 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1d0) at git.c:912
>>     #16 0x000055555565afc1 in main (argc=3, argv=0x7fffffffe1c8) at common-main.c:52
>>
>
> It seems that the call stack is very deep after using my version.
>
>> I.e. before in batch_object_write() we could use a cheap path of doing
>> oid_object_info_extended() and directly emitting the content. With your
>> version we're all the way down to parse_object_buffer(). Meaning that
>> we're going to be creating a "struct commit" or whatever if we're
>> looking at a commit, just to print out the raw contents.
>>
>
> I agree that the logic in ref-filter is too complicated in order to be
> able to print
> the structured object data.

Well, maybe it isn't. I think the main problem with your performance is
just looking things up again from the object store needlessly, and
possibly redundantly copying things around. E.g. if you have a buffer
with a commit object and you need to xstrdup() that, and xstrdup() the
answer etc, it adds up.

Maybe if all that's gone the formatting will still slow things down, but
I bet it's going to be to a much smaller extent.

>> I think the best next step here is to add a t/perf/t1006-cat-file.sh
>> test to stress these various cases, i.e. a plain --batch without a
>> format, with format, with --batch-all-objects etc. Try to then run that
>> on each of your commits against the preceding one and see if/when you
>> have regressions.
>>
>
> Make sence.
>
>> Aside from any double-lookups etc, the problem is also that you're
>> trying to handle a really general case (e.g. with textconv) in a
>
> Well, "--textconv" is a common situation? Here I may not be sure which
> scenarios where the upper application calls "git cat-file --batch" are the
> most common.

I think --textconv is much rarer. I think it's used for API purposes
e.g. for people who are doing \n translation, or maybe for the likes of
git-lfs (or was that --filters)? I'm not very familiar with it.

For what it's worth 'git cat-file --batch' is very performance sensitive
in backend use-cases like e.g. gitlab.com's gitaly (which executes all
git commands for you on that site). Anything that looks up .. any object
pretty much .. is prining that object line to a persistent "cat-file
--batch" process.

There was a case where some very small (unrelated to cat-file per-se)
slowdown in that codepath ended up slowing down overall API requests for
some calls by >100%, because it's executed in a tight loop. E.g. if
you're showing N commits per page that might be N cat-file --batch
calls.

[12/15,GSOC] cat-file: reuse ref-filter logic

Commit Message

Comments

Patch