mbox series

[00/27,GSOC,RFC] cat-file: reuse ref-filter logic

Message ID pull.1016.git.1628842990.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series cat-file: reuse ref-filter logic | expand

Message

John Passaro via GitGitGadget Aug. 13, 2021, 8:22 a.m. UTC
This patch series makes cat-file reuse ref-filter logic. At the same time,
some performance optimizations have been carried out. It's last version is
here:
https://lore.kernel.org/git/pull.993.v2.git.1626363626.gitgitgadget@gmail.com/#t

It seems that zh/ref-filter-raw-data is still hovering in the next branch
(Because git is rc2) So I now want to show some recent performance
optimizations first.

Change from last version:

 1.  Use free_global_resource() to avoid memory leaks.
 2.  Skip parse_object_buffer() which bring 12.5% performance optimization.
 3.  Merge two for loop in grab_person() which bring 2% performance
     optimization.
 4.  Remove strlen from find_subpos.
 5.  Introducing xstrvfmt_len() and xstrfmt_len().
 6.  Remove second parsing in format_ref_array_item() which bring 1.9%
     performance optimization
 7.  Introduction ref_filter_slopbuf to instread xstrdup("").
 8.  Add deref member to struct used_atom to simplify the logic of the
     program.
 9.  Introduce symref_atom_parser() to make the program logic more concise.
 10. Use switch/case instread of if/else to increase the readability of the
     code.
 11. Reuse finnal buffer which bring 2% performance optimization.
 12. Add need_get_object_info flag to reduce memory comparing.

This is the result of the performance test after I did some optimization:

Test                                        upstream/master   this tree
------------------------------------------------------------------------------------
1006.2: cat-file --batch-check              0.08(0.07+0.00)   0.09(0.08+0.01) +12.5%
1006.3: cat-file --batch-check with atoms   0.06(0.04+0.02)   0.08(0.06+0.02) +33.3%
1006.4: cat-file --batch                    0.49(0.46+0.02)   0.50(0.47+0.03) +2.0%
1006.5: cat-file --batch with atoms         0.47(0.45+0.01)   0.49(0.47+0.02) +4.3%


We can see that the performance of the current patch of git cat-file --batch
is very close to upstream/master. The optimization of git cat-file
--batch-check does not seem obvious, because its optimization degree will be
affected by noise, which may appear in the range of +12.5% to +50.0%. From
an optimistic point of view, the execution time of git cat-file
--batch-check itself is relatively short, the optimization is of course not
obvious.

As GSOC is about to end, this patch series is estimated to be adjusted for
some time, I can only wish this patch can be accepted in the future.

Note: The previous part of this patch series is the duplicate content
belonging to zh/ref-filter-raw-data.

ZheNing Hu (27):
  [GSOC] ref-filter: add obj-type check in grab contents
  [GSOC] ref-filter: add %(raw) atom
  [GSOC] ref-filter: --format=%(raw) support --perl
  [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  [GSOC] ref-filter: add %(rest) atom
  [GSOC] ref-filter: pass get_object() return value to their callers
  [GSOC] ref-filter: introduce free_ref_array_item_value() function
  [GSOC] ref-filter: add cat_file_mode to ref_format
  [GSOC] ref-filter: modify the error message and value in get_object
  [GSOC] cat-file: add has_object_file() check
  [GSOC] cat-file: change batch_objects parameter name
  [GSOC] cat-file: create p1006-cat-file.sh
  [GSOC] cat-file: reuse ref-filter logic
  [GSOC] cat-file: reuse err buf in batch_object_write()
  [GSOC] cat-file: re-implement --textconv, --filters options
  [GSOC] ref-filter: remove grab_oid() function
  [GSOC] ref-filter: performance optimization by skip
    parse_object_buffer
  [GSOC] ref-filter: use atom_type and merge two for loop in grab_person
  [GSOC] ref-filter: remove strlen from find_subpos
  [GSOC] ref-filter: introducing xstrvfmt_len() and xstrfmt_len()
  [GSOC] ref-filter: remove second parsing in format_ref_array_item
  [GSOC] ref-filter: introduction ref_filter_slopbuf
  [GSOC] ref-filter: add deref member to struct used_atom
  [GSOC] ref-filter: introduce symref_atom_parser()
  [GSOC] ref-filter: use switch case instread of if else
  [GSOC] ref-filter: reuse finnal buffer if no stack need
  [GSOC] ref-filter: add need_get_object_info flag to struct expand_data

 Documentation/git-cat-file.txt     |   6 +
 Documentation/git-for-each-ref.txt |   9 +
 builtin/branch.c                   |   2 +
 builtin/cat-file.c                 | 275 +++------
 builtin/for-each-ref.c             |   3 +-
 builtin/tag.c                      |   4 +-
 builtin/verify-tag.c               |   2 +
 quote.c                            |  17 +
 quote.h                            |   1 +
 ref-filter.c                       | 902 +++++++++++++++++++----------
 ref-filter.h                       |  30 +-
 strbuf.c                           |  21 +
 strbuf.h                           |   6 +
 t/perf/p1006-cat-file.sh           |  28 +
 t/t1006-cat-file.sh                | 239 ++++++++
 t/t3203-branch-output.sh           |   4 +
 t/t6300-for-each-ref.sh            | 235 ++++++++
 t/t6301-for-each-ref-errors.sh     |   2 +-
 t/t7004-tag.sh                     |   4 +
 t/t7030-verify-tag.sh              |   4 +
 20 files changed, 1283 insertions(+), 511 deletions(-)
 create mode 100755 t/perf/p1006-cat-file.sh


base-commit: 5d213e46bb7b880238ff5ea3914e940a50ae9369
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1016%2Fadlternative%2Fcat-file-reuse-ref-filter-logic-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1016/adlternative/cat-file-reuse-ref-filter-logic-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1016