[0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after

Message ID	pull.1411.git.1667947465.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1411.git.1667947465.gitgitgadget@gmail.com> From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> Date: Tue, 08 Nov 2022 22:44:20 +0000 Subject: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, phillip.wood123@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, Victoria Dye <vdye@github.com> Precedence: bulk
Series	Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after \| expand [0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after [1/5] cache-tree: add perf test comparing update and prime [2/5] unpack-trees: add 'skip_cache_tree_update' option [3/5] reset: use 'skip_cache_tree_update' option [4/5] read-tree: use 'skip_cache_tree_update' option [5/5] rebase: use 'skip_cache_tree_update' option

Message ID

pull.1411.git.1667947465.gitgitgadget@gmail.com (mailing list archive)

Headers

Message-Id: <pull.1411.git.1667947465.gitgitgadget@gmail.com>
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
Date: Tue, 08 Nov 2022 22:44:20 +0000
Subject: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is
 called immediate
 after
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: gitster@pobox.com, phillip.wood123@gmail.com,
        derrickstolee@github.com, jonathantanmy@google.com,
        Victoria Dye <vdye@github.com>
Precedence: bulk

Series

Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after | expand

Message

Philippe Blain via GitGitGadget Nov. 8, 2022, 10:44 p.m. UTC

Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a
fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
(although both are so fast that the total execution time of 100 invocations
is needed to compare the results in the default perf repo).

The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.

Thanks!

 * Victoria

[1] https://lore.kernel.org/git/xmqqlf30edvf.fsf@gitster.g/ [2]
https://lore.kernel.org/git/f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com/

Victoria Dye (5):
  cache-tree: add perf test comparing update and prime
  unpack-trees: add 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  rebase: use 'skip_cache_tree_update' option

 Makefile                           |  1 +
 builtin/read-tree.c                |  4 +++
 builtin/reset.c                    |  2 ++
 reset.c                            |  1 +
 sequencer.c                        |  1 +
 t/helper/test-cache-tree.c         | 52 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c               |  1 +
 t/helper/test-tool.h               |  1 +
 t/perf/p0006-read-tree-checkout.sh |  8 +++++
 t/perf/p0090-cache-tree.sh         | 27 ++++++++++++++++
 t/perf/p7102-reset.sh              | 21 ++++++++++++
 t/t1022-read-tree-partial-clone.sh |  2 +-
 unpack-trees.c                     |  3 +-
 unpack-trees.h                     |  3 +-
 14 files changed, 124 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-cache-tree.c
 create mode 100755 t/perf/p0090-cache-tree.sh
 create mode 100755 t/perf/p7102-reset.sh


base-commit: 3b08839926fcc7cc48cf4c759737c1a71af430c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1411%2Fvdye%2Ffeature%2Fcache-tree-optimization-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1411/vdye/feature/cache-tree-optimization-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1411

Comments

Derrick Stolee Nov. 9, 2022, 3:23 p.m. UTC | #1

On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
> Following up on a discussion [1] around cache tree refreshes in 'git reset',
> this series updates callers of 'unpack_trees()' to skip its internal
> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
> expensive operation, and it is redundant when 'prime_cache_tree()' clears
> and rebuilds the cache tree from scratch immediately after.
> 
> The first patch adds a test directly comparing the execution time of
> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
> that on a fully-valid cache tree, they perform the same, but on a
> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
> (although both are so fast that the total execution time of 100 invocations
> is needed to compare the results in the default perf repo).

One thing I found interesting is how you needed 200 iterations to show
a meaningful change in this test script, but in the case of 'git reset'
we can see sizeable improvements even with a single iteration.

Is there something about this test that is artificially speeding up
these iterations? Perhaps the index has up-to-date filesystem information
that allows these methods to avoid filesystem interactions that are
necessary in the 'git reset' case?
 
> The second patch introduces the 'skip_cache_tree_update' option for
> 'unpack_trees()', but does not use it yet.
> 
> The remaining three patches update callers that make the aforementioned
> redundant cache tree updates. The performance impact on these callers ranges
> from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
> details can be found in the commit messages of the patch associated with the
> affected code path.

I found these patches well motivated and the code change to be so
unobtrusive that the benefits are well worth the new options.

Thanks,
-Stolee

Victoria Dye Nov. 9, 2022, 10:18 p.m. UTC | #2

Derrick Stolee wrote:
> On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
>> Following up on a discussion [1] around cache tree refreshes in 'git reset',
>> this series updates callers of 'unpack_trees()' to skip its internal
>> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
>> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
>> expensive operation, and it is redundant when 'prime_cache_tree()' clears
>> and rebuilds the cache tree from scratch immediately after.
>>
>> The first patch adds a test directly comparing the execution time of
>> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
>> that on a fully-valid cache tree, they perform the same, but on a
>> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
>> (although both are so fast that the total execution time of 100 invocations
>> is needed to compare the results in the default perf repo).
> 
> One thing I found interesting is how you needed 200 iterations to show
> a meaningful change in this test script, but in the case of 'git reset'
> we can see sizeable improvements even with a single iteration.

All of the new performance tests run with multiple iterations: 20 for reset
(10 iterations of two resets each), 100 for read-tree, 200 for the
comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
were picked mostly by trial-and-error, to strike a balance of "the test
doesn't take too long to run" and "the change in execution time is clearly
visible in the results."

> 
> Is there something about this test that is artificially speeding up
> these iterations? Perhaps the index has up-to-date filesystem information
> that allows these methods to avoid filesystem interactions that are
> necessary in the 'git reset' case?

I would expect the "cache_tree_update, invalid" test's execution time, when
scaled to the iterations of 'read-tree' and 'reset', to match the change in
timing of those commands, but the command tests are reporting *much* larger
improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
the results are *consistently* >=0.9s).

Per trace2 logs, a single invocation of 'read-tree' matching the one added
in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
the total time would be ~1.01s, which lines up with the 'p0006' test
results. However, the trace2 results for "test-tool cache-tree --count 3
--fresh --update" show the first iteration taking 0.013060s (looks good),
then the next taking 0.003755s, then 0.004026s (_much_ faster than
expected).

To be honest, I can't figure out what's going on there. It might be some
kind of runtime/memory optimization with repeatedly rebuilding the same
cache tree (doesn't seem to be compiler optimization, since the speedup
still happens with '-O0'). The only sure-fire way to avoid it seems to be
moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
Unfortunately, the command initialization overhead *really* slows things
down, but I can add a "control" test (with no cache tree refresh) to
quantify how long that initialization takes.

While looking into this, I found a few other things I'd like to add to/fix
in that test (add a "partially-invalidated" cache tree case, use the
original cache tree OID in 'prime_cache_tree()' rather than the OID at
HEAD), so I'll re-roll with those + the updated iteration logic.

Thanks for bringing this up!

>  
>> The second patch introduces the 'skip_cache_tree_update' option for
>> 'unpack_trees()', but does not use it yet.
>>
>> The remaining three patches update callers that make the aforementioned
>> redundant cache tree updates. The performance impact on these callers ranges
>> from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
>> details can be found in the commit messages of the patch associated with the
>> affected code path.
> 
> I found these patches well motivated and the code change to be so
> unobtrusive that the benefits are well worth the new options.

Thanks!

> 
> Thanks,
> -Stolee

Taylor Blau Nov. 9, 2022, 11:01 p.m. UTC | #3

On Tue, Nov 08, 2022 at 10:44:20PM +0000, Victoria Dye via GitGitGadget wrote:
> Victoria Dye (5):
>   cache-tree: add perf test comparing update and prime
>   unpack-trees: add 'skip_cache_tree_update' option
>   reset: use 'skip_cache_tree_update' option
>   read-tree: use 'skip_cache_tree_update' option
>   rebase: use 'skip_cache_tree_update' option

Very cleanly done and demonstrated. I'll keep an eye out for the reroll
that you and Stolee discussed below.


Thanks,
Taylor

Derrick Stolee Nov. 10, 2022, 2:44 p.m. UTC | #4

On 11/9/2022 5:18 PM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
>>> Following up on a discussion [1] around cache tree refreshes in 'git reset',
>>> this series updates callers of 'unpack_trees()' to skip its internal
>>> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
>>> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
>>> expensive operation, and it is redundant when 'prime_cache_tree()' clears
>>> and rebuilds the cache tree from scratch immediately after.
>>>
>>> The first patch adds a test directly comparing the execution time of
>>> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
>>> that on a fully-valid cache tree, they perform the same, but on a
>>> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
>>> (although both are so fast that the total execution time of 100 invocations
>>> is needed to compare the results in the default perf repo).
>>
>> One thing I found interesting is how you needed 200 iterations to show
>> a meaningful change in this test script, but in the case of 'git reset'
>> we can see sizeable improvements even with a single iteration.
> 
> All of the new performance tests run with multiple iterations: 20 for reset
> (10 iterations of two resets each), 100 for read-tree, 200 for the
> comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
> were picked mostly by trial-and-error, to strike a balance of "the test
> doesn't take too long to run" and "the change in execution time is clearly
> visible in the results."

Thanks for pointing out my misunderstanding. I missed the repeat counts
because 2-3 seconds "seemed right" based on performance tests of large
monorepos, but clearly that's not right when using the Git repository for
performance tests.
>> Is there something about this test that is artificially speeding up
>> these iterations? Perhaps the index has up-to-date filesystem information
>> that allows these methods to avoid filesystem interactions that are
>> necessary in the 'git reset' case?
> 
> I would expect the "cache_tree_update, invalid" test's execution time, when
> scaled to the iterations of 'read-tree' and 'reset', to match the change in
> timing of those commands, but the command tests are reporting *much* larger
> improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
> the results are *consistently* >=0.9s).
> 
> Per trace2 logs, a single invocation of 'read-tree' matching the one added
> in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
> the total time would be ~1.01s, which lines up with the 'p0006' test
> results. However, the trace2 results for "test-tool cache-tree --count 3
> --fresh --update" show the first iteration taking 0.013060s (looks good),
> then the next taking 0.003755s, then 0.004026s (_much_ faster than
> expected).
> 
> To be honest, I can't figure out what's going on there. It might be some
> kind of runtime/memory optimization with repeatedly rebuilding the same
> cache tree (doesn't seem to be compiler optimization, since the speedup
> still happens with '-O0'). The only sure-fire way to avoid it seems to be
> moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
> Unfortunately, the command initialization overhead *really* slows things
> down, but I can add a "control" test (with no cache tree refresh) to
> quantify how long that initialization takes.

Getting unit-level performance tests is always tricky. Sometimes the best
way to do it is to collect a sample using GIT_TRACE2_PERF and then manually
collect the region times. It could be a fun project to integrate region
measurements into the performance test suite instead of only end-to-end
timings.
 
> While looking into this, I found a few other things I'd like to add to/fix
> in that test (add a "partially-invalidated" cache tree case, use the
> original cache tree OID in 'prime_cache_tree()' rather than the OID at
> HEAD), so I'll re-roll with those + the updated iteration logic.

Taking a look now. Thanks!

-Stolee