mbox series

[v3,0/6] Sparse Index: Integrate with merge, cherry-pick, rebase, and revert

Message ID pull.1019.v3.git.1631100241.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse Index: Integrate with merge, cherry-pick, rebase, and revert | expand

Message

John Cai via GitGitGadget Sept. 8, 2021, 11:23 a.m. UTC
This series integrates the sparse index with commands that perform merges
such as 'git merge', 'git cherry-pick', 'git revert' (free with
cherry-pick), and 'git rebase'.

When the ORT merge strategy is enabled, this allows most merges to succeed
without expanding the sparse index, leading to significant performance
gains. I tested these changes against an internal monorepo with over 2
million paths at HEAD but with a sparse-checkout that only has ~60,000 files
within the sparse-checkout cone. 'git merge' commands went from 5-6 seconds
to 0.750-1.250s.

In the case of the recursive merge strategy, the sparse index is expanded
before the recursive algorithm proceeds. We expect that this is as good as
we can get with that strategy. When the strategy shifts to ORT as the
default, then this will not be a problem except for users who decide to
change the behavior.

Most of the hard work was done by previous series, such as
ds/sparse-index-ignored-files (which this series is based on).


Updates in V3
=============

 * Fixed a typo in patch 2 (it is then moved in patch 3, affecting the
   range-diff)

 * There was a recommendation to use test_config over git config, but that
   is not possible in a subshell. So, it got moved outside of the subshell
   and that works just fine.

 * The other comments were about the use of GIT_TEST_MERGE_ALGORITHM in the
   test script, but the tests that isolate that environment variable are
   only for the 'ensure_not_expanded' tests, not the rest of the tests that
   already exist and are beneficial to cover the 'recursive' mode.


Updates in V2
=============

 * The tests no longer specify GIT_TEST_MERGE_ALGORITHM or directly
   reference "-s ort". By relaxing this condition, I found an issue with
   'git cherry-pick' and 'git rebase' when using the 'recursive' algorithm
   which is fixed in a new patch.

 * Use the pull.twohead config to specify the ORT merge algorithm to avoid
   expanding the sparse index when that is what we are testing.

 * Corrected some misstatements in my commit messages.

Thanks, -Stolee

Derrick Stolee (6):
  diff: ignore sparse paths in diffstat
  merge: make sparse-aware with ORT
  merge-ort: expand only for out-of-cone conflicts
  t1092: add cherry-pick, rebase tests
  sequencer: ensure full index if not ORT strategy
  sparse-index: integrate with cherry-pick and rebase

 builtin/merge.c                          |  3 +
 builtin/rebase.c                         |  6 ++
 builtin/revert.c                         |  3 +
 diff.c                                   |  8 +++
 merge-ort.c                              | 15 ++++
 merge-recursive.c                        |  3 +
 sequencer.c                              |  9 +++
 t/t1092-sparse-checkout-compatibility.sh | 92 +++++++++++++++++++++---
 8 files changed, 129 insertions(+), 10 deletions(-)


base-commit: 91b53f20109fe55635b1815f87afd5d5da68a182
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1019%2Fderrickstolee%2Fsparse-index%2Fmerge-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1019/derrickstolee/sparse-index/merge-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1019

Range-diff vs v2:

 1:  c5ae705648c = 1:  a6963182fe0 diff: ignore sparse paths in diffstat
 2:  bb150483bcf ! 2:  141f7fb26d6 merge: make sparse-aware with ORT
     @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
       
      +	/*
      +	 * We are in a conflicted state. These conflicts might be inside
     -+	 * sparse-directory entries, so expand the index preemtively.
     ++	 * sparse-directory entries, so expand the index preemptively.
      +	 * Also, we set original_cache_nr below, but that might change if
      +	 * index_name_pos() calls ask for paths within sparse directories.
      +	 */
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	ensure_not_expanded add . &&
      +
      +	ensure_not_expanded checkout -f update-deep &&
     ++	test_config -C sparse-index pull.twohead ort &&
      +	(
      +		sane_unset GIT_TEST_MERGE_ALGORITHM &&
     -+		git -C sparse-index config pull.twohead ort &&
      +		ensure_not_expanded merge -m merge update-folder1 &&
      +		ensure_not_expanded merge -m merge update-folder2
      +	)
 3:  815b1b1cfbf ! 3:  c3c9ffd855c merge-ort: expand only for out-of-cone conflicts
     @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
       
       	/*
       	 * We are in a conflicted state. These conflicts might be inside
     --	 * sparse-directory entries, so expand the index preemtively.
     +-	 * sparse-directory entries, so expand the index preemptively.
      -	 * Also, we set original_cache_nr below, but that might change if
      +	 * sparse-directory entries, so check if any entries are outside
      +	 * of the sparse-checkout cone preemptively.
 4:  8032154bc8a = 4:  7aae5727fb7 t1092: add cherry-pick, rebase tests
 5:  90ac85500b8 = 5:  20f5bbae546 sequencer: ensure full index if not ORT strategy
 6:  df4bbec744f ! 6:  36cecb22330 sparse-index: integrate with cherry-pick and rebase
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge with confli
       	init_repos &&
       
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     + 	test_config -C sparse-index pull.twohead ort &&
       	(
       		sane_unset GIT_TEST_MERGE_ALGORITHM &&
     - 		git -C sparse-index config pull.twohead ort &&
      -		ensure_not_expanded merge -m merge update-folder1 &&
      -		ensure_not_expanded merge -m merge update-folder2
      +		for OPERATION in "merge -m merge" cherry-pick rebase

Comments

Elijah Newren Sept. 9, 2021, 2:16 p.m. UTC | #1
On Wed, Sep 8, 2021 at 4:24 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series integrates the sparse index with commands that perform merges
> such as 'git merge', 'git cherry-pick', 'git revert' (free with
> cherry-pick), and 'git rebase'.
>
> When the ORT merge strategy is enabled, this allows most merges to succeed
> without expanding the sparse index, leading to significant performance
> gains. I tested these changes against an internal monorepo with over 2
> million paths at HEAD but with a sparse-checkout that only has ~60,000 files
> within the sparse-checkout cone. 'git merge' commands went from 5-6 seconds
> to 0.750-1.250s.
>
> In the case of the recursive merge strategy, the sparse index is expanded
> before the recursive algorithm proceeds. We expect that this is as good as
> we can get with that strategy. When the strategy shifts to ORT as the
> default, then this will not be a problem except for users who decide to
> change the behavior.
>
> Most of the hard work was done by previous series, such as
> ds/sparse-index-ignored-files (which this series is based on).
>
>
> Updates in V3
> =============
>
>  * Fixed a typo in patch 2 (it is then moved in patch 3, affecting the
>    range-diff)
>
>  * There was a recommendation to use test_config over git config, but that
>    is not possible in a subshell. So, it got moved outside of the subshell
>    and that works just fine.
>
>  * The other comments were about the use of GIT_TEST_MERGE_ALGORITHM in the
>    test script, but the tests that isolate that environment variable are
>    only for the 'ensure_not_expanded' tests, not the rest of the tests that
>    already exist and are beneficial to cover the 'recursive' mode.

This round addresses all my feedback and looks good to me:

Reviewed-by: Elijah Newren <newren@gmail.com>

>
>
> Updates in V2
> =============
>
>  * The tests no longer specify GIT_TEST_MERGE_ALGORITHM or directly
>    reference "-s ort". By relaxing this condition, I found an issue with
>    'git cherry-pick' and 'git rebase' when using the 'recursive' algorithm
>    which is fixed in a new patch.
>
>  * Use the pull.twohead config to specify the ORT merge algorithm to avoid
>    expanding the sparse index when that is what we are testing.
>
>  * Corrected some misstatements in my commit messages.
>
> Thanks, -Stolee
>
> Derrick Stolee (6):
>   diff: ignore sparse paths in diffstat
>   merge: make sparse-aware with ORT
>   merge-ort: expand only for out-of-cone conflicts
>   t1092: add cherry-pick, rebase tests
>   sequencer: ensure full index if not ORT strategy
>   sparse-index: integrate with cherry-pick and rebase
>
>  builtin/merge.c                          |  3 +
>  builtin/rebase.c                         |  6 ++
>  builtin/revert.c                         |  3 +
>  diff.c                                   |  8 +++
>  merge-ort.c                              | 15 ++++
>  merge-recursive.c                        |  3 +
>  sequencer.c                              |  9 +++
>  t/t1092-sparse-checkout-compatibility.sh | 92 +++++++++++++++++++++---
>  8 files changed, 129 insertions(+), 10 deletions(-)
>
>
> base-commit: 91b53f20109fe55635b1815f87afd5d5da68a182
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1019%2Fderrickstolee%2Fsparse-index%2Fmerge-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1019/derrickstolee/sparse-index/merge-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1019
>
> Range-diff vs v2:
>
>  1:  c5ae705648c = 1:  a6963182fe0 diff: ignore sparse paths in diffstat
>  2:  bb150483bcf ! 2:  141f7fb26d6 merge: make sparse-aware with ORT
>      @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
>
>       + /*
>       +  * We are in a conflicted state. These conflicts might be inside
>      -+  * sparse-directory entries, so expand the index preemtively.
>      ++  * sparse-directory entries, so expand the index preemptively.
>       +  * Also, we set original_cache_nr below, but that might change if
>       +  * index_name_pos() calls ask for paths within sparse directories.
>       +  */
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
>       + ensure_not_expanded add . &&
>       +
>       + ensure_not_expanded checkout -f update-deep &&
>      ++ test_config -C sparse-index pull.twohead ort &&
>       + (
>       +         sane_unset GIT_TEST_MERGE_ALGORITHM &&
>      -+         git -C sparse-index config pull.twohead ort &&
>       +         ensure_not_expanded merge -m merge update-folder1 &&
>       +         ensure_not_expanded merge -m merge update-folder2
>       + )
>  3:  815b1b1cfbf ! 3:  c3c9ffd855c merge-ort: expand only for out-of-cone conflicts
>      @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
>
>         /*
>          * We are in a conflicted state. These conflicts might be inside
>      --  * sparse-directory entries, so expand the index preemtively.
>      +-  * sparse-directory entries, so expand the index preemptively.
>       -  * Also, we set original_cache_nr below, but that might change if
>       +  * sparse-directory entries, so check if any entries are outside
>       +  * of the sparse-checkout cone preemptively.
>  4:  8032154bc8a = 4:  7aae5727fb7 t1092: add cherry-pick, rebase tests
>  5:  90ac85500b8 = 5:  20f5bbae546 sequencer: ensure full index if not ORT strategy
>  6:  df4bbec744f ! 6:  36cecb22330 sparse-index: integrate with cherry-pick and rebase
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge with confli
>         init_repos &&
>
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
>      +  test_config -C sparse-index pull.twohead ort &&
>         (
>                 sane_unset GIT_TEST_MERGE_ALGORITHM &&
>      -          git -C sparse-index config pull.twohead ort &&
>       -         ensure_not_expanded merge -m merge update-folder1 &&
>       -         ensure_not_expanded merge -m merge update-folder2
>       +         for OPERATION in "merge -m merge" cherry-pick rebase
>
> --
> gitgitgadget