mbox series

[v3,0/5] sparse-index: improve clear_skip_worktree_from_present_files()

Message ID pull.1754.v3.git.1719578605.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series sparse-index: improve clear_skip_worktree_from_present_files() | expand

Message

Bruce Perry via GitGitGadget June 28, 2024, 12:43 p.m. UTC
While doing some investigation in a private monorepo with sparse-checkout
and a sparse index, I accidentally left a modified file outside of my
sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
reran with GIT_TRACE2_PERF=1.

While I was able to identify clear_skip_worktree_from_present_files() as the
culprit, it took longer than desired to figure out what was going on. This
series intends to both fix the performance issue (as much as possible) and
do some refactoring to make it easier to understand what is happening.

In the end, I was able to reduce the number of lstat() calls in my case from
over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
warm disk cache. (These numbers are from a test after v2, which somehow hit
the old caching algorithm even worse than my test in v1.)


Updates in v3
=============

 * Removed the incorrect paragraph in the commit message of patch 1.
 * Replaced "largest" with "longest" in the final patch.

Thanks, Stolee

Derrick Stolee (5):
  sparse-checkout: refactor skip worktree retry logic
  sparse-index: refactor path_found()
  sparse-index: use strbuf in path_found()
  sparse-index: count lstat() calls
  sparse-index: improve lstat caching of sparse paths

 sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 164 insertions(+), 52 deletions(-)


base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1754

Range-diff vs v2:

 1:  93d0baed0b0 ! 1:  0844cda94cf sparse-checkout: refactor skip worktree retry logic
     @@ Commit message
          stored in the index, so caching was introduced in d79d299352 (Accelerate
          clear_skip_worktree_from_present_files() by caching, 2022-01-14).
      
     -    If users are having trouble with the performance of this operation and
     -    don't care about paths outside of the sparse-checkout, they can disable
     -    them using the sparse.expectFilesOutsideOfPatterns config option
     -    introduced in ecc7c8841d (repo_read_index: add config to expect files
     -    outside sparse patterns, 2022-02-25).
     -
          This check is particularly confusing in the presence of a sparse index,
          as a sparse tree entry corresponding to an existing directory must first
          be expanded to a full index before examining the paths within. This is
 2:  69c3beaabf7 = 2:  c242e2c9168 sparse-index: refactor path_found()
 3:  0a82e6b4183 = 3:  ad63bf746ca sparse-index: use strbuf in path_found()
 4:  9549f5b8062 = 4:  db6ded0df0d sparse-index: count lstat() calls
 5:  0cb344ac14f ! 5:  1f58e19691f sparse-index: improve lstat caching of sparse paths
     @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
       }
       
      +/**
     -+ * Return the length of the largest common substring that ends in a
     -+ * slash ('/') to indicate the largest common parent directory. Returns
     ++ * Return the length of the longest common substring that ends in a
     ++ * slash ('/') to indicate the longest common parent directory. Returns
      + * zero if no common directory exists.
      + */
      +static size_t max_common_dir_prefix(const char *path1, const char *path2)

Comments

Elijah Newren June 28, 2024, 3:07 p.m. UTC | #1
On Fri, Jun 28, 2024 at 5:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> While doing some investigation in a private monorepo with sparse-checkout
> and a sparse index, I accidentally left a modified file outside of my
> sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
> reran with GIT_TRACE2_PERF=1.
>
> While I was able to identify clear_skip_worktree_from_present_files() as the
> culprit, it took longer than desired to figure out what was going on. This
> series intends to both fix the performance issue (as much as possible) and
> do some refactoring to make it easier to understand what is happening.
>
> In the end, I was able to reduce the number of lstat() calls in my case from
> over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
> warm disk cache. (These numbers are from a test after v2, which somehow hit
> the old caching algorithm even worse than my test in v1.)
>
>
> Updates in v3
> =============
>
>  * Removed the incorrect paragraph in the commit message of patch 1.
>  * Replaced "largest" with "longest" in the final patch.
>
> Thanks, Stolee
>
> Derrick Stolee (5):
>   sparse-checkout: refactor skip worktree retry logic
>   sparse-index: refactor path_found()
>   sparse-index: use strbuf in path_found()
>   sparse-index: count lstat() calls
>   sparse-index: improve lstat caching of sparse paths
>
>  sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 164 insertions(+), 52 deletions(-)
>
>
> base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1754
>
> Range-diff vs v2:
>
>  1:  93d0baed0b0 ! 1:  0844cda94cf sparse-checkout: refactor skip worktree retry logic
>      @@ Commit message
>           stored in the index, so caching was introduced in d79d299352 (Accelerate
>           clear_skip_worktree_from_present_files() by caching, 2022-01-14).
>
>      -    If users are having trouble with the performance of this operation and
>      -    don't care about paths outside of the sparse-checkout, they can disable
>      -    them using the sparse.expectFilesOutsideOfPatterns config option
>      -    introduced in ecc7c8841d (repo_read_index: add config to expect files
>      -    outside sparse patterns, 2022-02-25).
>      -
>           This check is particularly confusing in the presence of a sparse index,
>           as a sparse tree entry corresponding to an existing directory must first
>           be expanded to a full index before examining the paths within. This is
>  2:  69c3beaabf7 = 2:  c242e2c9168 sparse-index: refactor path_found()
>  3:  0a82e6b4183 = 3:  ad63bf746ca sparse-index: use strbuf in path_found()
>  4:  9549f5b8062 = 4:  db6ded0df0d sparse-index: count lstat() calls
>  5:  0cb344ac14f ! 5:  1f58e19691f sparse-index: improve lstat caching of sparse paths
>      @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
>        }
>
>       +/**
>      -+ * Return the length of the largest common substring that ends in a
>      -+ * slash ('/') to indicate the largest common parent directory. Returns
>      ++ * Return the length of the longest common substring that ends in a
>      ++ * slash ('/') to indicate the longest common parent directory. Returns
>       + * zero if no common directory exists.
>       + */
>       +static size_t max_common_dir_prefix(const char *path1, const char *path2)
>
> --
> gitgitgadget

This version covers the last two outstanding items.

Reviewed-by: Elijah Newren <newren@gmail.com>
Junio C Hamano June 28, 2024, 7:34 p.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> This version covers the last two outstanding items.
>
> Reviewed-by: Elijah Newren <newren@gmail.com>

Thanks both.  Will mark it for 'next'.