Message ID | d7d4cad8be0b2a27a332a2796ba0dce92783355f.1618322497.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sparse-index: integrate with status and add | expand |
On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > By testing 'git -c core.fsmonitor= status -uno', we can check for the > simplest index operations that can be made sparse-aware. The necessary > implementation details are already integrated with sparse-checkout, so > modify command_requires_full_index to be zero for cmd_status(). > > By running the debugger for 'git status -uno' after that change, we find > two instances of ensure_full_index() that were added for extra safety, > but can be removed without issue. > > In refresh_index(), we loop through the index entries. The > refresh_cache_ent() method copies the sparse directories into the > refreshed index without issue. I do see the removal of a call to ensure_full_index() in refresh_index() that you mention in this paragraph in the patch below. I'm confused, though; I would have thought we wanted to avoid a refresh_cache_ent() call. Also, one of your previous patches added a if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode)) continue; check before the code ever gets to the refresh_cache_ent() call, so as far as I can tell, that function won't be called from refresh_entry() for sparse entries. Maybe your commit message here is out-of-date? Or am I confused somehow? > The loop within run_diff_files() skips things that are in stage 0 and > have skip-worktree enabled, so seems safe to disable ensure_full_index() > here. Unlike the above, I don't see a removal of a ensure_full_index() call in run_diff_files() as claimed by this paragraph. Has the commit message gotten out of date with refactorings you did while developing this series? > This allows some cases of 'git status' to no longer expand a sparse > index to a full one, giving the following performance improvements for > p2000-sparse-checkout-operations.sh: > > Test HEAD~1 HEAD > ----------------------------------------------------------------------------- > 2000.2: git status (full-index-v3) 0.38(0.36+0.07) 0.37(0.31+0.10) -2.6% > 2000.3: git status (full-index-v4) 0.38(0.29+0.12) 0.37(0.30+0.11) -2.6% > 2000.4: git status (sparse-index-v3) 2.43(2.33+0.14) 0.04(0.05+0.04) -98.4% > 2000.5: git status (sparse-index-v4) 2.44(2.35+0.13) 0.05(0.04+0.07) -98.0% > > Note that since HEAD~1 was expanding the sparse index by parsing trees, > it was artificially slower than the full index case. Thus, the 98% > improvement is misleading, and instead we should celebrate the 0.37s to > 0.05s improvement of 82%. This is more indicative of the peformance > gains we are expecting by using a sparse index. 82%, very nice. Was this with git.git as the test repository, or some other repo? If it's git.git, then we'd actually expect a much bigger speedup for other repositories, as git.git is pretty small. > Note: we are dropping the assignment of core.fsmonitor here. This is not > necessary for the test script as we are not altering the config any > other way. Correct integration with FS Monitor will be validated in > later changes. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > builtin/commit.c | 3 +++ > read-cache.c | 2 -- > t/t1092-sparse-checkout-compatibility.sh | 12 ++++++++---- > 3 files changed, 11 insertions(+), 6 deletions(-) > > diff --git a/builtin/commit.c b/builtin/commit.c > index cf0c36d1dcb2..e529da7beadd 100644 > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix) > if (argc == 2 && !strcmp(argv[1], "-h")) > usage_with_options(builtin_status_usage, builtin_status_options); > > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + > status_init_config(&s, git_status_config); > argc = parse_options(argc, argv, prefix, > builtin_status_options, > diff --git a/read-cache.c b/read-cache.c > index 6308234b4838..83e6bdef7604 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags, > */ > preload_index(istate, pathspec, 0); > trace2_region_enter("index", "refresh", NULL); > - /* TODO: audit for interaction with sparse-index. */ > - ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce, *new_entry; > int cache_errno = 0; > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index e488ef9bd941..380a085f8ec4 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' ' > GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > git -C sparse-index -c core.fsmonitor="" reset --hard && > test_region index convert_to_sparse trace2.txt && > - test_region index ensure_full_index trace2.txt && > + test_region index ensure_full_index trace2.txt > +' > > - rm trace2.txt && > +test_expect_success 'sparse-index is not expanded' ' > + init_repos && > + > + rm -f trace2.txt && > GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > - git -C sparse-index -c core.fsmonitor="" status -uno && > - test_region index ensure_full_index trace2.txt > + git -C sparse-index status -uno && > + test_region ! index ensure_full_index trace2.txt > ' > > test_done > -- > gitgitgadget Other than what looks like a couple issues in the commit message, the change looks good to me.
On 4/20/2021 8:44 PM, Elijah Newren wrote: > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> By testing 'git -c core.fsmonitor= status -uno', we can check for the >> simplest index operations that can be made sparse-aware. The necessary >> implementation details are already integrated with sparse-checkout, so >> modify command_requires_full_index to be zero for cmd_status(). >> >> By running the debugger for 'git status -uno' after that change, we find >> two instances of ensure_full_index() that were added for extra safety, >> but can be removed without issue. >> >> In refresh_index(), we loop through the index entries. The >> refresh_cache_ent() method copies the sparse directories into the >> refreshed index without issue. > > I do see the removal of a call to ensure_full_index() in > refresh_index() that you mention in this paragraph in the patch below. > > I'm confused, though; I would have thought we wanted to avoid a > refresh_cache_ent() call. Also, one of your previous patches added a > > if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode)) > continue; > > check before the code ever gets to the refresh_cache_ent() call, so as > far as I can tell, that function won't be called from refresh_entry() > for sparse entries. Maybe your commit message here is out-of-date? > Or am I confused somehow? > >> The loop within run_diff_files() skips things that are in stage 0 and >> have skip-worktree enabled, so seems safe to disable ensure_full_index() >> here. > > Unlike the above, I don't see a removal of a ensure_full_index() call > in run_diff_files() as claimed by this paragraph. Has the commit > message gotten out of date with refactorings you did while developing > this series? I greatly reduced the number of ensure_full_index() calls in the previous topic (ds/sparse-index-protections) since first writing this patch, so it is very likely to be out-of-date. Thanks for calling it out. >> This allows some cases of 'git status' to no longer expand a sparse >> index to a full one, giving the following performance improvements for >> p2000-sparse-checkout-operations.sh: >> >> Test HEAD~1 HEAD >> ----------------------------------------------------------------------------- >> 2000.2: git status (full-index-v3) 0.38(0.36+0.07) 0.37(0.31+0.10) -2.6% >> 2000.3: git status (full-index-v4) 0.38(0.29+0.12) 0.37(0.30+0.11) -2.6% >> 2000.4: git status (sparse-index-v3) 2.43(2.33+0.14) 0.04(0.05+0.04) -98.4% >> 2000.5: git status (sparse-index-v4) 2.44(2.35+0.13) 0.05(0.04+0.07) -98.0% >> >> Note that since HEAD~1 was expanding the sparse index by parsing trees, >> it was artificially slower than the full index case. Thus, the 98% >> improvement is misleading, and instead we should celebrate the 0.37s to >> 0.05s improvement of 82%. This is more indicative of the peformance >> gains we are expecting by using a sparse index. > > 82%, very nice. Was this with git.git as the test repository, or some > other repo? If it's git.git, then we'd actually expect a much bigger > speedup for other repositories, as git.git is pretty small. This test script takes the input repository (git.git in this case) and creates a tree that contains that repository many times over, but only four copies remain in the sparse-checkout definition. This creates the big speedup, because of the enormous difference in index size. As I am exploring commands such as 'merge' and 'rebase' I am finding that this test setup is too expensive to cover those commands. I will need to reduce the size of the test repository (by a factor of 4) and that will reduce how impressive these results are while making the more complicated commands testable in a reasonable amount of time. Thanks, -Stolee
diff --git a/builtin/commit.c b/builtin/commit.c index cf0c36d1dcb2..e529da7beadd 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1404,6 +1404,9 @@ int cmd_status(int argc, const char **argv, const char *prefix) if (argc == 2 && !strcmp(argv[1], "-h")) usage_with_options(builtin_status_usage, builtin_status_options); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + status_init_config(&s, git_status_config); argc = parse_options(argc, argv, prefix, builtin_status_options, diff --git a/read-cache.c b/read-cache.c index 6308234b4838..83e6bdef7604 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1578,8 +1578,6 @@ int refresh_index(struct index_state *istate, unsigned int flags, */ preload_index(istate, pathspec, 0); trace2_region_enter("index", "refresh", NULL); - /* TODO: audit for interaction with sparse-index. */ - ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce, *new_entry; int cache_errno = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e488ef9bd941..380a085f8ec4 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -449,12 +449,16 @@ test_expect_success 'sparse-index is expanded and converted back' ' GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ git -C sparse-index -c core.fsmonitor="" reset --hard && test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && + test_region index ensure_full_index trace2.txt +' - rm trace2.txt && +test_expect_success 'sparse-index is not expanded' ' + init_repos && + + rm -f trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt + git -C sparse-index status -uno && + test_region ! index ensure_full_index trace2.txt ' test_done