Message ID | ddd8a9a90cea10be47eba4775bb90f01a9b80443.1718899877.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | sparse-index: improve clear_skip_worktree_from_present_files() | expand |
On Thu, Jun 20, 2024 at 10:11 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <stolee@gmail.com> > > The clear_skip_worktree_from_present_files() method was introduced in > af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present > in worktree, 2022-01-14) to help cases where the sparse index is enabled s/index/checkout/; the code in af6a51875a made no assumptions about sparse index being enabled. > but some paths outside of the sparse-checkout cone also exist on disk. s/cone//; it wasn't specific to cone mode. > This operation can be slow as it needs to check path existence in a way > that is not stored in the collapsed index, so caching was introduced in > d79d299352 (Accelerate clear_skip_worktree_from_present_files() by > caching, 2022-01-14). This is worded in a way that seems to imply the code is only relevant with a sparse index. Part of the code was written to be faster in general with a sparse-index, but having a sparse-index was not assumed. Perhaps, This operation could be slow since it needs to check path existence for the paths outside the sparse checkout, which may vastly outnumber the number of paths inside the sparse checkout. To address this, caching was introduced in d79d299352 (Accelerate clear_skip_worktree_from_present_files() by caching, 2022-01-14). > If users are having trouble with the performance of this operation and > don't care about paths outside of the sparse-checkout cone, they can > disable them using the sparse.expectFilesOutsideOfPatterns config option > introduced in ecc7c8841d (repo_read_index: add config to expect files > outside sparse patterns, 2022-02-25). > > Even with that caching, it was noticed that this could take a long time > to execute. 89aaab11a3 (index: add trace2 region for clear skip > worktree, 2022-11-03) introduced trace2 regions to measure this time. > Further, the way the loop repeats itself was slightly confusing and > prone to breakage, so a BUG() statement was added in 8c7abdc596 (index: > raise a bug if the index is materialised more than once, 2022-11-03) to > be sure that the second run of the loop does not hit any sparse trees. > > One thing that can be confusing about the current setup is that the > trace2 regions nest and it is not clear that a second loop is running > after the index is expanded. Here is an example of what the regions look > like in a typical case: > > | region_enter | ... | label:clear_skip_worktree_from_present_files > | region_enter | ... | ..label:update > | region_leave | ... | ..label:update > | region_enter | ... | ..label:ensure_full_index > | region_enter | ... | ....label:update > | region_leave | ... | ....label:update > | region_leave | ... | ..label:ensure_full_index > | data | ... | ..sparse_path_count:1 > | data | ... | ..sparse_path_count_full:269538 > | region_leave | ... | label:clear_skip_worktree_from_present_files > > One thing that is particularly difficult to understand about these > regions is that most of the time is spent between the close of the > ensure_full_index region and the reporting of the end data. This is > because of the restart of the loop being within the same region as the > first iteration of the loop. > > This change refactors the method into two separate methods that are > traced separately. This will be more important later when we change > other features of the methods, but for now the only functional change is > the difference in the structure of the trace regions. > > After this change, the same telemetry section is split into three > distinct chunks: > > | region_enter | ... | label:clear_skip_worktree_from_present_files_sparse > | data | ... | ..sparse_path_count:1 > | region_leave | ... | label:clear_skip_worktree_from_present_files_sparse > | region_enter | ... | label:update > | region_leave | ... | label:update > | region_enter | ... | label:ensure_full_index > | region_enter | ... | ..label:update > | region_leave | ... | ..label:update > | region_leave | ... | label:ensure_full_index > | region_enter | ... | label:clear_skip_worktree_from_present_files_full > | data | ... | ..full_path_count:269538 > | region_leave | ... | label:clear_skip_worktree_from_present_files_full > > Here, we see the sparse loop terminating early with its first sparse > path being a sparse directory containing a file. Then, that loop's > region terminates before ensure_full_index begins (in this case, the > cache-tree must also be computed). Then, _after_ the index is expanded, > the full loop begins with its own region. > > Signed-off-by: Derrick Stolee <stolee@gmail.com> > --- > sparse-index.c | 77 ++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 53 insertions(+), 24 deletions(-) > > diff --git a/sparse-index.c b/sparse-index.c > index e48e40cae71..e0457c87fff 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -486,49 +486,78 @@ static int path_found(const char *path, const char **dirname, size_t *dir_len, > return 0; > } > > -void clear_skip_worktree_from_present_files(struct index_state *istate) > +static int clear_skip_worktree_from_present_files_sparse(struct index_state *istate) > { > const char *last_dirname = NULL; > size_t dir_len = 0; > int dir_found = 1; > > - int i; > - int path_count[2] = {0, 0}; > - int restarted = 0; > + int path_count = 0; > + int to_restart = 0; > > - if (!core_apply_sparse_checkout || > - sparse_expect_files_outside_of_patterns) > - return; > - > - trace2_region_enter("index", "clear_skip_worktree_from_present_files", > + trace2_region_enter("index", "clear_skip_worktree_from_present_files_sparse", > istate->repo); > -restart: > - for (i = 0; i < istate->cache_nr; i++) { > + for (int i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > > if (ce_skip_worktree(ce)) { > - path_count[restarted]++; > + path_count++; > if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { > if (S_ISSPARSEDIR(ce->ce_mode)) { > - if (restarted) > - BUG("ensure-full-index did not fully flatten?"); > - ensure_full_index(istate); > - restarted = 1; > - goto restart; > + to_restart = 1; > + break; > } > ce->ce_flags &= ~CE_SKIP_WORKTREE; > } > } > } > > - if (path_count[0]) > - trace2_data_intmax("index", istate->repo, > - "sparse_path_count", path_count[0]); > - if (restarted) > - trace2_data_intmax("index", istate->repo, > - "sparse_path_count_full", path_count[1]); > - trace2_region_leave("index", "clear_skip_worktree_from_present_files", > + trace2_data_intmax("index", istate->repo, > + "sparse_path_count", path_count); > + trace2_region_leave("index", "clear_skip_worktree_from_present_files_sparse", > + istate->repo); > + return to_restart; > +} > + > +static void clear_skip_worktree_from_present_files_full(struct index_state *istate) > +{ > + const char *last_dirname = NULL; > + size_t dir_len = 0; > + int dir_found = 1; > + > + int path_count = 0; > + > + trace2_region_enter("index", "clear_skip_worktree_from_present_files_full", > istate->repo); > + for (int i = 0; i < istate->cache_nr; i++) { > + struct cache_entry *ce = istate->cache[i]; > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > + BUG("ensure-full-index did not fully flatten?"); > + > + if (ce_skip_worktree(ce)) { > + path_count++; > + if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) > + ce->ce_flags &= ~CE_SKIP_WORKTREE; > + } > + } > + > + trace2_data_intmax("index", istate->repo, > + "full_path_count", path_count); > + trace2_region_leave("index", "clear_skip_worktree_from_present_files_full", > + istate->repo); > +} > + > +void clear_skip_worktree_from_present_files(struct index_state *istate) > +{ > + if (!core_apply_sparse_checkout || > + sparse_expect_files_outside_of_patterns) > + return; > + > + if (clear_skip_worktree_from_present_files_sparse(istate)) { > + ensure_full_index(istate); > + clear_skip_worktree_from_present_files_full(istate); > + } > } > > /* > -- > gitgitgadget Although I had a few wording quibbles with the commit message, overall the commit message was clear about what you were trying to achieve. Also, the change in this patch is a straightforward splitting of the old function into three new functions (one of which is the overall driver calling the other two).
On 6/24/24 6:12 PM, Elijah Newren wrote: > On Thu, Jun 20, 2024 at 10:11 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <stolee@gmail.com> >> >> The clear_skip_worktree_from_present_files() method was introduced in >> af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present >> in worktree, 2022-01-14) to help cases where the sparse index is enabled > > s/index/checkout/; the code in af6a51875a made no assumptions about > sparse index being enabled. > >> but some paths outside of the sparse-checkout cone also exist on disk. > > s/cone//; it wasn't specific to cone mode. Thanks for the context that this is a generic sparse-checkout feature, and is not limited to the sparse index (which is limited to cone mode). I made assumptions based on the code's location in sparse-index.c. > Although I had a few wording quibbles with the commit message, overall > the commit message was clear about what you were trying to achieve. > Also, the change in this patch is a straightforward splitting of the > old function into three new functions (one of which is the overall > driver calling the other two). I have reworded locally and will have a v2 later today with a better message. Thanks for your attention to detail. Thanks, -Stolee
diff --git a/sparse-index.c b/sparse-index.c index e48e40cae71..e0457c87fff 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -486,49 +486,78 @@ static int path_found(const char *path, const char **dirname, size_t *dir_len, return 0; } -void clear_skip_worktree_from_present_files(struct index_state *istate) +static int clear_skip_worktree_from_present_files_sparse(struct index_state *istate) { const char *last_dirname = NULL; size_t dir_len = 0; int dir_found = 1; - int i; - int path_count[2] = {0, 0}; - int restarted = 0; + int path_count = 0; + int to_restart = 0; - if (!core_apply_sparse_checkout || - sparse_expect_files_outside_of_patterns) - return; - - trace2_region_enter("index", "clear_skip_worktree_from_present_files", + trace2_region_enter("index", "clear_skip_worktree_from_present_files_sparse", istate->repo); -restart: - for (i = 0; i < istate->cache_nr; i++) { + for (int i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; if (ce_skip_worktree(ce)) { - path_count[restarted]++; + path_count++; if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { if (S_ISSPARSEDIR(ce->ce_mode)) { - if (restarted) - BUG("ensure-full-index did not fully flatten?"); - ensure_full_index(istate); - restarted = 1; - goto restart; + to_restart = 1; + break; } ce->ce_flags &= ~CE_SKIP_WORKTREE; } } } - if (path_count[0]) - trace2_data_intmax("index", istate->repo, - "sparse_path_count", path_count[0]); - if (restarted) - trace2_data_intmax("index", istate->repo, - "sparse_path_count_full", path_count[1]); - trace2_region_leave("index", "clear_skip_worktree_from_present_files", + trace2_data_intmax("index", istate->repo, + "sparse_path_count", path_count); + trace2_region_leave("index", "clear_skip_worktree_from_present_files_sparse", + istate->repo); + return to_restart; +} + +static void clear_skip_worktree_from_present_files_full(struct index_state *istate) +{ + const char *last_dirname = NULL; + size_t dir_len = 0; + int dir_found = 1; + + int path_count = 0; + + trace2_region_enter("index", "clear_skip_worktree_from_present_files_full", istate->repo); + for (int i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (S_ISSPARSEDIR(ce->ce_mode)) + BUG("ensure-full-index did not fully flatten?"); + + if (ce_skip_worktree(ce)) { + path_count++; + if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) + ce->ce_flags &= ~CE_SKIP_WORKTREE; + } + } + + trace2_data_intmax("index", istate->repo, + "full_path_count", path_count); + trace2_region_leave("index", "clear_skip_worktree_from_present_files_full", + istate->repo); +} + +void clear_skip_worktree_from_present_files(struct index_state *istate) +{ + if (!core_apply_sparse_checkout || + sparse_expect_files_outside_of_patterns) + return; + + if (clear_skip_worktree_from_present_files_sparse(istate)) { + ensure_full_index(istate); + clear_skip_worktree_from_present_files_full(istate); + } } /*