Message ID | d52c72b4a7b9a3ae10bd2daae85c1bd1bafcc2f3.1617241803.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sparse Index: API protections | expand |
On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Some users of the index API have a specific path they are looking for, > but choose to use index_file_exists() to rely on the name-hash hashtable > instead of doing binary search with index_name_pos(). These users only > need to know a yes/no answer, not a position within the cache array. > > When the index is sparse, the name-hash hash table does not contain the > full list of paths within sparse directories. It _does_ contain the > directory names for the sparse-directory entries. > > Create a helper function, expand_to_path(), for intended use with the > name-hash hashtable functions. The integration with name-hash.c will > follow in a later change. > > The solution here is to use ensure_full_index() when we determine that > the requested path is within a sparse directory entry. This will > populate the name-hash hashtable as the index is recomputed from > scratch. > > There may be cases where the caller is trying to find an untracked path > that is not in the index but also is not within a sparse directory > entry. We want to minimize the overhead for these requests. If we used > index_name_pos() to find the insertion order of the path, then we could > determine from that position if a sparse-directory exists. (In fact, > just calling index_name_pos() in that case would lead to expanding the > index to a full index.) However, this takes O(log N) time where N is the > number of cache entries. > > To keep the performance of this call based mostly on the input string, > use index_file_exists() to look for the ancestors of the path. Using the > heuristic that a sparse directory is likely to have a small number of > parent directories, we start from the bottom and build up. Use a string > buffer to allow mutating the path name to terminate after each slash for > each hashset test. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++ > sparse-index.h | 13 +++++++++ > 2 files changed, 85 insertions(+) > > diff --git a/sparse-index.c b/sparse-index.c > index 95ea17174da3..8a1223041296 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate) > > trace2_region_leave("index", "ensure_full_index", istate->repo); > } > + > +/* > + * This static global helps avoid infinite recursion between > + * expand_to_path() and index_file_exists(). > + */ > +static int in_expand_to_path = 0; > + > +void expand_to_path(struct index_state *istate, > + const char *path, size_t pathlen, int icase) > +{ > + struct strbuf path_mutable = STRBUF_INIT; > + size_t substr_len; > + > + /* prevent extra recursion */ > + if (in_expand_to_path) > + return; > + > + if (!istate || !istate->sparse_index) > + return; > + > + if (!istate->repo) > + istate->repo = the_repository; > + > + in_expand_to_path = 1; > + > + /* > + * We only need to actually expand a region if the > + * following are both true: > + * > + * 1. 'path' is not already in the index. > + * 2. Some parent directory of 'path' is a sparse directory. > + */ > + > + if (index_file_exists(istate, path, pathlen, icase)) > + goto cleanup; > + > + strbuf_add(&path_mutable, path, pathlen); > + strbuf_addch(&path_mutable, '/'); > + > + /* Check the name hash for all parent directories */ > + substr_len = 0; > + while (substr_len < pathlen) { > + char temp; > + char *replace = strchr(path_mutable.buf + substr_len, '/'); > + > + if (!replace) > + break; > + > + /* replace the character _after_ the slash */ > + replace++; > + temp = *replace; > + *replace = '\0'; > + if (index_file_exists(istate, path_mutable.buf, > + path_mutable.len, icase)) { > + /* > + * We found a parent directory in the name-hash > + * hashtable, which means that this entry could > + * exist within a sparse-directory entry. Expand > + * accordingly. "this entry" is a bit unclear; it might be worth referring to the "path" variable instead. I think it'd also help to explain the reasoning for using the '/' character, because while it's clear when looking at the series, just running across it in the code it might be easy to forget or miss. Perhaps: * We found a parent directory in the name-hash * hashtable, because only sparse directory entries * have a trailing '/' character. Since "path" wasn't * in the index, perhaps it exists within this * sparse-directory. Expand accordingly. > + */ > + ensure_full_index(istate); > + break; > + } > + > + *replace = temp; > + substr_len = replace - path_mutable.buf; > + } > + > +cleanup: > + strbuf_release(&path_mutable); > + in_expand_to_path = 0; > +} > diff --git a/sparse-index.h b/sparse-index.h > index 0268f38753c0..1115a0d7dd98 100644 > --- a/sparse-index.h > +++ b/sparse-index.h > @@ -4,6 +4,19 @@ > struct index_state; > int convert_to_sparse(struct index_state *istate); > > +/* > + * Some places in the codebase expect to search for a specific path. > + * This path might be outside of the sparse-checkout definition, in > + * which case a sparse-index may not contain a path for that index. > + * > + * Given an index and a path, check to see if a leading directory for > + * 'path' exists in the index as a sparse directory. In that case, > + * expand that sparse directory to a full range of cache entries and > + * populate the index accordingly. > + */ > +void expand_to_path(struct index_state *istate, > + const char *path, size_t pathlen, int icase); > + > struct repository; > int set_sparse_index_config(struct repository *repo, int enable); > > -- > gitgitgadget
On 4/5/2021 3:32 PM, Elijah Newren wrote: > On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com>>> + if (index_file_exists(istate, path_mutable.buf, >> + path_mutable.len, icase)) { >> + /* >> + * We found a parent directory in the name-hash >> + * hashtable, which means that this entry could >> + * exist within a sparse-directory entry. Expand >> + * accordingly. > > "this entry" is a bit unclear; it might be worth referring to the > "path" variable instead. I think it'd also help to explain the > reasoning for using the '/' character, because while it's clear when > looking at the series, just running across it in the code it might be > easy to forget or miss. Perhaps: > > * We found a parent directory in the name-hash > * hashtable, because only sparse directory entries > * have a trailing '/' character. Since "path" wasn't > * in the index, perhaps it exists within this > * sparse-directory. Expand accordingly. A good recommendation. Thanks. -Stolee
diff --git a/sparse-index.c b/sparse-index.c index 95ea17174da3..8a1223041296 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } + +/* + * This static global helps avoid infinite recursion between + * expand_to_path() and index_file_exists(). + */ +static int in_expand_to_path = 0; + +void expand_to_path(struct index_state *istate, + const char *path, size_t pathlen, int icase) +{ + struct strbuf path_mutable = STRBUF_INIT; + size_t substr_len; + + /* prevent extra recursion */ + if (in_expand_to_path) + return; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + in_expand_to_path = 1; + + /* + * We only need to actually expand a region if the + * following are both true: + * + * 1. 'path' is not already in the index. + * 2. Some parent directory of 'path' is a sparse directory. + */ + + if (index_file_exists(istate, path, pathlen, icase)) + goto cleanup; + + strbuf_add(&path_mutable, path, pathlen); + strbuf_addch(&path_mutable, '/'); + + /* Check the name hash for all parent directories */ + substr_len = 0; + while (substr_len < pathlen) { + char temp; + char *replace = strchr(path_mutable.buf + substr_len, '/'); + + if (!replace) + break; + + /* replace the character _after_ the slash */ + replace++; + temp = *replace; + *replace = '\0'; + if (index_file_exists(istate, path_mutable.buf, + path_mutable.len, icase)) { + /* + * We found a parent directory in the name-hash + * hashtable, which means that this entry could + * exist within a sparse-directory entry. Expand + * accordingly. + */ + ensure_full_index(istate); + break; + } + + *replace = temp; + substr_len = replace - path_mutable.buf; + } + +cleanup: + strbuf_release(&path_mutable); + in_expand_to_path = 0; +} diff --git a/sparse-index.h b/sparse-index.h index 0268f38753c0..1115a0d7dd98 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -4,6 +4,19 @@ struct index_state; int convert_to_sparse(struct index_state *istate); +/* + * Some places in the codebase expect to search for a specific path. + * This path might be outside of the sparse-checkout definition, in + * which case a sparse-index may not contain a path for that index. + * + * Given an index and a path, check to see if a leading directory for + * 'path' exists in the index as a sparse directory. In that case, + * expand that sparse directory to a full range of cache entries and + * populate the index accordingly. + */ +void expand_to_path(struct index_state *istate, + const char *path, size_t pathlen, int icase); + struct repository; int set_sparse_index_config(struct repository *repo, int enable);