Message ID | 723de4e42582afbe841ed96470fc02db44b24b5e.1631453010.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior | expand |
On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote: > + /* > + * Use 'alloc' as an indicator that the string has not been > + * initialized, in case the parent is the root directory. > + */ > + if (!path_parent->alloc) { This isn't wrong, but seems to be way too cozy with the internal implementation details of strbuf. For what it's worth I renamed it to "alloc2" and found that this would be only the 3rd bit of code out of strbuf.[ch] that cares about that member. > + char *slash; > + strbuf_addstr(path_parent, pathname); So is "pathname" ever the empty string? If not we could check the length? Or probably better: ... > @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname > { > struct path_pattern *res = NULL; /* undecided */ > int i; > + struct strbuf path_parent = STRBUF_INIT; Just malloc + strbuf_init() this in the above function and have a "struct strbuf *" initialized to NULL here? Then we can use a much more idiomatic "is it NULL?" to check if it's initialized.
On 9/12/2021 6:21 PM, Ævar Arnfjörð Bjarmason wrote: > > On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote: > >> + /* >> + * Use 'alloc' as an indicator that the string has not been >> + * initialized, in case the parent is the root directory. >> + */ >> + if (!path_parent->alloc) { > > This isn't wrong, but seems to be way too cozy with the internal > implementation details of strbuf. For what it's worth I renamed it to > "alloc2" and found that this would be only the 3rd bit of code out of > strbuf.[ch] that cares about that member. I can understand not wanting to poke into the internals. >> + char *slash; >> + strbuf_addstr(path_parent, pathname); > > So is "pathname" ever the empty string? If not we could check the > length? We are given 'pathlen' as a parameter, so this should just use strbuf_add() instead. > Or probably better: ... > >> @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname >> { >> struct path_pattern *res = NULL; /* undecided */ >> int i; >> + struct strbuf path_parent = STRBUF_INIT; > > Just malloc + strbuf_init() this in the above function and have a > "struct strbuf *" initialized to NULL here? Then we can use a much more > idiomatic "is it NULL?" to check if it's initialized. That makes sense. Can do. Thanks, -Stolee
On Sun, Sep 12, 2021 at 6:23 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > When matching a path against a list of patterns, the ones that require a > directory match previously did not work when a filename is specified. > This was fine when all pattern-matching was done within methods such as > unpack_trees() that check a directory before recursing into the > contained files. However, other commands will start matching individual > files against pattern lists without that recursive approach. > > We modify path_matches_dir_pattern() to take a strbuf 'path_parent' that > is used to store the parent directory of 'pathname' between multiple > pattern matching tests. This is loaded lazily, only on the first pattern > it finds that has the PATTERN_FLAG_MUSTBEDIR flag. > > If we find that a path has a parent directory, we start by checking to > see if that parent directory matches the pattern. If so, then we do not > need to query the index for the type (which can be expensive). If we > find that the parent does not match, then we still must check the type > from the index for the given pathname. > > Note that this does not affect cone mode pattern matching, but instead > the more general -- and slower -- full pattern set. Thus, this does not > affect the sparse index. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > dir.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/dir.c b/dir.c > index 652135df896..fe5ee87bb5f 100644 > --- a/dir.c > +++ b/dir.c > @@ -1305,10 +1305,38 @@ int match_pathname(const char *pathname, int pathlen, > > static int path_matches_dir_pattern(const char *pathname, > int pathlen, > + struct strbuf *path_parent, > int *dtype, > struct path_pattern *pattern, > struct index_state *istate) > { > + /* > + * Use 'alloc' as an indicator that the string has not been > + * initialized, in case the parent is the root directory. > + */ > + if (!path_parent->alloc) { > + char *slash; > + strbuf_addstr(path_parent, pathname); > + slash = find_last_dir_sep(path_parent->buf); > + > + if (slash) > + *slash = '\0'; Are you breaking strbuf invariants here? path_parent->len will not be corrected by this string manipulation. Perhaps replace this if-else block with strbuf_setlen(path_parent, slash ? slash - path_parent->buf : 0) > + else > + strbuf_setlen(path_parent, 0); > + } > + > + /* > + * If the parent directory matches the pattern, then we do not > + * need to check for dtype. > + */ > + if (path_parent->len && > + match_pathname(path_parent->buf, path_parent->len, > + pattern->base, > + pattern->baselen ? pattern->baselen - 1 : 0, > + pattern->pattern, pattern->nowildcardlen, > + pattern->patternlen, pattern->flags)) > + return 1; > + > *dtype = resolve_dtype(*dtype, istate, pathname, pathlen); > if (*dtype != DT_DIR) > return 0; > @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname > { > struct path_pattern *res = NULL; /* undecided */ > int i; > + struct strbuf path_parent = STRBUF_INIT; > > if (!pl->nr) > return NULL; /* undefined */ > @@ -1340,8 +1369,8 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname > const char *exclude = pattern->pattern; > int prefix = pattern->nowildcardlen; > > - if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) && > - !path_matches_dir_pattern(pathname, pathlen, > + if (pattern->flags & PATTERN_FLAG_MUSTBEDIR && > + !path_matches_dir_pattern(pathname, pathlen, &path_parent, > dtype, pattern, istate)) > continue; > > @@ -1367,6 +1396,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname > break; > } > } > + strbuf_release(&path_parent); > return res; > } > > -- > gitgitgadget
On 9/15/2021 10:54 AM, Elijah Newren wrote: >> + /* >> + * Use 'alloc' as an indicator that the string has not been >> + * initialized, in case the parent is the root directory. >> + */ >> + if (!path_parent->alloc) { >> + char *slash; >> + strbuf_addstr(path_parent, pathname); >> + slash = find_last_dir_sep(path_parent->buf); >> + >> + if (slash) >> + *slash = '\0'; > > Are you breaking strbuf invariants here? path_parent->len will not be > corrected by this string manipulation. Perhaps replace this if-else > block with > > strbuf_setlen(path_parent, slash ? slash - path_parent->buf : 0) Yes, I am. I noticed and fixed this when I was rewriting this patch for Ævar's feedback. Thanks for pointing it out. Thanks, -Stolee
diff --git a/dir.c b/dir.c index 652135df896..fe5ee87bb5f 100644 --- a/dir.c +++ b/dir.c @@ -1305,10 +1305,38 @@ int match_pathname(const char *pathname, int pathlen, static int path_matches_dir_pattern(const char *pathname, int pathlen, + struct strbuf *path_parent, int *dtype, struct path_pattern *pattern, struct index_state *istate) { + /* + * Use 'alloc' as an indicator that the string has not been + * initialized, in case the parent is the root directory. + */ + if (!path_parent->alloc) { + char *slash; + strbuf_addstr(path_parent, pathname); + slash = find_last_dir_sep(path_parent->buf); + + if (slash) + *slash = '\0'; + else + strbuf_setlen(path_parent, 0); + } + + /* + * If the parent directory matches the pattern, then we do not + * need to check for dtype. + */ + if (path_parent->len && + match_pathname(path_parent->buf, path_parent->len, + pattern->base, + pattern->baselen ? pattern->baselen - 1 : 0, + pattern->pattern, pattern->nowildcardlen, + pattern->patternlen, pattern->flags)) + return 1; + *dtype = resolve_dtype(*dtype, istate, pathname, pathlen); if (*dtype != DT_DIR) return 0; @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname { struct path_pattern *res = NULL; /* undecided */ int i; + struct strbuf path_parent = STRBUF_INIT; if (!pl->nr) return NULL; /* undefined */ @@ -1340,8 +1369,8 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname const char *exclude = pattern->pattern; int prefix = pattern->nowildcardlen; - if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) && - !path_matches_dir_pattern(pathname, pathlen, + if (pattern->flags & PATTERN_FLAG_MUSTBEDIR && + !path_matches_dir_pattern(pathname, pathlen, &path_parent, dtype, pattern, istate)) continue; @@ -1367,6 +1396,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname break; } } + strbuf_release(&path_parent); return res; }