diff mbox series

[v3,03/14] dir: extract directory-matching logic

Message ID b1f6468f9cdb7d16f6317c71b21f4459af158e87.1632159937.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior | expand

Commit Message

Derrick Stolee Sept. 20, 2021, 5:45 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The last_matching_pattern_from_list() logic performs some checks on the
filetype of a path within the index when the PATTERN_FLAG_MUSTBEDIR flag
is set. This works great when setting SKIP_WORKTREE bits within
unpack_trees(), but doesn't work well when passing an arbitrary path
such as a file within a matching directory.

This change only rearranges the logic but does not change its
functionality.

We will expand the path_matches_dir_pattern() method in a following
change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Sept. 22, 2021, 11:13 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int path_matches_dir_pattern(const char *pathname,
> +				    int pathlen,
> +				    int *dtype,
> +				    struct path_pattern *pattern,
> +				    struct index_state *istate)
> +{
> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
> +	if (*dtype != DT_DIR)
> +		return 0;
> +
> +	return 1;
> +}

The function name and parameter list have "pattern" but as far as I
can see any "matches" or "pattern" comes into the picture.  The code
in the caller after calling this function may be doing pattern
matching, but not this one.

What this helper is doing is "signal if the pathname in the working
tree is supposed to be a directory with the return value, while
filling *dtype with what kind of thing it is."

path_must_be_dir_in_working_tree() or something, perhaps?

> @@ -1327,11 +1340,10 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>  		const char *exclude = pattern->pattern;
>  		int prefix = pattern->nowildcardlen;
>  
> -		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
> -			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
> -			if (*dtype != DT_DIR)
> -				continue;
> -		}
> +		if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) &&
> +		    !path_matches_dir_pattern(pathname, pathlen,
> +					      dtype, pattern, istate))
> +			continue;
>  
>  		if (pattern->flags & PATTERN_FLAG_NODIR) {
>  			if (match_basename(basename,
Derrick Stolee Sept. 23, 2021, 1:39 p.m. UTC | #2
On 9/22/2021 7:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +static int path_matches_dir_pattern(const char *pathname,
>> +				    int pathlen,
>> +				    int *dtype,
>> +				    struct path_pattern *pattern,
>> +				    struct index_state *istate)
>> +{
>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>> +	if (*dtype != DT_DIR)
>> +		return 0;
>> +
>> +	return 1;
>> +}
> 
> The function name and parameter list have "pattern" but as far as I
> can see any "matches" or "pattern" comes into the picture.  The code
> in the caller after calling this function may be doing pattern
> matching, but not this one.
> 
> What this helper is doing is "signal if the pathname in the working
> tree is supposed to be a directory with the return value, while
> filling *dtype with what kind of thing it is."
> 
> path_must_be_dir_in_working_tree() or something, perhaps?

Yes, a rename would be prudent here. Thanks.

-Stolee
Derrick Stolee Sept. 23, 2021, 1:42 p.m. UTC | #3
On 9/23/2021 9:39 AM, Derrick Stolee wrote:
> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> +static int path_matches_dir_pattern(const char *pathname,
>>> +				    int pathlen,
>>> +				    int *dtype,
>>> +				    struct path_pattern *pattern,
>>> +				    struct index_state *istate)
>>> +{
>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>> +	if (*dtype != DT_DIR)
>>> +		return 0;
>>> +
>>> +	return 1;
>>> +}
>>
>> The function name and parameter list have "pattern" but as far as I
>> can see any "matches" or "pattern" comes into the picture.  The code
>> in the caller after calling this function may be doing pattern
>> matching, but not this one.
>>
>> What this helper is doing is "signal if the pathname in the working
>> tree is supposed to be a directory with the return value, while
>> filling *dtype with what kind of thing it is."
>>
>> path_must_be_dir_in_working_tree() or something, perhaps?
> 
> Yes, a rename would be prudent here. Thanks.

Of course, when I go to amend the commit, the commit message says

	We will expand the path_matches_dir_pattern() method in a following
	change.

which means that more will follow that will actually care about the
pattern and matching as a directory.

After looking at the extension in the next patch, do you still think a
rename is necessary? Specifically, this diff hunk:

diff --git a/dir.c b/dir.c
index 652135df896..9ea6cfe61cb 100644
--- a/dir.c
+++ b/dir.c
@@ -1305,10 +1305,35 @@ int match_pathname(const char *pathname, int pathlen,
 
 static int path_matches_dir_pattern(const char *pathname,
 				    int pathlen,
+				    struct strbuf **path_parent,
 				    int *dtype,
 				    struct path_pattern *pattern,
 				    struct index_state *istate)
 {
+	if (!*path_parent) {
+		char *slash;
+		CALLOC_ARRAY(*path_parent, 1);
+		strbuf_add(*path_parent, pathname, pathlen);
+		slash = find_last_dir_sep((*path_parent)->buf);
+
+		if (slash)
+			strbuf_setlen(*path_parent, slash - (*path_parent)->buf);
+		else
+			strbuf_setlen(*path_parent, 0);
+	}
+
+	/*
+	 * If the parent directory matches the pattern, then we do not
+	 * need to check for dtype.
+	 */
+	if ((*path_parent)->len &&
+	    match_pathname((*path_parent)->buf, (*path_parent)->len,
+			   pattern->base,
+			   pattern->baselen ? pattern->baselen - 1 : 0,
+			   pattern->pattern, pattern->nowildcardlen,
+			   pattern->patternlen, pattern->flags))
+		return 1;
+
 	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
 	if (*dtype != DT_DIR)
 		return 0;

Thanks,
-Stolee
Junio C Hamano Sept. 23, 2021, 6:23 p.m. UTC | #4
Derrick Stolee <stolee@gmail.com> writes:

> On 9/23/2021 9:39 AM, Derrick Stolee wrote:
>> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> +static int path_matches_dir_pattern(const char *pathname,
>>>> +				    int pathlen,
>>>> +				    int *dtype,
>>>> +				    struct path_pattern *pattern,
>>>> +				    struct index_state *istate)
>>>> +{
>>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>>> +	if (*dtype != DT_DIR)
>>>> +		return 0;
>>>> +
>>>> +	return 1;
>>>> +}
>>>
>>> The function name and parameter list have "pattern" but as far as I
>>> can see any "matches" or "pattern" comes into the picture.  The code
>>> in the caller after calling this function may be doing pattern
>>> matching, but not this one.
>>>
>>> What this helper is doing is "signal if the pathname in the working
>>> tree is supposed to be a directory with the return value, while
>>> filling *dtype with what kind of thing it is."
>>>
>>> path_must_be_dir_in_working_tree() or something, perhaps?
>> 
>> Yes, a rename would be prudent here. Thanks.
>
> Of course, when I go to amend the commit, the commit message says
>
> 	We will expand the path_matches_dir_pattern() method in a following
> 	change.
>
> which means that more will follow that will actually care about the
> pattern and matching as a directory.
>
> After looking at the extension in the next patch, do you still think a
> rename is necessary?

When the focus and purpose of the function changes, it may warrant a
rename to include "matching" or "pattern", but not before.

Or we might be seeing a premature refactoring with these two steps.
Are we gaining multiple callers of this function before it gets
extended to care about pattern and matching?  If not, perhaps
teaching the inlined codepath about the pattern and matching in
place first before extracting the code to a helper function for
readability and reusability may help make the resulting series
easier to follow, and we do not have to see a function with a
misleading name.
Derrick Stolee Sept. 24, 2021, 1:29 p.m. UTC | #5
On 9/23/2021 2:23 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> On 9/23/2021 9:39 AM, Derrick Stolee wrote:
>>> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>>
>>>>> +static int path_matches_dir_pattern(const char *pathname,
>>>>> +				    int pathlen,
>>>>> +				    int *dtype,
>>>>> +				    struct path_pattern *pattern,
>>>>> +				    struct index_state *istate)
>>>>> +{
>>>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>>>> +	if (*dtype != DT_DIR)
>>>>> +		return 0;
>>>>> +
>>>>> +	return 1;
>>>>> +}
>>>>
>>>> The function name and parameter list have "pattern" but as far as I
>>>> can see any "matches" or "pattern" comes into the picture.  The code
>>>> in the caller after calling this function may be doing pattern
>>>> matching, but not this one.
>>>>
>>>> What this helper is doing is "signal if the pathname in the working
>>>> tree is supposed to be a directory with the return value, while
>>>> filling *dtype with what kind of thing it is."
>>>>
>>>> path_must_be_dir_in_working_tree() or something, perhaps?
>>>
>>> Yes, a rename would be prudent here. Thanks.
>>
>> Of course, when I go to amend the commit, the commit message says
>>
>> 	We will expand the path_matches_dir_pattern() method in a following
>> 	change.
>>
>> which means that more will follow that will actually care about the
>> pattern and matching as a directory.
>>
>> After looking at the extension in the next patch, do you still think a
>> rename is necessary?
> 
> When the focus and purpose of the function changes, it may warrant a
> rename to include "matching" or "pattern", but not before.
> 
> Or we might be seeing a premature refactoring with these two steps.
> Are we gaining multiple callers of this function before it gets
> extended to care about pattern and matching?  If not, perhaps
> teaching the inlined codepath about the pattern and matching in
> place first before extracting the code to a helper function for
> readability and reusability may help make the resulting series
> easier to follow, and we do not have to see a function with a
> misleading name.
 
Squashing these two patches together has the same effect, but
takes a little bit extra work to see that the re-used code is
the same. It's small enough that I don't see that as a huge
hurdle.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index 86afa2eae00..652135df896 100644
--- a/dir.c
+++ b/dir.c
@@ -1303,6 +1303,19 @@  int match_pathname(const char *pathname, int pathlen,
 				 WM_PATHNAME) == 0;
 }
 
+static int path_matches_dir_pattern(const char *pathname,
+				    int pathlen,
+				    int *dtype,
+				    struct path_pattern *pattern,
+				    struct index_state *istate)
+{
+	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
+	if (*dtype != DT_DIR)
+		return 0;
+
+	return 1;
+}
+
 /*
  * Scan the given exclude list in reverse to see whether pathname
  * should be ignored.  The first match (i.e. the last on the list), if
@@ -1327,11 +1340,10 @@  static struct path_pattern *last_matching_pattern_from_list(const char *pathname
 		const char *exclude = pattern->pattern;
 		int prefix = pattern->nowildcardlen;
 
-		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
-			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
-			if (*dtype != DT_DIR)
-				continue;
-		}
+		if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) &&
+		    !path_matches_dir_pattern(pathname, pathlen,
+					      dtype, pattern, istate))
+			continue;
 
 		if (pattern->flags & PATTERN_FLAG_NODIR) {
 			if (match_basename(basename,