diff mbox series

[03/10] dir.c: accept a directory as part of cone-mode patterns

Message ID 28ca717e6526f0b66df696f3237d72b9bee2ffc3.1618322497.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse-index: integrate with status and add | expand

Commit Message

Derrick Stolee April 13, 2021, 2:01 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Elijah Newren April 20, 2021, 11:21 p.m. UTC | #1
On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When we have sparse directory entries in the index, we want to compare
> that directory against sparse-checkout patterns. Those pattern matching
> algorithms are built expecting a file path, not a directory path. This
> is especially important in the "cone mode" patterns which will match
> files that exist within the "parent directories" as well as the
> recursive directory matches.
>
> If path_matches_pattern_list() is given a directory, we can add a fake
> filename ("-") to the directory and get the same results as before,
> assuming we are in cone mode. Since sparse index requires cone mode
> patterns, this is an acceptable assumption.

Makes sense; thanks for the good description.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index 166238e79f52..57e22e605cec 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>         strbuf_addch(&parent_pathname, '/');
>         strbuf_add(&parent_pathname, pathname, pathlen);
>
> +       /* Directory requests should be added as if they are a file */

"added" or "matched"?  Also, the description seems a bit brief and
likely to surprise; I'd at least want to expand "file" to "file within
their given directory" but it might be nice to get some summarized
version of the commit message or at least state that "-" is just a
random simple name within the given directory.

> +       if (parent_pathname.len > 1 &&

Is this line...

> +           parent_pathname.buf[parent_pathname.len - 1] == '/')

to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
above ensure the condition is always met?

Or are we trying to avoid adding the "-" when we parent_pathname is
just a plain "/"?

> +               strbuf_add(&parent_pathname, "-", 1);
> +

Sorry for all the questions on such a tiny change.  It makes sense to
me, I'm just curious whether it'll confuse future code readers.


>         if (hashmap_contains_path(&pl->recursive_hashmap,
>                                   &parent_pathname)) {
>                 result = MATCHED_RECURSIVE;
> --
Derrick Stolee April 21, 2021, 1:47 p.m. UTC | #2
On 4/20/2021 7:21 PM, Elijah Newren wrote:
> On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When we have sparse directory entries in the index, we want to compare
>> that directory against sparse-checkout patterns. Those pattern matching
>> algorithms are built expecting a file path, not a directory path. This
>> is especially important in the "cone mode" patterns which will match
>> files that exist within the "parent directories" as well as the
>> recursive directory matches.
>>
>> If path_matches_pattern_list() is given a directory, we can add a fake
>> filename ("-") to the directory and get the same results as before,
>> assuming we are in cone mode. Since sparse index requires cone mode
>> patterns, this is an acceptable assumption.
> 
> Makes sense; thanks for the good description.
> 
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  dir.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/dir.c b/dir.c
>> index 166238e79f52..57e22e605cec 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1378,6 +1378,11 @@ enum pattern_match_result path_matches_pattern_list(
>>         strbuf_addch(&parent_pathname, '/');
>>         strbuf_add(&parent_pathname, pathname, pathlen);
>>
>> +       /* Directory requests should be added as if they are a file */
> 
> "added" or "matched"?  Also, the description seems a bit brief and
> likely to surprise; I'd at least want to expand "file" to "file within
> their given directory" but it might be nice to get some summarized
> version of the commit message or at least state that "-" is just a
> random simple name within the given directory.

I can improve this comment.

>> +       if (parent_pathname.len > 1 &&
> 
> Is this line...
> 
>> +           parent_pathname.buf[parent_pathname.len - 1] == '/')
> 
> to prevent an out-of-bounds indexing?  If so, shouldn't it be "> 0" or
> ">= 1" rather than "> 1"?  And if so, doesn't the strbuf_addch() call
> above ensure the condition is always met?
> 
> Or are we trying to avoid adding the "-" when we parent_pathname is
> just a plain "/"?

I believe plain "/" is impossible. There needs to be a valid tree entry
before that first slash ("a/", for example). But that isn't super
important to the logic here and just adds confusion.

> 
>> +               strbuf_add(&parent_pathname, "-", 1);
>> +
> 
> Sorry for all the questions on such a tiny change.  It makes sense to
> me, I'm just curious whether it'll confuse future code readers.

Yes, let's avoid confusion by doing the simple thing and use "> 0".

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index 166238e79f52..57e22e605cec 100644
--- a/dir.c
+++ b/dir.c
@@ -1378,6 +1378,11 @@  enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/* Directory requests should be added as if they are a file */
+	if (parent_pathname.len > 1 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;