diff mbox series

[17/27] dir.c: accept a directory as part of cone-mode patterns

Message ID 036653cac368c6c04b439f5352d70a5dcc3c5feb.1611596534.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series Sparse Index | expand

Commit Message

Derrick Stolee Jan. 25, 2021, 5:42 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.

If path_matches_pattern_list() is given a directory, we can add a bogus
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Elijah Newren Feb. 1, 2021, 10:12 p.m. UTC | #1
On Mon, Jan 25, 2021 at 9:42 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When we have sparse directory entries in the index, we want to compare
> that directory against sparse-checkout patterns. Those pattern matching
> algorithms are built expecting a file path, not a directory path. This
> is especially important in the "cone mode" patterns which will match
> files that exist within the "parent directories" as well as the
> recursive directory matches.
>
> If path_matches_pattern_list() is given a directory, we can add a bogus
> filename ("-") to the directory and get the same results as before,
> assuming we are in cone mode. Since sparse index requires cone mode
> patterns, this is an acceptable assumption.

Why is "-" a bogus filename?  Is that only on certain operating
systems, or are you just not expecting a user to name their file with
such a bad name?  What if there is a file with that name in that
directory in the repository; do you need the pathname to be bogus?

What do you mean by "get the same results as before"?  The first
paragraph suggests the code wouldn't handle a directory path, and that
not handling it was problematic, so it seems unlikely you want the
same results as that.  But it's not clear what the "before" refers to
here.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index ad6eb033cb1..c786fa98d0e 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1384,6 +1384,11 @@ enum pattern_match_result path_matches_pattern_list(
>         strbuf_addch(&parent_pathname, '/');
>         strbuf_add(&parent_pathname, pathname, pathlen);
>
> +       /* Directory requests should be added as if they are a file */
> +       if (parent_pathname.len > 1 &&
> +           parent_pathname.buf[parent_pathname.len - 1] == '/')

Ah, this looks like a case where the trailing slash is helpful;
without it, you might have to feed extra data in through the call
hierarchy to signify that this is a directory entry.

> +               strbuf_add(&parent_pathname, "-", 1);
> +
>         if (hashmap_contains_path(&pl->recursive_hashmap,
>                                   &parent_pathname)) {
>                 result = MATCHED_RECURSIVE;

hashmap_contains_path?  Don't we already know (modulo special cases of
our bogus value not quite being bogus enough) that this is false since
we were adding a bogus path?  How could the hashmap have a bogus value
in it?  Won't this particular call fail with or without our adding "-"
to the end of the path?

After this hashmap_contains_path() call, the subsequent code looks for
the parent of the path by stripping off everything after the last
'/'...which seems like the relevant code anyway.  Is the problem that
the hashmap_contains_path() call was returning true when we didn't add
"-" to the end?  If so, can we use and if or a goto instead to make
the code skip this first check and move on to where we want it to go?

Or am I misunderstanding something about this code?
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index ad6eb033cb1..c786fa98d0e 100644
--- a/dir.c
+++ b/dir.c
@@ -1384,6 +1384,11 @@  enum pattern_match_result path_matches_pattern_list(
 	strbuf_addch(&parent_pathname, '/');
 	strbuf_add(&parent_pathname, pathname, pathlen);
 
+	/* Directory requests should be added as if they are a file */
+	if (parent_pathname.len > 1 &&
+	    parent_pathname.buf[parent_pathname.len - 1] == '/')
+		strbuf_add(&parent_pathname, "-", 1);
+
 	if (hashmap_contains_path(&pl->recursive_hashmap,
 				  &parent_pathname)) {
 		result = MATCHED_RECURSIVE;