[1/1] sparse-checkout: respect core.ignoreCase in cone mode
diff mbox series

Message ID 23705845ce73992bf7ab645d28febebe0a698d49.1575920580.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • sparse-checkout: respect core.ignoreCase in cone mode
Related show

Commit Message

Hariom Verma via GitGitGadget Dec. 9, 2019, 7:43 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.

The sanitize_cone_input() method is named to imply that other checks
may be added. In fact, such checks are planned including looking for
wildcards that make the paths invalid cone patterns or must be
escaped.

Specifically, if the path has a match in the index, then use that
path instead. If there is no match, still add that path to the
patterns, as the user may expect the directory to appear after a
checkout to another ref. However, we have no matching path to
correct for a case conflict, and must assume that the user provided
the correct case.

Another option would be to do case-insensitive checks while
updating the skip-worktree bits during unpack_trees(). Outside of
the potential performance loss on a more expensive code path, that
also breaks compatibility with older versions of Git as using the
same sparse-checkout file would change the paths that are included.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt |  4 ++++
 builtin/sparse-checkout.c             | 19 +++++++++++++++++--
 cache.h                               |  1 +
 name-hash.c                           | 10 ++++++++++
 t/t1091-sparse-checkout-builtin.sh    | 13 +++++++++++++
 5 files changed, 45 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Dec. 11, 2019, 6:44 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Another option would be to do case-insensitive checks while
> updating the skip-worktree bits during unpack_trees(). Outside of
> the potential performance loss on a more expensive code path, that
> also breaks compatibility with older versions of Git as using the
> same sparse-checkout file would change the paths that are included.

I haven't thought things through, but that sounds as if saying that
we cannot fix old bugs.  We use the entries in the index as a hint
for "correcting" a path to the right case on a case-insensitive
filesystem to avoid corrupting the case in the index, which is case
sensitive and is a way to convey the intended case (by writing them
out to a tree object) to those who use systems that can reliably
reproduce cases in pathnames.  But that still does not mean on a
case-insensitive filesystem, "hello.c" in the "right" case recorded
in the index will always be spelled like so---somebody can make
readdir() or equivalent to yield "Hello.c" for it, and if the user
tells us to say they want to do X (e.g. ignore, skip checkout, etc.)
to "hello.c", we should do the same to "Hello.c" on such a system, I
would think.

If the runtime needs to deal with the case insensitive filesystems
anyway (and I suspect it does), there isn't much point matching and
forcing the case to the paths in the index like this patch does, I
think.  So...


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |  4 ++++
>  builtin/sparse-checkout.c             | 19 +++++++++++++++++--
>  cache.h                               |  1 +
>  name-hash.c                           | 10 ++++++++++
>  t/t1091-sparse-checkout-builtin.sh    | 13 +++++++++++++
>  5 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index b975285673..849efa0f0b 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -150,6 +150,10 @@ expecting patterns of these types. Git will warn if the patterns do not match.
>  If the patterns do match the expected format, then Git will use faster hash-
>  based algorithms to compute inclusion in the sparse-checkout.
>  
> +If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
> +correct for incorrect case when assigning patterns to the sparse-checkout
> +file.
> +
>  SEE ALSO
>  --------
>  
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index a542d617a5..0de426384e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -336,6 +336,22 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
>  	}
>  }
>  
> +static void sanitize_cone_input(struct strbuf *line)
> +{
> +	if (ignore_case) {
> +		struct index_state *istate = the_repository->index;
> +		const char *name = index_dir_matching_name(istate, line->buf, line->len);
> +
> +		if (name) {
> +			strbuf_setlen(line, 0);
> +			strbuf_addstr(line, name);
> +		}
> +	}
> +
> +	if (line->buf[0] != '/')
> +		strbuf_insert(line, 0, "/", 1);
> +}
> +
>  static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
>  {
>  	strbuf_trim(line);
> @@ -345,8 +361,7 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
>  	if (!line->len)
>  		return;
>  
> -	if (line->buf[0] != '/')
> -		strbuf_insert(line, 0, "/", 1);
> +	sanitize_cone_input(line);
>  
>  	insert_recursive_pattern(pl, line);
>  }
> diff --git a/cache.h b/cache.h
> index d3c89e7a53..a2d9d437f0 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -728,6 +728,7 @@ int repo_index_has_changes(struct repository *repo,
>  int verify_path(const char *path, unsigned mode);
>  int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
>  
> diff --git a/name-hash.c b/name-hash.c
> index ceb1d7bd6f..46898b6571 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -681,6 +681,16 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	return dir && dir->nr;
>  }
>  
> +const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
> +{
> +	struct dir_entry *dir;
> +
> +	lazy_init_name_hash(istate);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	return dir ? dir->name : NULL;
> +}
> +
>  void adjust_dirname_case(struct index_state *istate, char *name)
>  {
>  	const char *startPtr = name;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index d5e2892526..d0ce48869f 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -304,4 +304,17 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
>  	git -C dirty sparse-checkout disable
>  '
>  
> +test_expect_success 'cone mode: set with core.ignoreCase=true' '
> +	test_when_finished git -C repo config --unset core.ignoreCase &&
> +	git -C repo sparse-checkout init --cone &&
> +	git -C repo config core.ignoreCase true &&
> +	git -C repo sparse-checkout set Folder1 &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +		/folder1/
> +	EOF
> +	test_cmp expect repo/.git/info/sparse-checkout
> +'
> +
>  test_done
Derrick Stolee Dec. 11, 2019, 7:11 p.m. UTC | #2
On 12/11/2019 1:44 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Another option would be to do case-insensitive checks while
>> updating the skip-worktree bits during unpack_trees(). Outside of
>> the potential performance loss on a more expensive code path, that
>> also breaks compatibility with older versions of Git as using the
>> same sparse-checkout file would change the paths that are included.
> 
> I haven't thought things through, but that sounds as if saying that
> we cannot fix old bugs.  We use the entries in the index as a hint
> for "correcting" a path to the right case on a case-insensitive
> filesystem to avoid corrupting the case in the index, which is case
> sensitive and is a way to convey the intended case (by writing them
> out to a tree object) to those who use systems that can reliably
> reproduce cases in pathnames.  But that still does not mean on a
> case-insensitive filesystem, "hello.c" in the "right" case recorded
> in the index will always be spelled like so---somebody can make
> readdir() or equivalent to yield "Hello.c" for it, and if the user
> tells us to say they want to do X (e.g. ignore, skip checkout, etc.)
> to "hello.c", we should do the same to "Hello.c" on such a system, I
> would think.
> 
> If the runtime needs to deal with the case insensitive filesystems
> anyway (and I suspect it does), there isn't much point matching and
> forcing the case to the paths in the index like this patch does, I
> think.  So...

I'm trying to find a way around these two ideas:

1. The index is case-sensitive, and the sparse-checkout patterns are
   case-sensitive.

2. If a filesystem is case-insensitive, most index-change operations
   already succeed with incorrect case, especially with core.ignoreCase
   enabled.

The approach I have is to allow a user to provide a case that does not
match the index, and then we store the pattern in the sparse-checkout
that matches the case in the index.

Another approach would be to make the sparse-checkout patterns be
case-insensitive when core.ignoreCase is enabled. I chose against
this due to the compatibility and the performance cost. We would
likely pay that performance penalty even if all the patterns have
the correct case. It violates the existing "contract" with the
sparse-checkout file, and that is probably what you are talking about
with "we cannot fix old bugs".

It sounds like you are preferring this second option, despite the
performance costs. It is possible to use a case-insensitive hashing
algorithm when in the cone mode, but I'm not sure about how to do
a similar concept for the normal mode.

Thanks,
-Stolee
Junio C Hamano Dec. 11, 2019, 8 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

> I'm trying to find a way around these two ideas:
>
> 1. The index is case-sensitive, and the sparse-checkout patterns are
>    case-sensitive.

OK.  The latter is local to the repository and not shared to the
world where people with case sensitive systems would live, right?

> 2. If a filesystem is case-insensitive, most index-change operations
>    already succeed with incorrect case, especially with core.ignoreCase
>    enabled.

I am not sure about "incorrect", though.  

My understanding is that modern case-insensitive systems are not
information-destroying [*1*].  A new file you create as "ReadMe.txt"
on your filesystem would be seen in that mixed case spelling via
readdir() or equivalent, so adding it to the index as-is would
likely be in the "correct" case, no?  If, after adding that path to
the index, you did "rm ReadMe.txt" and created "README.TXT", surely
we won't have both ReadMe.txt and README.TXT in the index with
ignoreCase set, and keep using ReadMe.txt that matches what you
added to the index.  I am not sure which one is the "incorrect" case
in such a scenario.

> The approach I have is to allow a user to provide a case that does not
> match the index, and then we store the pattern in the sparse-checkout
> that matches the case in the index.

Yes, I understood that from your proposed log message and cover
letter.  They were very clearly written.

But imagine that your user writes ABC in the sparse pattern file,
and there is no abc anything in the index in any case permutations.

When you check out a branch that has Abc, shouldn't the pattern ABC
affect the operation just like a pattern Abc would on a case
insensitive system?

Or are end users perfectly aware that the thing in that branch is
spelled "Abc" and not "ABC" (the data in Git does---it comes from a
tree object that is case sensitive)?  If so, then the pattern ABC
should not affect the subtree whose name is "Abc" even on a case
insensitive system.

I am not sure what the design of this part of the system expects out
of the end user.  Perhaps keeping the patterns case insensitive and
tell the end users to spell them correctly is the right way to go, I
guess, if it is just the filesystem that cannot represente the cases
correctly at times and the users are perfectly capable of telling
the right and wrong cases apart.

But then I am not sure why correcting misspelled names by comparing
them with the index entries is a good idea, either.

> It sounds like you are preferring this second option, despite the
> performance costs. It is possible to use a case-insensitive hashing
> algorithm when in the cone mode, but I'm not sure about how to do
> a similar concept for the normal mode.

I have no strong preference, other than that I prefer to see things
being done consistently.  Don't we already use case folding hash
function to ensure that we won't add duplicate to the index on
case-insensitive system?  I suspect that we would need to do the
same, whether in cone mode or just a normal sparse-checkout mode
consistently.

Thanks.


[Footnote]

*1* ... unlike HFS+ where everything is forced to NKD and a bit of
information---whether the original was in NKC or NKD---is discarded
forever.
Derrick Stolee Dec. 11, 2019, 8:29 p.m. UTC | #4
On 12/11/2019 3:00 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> I'm trying to find a way around these two ideas:
>>
>> 1. The index is case-sensitive, and the sparse-checkout patterns are
>>    case-sensitive.
> 
> OK.  The latter is local to the repository and not shared to the
> world where people with case sensitive systems would live, right?

Yes, this information is local to a specific local copy. Even worktrees
have distinct sparse-checkout files.

>> 2. If a filesystem is case-insensitive, most index-change operations
>>    already succeed with incorrect case, especially with core.ignoreCase
>>    enabled.
> 
> I am not sure about "incorrect", though.  
> 
> My understanding is that modern case-insensitive systems are not
> information-destroying [*1*].  A new file you create as "ReadMe.txt"
> on your filesystem would be seen in that mixed case spelling via
> readdir() or equivalent, so adding it to the index as-is would
> likely be in the "correct" case, no?  If, after adding that path to
> the index, you did "rm ReadMe.txt" and created "README.TXT", surely
> we won't have both ReadMe.txt and README.TXT in the index with
> ignoreCase set, and keep using ReadMe.txt that matches what you
> added to the index.  I am not sure which one is the "incorrect" case
> in such a scenario.

The specific example I have in my mind is that the filesystem can have
"README.TXT" as what it would report by readdir(), but a user can type

	git add readme.txt

Without core.ignoreCase, the index will store the path as "readme.txt",
possibly adding a new entry in the index next to "README.TXT". If
core.ignoreCase is enabled, then the case reported by readdir() is used.
(This works both for a tracked and untracked path.)

>> The approach I have is to allow a user to provide a case that does not
>> match the index, and then we store the pattern in the sparse-checkout
>> that matches the case in the index.
> 
> Yes, I understood that from your proposed log message and cover
> letter.  They were very clearly written.
> 
> But imagine that your user writes ABC in the sparse pattern file,
> and there is no abc anything in the index in any case permutations.
> 
> When you check out a branch that has Abc, shouldn't the pattern ABC
> affect the operation just like a pattern Abc would on a case
> insensitive system?
> 
> Or are end users perfectly aware that the thing in that branch is
> spelled "Abc" and not "ABC" (the data in Git does---it comes from a
> tree object that is case sensitive)?  If so, then the pattern ABC
> should not affect the subtree whose name is "Abc" even on a case
> insensitive system.

This is a case that is difficult to control for. We have no source
of truth (filesystem or index) at the time of the 'git sparse-checkout
set' command. I cannot think of a solution to this specific issue
without doing the more-costly approach on every unpack_trees() call.

I believe this case may be narrow enough to "document and don't-fix",
but please speak up if anyone thinks this _must_ be fixed.

The other thing I can say is that my current approach _could_ be
replaced in the future by the more invasive check in unpack_trees().
The behavior change would be invisible to users who don't inspect
their sparse-checkout files other than this narrow case.

Specifically, our internal customer is planning to update the
sparse-checkout cone based on files in the workdir, which means that
the paths are expected to be in the index at the time of the 'set'
call.

> I am not sure what the design of this part of the system expects out
> of the end user.  Perhaps keeping the patterns case insensitive and
> tell the end users to spell them correctly is the right way to go, I
> guess, if it is just the filesystem that cannot represente the cases
> correctly at times and the users are perfectly capable of telling
> the right and wrong cases apart.

My first reaction to this internal request was "just clean up your
data." The issue is keeping it clean as users are changing the data
often and the only current checks are whether the build passes (and
the build "sees" the files because the filesystem accepts the wrong
case).

The "git add" behavior made me think there was sufficient precedent
in Git to try this change.

I'm happy to follow the community's opinion in this matter, including
a hard "we will not correct for case in this feature" or "if you want
this then you'll pay for it in performance." In the latter case I'd
prefer a new config option to toggle the sparse-checkout case
insensitivity. That way users could have core.ignoreCase behavior without
the perf hit if they use clean input to the sparse-checkout feature.

> But then I am not sure why correcting misspelled names by comparing
> them with the index entries is a good idea, either.

Right, that seems like a line too far.

>> It sounds like you are preferring this second option, despite the
>> performance costs. It is possible to use a case-insensitive hashing
>> algorithm when in the cone mode, but I'm not sure about how to do
>> a similar concept for the normal mode.
> 
> I have no strong preference, other than that I prefer to see things
> being done consistently.  Don't we already use case folding hash
> function to ensure that we won't add duplicate to the index on
> case-insensitive system?  I suspect that we would need to do the
> same, whether in cone mode or just a normal sparse-checkout mode
> consistently.

Since the cone mode uses a hashset to match the paths to the patterns,
that conversion is possible. I'm not sure how to start making arbitrary
regexes be case-insensitive in the non-cone-mode case. Suggestions?

Thank you for the discussion. I was hoping to get feedback on this
approach, which is why this patch is isolated to only this issue.

Thanks,
-Stolee
Junio C Hamano Dec. 11, 2019, 9:37 p.m. UTC | #5
Derrick Stolee <stolee@gmail.com> writes:

> The specific example I have in my mind is that the filesystem can have
> "README.TXT" as what it would report by readdir(), but a user can type
>
> 	git add readme.txt
>
> Without core.ignoreCase, the index will store the path as "readme.txt",
> possibly adding a new entry in the index next to "README.TXT". If
> core.ignoreCase is enabled, then the case reported by readdir() is used.
> (This works both for a tracked and untracked path.)

Yes, but which one is "correct"?  readme.txt read from the index or
README.TXT read from the filesystem?  Presumably, when the paths got
first checked out of the index, it would have been in "readme.txt"
on the filesystem, so the user must have done on purpose something
to cause the file to be named in all uppercase since then.

> This is a case that is difficult to control for. We have no source
> of truth (filesystem or index) at the time of the 'git sparse-checkout
> set' command. I cannot think of a solution to this specific issue
> without doing the more-costly approach on every unpack_trees() call.
>
> I believe this case may be narrow enough to "document and don't-fix",
> but please speak up if anyone thinks this _must_ be fixed.

I do not think it "must be" fixed in the sense that "if it hurts,
don't do so" is a sufficient remedy.  But then for exactly the same
reason, I do not think it is worth doing what you do in this patch.

On the other hand, I think runtime case folding, just like what we
do when "git add" is told to handle a path in different case, would
be the only "right" way to match end-user expectations on a case
insensitive system, even though that is a "nice to do so" and
certainly not a "must do so" thing.

> The "git add" behavior made me think there was sufficient precedent
> in Git to try this change.

Since I view that the "git add" behaviour is a "nice to do so"
runtime conversion, I would actually think the approach you
discarded would be more in line with the precedent.

> I'm happy to follow the community's opinion in this matter, including
> a hard "we will not correct for case in this feature" or "if you want
> this then you'll pay for it in performance." In the latter case I'd
> prefer a new config option to toggle the sparse-checkout case
> insensitivity. That way users could have core.ignoreCase behavior without
> the perf hit if they use clean input to the sparse-checkout feature.

I value correctness more---a system that does incorrect thing
quickly is not all that interesting.

Assuming that your users are sensible and feed good data, how much
"performance hit" are we talking about?  Wouldn't this be something
we can make into a "if you have the paths in the correct case, we'll
see the match just as quickly as in the case sensitive system, so
most of the time there is no penalty, but for correctness we would
also fall back to try different cases"?
Derrick Stolee Dec. 12, 2019, 2:51 a.m. UTC | #6
On 12/11/2019 4:37 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> The specific example I have in my mind is that the filesystem can have
>> "README.TXT" as what it would report by readdir(), but a user can type
>>
>> 	git add readme.txt
>>
>> Without core.ignoreCase, the index will store the path as "readme.txt",
>> possibly adding a new entry in the index next to "README.TXT". If
>> core.ignoreCase is enabled, then the case reported by readdir() is used.
>> (This works both for a tracked and untracked path.)
> 
> Yes, but which one is "correct"?  readme.txt read from the index or
> README.TXT read from the filesystem?  Presumably, when the paths got
> first checked out of the index, it would have been in "readme.txt"
> on the filesystem, so the user must have done on purpose something
> to cause the file to be named in all uppercase since then.

Ah. The issue here is that with core.ignoreCase=true, the index and
filesystem agree. It's just the user input that differs.

>> This is a case that is difficult to control for. We have no source
>> of truth (filesystem or index) at the time of the 'git sparse-checkout
>> set' command. I cannot think of a solution to this specific issue
>> without doing the more-costly approach on every unpack_trees() call.
>>
>> I believe this case may be narrow enough to "document and don't-fix",
>> but please speak up if anyone thinks this _must_ be fixed.
> 
> I do not think it "must be" fixed in the sense that "if it hurts,
> don't do so" is a sufficient remedy.  But then for exactly the same
> reason, I do not think it is worth doing what you do in this patch.
> 
> On the other hand, I think runtime case folding, just like what we
> do when "git add" is told to handle a path in different case, would
> be the only "right" way to match end-user expectations on a case
> insensitive system, even though that is a "nice to do so" and
> certainly not a "must do so" thing.
> 
>> The "git add" behavior made me think there was sufficient precedent
>> in Git to try this change.
> 
> Since I view that the "git add" behaviour is a "nice to do so"
> runtime conversion, I would actually think the approach you
> discarded would be more in line with the precedent.
> 
>> I'm happy to follow the community's opinion in this matter, including
>> a hard "we will not correct for case in this feature" or "if you want
>> this then you'll pay for it in performance." In the latter case I'd
>> prefer a new config option to toggle the sparse-checkout case
>> insensitivity. That way users could have core.ignoreCase behavior without
>> the perf hit if they use clean input to the sparse-checkout feature.
> 
> I value correctness more---a system that does incorrect thing
> quickly is not all that interesting.
> 
> Assuming that your users are sensible and feed good data, how much
> "performance hit" are we talking about?

I'll need to build a prototype to test the performance hit. Maybe
I'm overestimating the effect of using a case-insensitive hash.

> Wouldn't this be something
> we can make into a "if you have the paths in the correct case, we'll
> see the match just as quickly as in the case sensitive system, so
> most of the time there is no penalty, but for correctness we would
> also fall back to try different cases"?

I think we would want the config option present to say "my data may
not be in the proper case, please do extra work for me". I can't
think of a way to do the fallback in real-time.

I'll try again with the case-insensitive hash algorithm and see
how that shakes out.

Thanks,
-Stolee
Derrick Stolee Dec. 12, 2019, 8:59 p.m. UTC | #7
On 12/9/2019 2:43 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When a user uses the sparse-checkout feature in cone mode, they
> add patterns using "git sparse-checkout set <dir1> <dir2> ..."
> or by using "--stdin" to provide the directories line-by-line over
> stdin. This behaviour naturally looks a lot like the way a user
> would type "git add <dir1> <dir2> ..."

Junio:

While I was updating my GGG PR [1] with new logic, I see that you
added this commit to ds/sparse-cone. I didn't intend to these
to be in the same topic, and I was hoping this discussion doesn't
delay the other commits in ds/sparse-cone from merging.

If you have a plan for that branch and the merge status of those
commits, then I'm happy to re-target my PR against 'next' or
an equivalent branch.

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/488
Junio C Hamano Dec. 13, 2019, 7:05 p.m. UTC | #8
Derrick Stolee <stolee@gmail.com> writes:

> If you have a plan for that branch and the merge status of those
> commits, then I'm happy to re-target my PR against 'next' or
> an equivalent branch.

This change does not make much sense without the cone-mode topic,
no?
Derrick Stolee Dec. 13, 2019, 7:40 p.m. UTC | #9
On 12/13/2019 2:05 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> If you have a plan for that branch and the merge status of those
>> commits, then I'm happy to re-target my PR against 'next' or
>> an equivalent branch.
> 
> This change does not make much sense without the cone-mode topic,
> no?

I'm saying it should go on top, as a new branch that depends on
ds/sparse-cone. Sorry for not making that clear.

Thanks,
-Stolee
Junio C Hamano Dec. 13, 2019, 10:02 p.m. UTC | #10
Derrick Stolee <stolee@gmail.com> writes:

> On 12/13/2019 2:05 PM, Junio C Hamano wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>> 
>>> If you have a plan for that branch and the merge status of those
>>> commits, then I'm happy to re-target my PR against 'next' or
>>> an equivalent branch.
>> 
>> This change does not make much sense without the cone-mode topic,
>> no?
>
> I'm saying it should go on top, as a new branch that depends on
> ds/sparse-cone. Sorry for not making that clear.

I thought that it was a *bug* that ds/sparse-cone topic does not
match the pattern case insensitively on case insensitive systems,
and from that point of view, letting the earlier part of the topic
graduate without the fix would not make sense, I thought.

Thanks.

Patch
diff mbox series

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b975285673..849efa0f0b 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -150,6 +150,10 @@  expecting patterns of these types. Git will warn if the patterns do not match.
 If the patterns do match the expected format, then Git will use faster hash-
 based algorithms to compute inclusion in the sparse-checkout.
 
+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
+correct for incorrect case when assigning patterns to the sparse-checkout
+file.
+
 SEE ALSO
 --------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index a542d617a5..0de426384e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -336,6 +336,22 @@  static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 	}
 }
 
+static void sanitize_cone_input(struct strbuf *line)
+{
+	if (ignore_case) {
+		struct index_state *istate = the_repository->index;
+		const char *name = index_dir_matching_name(istate, line->buf, line->len);
+
+		if (name) {
+			strbuf_setlen(line, 0);
+			strbuf_addstr(line, name);
+		}
+	}
+
+	if (line->buf[0] != '/')
+		strbuf_insert(line, 0, "/", 1);
+}
+
 static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 {
 	strbuf_trim(line);
@@ -345,8 +361,7 @@  static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 	if (!line->len)
 		return;
 
-	if (line->buf[0] != '/')
-		strbuf_insert(line, 0, "/", 1);
+	sanitize_cone_input(line);
 
 	insert_recursive_pattern(pl, line);
 }
diff --git a/cache.h b/cache.h
index d3c89e7a53..a2d9d437f0 100644
--- a/cache.h
+++ b/cache.h
@@ -728,6 +728,7 @@  int repo_index_has_changes(struct repository *repo,
 int verify_path(const char *path, unsigned mode);
 int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
 int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 
diff --git a/name-hash.c b/name-hash.c
index ceb1d7bd6f..46898b6571 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -681,6 +681,16 @@  int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	return dir && dir->nr;
 }
 
+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
+{
+	struct dir_entry *dir;
+
+	lazy_init_name_hash(istate);
+	dir = find_dir_entry(istate, name, namelen);
+
+	return dir ? dir->name : NULL;
+}
+
 void adjust_dirname_case(struct index_state *istate, char *name)
 {
 	const char *startPtr = name;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index d5e2892526..d0ce48869f 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -304,4 +304,17 @@  test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
 	git -C dirty sparse-checkout disable
 '
 
+test_expect_success 'cone mode: set with core.ignoreCase=true' '
+	test_when_finished git -C repo config --unset core.ignoreCase &&
+	git -C repo sparse-checkout init --cone &&
+	git -C repo config core.ignoreCase true &&
+	git -C repo sparse-checkout set Folder1 &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/
+		/folder1/
+	EOF
+	test_cmp expect repo/.git/info/sparse-checkout
+'
+
 test_done