diff mbox series

[v2,1/1] macOS: ls-files path fails if path of workdir is NFD

Message ID 20240509161110.12121-1-tboegi@web.de (mailing list archive)
State Superseded
Headers show
Series [v2,1/1] macOS: ls-files path fails if path of workdir is NFD | expand

Commit Message

Torsten Bögershausen May 9, 2024, 4:11 p.m. UTC
From: Torsten Bögershausen <tboegi@web.de>

Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:

$ cd somewhere          # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88'    # ü in NFD
$ cd ü                  # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88'   # NFD
  fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc'    # NFC
(the same error as above)

In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.

This commit adds a test case that follows the bug report,
with the simplification that the 'ü' is replaced by an 'ä',
which is already used as NFD and NFC in t0050.

The solution is to precompose the result of getcwd(), if needed.

One possible implementation would be to re-define getcwd() similar
to opendir(), readdir() and closedir().
Since there is already a strbuf wrapper around getcwd(), and only this
wrapper is used inside the whole codebase, equip strbuf_getcwd() with
a call to the newly created function precompose_strbuf_if_needed().
Note that precompose_strbuf_if_needed() is a function under macOS,
and is a "no-op" on all other systems.

Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 compat/precompose_utf8.c | 10 ++++++++++
 compat/precompose_utf8.h |  1 +
 git-compat-util.h        |  1 +
 strbuf.c                 |  1 +
 t/t0050-filesystem.sh    | 11 +++++++++++
 5 files changed, 24 insertions(+)

Thanks everybody for the review, which makes V2 much better.


--
2.41.0.394.ge43f4fd0bd

Comments

Junio C Hamano May 9, 2024, 4:37 p.m. UTC | #1
tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> Under macOS, `git ls-files path` does not work (gives an error)
> if the absolute 'path' contains characters in NFD (decomposed).
> ...
>
> Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---

Looks good.  I've queued with a slight rewording to the proposed log
message, and a bit of extra quoting in the test.  Any string that
contains "$aumlcdiar" are enclosed in a pair of double-quotes in the
script, so not just the one given to ls-files, other two references
to it are also quoted now.

Thanks.

1:  a00cec23cf ! 1:  ee6ba4053d macOS: ls-files path fails if path of workdir is NFD
    @@ Commit message
         In the 'fatal:' error message, there are three ü;
         the 1st and 2nd are in NFC, the 3rd is in NFD.
     
    -    This commit adds a test case that follows the bug report,
    -    with the simplification that the 'ü' is replaced by an 'ä',
    -    which is already used as NFD and NFC in t0050.
    +    Add a test case that follows the bug report, with the simplification
    +    that the 'ü' is replaced by an 'ä', which is already used as NFD and
    +    NFC in t0050.
     
    -    The solution is to precompose the result of getcwd(), if needed.
    +    Precompose the result of getcwd(), if needed, just like all other
    +    paths we use internally.  That way, paths comparisons are all done
    +    in NFC and we would correctly notice that the early part of the
    +    path given as an absolute path matches the current directory.
     
         One possible implementation would be to re-define getcwd() similar
    -    to opendir(), readdir() and closedir().
    -    Since there is already a strbuf wrapper around getcwd(), and only this
    -    wrapper is used inside the whole codebase, equip strbuf_getcwd() with
    -    a call to the newly created function precompose_strbuf_if_needed().
    +    to opendir(), readdir() and closedir(), but since there is already a
    +    strbuf wrapper around getcwd(), and only this wrapper is used inside
    +    the whole codebase, equip strbuf_getcwd() with a call to the newly
    +    created function precompose_strbuf_if_needed().
    +
         Note that precompose_strbuf_if_needed() is a function under macOS,
         and is a "no-op" on all other systems.
     
    @@ t/t0050-filesystem.sh: test_expect_success CASE_INSENSITIVE_FS 'checkout with no
      
     +test_expect_success 'git ls-files under NFD' '
     +	(
    -+		mkdir -p somewhere/$aumlcdiar &&
    ++		mkdir -p "somewhere/$aumlcdiar" &&
     +		mypwd=$PWD &&
    -+		cd somewhere/$aumlcdiar &&
    ++		cd "somewhere/$aumlcdiar" &&
     +		git init &&
    -+		git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar"  2>err &&
    ++		git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err &&
     +		>expected &&
     +		test_cmp expected err
     +	)
Jun. T May 19, 2024, 7:03 a.m. UTC | #2
Sorry for not responding quickly.

Thank you for the patch, but it seems the problem still remains.

Although
% git ls-files NFD
(apparently) works,
% git ls-files NFC
still gives the error
(if core.precomposeunicode is not set in global config).

The following is some info I got (hope it is correct and useful),
but I have no idea how to fix the problem.

precompose_string_if_needed() works only if:
  precomposed_unicode is already set to 1, or
  git_config_get_bool("core.precomposeunicode") sets it to 1.

But git_config_get_bool() reads the file .git/config only if:
  the_repository->commondir is already set to ".git".

Back trace when the strbuf_getcwd() is called for the
3rd time is (frame #4 is set_git_work_tree()):

  * frame #0: git`strbuf_getcwd(sb=0x00007ff7bfeff0a8) at strbuf.c:588:20
    frame #1: git`strbuf_realpath_1(resolved=0x00007ff7bfeff0a8, path=".", flags=2) at abspath.c:101:7
    frame #2: git`strbuf_realpath(resolved=0x00007ff7bfeff0a8, path=".", die_on_error=1) at abspath.c:219:9
    frame #3: git`real_pathdup(path=".", die_on_error=1) at abspath.c:240:6
    frame #4: git`repo_set_worktree(repo=0x000000010044eb98, path=".") at repository.c:145:19
    frame #5: git`set_git_work_tree(new_work_tree=".") at environment.c:278:2
    frame #6: git`setup_discovered_git_dir(gitdir=".git", cwd=0x0000000100435238, offset=16, repo_fmt=0x00007ff7bfeff1d8, nongit_ok=0x0000000000000000) at setup.c:1119:2
    frame #7: git`setup_git_directory_gently(nongit_ok=0x0000000000000000) at setup.c:1606:12
    frame #8: git`setup_git_directory at setup.c:1815:9
    frame #9: git`run_builtin(p=0x0000000100424d58, argc=2, argv=0x00007ff7bfeff6d8) at git.c:448:12
    frame #10: git`handle_builtin(argc=2, argv=0x00007ff7bfeff6d8) at git.c:729:3        
    frame #11: git`run_argv(argcp=0x00007ff7bfeff54c, argv=0x00007ff7bfeff540) at git.c:793:4                                            
    frame #12: git`cmd_main(argc=2, argv=0x00007ff7bfeff6d8) at git.c:928:19                                   
    frame #13: git`main(argc=3, argv=0x00007ff7bfeff6d0) at common-main.c:62:11

At this point, precomposed_unicode is still -1 and
the_repository->commondir is still NULL.
This means strbuf_getcwd() retuns NFD, and                                      the_repository->worktree is set to NFD.
                
Moreover, precompose_string_if_needed() calls    
git_config_get_bool("core.precomposeunicode"), and
this function indirecly sets  
the_repository->config->hash_initialized = 1

Later setup_git_directory_gently() (frame #7) calls
setup_git_env() --> repo_set_gitdir() --> repo_set_commondir()
and the_repository->commondir is now set to ".git".

Then run_builtin() (frame #10) calls precompose_argv_prefix()
 --> precompose_string_if_needed(). Here we have
  precomposed_unicode = -1
  the_repository->config->hash_initialized = 1
This means git_config_check_init() does not read
.git/config (does not call repo_read_config()) even if
the_repository->commondir is set to ".git",
and precomposed_unicode is not set to 1.
So the NFD in argv is not converted to NFC,
and
% git ls-files NFD
apparently works.
Torsten Bögershausen May 20, 2024, 4:06 p.m. UTC | #3
On Sun, May 19, 2024 at 04:03:03PM +0900, Jun. T wrote:
> Sorry for not responding quickly.
>
> Thank you for the patch, but it seems the problem still remains.
>
> Although
> % git ls-files NFD
> (apparently) works,
> % git ls-files NFC
> still gives the error
> (if core.precomposeunicode is not set in global config).
>
> The following is some info I got (hope it is correct and useful),
> but I have no idea how to fix the problem.
>
> precompose_string_if_needed() works only if:
>   precomposed_unicode is already set to 1, or
>   git_config_get_bool("core.precomposeunicode") sets it to 1.
>
> But git_config_get_bool() reads the file .git/config only if:
>   the_repository->commondir is already set to ".git".
>
> Back trace when the strbuf_getcwd() is called for the
> 3rd time is (frame #4 is set_git_work_tree()):
>
>   * frame #0: git`strbuf_getcwd(sb=0x00007ff7bfeff0a8) at strbuf.c:588:20
>     frame #1: git`strbuf_realpath_1(resolved=0x00007ff7bfeff0a8, path=".", flags=2) at abspath.c:101:7
>     frame #2: git`strbuf_realpath(resolved=0x00007ff7bfeff0a8, path=".", die_on_error=1) at abspath.c:219:9
>     frame #3: git`real_pathdup(path=".", die_on_error=1) at abspath.c:240:6
>     frame #4: git`repo_set_worktree(repo=0x000000010044eb98, path=".") at repository.c:145:19
>     frame #5: git`set_git_work_tree(new_work_tree=".") at environment.c:278:2
>     frame #6: git`setup_discovered_git_dir(gitdir=".git", cwd=0x0000000100435238, offset=16, repo_fmt=0x00007ff7bfeff1d8, nongit_ok=0x0000000000000000) at setup.c:1119:2
>     frame #7: git`setup_git_directory_gently(nongit_ok=0x0000000000000000) at setup.c:1606:12
>     frame #8: git`setup_git_directory at setup.c:1815:9
>     frame #9: git`run_builtin(p=0x0000000100424d58, argc=2, argv=0x00007ff7bfeff6d8) at git.c:448:12
>     frame #10: git`handle_builtin(argc=2, argv=0x00007ff7bfeff6d8) at git.c:729:3
>     frame #11: git`run_argv(argcp=0x00007ff7bfeff54c, argv=0x00007ff7bfeff540) at git.c:793:4
>     frame #12: git`cmd_main(argc=2, argv=0x00007ff7bfeff6d8) at git.c:928:19
>     frame #13: git`main(argc=3, argv=0x00007ff7bfeff6d0) at common-main.c:62:11
>
> At this point, precomposed_unicode is still -1 and
> the_repository->commondir is still NULL.
> This means strbuf_getcwd() retuns NFD, and                                      the_repository->worktree is set to NFD.
>
> Moreover, precompose_string_if_needed() calls
> git_config_get_bool("core.precomposeunicode"), and
> this function indirecly sets
> the_repository->config->hash_initialized = 1
>
> Later setup_git_directory_gently() (frame #7) calls
> setup_git_env() --> repo_set_gitdir() --> repo_set_commondir()
> and the_repository->commondir is now set to ".git".
>
> Then run_builtin() (frame #10) calls precompose_argv_prefix()
>  --> precompose_string_if_needed(). Here we have
>   precomposed_unicode = -1
>   the_repository->config->hash_initialized = 1
> This means git_config_check_init() does not read
> .git/config (does not call repo_read_config()) even if
> the_repository->commondir is set to ".git",
> and precomposed_unicode is not set to 1.
> So the NFD in argv is not converted to NFC,
> and
> % git ls-files NFD
> apparently works.
>

Thanks so much for the detailed analysis, that is appreciated.
To be honest, I have set core.precomposeunicode true globally,
core.quotepath=false, together with settings for
pull.rebase and init.defaultbranch

Because of that, the new testcase passed, and the patch was in
improvement.
However, it would be nice to have the case fixed as well,
where core.precomposeunicode is not set at a global level.

I am happy to provide a patch (a new testcase is already there),
but for a change in the codebase I would need some help from an expert,
to get the config-reading right both for hash_initialized
(that is may be not about the hash-algorithn at all ?)
and precompose.
Junio C Hamano May 20, 2024, 6:08 p.m. UTC | #4
Torsten Bögershausen <tboegi@web.de> writes:

> Thanks so much for the detailed analysis, that is appreciated.
> To be honest, I have set core.precomposeunicode true globally,
> ...
> ...
> I am happy to provide a patch (a new testcase is already there),
> but for a change in the codebase I would need some help from an expert,
> to get the config-reading right both for hash_initialized
> (that is may be not about the hash-algorithn at all ?)
> and precompose.

It does not sound like an issue with the hash algorithm.

Why isn't the local config (presumably set with auto-probing when
the repository was initialized) being read?  Are we reading the
core.precomposeunicode in some funny ways?  Is per-worktree config
involved that is trying to read from one but the auto-probing code
is setting it to another, or something silly like that?

Thanks for working well together, both of you.
Torsten Bögershausen May 20, 2024, 7:21 p.m. UTC | #5
On Mon, May 20, 2024 at 11:08:43AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
> > Thanks so much for the detailed analysis, that is appreciated.
> > To be honest, I have set core.precomposeunicode true globally,
> > ...
> > ...
> > I am happy to provide a patch (a new testcase is already there),
> > but for a change in the codebase I would need some help from an expert,
> > to get the config-reading right both for hash_initialized
> > (that is may be not about the hash-algorithn at all ?)
> > and precompose.
>
> It does not sound like an issue with the hash algorithm.
>
> Why isn't the local config (presumably set with auto-probing when
> the repository was initialized) being read?

I have the same question, kind of. The callstack provided does give some
hints, but I was lost...


> Are we reading the core.precomposeunicode in some funny ways?
Not what I am aware of. But the order of initialization, when a git command
is executed, some need a worktree or .git directory, some not, is somewhat
beyond my yet expertise.

>Is per-worktree config  involved that is trying to read from one but the auto-probing code
> is setting it to another, or something silly like that?
No, not to my understanding.
We have a "local" config (per repo), that is there and has the right value.
However, for some reason we miss to read the repo-config here,
when argv[] needs to be precomposed.
Reading the global config does work, but if core.precomposeunicode is
not set here, but set in the repo-config, we miss that.

>
> Thanks for working well together, both of you.
Yes, please let someone join the force, reading the callstack(s) and
try to find what is wrong here. I will try to do the same.
diff mbox series

Patch

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 0bd5c24250..5a7c90c90d 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -94,6 +94,16 @@  const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+void precompose_strbuf_if_needed(struct strbuf *sb)
+{
+	char *buf_prec = (char *)precompose_string_if_needed(sb->buf);
+	if (buf_prec != sb->buf) {
+		size_t buf_prec_len = strlen(buf_prec);
+		free(strbuf_detach(sb, NULL));
+		strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1);
+	}
+}
+
 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
 	int i = 0;
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index fea06cf28a..7c3cfcadb0 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -30,6 +30,7 @@  typedef struct {

 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 const char *precompose_string_if_needed(const char *in);
+void precompose_strbuf_if_needed(struct strbuf *sb);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index ca7678a379..892e1f9067 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -344,6 +344,7 @@  static inline const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+#define precompose_strbuf_if_needed(a)
 #define probe_utf8_pathname_composition()
 #endif

diff --git a/strbuf.c b/strbuf.c
index 0d929e4e19..d5b4b3903a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -592,6 +592,7 @@  int strbuf_getcwd(struct strbuf *sb)
 		strbuf_grow(sb, guessed_len);
 		if (getcwd(sb->buf, sb->alloc)) {
 			strbuf_setlen(sb, strlen(sb->buf));
+			precompose_strbuf_if_needed(sb);
 			return 0;
 		}

diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 325eb1c3cd..a24ec866d1 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -156,4 +156,15 @@  test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case in
 	)
 '

+test_expect_success 'git ls-files under NFD' '
+	(
+		mkdir -p somewhere/$aumlcdiar &&
+		mypwd=$PWD &&
+		cd somewhere/$aumlcdiar &&
+		git init &&
+		git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar"  2>err &&
+		>expected &&
+		test_cmp expected err
+	)
+'
 test_done