Message ID | 20240509161110.12121-1-tboegi@web.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2,1/1] macOS: ls-files path fails if path of workdir is NFD | expand |
tboegi@web.de writes: > From: Torsten Bögershausen <tboegi@web.de> > > Under macOS, `git ls-files path` does not work (gives an error) > if the absolute 'path' contains characters in NFD (decomposed). > ... > > Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp> > Signed-off-by: Torsten Bögershausen <tboegi@web.de> > --- Looks good. I've queued with a slight rewording to the proposed log message, and a bit of extra quoting in the test. Any string that contains "$aumlcdiar" are enclosed in a pair of double-quotes in the script, so not just the one given to ls-files, other two references to it are also quoted now. Thanks. 1: a00cec23cf ! 1: ee6ba4053d macOS: ls-files path fails if path of workdir is NFD @@ Commit message In the 'fatal:' error message, there are three ü; the 1st and 2nd are in NFC, the 3rd is in NFD. - This commit adds a test case that follows the bug report, - with the simplification that the 'ü' is replaced by an 'ä', - which is already used as NFD and NFC in t0050. + Add a test case that follows the bug report, with the simplification + that the 'ü' is replaced by an 'ä', which is already used as NFD and + NFC in t0050. - The solution is to precompose the result of getcwd(), if needed. + Precompose the result of getcwd(), if needed, just like all other + paths we use internally. That way, paths comparisons are all done + in NFC and we would correctly notice that the early part of the + path given as an absolute path matches the current directory. One possible implementation would be to re-define getcwd() similar - to opendir(), readdir() and closedir(). - Since there is already a strbuf wrapper around getcwd(), and only this - wrapper is used inside the whole codebase, equip strbuf_getcwd() with - a call to the newly created function precompose_strbuf_if_needed(). + to opendir(), readdir() and closedir(), but since there is already a + strbuf wrapper around getcwd(), and only this wrapper is used inside + the whole codebase, equip strbuf_getcwd() with a call to the newly + created function precompose_strbuf_if_needed(). + Note that precompose_strbuf_if_needed() is a function under macOS, and is a "no-op" on all other systems. @@ t/t0050-filesystem.sh: test_expect_success CASE_INSENSITIVE_FS 'checkout with no +test_expect_success 'git ls-files under NFD' ' + ( -+ mkdir -p somewhere/$aumlcdiar && ++ mkdir -p "somewhere/$aumlcdiar" && + mypwd=$PWD && -+ cd somewhere/$aumlcdiar && ++ cd "somewhere/$aumlcdiar" && + git init && -+ git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err && ++ git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err && + >expected && + test_cmp expected err + )
Sorry for not responding quickly. Thank you for the patch, but it seems the problem still remains. Although % git ls-files NFD (apparently) works, % git ls-files NFC still gives the error (if core.precomposeunicode is not set in global config). The following is some info I got (hope it is correct and useful), but I have no idea how to fix the problem. precompose_string_if_needed() works only if: precomposed_unicode is already set to 1, or git_config_get_bool("core.precomposeunicode") sets it to 1. But git_config_get_bool() reads the file .git/config only if: the_repository->commondir is already set to ".git". Back trace when the strbuf_getcwd() is called for the 3rd time is (frame #4 is set_git_work_tree()): * frame #0: git`strbuf_getcwd(sb=0x00007ff7bfeff0a8) at strbuf.c:588:20 frame #1: git`strbuf_realpath_1(resolved=0x00007ff7bfeff0a8, path=".", flags=2) at abspath.c:101:7 frame #2: git`strbuf_realpath(resolved=0x00007ff7bfeff0a8, path=".", die_on_error=1) at abspath.c:219:9 frame #3: git`real_pathdup(path=".", die_on_error=1) at abspath.c:240:6 frame #4: git`repo_set_worktree(repo=0x000000010044eb98, path=".") at repository.c:145:19 frame #5: git`set_git_work_tree(new_work_tree=".") at environment.c:278:2 frame #6: git`setup_discovered_git_dir(gitdir=".git", cwd=0x0000000100435238, offset=16, repo_fmt=0x00007ff7bfeff1d8, nongit_ok=0x0000000000000000) at setup.c:1119:2 frame #7: git`setup_git_directory_gently(nongit_ok=0x0000000000000000) at setup.c:1606:12 frame #8: git`setup_git_directory at setup.c:1815:9 frame #9: git`run_builtin(p=0x0000000100424d58, argc=2, argv=0x00007ff7bfeff6d8) at git.c:448:12 frame #10: git`handle_builtin(argc=2, argv=0x00007ff7bfeff6d8) at git.c:729:3 frame #11: git`run_argv(argcp=0x00007ff7bfeff54c, argv=0x00007ff7bfeff540) at git.c:793:4 frame #12: git`cmd_main(argc=2, argv=0x00007ff7bfeff6d8) at git.c:928:19 frame #13: git`main(argc=3, argv=0x00007ff7bfeff6d0) at common-main.c:62:11 At this point, precomposed_unicode is still -1 and the_repository->commondir is still NULL. This means strbuf_getcwd() retuns NFD, and the_repository->worktree is set to NFD. Moreover, precompose_string_if_needed() calls git_config_get_bool("core.precomposeunicode"), and this function indirecly sets the_repository->config->hash_initialized = 1 Later setup_git_directory_gently() (frame #7) calls setup_git_env() --> repo_set_gitdir() --> repo_set_commondir() and the_repository->commondir is now set to ".git". Then run_builtin() (frame #10) calls precompose_argv_prefix() --> precompose_string_if_needed(). Here we have precomposed_unicode = -1 the_repository->config->hash_initialized = 1 This means git_config_check_init() does not read .git/config (does not call repo_read_config()) even if the_repository->commondir is set to ".git", and precomposed_unicode is not set to 1. So the NFD in argv is not converted to NFC, and % git ls-files NFD apparently works.
On Sun, May 19, 2024 at 04:03:03PM +0900, Jun. T wrote: > Sorry for not responding quickly. > > Thank you for the patch, but it seems the problem still remains. > > Although > % git ls-files NFD > (apparently) works, > % git ls-files NFC > still gives the error > (if core.precomposeunicode is not set in global config). > > The following is some info I got (hope it is correct and useful), > but I have no idea how to fix the problem. > > precompose_string_if_needed() works only if: > precomposed_unicode is already set to 1, or > git_config_get_bool("core.precomposeunicode") sets it to 1. > > But git_config_get_bool() reads the file .git/config only if: > the_repository->commondir is already set to ".git". > > Back trace when the strbuf_getcwd() is called for the > 3rd time is (frame #4 is set_git_work_tree()): > > * frame #0: git`strbuf_getcwd(sb=0x00007ff7bfeff0a8) at strbuf.c:588:20 > frame #1: git`strbuf_realpath_1(resolved=0x00007ff7bfeff0a8, path=".", flags=2) at abspath.c:101:7 > frame #2: git`strbuf_realpath(resolved=0x00007ff7bfeff0a8, path=".", die_on_error=1) at abspath.c:219:9 > frame #3: git`real_pathdup(path=".", die_on_error=1) at abspath.c:240:6 > frame #4: git`repo_set_worktree(repo=0x000000010044eb98, path=".") at repository.c:145:19 > frame #5: git`set_git_work_tree(new_work_tree=".") at environment.c:278:2 > frame #6: git`setup_discovered_git_dir(gitdir=".git", cwd=0x0000000100435238, offset=16, repo_fmt=0x00007ff7bfeff1d8, nongit_ok=0x0000000000000000) at setup.c:1119:2 > frame #7: git`setup_git_directory_gently(nongit_ok=0x0000000000000000) at setup.c:1606:12 > frame #8: git`setup_git_directory at setup.c:1815:9 > frame #9: git`run_builtin(p=0x0000000100424d58, argc=2, argv=0x00007ff7bfeff6d8) at git.c:448:12 > frame #10: git`handle_builtin(argc=2, argv=0x00007ff7bfeff6d8) at git.c:729:3 > frame #11: git`run_argv(argcp=0x00007ff7bfeff54c, argv=0x00007ff7bfeff540) at git.c:793:4 > frame #12: git`cmd_main(argc=2, argv=0x00007ff7bfeff6d8) at git.c:928:19 > frame #13: git`main(argc=3, argv=0x00007ff7bfeff6d0) at common-main.c:62:11 > > At this point, precomposed_unicode is still -1 and > the_repository->commondir is still NULL. > This means strbuf_getcwd() retuns NFD, and the_repository->worktree is set to NFD. > > Moreover, precompose_string_if_needed() calls > git_config_get_bool("core.precomposeunicode"), and > this function indirecly sets > the_repository->config->hash_initialized = 1 > > Later setup_git_directory_gently() (frame #7) calls > setup_git_env() --> repo_set_gitdir() --> repo_set_commondir() > and the_repository->commondir is now set to ".git". > > Then run_builtin() (frame #10) calls precompose_argv_prefix() > --> precompose_string_if_needed(). Here we have > precomposed_unicode = -1 > the_repository->config->hash_initialized = 1 > This means git_config_check_init() does not read > .git/config (does not call repo_read_config()) even if > the_repository->commondir is set to ".git", > and precomposed_unicode is not set to 1. > So the NFD in argv is not converted to NFC, > and > % git ls-files NFD > apparently works. > Thanks so much for the detailed analysis, that is appreciated. To be honest, I have set core.precomposeunicode true globally, core.quotepath=false, together with settings for pull.rebase and init.defaultbranch Because of that, the new testcase passed, and the patch was in improvement. However, it would be nice to have the case fixed as well, where core.precomposeunicode is not set at a global level. I am happy to provide a patch (a new testcase is already there), but for a change in the codebase I would need some help from an expert, to get the config-reading right both for hash_initialized (that is may be not about the hash-algorithn at all ?) and precompose.
Torsten Bögershausen <tboegi@web.de> writes: > Thanks so much for the detailed analysis, that is appreciated. > To be honest, I have set core.precomposeunicode true globally, > ... > ... > I am happy to provide a patch (a new testcase is already there), > but for a change in the codebase I would need some help from an expert, > to get the config-reading right both for hash_initialized > (that is may be not about the hash-algorithn at all ?) > and precompose. It does not sound like an issue with the hash algorithm. Why isn't the local config (presumably set with auto-probing when the repository was initialized) being read? Are we reading the core.precomposeunicode in some funny ways? Is per-worktree config involved that is trying to read from one but the auto-probing code is setting it to another, or something silly like that? Thanks for working well together, both of you.
On Mon, May 20, 2024 at 11:08:43AM -0700, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > > > Thanks so much for the detailed analysis, that is appreciated. > > To be honest, I have set core.precomposeunicode true globally, > > ... > > ... > > I am happy to provide a patch (a new testcase is already there), > > but for a change in the codebase I would need some help from an expert, > > to get the config-reading right both for hash_initialized > > (that is may be not about the hash-algorithn at all ?) > > and precompose. > > It does not sound like an issue with the hash algorithm. > > Why isn't the local config (presumably set with auto-probing when > the repository was initialized) being read? I have the same question, kind of. The callstack provided does give some hints, but I was lost... > Are we reading the core.precomposeunicode in some funny ways? Not what I am aware of. But the order of initialization, when a git command is executed, some need a worktree or .git directory, some not, is somewhat beyond my yet expertise. >Is per-worktree config involved that is trying to read from one but the auto-probing code > is setting it to another, or something silly like that? No, not to my understanding. We have a "local" config (per repo), that is there and has the right value. However, for some reason we miss to read the repo-config here, when argv[] needs to be precomposed. Reading the global config does work, but if core.precomposeunicode is not set here, but set in the repo-config, we miss that. > > Thanks for working well together, both of you. Yes, please let someone join the force, reading the callstack(s) and try to find what is wrong here. I will try to do the same.
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 0bd5c24250..5a7c90c90d 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -94,6 +94,16 @@ const char *precompose_string_if_needed(const char *in) return in; } +void precompose_strbuf_if_needed(struct strbuf *sb) +{ + char *buf_prec = (char *)precompose_string_if_needed(sb->buf); + if (buf_prec != sb->buf) { + size_t buf_prec_len = strlen(buf_prec); + free(strbuf_detach(sb, NULL)); + strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1); + } +} + const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix) { int i = 0; diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h index fea06cf28a..7c3cfcadb0 100644 --- a/compat/precompose_utf8.h +++ b/compat/precompose_utf8.h @@ -30,6 +30,7 @@ typedef struct { const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix); const char *precompose_string_if_needed(const char *in); +void precompose_strbuf_if_needed(struct strbuf *sb); void probe_utf8_pathname_composition(void); PREC_DIR *precompose_utf8_opendir(const char *dirname); diff --git a/git-compat-util.h b/git-compat-util.h index ca7678a379..892e1f9067 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -344,6 +344,7 @@ static inline const char *precompose_string_if_needed(const char *in) return in; } +#define precompose_strbuf_if_needed(a) #define probe_utf8_pathname_composition() #endif diff --git a/strbuf.c b/strbuf.c index 0d929e4e19..d5b4b3903a 100644 --- a/strbuf.c +++ b/strbuf.c @@ -592,6 +592,7 @@ int strbuf_getcwd(struct strbuf *sb) strbuf_grow(sb, guessed_len); if (getcwd(sb->buf, sb->alloc)) { strbuf_setlen(sb, strlen(sb->buf)); + precompose_strbuf_if_needed(sb); return 0; } diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh index 325eb1c3cd..a24ec866d1 100755 --- a/t/t0050-filesystem.sh +++ b/t/t0050-filesystem.sh @@ -156,4 +156,15 @@ test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case in ) ' +test_expect_success 'git ls-files under NFD' ' + ( + mkdir -p somewhere/$aumlcdiar && + mypwd=$PWD && + cd somewhere/$aumlcdiar && + git init && + git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err && + >expected && + test_cmp expected err + ) +' test_done