diff mbox series

[v3,1/1] macOS: ls-files path fails if path of workdir is NFD

Message ID 20240521141452.26210-1-tboegi@web.de (mailing list archive)
State Superseded
Headers show
Series [v3,1/1] macOS: ls-files path fails if path of workdir is NFD | expand

Commit Message

Torsten Bögershausen May 21, 2024, 2:14 p.m. UTC
From: Torsten Bögershausen <tboegi@web.de>

Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:

$ cd somewhere          # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88'    # ü in NFD
$ cd ü                  # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88'   # NFD
  fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc'    # NFC
(the same error as above)

In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.

Add a test case that follows the bug report, with the simplification
that the 'ü' is replaced by an 'ä', which is already used as NFD and
NFC in t0050.

Precompose the result of getcwd(), if needed, just like all other
paths we use internally.  That way, paths comparisons are all done
in NFC and we would correctly notice that the early part of the
path given as an absolute path matches the current directory.

One possible implementation would be to re-define getcwd() similar
to opendir(), readdir() and closedir(), but since there is already a
strbuf wrapper around getcwd(), and only this wrapper is used inside
the whole codebase, equip strbuf_getcwd() with a call to the newly
created function precompose_strbuf_if_needed().

Note that precompose_strbuf_if_needed() is a function under macOS,
and is a "no-op" on all other systems.

Add a missing call to precompose_string_if_needed() to this code
in setup.c :
`work_tree = precompose_string_if_needed(get_git_work_tree());`

Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 compat/precompose_utf8.c | 10 ++++++++++
 compat/precompose_utf8.h |  1 +
 git-compat-util.h        |  1 +
 setup.c                  |  2 +-
 strbuf.c                 |  1 +
 t/t0050-filesystem.sh    | 26 ++++++++++++++++++++++++++
 6 files changed, 40 insertions(+), 1 deletion(-)

--
2.41.0.394.ge43f4fd0bd

Comments

Junio C Hamano May 21, 2024, 5:50 p.m. UTC | #1
tboegi@web.de writes:

> Add a missing call to precompose_string_if_needed() to this code
> in setup.c :
> `work_tree = precompose_string_if_needed(get_git_work_tree());`

This is new in this iteration, I presume?  The old one did the
precompose only in strbuf_getcwd().  We now precompose also the
result of get_git_work_tree().

Two questions.

 * It is unclear to me why this makes a difference only when the
   precompuse configuration is set only in the local configuration.

 * As the leading part of the value placed in get_git_work_tree()
   comes from strbuf_getcwd() called by abspath.c:real_pathdup()
   that is called by repository.c:repo_set_worktree(), doesn't this
   potentially call precompse twice on the already precomposed early
   parth of the get_git_work_tree() result?

I suspect that with the arrangement in your test, the argument given
to set_git_work_tree() from setup.c:setup_discovered_git_dir() is
always ".", and that dot is passed to repository.c:repo_set_worktree()
which calls abspath.c:real_pathdup() to turn it into an absolute,
where it has a call to strbuf_getcwd().

So with the provided test, I suspect there is no difference between
the previous and this iteration in behaviour, as what is fed to
precompose should be identical?

What this iteration does differently is that inside real_pathdup(),
if the string given to repo_set_worktree() is more than the trivial
".", it is appended to the result of strbuf_getcwd(), and the new
code precomposes after such appending in real_pathdup() happens.  It
will convert the leading part twice [*] and more importantly the
appended part is now converted, unlike the previous one?

	Side note: [*] hopefully precompose is idempotent?  Relying
	on that property somewhat feels yucky, though.

Puzzled...

Will replace and queue, but I couldn't figure out what is going on
with the help by the proposed log message, so...

Thanks.
Torsten Bögershausen May 21, 2024, 8:57 p.m. UTC | #2
On Tue, May 21, 2024 at 10:50:25AM -0700, Junio C Hamano wrote:
> tboegi@web.de writes:
>
> > Add a missing call to precompose_string_if_needed() to this code
> > in setup.c :
> > `work_tree = precompose_string_if_needed(get_git_work_tree());`
>
> This is new in this iteration, I presume?  The old one did the
> precompose only in strbuf_getcwd().  We now precompose also the
> result of get_git_work_tree().
>
> Two questions.
>
>  * It is unclear to me why this makes a difference only when the
>    precompuse configuration is set only in the local configuration.
>
>  * As the leading part of the value placed in get_git_work_tree()
>    comes from strbuf_getcwd() called by abspath.c:real_pathdup()
>    that is called by repository.c:repo_set_worktree(), doesn't this
>    potentially call precompse twice on the already precomposed early
>    parth of the get_git_work_tree() result?
>
> I suspect that with the arrangement in your test, the argument given
> to set_git_work_tree() from setup.c:setup_discovered_git_dir() is
> always ".", and that dot is passed to repository.c:repo_set_worktree()
> which calls abspath.c:real_pathdup() to turn it into an absolute,
> where it has a call to strbuf_getcwd().
>
> So with the provided test, I suspect there is no difference between
> the previous and this iteration in behaviour, as what is fed to
> precompose should be identical?
>
> What this iteration does differently is that inside real_pathdup(),
> if the string given to repo_set_worktree() is more than the trivial
> ".", it is appended to the result of strbuf_getcwd(), and the new
> code precomposes after such appending in real_pathdup() happens.  It
> will convert the leading part twice [*] and more importantly the
> appended part is now converted, unlike the previous one?
>
> 	Side note: [*] hopefully precompose is idempotent?  Relying
> 	on that property somewhat feels yucky, though.
>
> Puzzled...
>
> Will replace and queue, but I couldn't figure out what is going on
> with the help by the proposed log message, so...

Acknowledge.
The commit message deserves an update, for sure.
My suggestion would be too keep it in seen, until I have managed
to write a better commit message.
At the same time, I would ask Jun-ichi Takimoto to do a re-test
of the new version.
Junio C Hamano May 21, 2024, 10:15 p.m. UTC | #3
Torsten Bögershausen <tboegi@web.de> writes:

> The commit message deserves an update, for sure.
> My suggestion would be too keep it in seen, until I have managed
> to write a better commit message.
> At the same time, I would ask Jun-ichi Takimoto to do a re-test
> of the new version.

Yup, that sounds extremely sensible.  Thanks for working on this.
Jun. T May 23, 2024, 3:33 p.m. UTC | #4
Unfortunately v3 still doesn't work.
'git ls-files NFD' works but 'git ls-files NFC' does not.

I think it better to test both "ls-config NFD" and "ls-config NFC".

The reason of the failure seems to be the same as v2, but
I describe it here in more detail (or too detailed).

(one of?) The problem is the_repository->config->hash_initialized
is set to 1 before the_repository->commondir is set to ".git".
Due to this, .git/config is never read, and precomposed_unicode
is never set to 1 (remains -1).

run_builtin() {
    setup_git_directory() {
        strbuf_getcwd() {   # setup.c:1542
            precompose_{strbuf,string}_if_needed() {
                # precomposed_unicode is still -1
                git_congig_get_bool("core.precomposeunicode") {
                    git_config_check_init() {
                        repo_read_config() {
                            git_config_init() {
                                # !!!
                                the_repository->config->hash_initialized=1
                                # !!!
                            }
                            # does not read .git/config since
                            # the_repository->commondir is still NULL
                        }
                    }
                }
                returns without converting to NFC
            }
            returns cwd in NFD
        }

        setup_discovered_git_dir() {
            set_git_work_tree(".") {
                repo_set_worktree() {
                    # this function indirectly calls strbuf_getcwd()
                    # --> precompose_{strbuf,string}_if_needed() -->
                    # {git,repo}_config_get_bool("core.precomposeunicode"),
                    # but does not try to read .git/config since
                    # the_repository->config->hash_initialized
                    # is already set to 1 above. And it will not read
                    # .git/config even if hash_initialized is 0
                    # since the_repository->commondir is still NULL.

                    the_repository->worktree = NFD
                }
            }
        }

        setup_git_env() {
            repo_setup_gitdir() {
                repo_set_commondir() {
                    # finally commondir is set here
                    the_repository->commondir = ".git"
                }
            }
        }

    } // END setup_git_directory

    precompose_argv_prefix() {
        # since the_repository->config->hash_initialized is still 1
        # .git/config is not read and precomposed_unicode remains -1,
        # and argv (if in NFD) is not converted to NFC
    }

    cmd_ls_files() {
        parse_pathspec(.., argv /* may be in NFD, see above */) {
            init_path_spec_item() {
                prefix_path_gently() {
                    abspath_part_inside_repo() {
                        work_tree = precomose_string_if_needed(
                                        get_git_work_gree())
                            # get_git_work_tree() returns NFD, and
                            # precompose_string_if_needed() does not
                            # convert it to NFC since 
                            # the_repository->config->hash_initialized is 1
                        worktree = NFD

                        returns 0 for "ls-files NFD" since both argv
                        and work_tree are in NFD, but returns -1 for
                        "ls-files NFC" since argv is in NFC.
                    }
                    returns NULL for "ls-files NFC"
                }
                die() at pathspec.c:499 for "ls-files NFC"
            }
        }
    }
} END run_builtin

I don't know how to fix the problem, but I think it better to avoid
calling precompose_{strbuf,string}_if_needed() before commondir
is set to ".git" and .git/config is successfully read.

Or reset the_repository->config->hash_initialized at some point?
Torsten Bögershausen May 25, 2024, 8:01 p.m. UTC | #5
On Fri, May 24, 2024 at 12:33:08AM +0900, Jun. T wrote:
>
> Unfortunately v3 still doesn't work.
> 'git ls-files NFD' works but 'git ls-files NFC' does not.
>
> I think it better to test both "ls-config NFD" and "ls-config NFC".
>
> The reason of the failure seems to be the same as v2, but
> I describe it here in more detail (or too detailed).

Thanks for testing - I was fully convinced that the new test case
did cover all problems - but proved to be wrong.

[snip the nice analyses]

> I don't know how to fix the problem, but I think it better to avoid
> calling precompose_{strbuf,string}_if_needed() before commondir
> is set to ".git" and .git/config is successfully read.
>
> Or reset the_repository->config->hash_initialized at some point?
>

I think that I may be able to offer a v4 patch, which has better test cases.
From my understanding, the reading of the local/repo .git/config does
not work as we need it, since the .git/ directroy is determined after
the config has been read. And so it is never read.
Having said that, a patch relying on the global .gitconfig may still
be an improvement on it's own.
diff mbox series

Patch

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 0bd5c24250..5a7c90c90d 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -94,6 +94,16 @@  const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+void precompose_strbuf_if_needed(struct strbuf *sb)
+{
+	char *buf_prec = (char *)precompose_string_if_needed(sb->buf);
+	if (buf_prec != sb->buf) {
+		size_t buf_prec_len = strlen(buf_prec);
+		free(strbuf_detach(sb, NULL));
+		strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1);
+	}
+}
+
 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
 	int i = 0;
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index fea06cf28a..7c3cfcadb0 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -30,6 +30,7 @@  typedef struct {

 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 const char *precompose_string_if_needed(const char *in);
+void precompose_strbuf_if_needed(struct strbuf *sb);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index 3e7a59b5ff..8b63108f16 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -331,6 +331,7 @@  static inline const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+#define precompose_strbuf_if_needed(a)
 #define probe_utf8_pathname_composition()
 #endif

diff --git a/setup.c b/setup.c
index 2e607632db..61f61496ec 100644
--- a/setup.c
+++ b/setup.c
@@ -48,7 +48,7 @@  static int abspath_part_inside_repo(char *path)
 	size_t wtlen;
 	char *path0;
 	int off;
-	const char *work_tree = get_git_work_tree();
+	const char *work_tree = precompose_string_if_needed(get_git_work_tree());
 	struct strbuf realpath = STRBUF_INIT;

 	if (!work_tree)
diff --git a/strbuf.c b/strbuf.c
index 4c9ac6dc5e..b05581d8e7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -569,6 +569,7 @@  int strbuf_getcwd(struct strbuf *sb)
 		strbuf_grow(sb, guessed_len);
 		if (getcwd(sb->buf, sb->alloc)) {
 			strbuf_setlen(sb, strlen(sb->buf));
+			precompose_strbuf_if_needed(sb);
 			return 0;
 		}

diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 325eb1c3cd..5a9ee5be92 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -156,4 +156,30 @@  test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case in
 	)
 '

+test_expect_success 'git ls-files under NFD' '
+	(
+		mkdir -p "somewhere/$aumlcdiar" &&
+		mypwd=$PWD &&
+		cd "somewhere/$aumlcdiar" &&
+		git init &&
+		git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err &&
+		>expected &&
+		test_cmp expected err
+	)
+'
+
+# Re-do the same test. Note: global core.precomposeunicode is changed
+test_expect_success 'git ls-files under NFD. global precompose false' '
+	test_when_finished "git config --global --unset core.precomposeunicode" &&
+	(
+		mypwd=$PWD &&
+		cd "somewhere/$aumlcdiar" &&
+		git config --global core.precomposeunicode false &&
+		git config core.precomposeunicode true &&
+		git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err &&
+		>expected &&
+		test_cmp expected err
+	)
+'
+
 test_done