diff mbox series

[v1,2/2] strbuf_getcwd() needs precompse_strbuf_if_needed()

Message ID 20240507084431.19797-1-tboegi@web.de (mailing list archive)
State New
Headers show
Series [v1,1/2] t0050: ls-files path fails if path of workdir is NFD | expand

Commit Message

Torsten Bögershausen May 7, 2024, 8:44 a.m. UTC
From: Torsten Bögershausen <tboegi@web.de>

When running under macOs a call to strbuf_getcwd() may return
decomposed unicode.
This could make `git ls-files` fail, see previous commit of t0050

The solution is to precompose the result of getcwd() if needed.
One possible implementation would be to re-define getcwd() similar
to opendir(), readdir() and closedir().
Since there is already a strbuf wrapper around getcwd(), and only this
wrapper is used inside the whole codebase, equip strbuf_getcwd() with
a call to the newly created function precompse_strbuf_if_needed().
Note that precompse_strbuf_if_needed() is a function under macOs,
and is a "nop" on all other systems.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 compat/precompose_utf8.c | 11 +++++++++++
 compat/precompose_utf8.h |  1 +
 git-compat-util.h        |  2 ++
 strbuf.c                 |  1 +
 4 files changed, 15 insertions(+)

--
2.41.0.394.ge43f4fd0bd

Comments

Junio C Hamano May 7, 2024, 5:22 p.m. UTC | #1
tboegi@web.de writes:

> +void precompse_strbuf_if_needed(struct strbuf *sb)
> +{
> +	char *buf_prec = (char *)precompose_string_if_needed(sb->buf);
> +	if (buf_prec != sb->buf) {

Cute.  This matches with the !PRECOMPSE_UNICODE case in git-compat-util.h
where we do

    static inline const char *precompose_string_if_needed(const char *in)
    {
            return in;
    }

to make it a no-op.  I was wondering how you are avoiding an
inevitable crash from trying to free an unfreeable piece of memory,
but this should do just fine.

You'd want to fix the typo in the name of the new function, I
presume?  "precompse" -> "precompose"

> +		size_t buf_prec_len = strlen(buf_prec);
> +		free(strbuf_detach(sb, NULL));
> +		strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1);
> +	}
> +
> +}

> diff --git a/strbuf.c b/strbuf.c
> index 0d929e4e19..cefea6b75f 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -591,6 +591,7 @@ int strbuf_getcwd(struct strbuf *sb)
>  	for (;; guessed_len *= 2) {
>  		strbuf_grow(sb, guessed_len);
>  		if (getcwd(sb->buf, sb->alloc)) {
> +			precompse_strbuf_if_needed(sb);
>  			strbuf_setlen(sb, strlen(sb->buf));

The need for strbuf_setlen() stems from the use of getcwd() that may
and will place a string that is much shorter than sb->alloc, so they
logically belong together.  It will make more sense to call the
precompose _after_ arranging the members of strbuf in a consistent
state with the call to strbuf_setlen().

>  			return 0;
>  		}
> --
> 2.41.0.394.ge43f4fd0bd
Junio C Hamano May 7, 2024, 5:47 p.m. UTC | #2
tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> When running under macOs a call to strbuf_getcwd() may return

You spelled it as "macOS" in [1/2].  The hits from

    $ git grep -i 'mac *os' \*.[ch]

tell me that we seem to say "macOS", "MacOS X", "Mac OSX" and "Mac
OS X" pretty much interchangeably.  We may want to eventually
consolidate them to whatever the official name Apple uses, but in
the meantime let's make sure we do not add even more.
brian m. carlson May 8, 2024, 12:32 a.m. UTC | #3
On 2024-05-07 at 17:47:41, Junio C Hamano wrote:
> tboegi@web.de writes:
> 
> > From: Torsten Bögershausen <tboegi@web.de>
> >
> > When running under macOs a call to strbuf_getcwd() may return
> 
> You spelled it as "macOS" in [1/2].  The hits from
> 
>     $ git grep -i 'mac *os' \*.[ch]
> 
> tell me that we seem to say "macOS", "MacOS X", "Mac OSX" and "Mac
> OS X" pretty much interchangeably.  We may want to eventually
> consolidate them to whatever the official name Apple uses, but in
> the meantime let's make sure we do not add even more.

I believe the current preferred form is "macOS".  That's what I see on
Apple's website, and that's consistent with my understanding as well.

It was previously "Mac OS X", which is why we probably have that in our
code base, but it's my understanding Apple has moved away from that.

Wikipedia's opening sentence states, "macOS, originally Mac OS X,
previously shortened as OS X, is an operating system developed and
marketed by Apple since 2001," so I think Wikipedia agrees, too.
Junio C Hamano May 9, 2024, 3:24 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

>> diff --git a/strbuf.c b/strbuf.c
>> index 0d929e4e19..cefea6b75f 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -591,6 +591,7 @@ int strbuf_getcwd(struct strbuf *sb)
>>  	for (;; guessed_len *= 2) {
>>  		strbuf_grow(sb, guessed_len);
>>  		if (getcwd(sb->buf, sb->alloc)) {
>> +			precompse_strbuf_if_needed(sb);
>>  			strbuf_setlen(sb, strlen(sb->buf));
>
> The need for strbuf_setlen() stems from the use of getcwd() that may
> and will place a string that is much shorter than sb->alloc, so they
> logically belong together.  It will make more sense to call the
> precompose _after_ arranging the members of strbuf in a consistent
> state with the call to strbuf_setlen().

Of course, we need to make sure precompose_string_if_needed() will
leave the strbuf in an consistent state.  I think the implementation
of that helper function in this patch already does so, so

		strbuf_grow(sb, guessed_len);
		if (getcwd(sb->buf, sb->alloc)) {
			strbuf_setlen(sb, strlen(sb->buf));
			precompse_strbuf_if_needed(sb);

would be what we would want.

Thanks.
Torsten Bögershausen May 9, 2024, 3:29 p.m. UTC | #5
On Thu, May 09, 2024 at 08:24:05AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> Of course, we need to make sure precompose_string_if_needed() will
> leave the strbuf in an consistent state.  I think the implementation
> of that helper function in this patch already does so, so
>
> 		strbuf_grow(sb, guessed_len);
> 		if (getcwd(sb->buf, sb->alloc)) {
> 			strbuf_setlen(sb, strlen(sb->buf));
> 			precompse_strbuf_if_needed(sb);
>

I think that is what I have in mind as well.
Thanks for the review, a V2 should come the next days.
diff mbox series

Patch

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 0bd5c24250..82ec2a1c5b 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -94,6 +94,17 @@  const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+void precompse_strbuf_if_needed(struct strbuf *sb)
+{
+	char *buf_prec = (char *)precompose_string_if_needed(sb->buf);
+	if (buf_prec != sb->buf) {
+		size_t buf_prec_len = strlen(buf_prec);
+		free(strbuf_detach(sb, NULL));
+		strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1);
+	}
+
+}
+
 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
 	int i = 0;
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index fea06cf28a..fb17b1bd4a 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -30,6 +30,7 @@  typedef struct {

 const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 const char *precompose_string_if_needed(const char *in);
+void precompse_strbuf_if_needed(struct strbuf *sb);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index ca7678a379..80ae463410 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -344,6 +344,8 @@  static inline const char *precompose_string_if_needed(const char *in)
 	return in;
 }

+#define precompse_strbuf_if_needed(a)
+
 #define probe_utf8_pathname_composition()
 #endif

diff --git a/strbuf.c b/strbuf.c
index 0d929e4e19..cefea6b75f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -591,6 +591,7 @@  int strbuf_getcwd(struct strbuf *sb)
 	for (;; guessed_len *= 2) {
 		strbuf_grow(sb, guessed_len);
 		if (getcwd(sb->buf, sb->alloc)) {
+			precompse_strbuf_if_needed(sb);
 			strbuf_setlen(sb, strlen(sb->buf));
 			return 0;
 		}