diff mbox series

[v3,1/4] quote: add c quote flag to ignore core.quotePath

Message ID 20250201201658.11562-2-jltobler@gmail.com (mailing list archive)
State Superseded
Headers show
Series rev-list: print additional missing object information | expand

Commit Message

Justin Tobler Feb. 1, 2025, 8:16 p.m. UTC
The output of `cq_must_quote()` is affected by `core.quotePath`. This is
undesirable for operations that want to ensure consistent output
independent of config settings.

Introduce the `CQUOTE_IGNORE_CONFIG` flag for the `quote_c_style*`
functions which when set makes `cq_must_quote()` always follow the
default behavior (core.quotePath=true) regardless of how its set in the
config.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 quote.c | 14 ++++++++------
 quote.h |  3 ++-
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

Christian Couder Feb. 3, 2025, 9:51 a.m. UTC | #1
On Sat, Feb 1, 2025 at 9:20 PM Justin Tobler <jltobler@gmail.com> wrote:


> -static inline int cq_must_quote(char c)
> +static inline int cq_must_quote(char c, int ignore_config)

I think it's a bit better to use 'unsigned int' instead of just 'int'
for such flags, but it's fine here to use an 'int' because both
`quote_path_fully` and `no_dq` below already use that type.

> -               for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++);
> +               for (len = 0;
> +                    len < maxlen && !cq_must_quote(s[len], ignore_config); len++);

Micronit: If you really want to split the line into many lines, I
think it might be better to go all the way like this:

              for (len = 0;
                   len < maxlen && !cq_must_quote(s[len], ignore_config);
                   len++);

> @@ -83,7 +83,8 @@ int sq_dequote_to_strvec(char *arg, struct strvec *);
>  int unquote_c_style(struct strbuf *, const char *quoted, const char **endp);
>
>  /* Bits in the flags parameter to quote_c_style() */
> -#define CQUOTE_NODQ 01
> +#define CQUOTE_NODQ         (1u << 0)

Nice small cleanup which could be mentioned in the commit message if
you reroll this series.

> +#define CQUOTE_IGNORE_CONFIG (1u << 1)
Junio C Hamano Feb. 3, 2025, 10:14 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> On Sat, Feb 1, 2025 at 9:20 PM Justin Tobler <jltobler@gmail.com> wrote:
>
>
>> -static inline int cq_must_quote(char c)
>> +static inline int cq_must_quote(char c, int ignore_config)
>
> I think it's a bit better to use 'unsigned int' instead of just 'int'
> for such flags, but it's fine here to use an 'int' because both
> `quote_path_fully` and `no_dq` below already use that type.

Yup, good forward thinking to suggest using unsigned as these "this
is just a single bit, so let's use platform natural int" tend to
grow into "there is this another bit that is orthogonal, so pass it
as well in the same flag word", at which point unsigned would work
better for us.  But until that happens, plain platform natural int
is fine.

>> -               for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++);
>> +               for (len = 0;
>> +                    len < maxlen && !cq_must_quote(s[len], ignore_config); len++);
>
> Micronit: If you really want to split the line into many lines, I
> think it might be better to go all the way like this:
>
>               for (len = 0;
>                    len < maxlen && !cq_must_quote(s[len], ignore_config);
>                    len++);

Good style suggestion.

Thanks.
Junio C Hamano Feb. 3, 2025, 10:33 p.m. UTC | #3
Justin Tobler <jltobler@gmail.com> writes:

> The output of `cq_must_quote()` is affected by `core.quotePath`. This is
> undesirable for operations that want to ensure consistent output
> independent of config settings.
>
> Introduce the `CQUOTE_IGNORE_CONFIG` flag for the `quote_c_style*`
> functions which when set makes `cq_must_quote()` always follow the
> default behavior (core.quotePath=true) regardless of how its set in the
> config.

Hmph.  

I was hoping that we can flip the default for 'core.quotePath' to
'no' at Git 3.0 boundary, to help folks in non-ASCII locale.  If
this is about emitting pipe-able output out of rev-list, unlike a
patch that is to be e-mailed (and being 8-bit clean was a risky
assumption to make in 2005) that core.quotePath was originally
invented for, it is more so that we would not want to force the
receiving end to unquote, no?

So regardless of what the future default value of core.quotePath
would be, I am not convinced that it is a good idea to octal quote
any and all bytes outside the ASCII range in the rev-list output.

After all, "git rev-list --objects" would show such a path without
quoting, no [*]?

    Side note: the path in the output from "git rev-list --objects"
    is a hack to allow receiving end to compute a path hash, and
    does not have to be strictly reversible, so it emits verbatim
    bytes but truncates the output at LF to preserve the one-line
    one-object output format.

We do need to quote certain bytes (e.g., LF cannot be allowed
verbatim, when the output is line-oriented, and we use C-quote,
which means literal double-quote needs to be also quoted), so we
cannot mimic paths emitted by "git rev-list --objects", but I do not
think it buys us much to quote non-ASCII bytes these days.
Junio C Hamano Feb. 4, 2025, 4:40 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> So regardless of what the future default value of core.quotePath
> would be, I am not convinced that it is a good idea to octal quote
> any and all bytes outside the ASCII range in the rev-list output.
>
> After all, "git rev-list --objects" would show such a path without
> quoting, no [*]?
>
>     Side note: the path in the output from "git rev-list --objects"
>     is a hack to allow receiving end to compute a path hash, and
>     does not have to be strictly reversible, so it emits verbatim
>     bytes but truncates the output at LF to preserve the one-line
>     one-object output format.
>
> We do need to quote certain bytes (e.g., LF cannot be allowed
> verbatim, when the output is line-oriented, and we use C-quote,
> which means literal double-quote needs to be also quoted), so we
> cannot mimic paths emitted by "git rev-list --objects", but I do not
> think it buys us much to quote non-ASCII bytes these days.

Rereading this I realize that I was not quite making sense.

A short version of what I wanted to say is:

 - The output format need to do some quoting anyway because it is
   inevitable to make the string stuffed as "value" in a space
   separated list of var=value on a single line.

 - It does not really matter if core.quotePath allows us to pass
   bytes with 8-bit set (e.g. UTF-8 outside ASCII) unquoted or
   require quoting.  The receiving end must be prepared to unquote,
   so it is dubious that a new feature to ignore core.quotePath is
   needed.

 - We do need to quote SP that cquote does not require quoting, so
   some wrapping around quote_c_style() like quote_path() does is
   needed.

Thanks.
Justin Tobler Feb. 4, 2025, 10:50 p.m. UTC | #5
On 25/02/04 08:40AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > So regardless of what the future default value of core.quotePath
> > would be, I am not convinced that it is a good idea to octal quote
> > any and all bytes outside the ASCII range in the rev-list output.
> >
> > After all, "git rev-list --objects" would show such a path without
> > quoting, no [*]?
> >
> >     Side note: the path in the output from "git rev-list --objects"
> >     is a hack to allow receiving end to compute a path hash, and
> >     does not have to be strictly reversible, so it emits verbatim
> >     bytes but truncates the output at LF to preserve the one-line
> >     one-object output format.
> >
> > We do need to quote certain bytes (e.g., LF cannot be allowed
> > verbatim, when the output is line-oriented, and we use C-quote,
> > which means literal double-quote needs to be also quoted), so we
> > cannot mimic paths emitted by "git rev-list --objects", but I do not
> > think it buys us much to quote non-ASCII bytes these days.
> 
> Rereading this I realize that I was not quite making sense.
> 
> A short version of what I wanted to say is:
> 
>  - The output format need to do some quoting anyway because it is
>    inevitable to make the string stuffed as "value" in a space
>    separated list of var=value on a single line.
> 
>  - It does not really matter if core.quotePath allows us to pass
>    bytes with 8-bit set (e.g. UTF-8 outside ASCII) unquoted or
>    require quoting.  The receiving end must be prepared to unquote,
>    so it is dubious that a new feature to ignore core.quotePath is
>    needed.

I agree, thinking about this some more, forcing the value quoting
behavior to act as if core.quotePath is always enabled is a step in the
wrong direction. If anything, we could force it disabled to be more
friendly to non-ASCII characters, but that is probably not worth it
either.

>  - We do need to quote SP that cquote does not require quoting, so
>    some wrapping around quote_c_style() like quote_path() does is
>    needed.

I think using quote_path() as-is should be sufficient. I'll drop the
first two patches from this series in the next version I send and adapt
accordingly.

Thanks
-Justin
diff mbox series

Patch

diff --git a/quote.c b/quote.c
index b9f6bdc775..d129c1de70 100644
--- a/quote.c
+++ b/quote.c
@@ -232,21 +232,22 @@  static signed char const cq_lookup[256] = {
 	/* 0x80 */ /* set to 0 */
 };
 
-static inline int cq_must_quote(char c)
+static inline int cq_must_quote(char c, int ignore_config)
 {
-	return cq_lookup[(unsigned char)c] + quote_path_fully > 0;
+	return cq_lookup[(unsigned char)c] + (quote_path_fully || ignore_config) > 0;
 }
 
 /* returns the longest prefix not needing a quote up to maxlen if positive.
    This stops at the first \0 because it's marked as a character needing an
    escape */
-static size_t next_quote_pos(const char *s, ssize_t maxlen)
+static size_t next_quote_pos(const char *s, ssize_t maxlen, int ignore_config)
 {
 	size_t len;
 	if (maxlen < 0) {
-		for (len = 0; !cq_must_quote(s[len]); len++);
+		for (len = 0; !cq_must_quote(s[len], ignore_config); len++);
 	} else {
-		for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++);
+		for (len = 0;
+		     len < maxlen && !cq_must_quote(s[len], ignore_config); len++);
 	}
 	return len;
 }
@@ -282,13 +283,14 @@  static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
 	} while (0)
 
 	int no_dq = !!(flags & CQUOTE_NODQ);
+	int ignore_config = !!(flags & CQUOTE_IGNORE_CONFIG);
 	size_t len, count = 0;
 	const char *p = name;
 
 	for (;;) {
 		int ch;
 
-		len = next_quote_pos(p, maxlen);
+		len = next_quote_pos(p, maxlen, ignore_config);
 		if (len == maxlen || (maxlen < 0 && !p[len]))
 			break;
 
diff --git a/quote.h b/quote.h
index 0300c29104..2a793fbef6 100644
--- a/quote.h
+++ b/quote.h
@@ -83,7 +83,8 @@  int sq_dequote_to_strvec(char *arg, struct strvec *);
 int unquote_c_style(struct strbuf *, const char *quoted, const char **endp);
 
 /* Bits in the flags parameter to quote_c_style() */
-#define CQUOTE_NODQ 01
+#define CQUOTE_NODQ	     (1u << 0)
+#define CQUOTE_IGNORE_CONFIG (1u << 1)
 size_t quote_c_style(const char *name, struct strbuf *, FILE *, unsigned);
 void quote_two_c_style(struct strbuf *, const char *, const char *, unsigned);