Message ID | 20250201201658.11562-2-jltobler@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rev-list: print additional missing object information | expand |
On Sat, Feb 1, 2025 at 9:20 PM Justin Tobler <jltobler@gmail.com> wrote: > -static inline int cq_must_quote(char c) > +static inline int cq_must_quote(char c, int ignore_config) I think it's a bit better to use 'unsigned int' instead of just 'int' for such flags, but it's fine here to use an 'int' because both `quote_path_fully` and `no_dq` below already use that type. > - for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++); > + for (len = 0; > + len < maxlen && !cq_must_quote(s[len], ignore_config); len++); Micronit: If you really want to split the line into many lines, I think it might be better to go all the way like this: for (len = 0; len < maxlen && !cq_must_quote(s[len], ignore_config); len++); > @@ -83,7 +83,8 @@ int sq_dequote_to_strvec(char *arg, struct strvec *); > int unquote_c_style(struct strbuf *, const char *quoted, const char **endp); > > /* Bits in the flags parameter to quote_c_style() */ > -#define CQUOTE_NODQ 01 > +#define CQUOTE_NODQ (1u << 0) Nice small cleanup which could be mentioned in the commit message if you reroll this series. > +#define CQUOTE_IGNORE_CONFIG (1u << 1)
Christian Couder <christian.couder@gmail.com> writes: > On Sat, Feb 1, 2025 at 9:20 PM Justin Tobler <jltobler@gmail.com> wrote: > > >> -static inline int cq_must_quote(char c) >> +static inline int cq_must_quote(char c, int ignore_config) > > I think it's a bit better to use 'unsigned int' instead of just 'int' > for such flags, but it's fine here to use an 'int' because both > `quote_path_fully` and `no_dq` below already use that type. Yup, good forward thinking to suggest using unsigned as these "this is just a single bit, so let's use platform natural int" tend to grow into "there is this another bit that is orthogonal, so pass it as well in the same flag word", at which point unsigned would work better for us. But until that happens, plain platform natural int is fine. >> - for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++); >> + for (len = 0; >> + len < maxlen && !cq_must_quote(s[len], ignore_config); len++); > > Micronit: If you really want to split the line into many lines, I > think it might be better to go all the way like this: > > for (len = 0; > len < maxlen && !cq_must_quote(s[len], ignore_config); > len++); Good style suggestion. Thanks.
Justin Tobler <jltobler@gmail.com> writes: > The output of `cq_must_quote()` is affected by `core.quotePath`. This is > undesirable for operations that want to ensure consistent output > independent of config settings. > > Introduce the `CQUOTE_IGNORE_CONFIG` flag for the `quote_c_style*` > functions which when set makes `cq_must_quote()` always follow the > default behavior (core.quotePath=true) regardless of how its set in the > config. Hmph. I was hoping that we can flip the default for 'core.quotePath' to 'no' at Git 3.0 boundary, to help folks in non-ASCII locale. If this is about emitting pipe-able output out of rev-list, unlike a patch that is to be e-mailed (and being 8-bit clean was a risky assumption to make in 2005) that core.quotePath was originally invented for, it is more so that we would not want to force the receiving end to unquote, no? So regardless of what the future default value of core.quotePath would be, I am not convinced that it is a good idea to octal quote any and all bytes outside the ASCII range in the rev-list output. After all, "git rev-list --objects" would show such a path without quoting, no [*]? Side note: the path in the output from "git rev-list --objects" is a hack to allow receiving end to compute a path hash, and does not have to be strictly reversible, so it emits verbatim bytes but truncates the output at LF to preserve the one-line one-object output format. We do need to quote certain bytes (e.g., LF cannot be allowed verbatim, when the output is line-oriented, and we use C-quote, which means literal double-quote needs to be also quoted), so we cannot mimic paths emitted by "git rev-list --objects", but I do not think it buys us much to quote non-ASCII bytes these days.
Junio C Hamano <gitster@pobox.com> writes: > So regardless of what the future default value of core.quotePath > would be, I am not convinced that it is a good idea to octal quote > any and all bytes outside the ASCII range in the rev-list output. > > After all, "git rev-list --objects" would show such a path without > quoting, no [*]? > > Side note: the path in the output from "git rev-list --objects" > is a hack to allow receiving end to compute a path hash, and > does not have to be strictly reversible, so it emits verbatim > bytes but truncates the output at LF to preserve the one-line > one-object output format. > > We do need to quote certain bytes (e.g., LF cannot be allowed > verbatim, when the output is line-oriented, and we use C-quote, > which means literal double-quote needs to be also quoted), so we > cannot mimic paths emitted by "git rev-list --objects", but I do not > think it buys us much to quote non-ASCII bytes these days. Rereading this I realize that I was not quite making sense. A short version of what I wanted to say is: - The output format need to do some quoting anyway because it is inevitable to make the string stuffed as "value" in a space separated list of var=value on a single line. - It does not really matter if core.quotePath allows us to pass bytes with 8-bit set (e.g. UTF-8 outside ASCII) unquoted or require quoting. The receiving end must be prepared to unquote, so it is dubious that a new feature to ignore core.quotePath is needed. - We do need to quote SP that cquote does not require quoting, so some wrapping around quote_c_style() like quote_path() does is needed. Thanks.
On 25/02/04 08:40AM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > So regardless of what the future default value of core.quotePath > > would be, I am not convinced that it is a good idea to octal quote > > any and all bytes outside the ASCII range in the rev-list output. > > > > After all, "git rev-list --objects" would show such a path without > > quoting, no [*]? > > > > Side note: the path in the output from "git rev-list --objects" > > is a hack to allow receiving end to compute a path hash, and > > does not have to be strictly reversible, so it emits verbatim > > bytes but truncates the output at LF to preserve the one-line > > one-object output format. > > > > We do need to quote certain bytes (e.g., LF cannot be allowed > > verbatim, when the output is line-oriented, and we use C-quote, > > which means literal double-quote needs to be also quoted), so we > > cannot mimic paths emitted by "git rev-list --objects", but I do not > > think it buys us much to quote non-ASCII bytes these days. > > Rereading this I realize that I was not quite making sense. > > A short version of what I wanted to say is: > > - The output format need to do some quoting anyway because it is > inevitable to make the string stuffed as "value" in a space > separated list of var=value on a single line. > > - It does not really matter if core.quotePath allows us to pass > bytes with 8-bit set (e.g. UTF-8 outside ASCII) unquoted or > require quoting. The receiving end must be prepared to unquote, > so it is dubious that a new feature to ignore core.quotePath is > needed. I agree, thinking about this some more, forcing the value quoting behavior to act as if core.quotePath is always enabled is a step in the wrong direction. If anything, we could force it disabled to be more friendly to non-ASCII characters, but that is probably not worth it either. > - We do need to quote SP that cquote does not require quoting, so > some wrapping around quote_c_style() like quote_path() does is > needed. I think using quote_path() as-is should be sufficient. I'll drop the first two patches from this series in the next version I send and adapt accordingly. Thanks -Justin
diff --git a/quote.c b/quote.c index b9f6bdc775..d129c1de70 100644 --- a/quote.c +++ b/quote.c @@ -232,21 +232,22 @@ static signed char const cq_lookup[256] = { /* 0x80 */ /* set to 0 */ }; -static inline int cq_must_quote(char c) +static inline int cq_must_quote(char c, int ignore_config) { - return cq_lookup[(unsigned char)c] + quote_path_fully > 0; + return cq_lookup[(unsigned char)c] + (quote_path_fully || ignore_config) > 0; } /* returns the longest prefix not needing a quote up to maxlen if positive. This stops at the first \0 because it's marked as a character needing an escape */ -static size_t next_quote_pos(const char *s, ssize_t maxlen) +static size_t next_quote_pos(const char *s, ssize_t maxlen, int ignore_config) { size_t len; if (maxlen < 0) { - for (len = 0; !cq_must_quote(s[len]); len++); + for (len = 0; !cq_must_quote(s[len], ignore_config); len++); } else { - for (len = 0; len < maxlen && !cq_must_quote(s[len]); len++); + for (len = 0; + len < maxlen && !cq_must_quote(s[len], ignore_config); len++); } return len; } @@ -282,13 +283,14 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen, } while (0) int no_dq = !!(flags & CQUOTE_NODQ); + int ignore_config = !!(flags & CQUOTE_IGNORE_CONFIG); size_t len, count = 0; const char *p = name; for (;;) { int ch; - len = next_quote_pos(p, maxlen); + len = next_quote_pos(p, maxlen, ignore_config); if (len == maxlen || (maxlen < 0 && !p[len])) break; diff --git a/quote.h b/quote.h index 0300c29104..2a793fbef6 100644 --- a/quote.h +++ b/quote.h @@ -83,7 +83,8 @@ int sq_dequote_to_strvec(char *arg, struct strvec *); int unquote_c_style(struct strbuf *, const char *quoted, const char **endp); /* Bits in the flags parameter to quote_c_style() */ -#define CQUOTE_NODQ 01 +#define CQUOTE_NODQ (1u << 0) +#define CQUOTE_IGNORE_CONFIG (1u << 1) size_t quote_c_style(const char *name, struct strbuf *, FILE *, unsigned); void quote_two_c_style(struct strbuf *, const char *, const char *, unsigned);
The output of `cq_must_quote()` is affected by `core.quotePath`. This is undesirable for operations that want to ensure consistent output independent of config settings. Introduce the `CQUOTE_IGNORE_CONFIG` flag for the `quote_c_style*` functions which when set makes `cq_must_quote()` always follow the default behavior (core.quotePath=true) regardless of how its set in the config. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- quote.c | 14 ++++++++------ quote.h | 3 ++- 2 files changed, 10 insertions(+), 7 deletions(-)