[2/2] parse-options: properly align continued usage output

Message ID	patch-2.2-ab4bb70902b-20210901T110917Z-avarab@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Carlo Arenas <carenas@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFy?= =?utf-8?b?bWFzb24=?= <avarab@gmail.com> Subject: [PATCH 2/2] parse-options: properly align continued usage output Date: Wed, 1 Sep 2021 13:12:55 +0200 Message-Id: <patch-2.2-ab4bb70902b-20210901T110917Z-avarab@gmail.com> In-Reply-To: <cover-0.2-00000000000-20210901T110917Z-avarab@gmail.com> References: <cover-0.2-00000000000-20210901T110917Z-avarab@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	parse-options: properly align continued usage output \| expand [0/2] parse-options: properly align continued usage output [1/2] built-ins: "properly" align continued usage output [2/2] parse-options: properly align continued usage output

Message ID

patch-2.2-ab4bb70902b-20210901T110917Z-avarab@gmail.com (mailing list archive)

State

Superseded

Headers

From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?=  <avarab@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
 Carlo Arenas <carenas@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFy?=
	=?utf-8?b?bWFzb24=?=  <avarab@gmail.com>
Subject: [PATCH 2/2] parse-options: properly align continued usage output
Date: Wed,  1 Sep 2021 13:12:55 +0200
Message-Id: <patch-2.2-ab4bb70902b-20210901T110917Z-avarab@gmail.com>
In-Reply-To: <cover-0.2-00000000000-20210901T110917Z-avarab@gmail.com>
References: <cover-0.2-00000000000-20210901T110917Z-avarab@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

parse-options: properly align continued usage output | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 1, 2021, 11:12 a.m. UTC

Some commands such as "git stash" emit continued options output with
e.g. "git stash -h", because usage_with_options_internal() prefixes
with its own whitespace the resulting output wasn't properly
aligned. Let's account for the added whitespace, which properly aligns
the output.

The "git stash" command has usage output with a N_() translation that
legitimately stretches across multiple lines;

	N_("git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]\n"
	   "          [-u|--include-untracked] [-a|--all] [-m|--message <message>]\n"

We'd like to have that output aligned with the length of the initial
"git stash " output, but since usage_with_options_internal() adds its
own whitespace prefixing we fell short, before this change we'd emit:

    $ git stash -h
    usage: git stash list [<options>]
       or: git stash show [<options>] [<stash>]
       [...]
       or: git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
              [-u|--include-untracked] [-a|--all] [-m|--message <message>]
              [...]

Now we'll properly emit aligned output.  I.e. the last four lines
above will instead be (a whitespace-only change to the above):

       [...]
       or: git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
                     [-u|--include-untracked] [-a|--all] [-m|--message <message>]
                     [...]

In making this change we can can fold the two for-loops over *usagestr
into one. We had two of them purely to account for the case where an
empty string in the array delimits the usage output from free-form
text output.

We could skip the string_list_split() with a strchr(str, '\n') check,
but we'd then need to duplicate our state machine for strings that do
and don't contain a "\n". It's simpler to just always split into a
"struct string_list", even though the common case is that that "struct
string_list" will contain only one element. This is not
performance-sensitive code.

This change is relatively more complex since I've accounted for making
it future-proof for RTL translation support. Later in
usage_with_options_internal() we have some existing padding code
dating back to d7a38c54a6c (parse-options: be able to generate usages
automatically, 2007-10-15) which isn't RTL-safe, but that code would
be easy to fix. Let's not introduce new RTL translation problems here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 parse-options.c | 79 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 66 insertions(+), 13 deletions(-)

Comments

Eric Sunshine Sept. 10, 2021, 7:51 a.m. UTC | #1

On 9/1/21 7:12 AM, Ævar Arnfjörð Bjarmason wrote:
> Some commands such as "git stash" emit continued options output with
> e.g. "git stash -h", because usage_with_options_internal() prefixes
> with its own whitespace the resulting output wasn't properly
> aligned. Let's account for the added whitespace, which properly aligns
> the output.
> 
> The "git stash" command has usage output with a N_() translation that
> legitimately stretches across multiple lines;
> 
> 	N_("git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]\n"
> 	   "          [-u|--include-untracked] [-a|--all] [-m|--message <message>]\n"
> 
> We'd like to have that output aligned with the length of the initial
> "git stash " output, but since usage_with_options_internal() adds its
> own whitespace prefixing we fell short, before this change we'd emit:
> 
>      $ git stash -h
>      usage: git stash list [<options>]
>         or: git stash show [<options>] [<stash>]
>         [...]
>         or: git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
>                [-u|--include-untracked] [-a|--all] [-m|--message <message>]
>                [...]
> 
> Now we'll properly emit aligned output.  I.e. the last four lines
> above will instead be (a whitespace-only change to the above):
> 
>         [...]
>         or: git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
>                       [-u|--include-untracked] [-a|--all] [-m|--message <message>]
>                       [...]
> 
> In making this change we can can fold the two for-loops over *usagestr
> into one. We had two of them purely to account for the case where an
> empty string in the array delimits the usage output from free-form
> text output.

More on this below...

> We could skip the string_list_split() with a strchr(str, '\n') check,
> but we'd then need to duplicate our state machine for strings that do
> and don't contain a "\n". It's simpler to just always split into a
> "struct string_list", even though the common case is that that "struct
> string_list" will contain only one element. This is not
> performance-sensitive code.

Makes sense.

> This change is relatively more complex since I've accounted for making
> it future-proof for RTL translation support. Later in
> usage_with_options_internal() we have some existing padding code
> dating back to d7a38c54a6c (parse-options: be able to generate usages
> automatically, 2007-10-15) which isn't RTL-safe, but that code would
> be easy to fix. Let's not introduce new RTL translation problems here.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> diff --git a/parse-options.c b/parse-options.c
> @@ -917,25 +917,78 @@ static int usage_with_options_internal(struct parse_opt_ctx_t *ctx,
> +	 * When a translated usage string has an embedded "\n" it's
> +	 * because options have wrapped o the next line. The line

"wrapped o the next line"?

> +	size_t or_len = strlen(or_prefix) - strlen("%s");
> +	int i;
> +	int saw_empty_line = 0;
> +
> -	fprintf_ln(outfile, _("usage: %s"), _(*usagestr++));
> -	while (*usagestr && **usagestr)
> -		/*
> -		 * TRANSLATORS: the colon here should align with the
> -		 * one in "usage: %s" translation.
> -		 */
> -		fprintf_ln(outfile, _("   or: %s"), _(*usagestr++));
> -	while (*usagestr) {
> -		if (**usagestr)
> -			fprintf_ln(outfile, _("    %s"), _(*usagestr));
> -		else
> -			fputc('\n', outfile);
> -		usagestr++;
> +	for (i = 0; *usagestr; i++) {
> +		const char *str = _(*usagestr++);
> +		struct string_list list = STRING_LIST_INIT_DUP;
> +		unsigned int j;
> +
> +		string_list_split(&list, str, '\n', -1);
> +		for (j = 0; j < list.nr; j++) {
> +			const char *line = list.items[j].string;
> +
> +			if (!saw_empty_line && !*line)
> +				saw_empty_line = 1;
> +
> +			if (saw_empty_line && *line)
> +				fprintf_ln(outfile, _("    %s"), line);
> +			else if (saw_empty_line)
> +				fputc('\n', outfile);
> +			else if (!j && !i)
> +				fprintf_ln(outfile, usage_prefix, line);
> +			else if (!j)
> +				fprintf_ln(outfile, or_prefix, line);
> +			else
> +				fprintf_ln(outfile, usage_continued,
> +					   (int)or_len, "", line);
> +		}
> +		string_list_clear(&list, 0);

I may be missing something obvious, but I'm having trouble understanding 
why this single loop is better than the two loops it replaces. The 
cognitive load of the new code is much higher than that of the original. 
With the original code, the logic was obvious at a glance. On the other 
hand, I had to concentrate hard to figure out what the new code is 
trying to do and to wrap my brain around all the cases it is handling. I 
suppose you went with the single loop to avoid code duplication (in 
particular, the call to string_list_split() and the loop over the split 
elements)?

There are other ways this might be accomplished which don't carry such a 
high cognitive load. One (typed-in-email) possibility which closely 
resembles the existing code:

     const char *pfx = usage_prefix;
     while (*usagestr && **usagestr) {
         string_list_split(&list, _(*usagestr++), ...);
         fprintf_ln(outfile, pfx, list.items[0].string);
         for (i = 1; i < list.nr; i++)
             fprintf_ln(outfile, usage_continued,
                 (int)or_len, "", list.items[i].string);
         pfx = or_prefix;
     }
     while (*usagestr) {
         string_list_split(&list, _(*usagestr++), ...);
         for (i = 0; i < list.nr; i++) {
             const char *line = list.items[i].string;
             if (*line)
                 fprintf_ln(outfile, _("    %s"), line);
             else
                 fputc('\n', outfile);
         }
     }

I also wonder if you really need to support the embedded-newline case 
for the free-form text loop. For free-form text, it's just as easy for 
each line of text to be a distinct item in the usage[] array, if I 
understand correctly, so there isn't really a good reason for clients to 
embed newlines in the free-form text portion. Given that there's only a 
single client in the entire project which takes advantage of the 
free-form text support -- and that client doesn't embed newlines -- it 
may be simpler to not bother supporting embedded newlines for the 
free-form text, in which case you don't even need to modify that loop; 
the existing code is good enough.

Anyhow, the above observations are subjective, thus not necessarily 
actionable, however, there is also a subtle yet dramatic behavior change 
in the new code, if I understand correctly. It's not clear if this 
behavior change is intentional (it isn't mentioned in the commit 
message), but it does seem potentially dangerous. Specifically, with the:

     if (!saw_empty_line && !*line)
         saw_empty_line = 1;

check inside the inner loop which iterates over the split lines, this 
means that if someone accidentally embeds an extra newline in some usage 
line:

     static const char *foo_usage[] = {
         N_("git foo --bar\n\n" /* <-- accidental extra newline */
            "        --baz"),
         N_("git boo"),
         NULL
     };

then _all_ following usage lines will incorrectly be treated as 
free-form text lines rather than as the "or:" lines they are intended to 
be. Moving the:

     if (!saw_empty_line && !*line)
         saw_empty_line = 1;

check from the inner loop to the outer loop should restore the original 
(intended) behavior, I believe.

diff --git a/parse-options.c b/parse-options.c
index 2abff136a17..a06968bf4f5 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -917,25 +917,78 @@  static int usage_with_options_internal(struct parse_opt_ctx_t *ctx,
 	FILE *outfile = err ? stderr : stdout;
 	int need_newline;
 
+	const char *usage_prefix = _("usage: %s");
+	/*
+	 * TRANSLATORS: the colon here should align with the
+	 * one in "usage: %s" translation.
+	 */
+	const char *or_prefix = _("   or: %s");
+	/*
+	 * TRANSLATORS: You should only need to translate this format
+	 * string if your language is a RTL language (e.g. Arabic,
+	 * Hebrew etc.), not if it's a LTR language (e.g. German,
+	 * Russian, Chinese etc.).
+	 *
+	 * When a translated usage string has an embedded "\n" it's
+	 * because options have wrapped o the next line. The line
+	 * after the "\n" will then be padded to align with the
+	 * command name, such as N_("git cmd [opt]\n<8
+	 * spaces>[opt2]"), where the 8 spaces are the same length as
+	 * "git cmd ".
+	 *
+	 * This format string prints out that already-translated
+	 * line. The "%*s" is whitespace padding to account for the
+	 * padding at the start of the line that we add in this
+	 * function, the "%s" is a line in the (hopefully already
+	 * translated) N_() usage string, which contained embedded
+	 * newlines before we split it up.
+	 */
+	const char *usage_continued = _("%*s%s");
+
+	/*
+	 * The translation could be anything, but we can count on
+	 * msgfmt(1)'s --check option to have asserted that "%s" is in
+	 * the translation. So compute the length of the " or: "
+	 * part. We are assuming that the translator wasn't overly
+	 * clever and used e.g. "%1$s" instead of "%s", there's only
+	 * one "%s" in "or_prefix" above, so there's no reason to do
+	 * so even with a RTL language.
+	 */
+	size_t or_len = strlen(or_prefix) - strlen("%s");
+	int i;
+	int saw_empty_line = 0;
+
 	if (!usagestr)
 		return PARSE_OPT_HELP;
 
 	if (!err && ctx && ctx->flags & PARSE_OPT_SHELL_EVAL)
 		fprintf(outfile, "cat <<\\EOF\n");
 
-	fprintf_ln(outfile, _("usage: %s"), _(*usagestr++));
-	while (*usagestr && **usagestr)
-		/*
-		 * TRANSLATORS: the colon here should align with the
-		 * one in "usage: %s" translation.
-		 */
-		fprintf_ln(outfile, _("   or: %s"), _(*usagestr++));
-	while (*usagestr) {
-		if (**usagestr)
-			fprintf_ln(outfile, _("    %s"), _(*usagestr));
-		else
-			fputc('\n', outfile);
-		usagestr++;
+	for (i = 0; *usagestr; i++) {
+		const char *str = _(*usagestr++);
+		struct string_list list = STRING_LIST_INIT_DUP;
+		unsigned int j;
+
+		string_list_split(&list, str, '\n', -1);
+		for (j = 0; j < list.nr; j++) {
+			const char *line = list.items[j].string;
+
+			if (!saw_empty_line && !*line)
+				saw_empty_line = 1;
+
+			if (saw_empty_line && *line)
+				fprintf_ln(outfile, _("    %s"), line);
+			else if (saw_empty_line)
+				fputc('\n', outfile);
+			else if (!j && !i)
+				fprintf_ln(outfile, usage_prefix, line);
+			else if (!j)
+				fprintf_ln(outfile, or_prefix, line);
+			else
+				fprintf_ln(outfile, usage_continued,
+					   (int)or_len, "", line);
+		}
+		string_list_clear(&list, 0);
 	}
 
 	need_newline = 1;

[2/2] parse-options: properly align continued usage output

Commit Message

Comments

Patch