test-progress: fix test failures on big-endian systems

Message ID	20191019233706.GM29845@szeder.dev (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=aLlQ=YM=vger.kernel.org=git-owner@kernel.org> Date: Sun, 20 Oct 2019 01:37:06 +0200 From: SZEDER =?utf-8?b?R8OhYm9y?= <szeder.dev@gmail.com> To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>, Junio C Hamano <gitster@pobox.com> Cc: Todd Zullinger <tmz@pobox.com>, git@vger.kernel.org Subject: [PATCH] test-progress: fix test failures on big-endian systems Message-ID: <20191019233706.GM29845@szeder.dev> References: <b0bec82e-ad0a-32f6-e2e6-e1f0e6920639@physik.fu-berlin.de> <20190731071755.GF4545@pobox.com> <f1ce445e-6954-8e7b-2dca-3a566ce689a5@physik.fu-berlin.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <f1ce445e-6954-8e7b-2dca-3a566ce689a5@physik.fu-berlin.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: git-owner@vger.kernel.org Precedence: bulk
Series	test-progress: fix test failures on big-endian systems \| expand test-progress: fix test failures on big-endian systems

SZEDER Gábor Oct. 19, 2019, 11:37 p.m. UTC

On Sat, Oct 19, 2019 at 11:38:40PM +0200, John Paul Adrian Glaubitz wrote:
> The testsuite is failing again on s390x and all other big-endian targets in
> Debian. For a full build log on s390x see [1].

Gah, my progress display fixes strike again...

I think the patch below should fix it, but I could only test it on
little-endian systems.  Could you please confirm that it indeed works
on big-endian as well?

  --- >8 ---

Subject: [PATCH] test-progress: fix test failures on big-endian systems

In 't0500-progress-display.sh' all tests running 'test-tool progress
--total=<N>' fail on big-endian systems, e.g. like this:

  + test-tool progress --total=3 Working hard
  [...]
  + test_i18ncmp expect out
  --- expect	2019-10-18 23:07:54.765523916 +0000
  +++ out	2019-10-18 23:07:54.773523916 +0000
  @@ -1,4 +1,2 @@
  -Working hard:  33% (1/3)<CR>
  -Working hard:  66% (2/3)<CR>
  -Working hard: 100% (3/3)<CR>
  -Working hard: 100% (3/3), done.
  +Working hard:   0% (1/12884901888)<CR>
  +Working hard:   0% (3/12884901888), done.

The reason for that bogus value is that '--total's parameter is parsed
via parse-options's OPT_INTEGER into a uint64_t variable [1], so the
two bits of 3 end up in the "wrong" bytes on big-endian systems
(12884901888 = 0x300000000).

Change the type of that variable from uint64_t to int, to match what
parse-options expects; in the tests of the progress output we won't
use values that don't fit into an int anyway.

[1] start_progress() expects the total number as an uint64_t, that's
    why I chose the same type when declaring the variable holding the
    value given on the command line.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/helper/test-progress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

John Paul Adrian Glaubitz Oct. 19, 2019, 11:55 p.m. UTC | #1

Hi Gábor!

On 10/20/19 1:37 AM, SZEDER Gábor wrote:
> On Sat, Oct 19, 2019 at 11:38:40PM +0200, John Paul Adrian Glaubitz wrote:
>> The testsuite is failing again on s390x and all other big-endian targets in
>> Debian. For a full build log on s390x see [1].
> 
> Gah, my progress display fixes strike again...
> 
> I think the patch below should fix it, but I could only test it on
> little-endian systems.  Could you please confirm that it indeed works
> on big-endian as well?
> 
> 
>   --- >8 ---
> 
> Subject: [PATCH] test-progress: fix test failures on big-endian systems
> 
> In 't0500-progress-display.sh' all tests running 'test-tool progress
> --total=<N>' fail on big-endian systems, e.g. like this:
> 
>   + test-tool progress --total=3 Working hard
>   [...]
>   + test_i18ncmp expect out
>   --- expect	2019-10-18 23:07:54.765523916 +0000
>   +++ out	2019-10-18 23:07:54.773523916 +0000
>   @@ -1,4 +1,2 @@
>   -Working hard:  33% (1/3)<CR>
>   -Working hard:  66% (2/3)<CR>
>   -Working hard: 100% (3/3)<CR>
>   -Working hard: 100% (3/3), done.
>   +Working hard:   0% (1/12884901888)<CR>
>   +Working hard:   0% (3/12884901888), done.
> 
> The reason for that bogus value is that '--total's parameter is parsed
> via parse-options's OPT_INTEGER into a uint64_t variable [1], so the
> two bits of 3 end up in the "wrong" bytes on big-endian systems
> (12884901888 = 0x300000000).
> 
> Change the type of that variable from uint64_t to int, to match what
> parse-options expects; in the tests of the progress output we won't
> use values that don't fit into an int anyway.
> 
> [1] start_progress() expects the total number as an uint64_t, that's
>     why I chose the same type when declaring the variable holding the
>     value given on the command line.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  t/helper/test-progress.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/helper/test-progress.c b/t/helper/test-progress.c
> index 4e9f7fafdf..42b96cb103 100644
> --- a/t/helper/test-progress.c
> +++ b/t/helper/test-progress.c
> @@ -29,7 +29,7 @@ void progress_test_force_update(void);
>  
>  int cmd__progress(int argc, const char **argv)
>  {
> -	uint64_t total = 0;
> +	int total = 0;
>  	const char *title;
>  	struct strbuf line = STRBUF_INIT;
>  	struct progress *progress;
> 

I can confirm that your patch fixes the testsuite for me on Debian
unstable/ppc64 (big-endian).

Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

Thanks,
Adrian

Todd Zullinger Oct. 20, 2019, 12:19 a.m. UTC | #2

Hi,

John Paul Adrian Glaubitz wrote:
> Hi Gábor!
> 
> On 10/20/19 1:37 AM, SZEDER Gábor wrote:
>> On Sat, Oct 19, 2019 at 11:38:40PM +0200, John Paul Adrian Glaubitz wrote:
>>> The testsuite is failing again on s390x and all other big-endian targets in
>>> Debian. For a full build log on s390x see [1].
>> 
>> Gah, my progress display fixes strike again...
>> 
>> I think the patch below should fix it, but I could only test it on
>> little-endian systems.  Could you please confirm that it indeed works
>> on big-endian as well?
[...]
> I can confirm that your patch fixes the testsuite for me on Debian
> unstable/ppc64 (big-endian).
> 
> Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

Yep, that worked well on the Fedora s390x builders as well
(unsurprisingly).

Thanks!

Junio C Hamano Oct. 21, 2019, 12:52 a.m. UTC | #3

SZEDER Gábor <szeder.dev@gmail.com> writes:

> The reason for that bogus value is that '--total's parameter is parsed
> via parse-options's OPT_INTEGER into a uint64_t variable [1]...
>
> Change the type of that variable from uint64_t to int, to match what
> parse-options expects; in the tests of the progress output we won't
> use values that don't fit into an int anyway.

OK, so when the call to start_progress() is made, the second
argument (i.e. "total" which now is int) is promoted to what the
callee expects, so there needs no other change.  Makes sense.

> [1] start_progress() expects the total number as an uint64_t, that's
>     why I chose the same type when declaring the variable holding the
>     value given on the command line.

I can sympathize, but I do not think it is worth inventing OPT_U64()
or adding "int total_i" whose value is assigned to "u64 total" after
parsing a command line arg with OPT_INTEGER() into the former.

Catching a pointer whose type is not "int*" passed at the third
position of OPT_INTGER() mechanically may be worth it, though.
Would Coccinelle be a suitable tool for that kind of thing?

>  int cmd__progress(int argc, const char **argv)
>  {
> -	uint64_t total = 0;
> +	int total = 0;
>  	const char *title;
>  	struct strbuf line = STRBUF_INIT;
>  	struct progress *progress;

Jeff King Oct. 21, 2019, 3:21 a.m. UTC | #4

On Mon, Oct 21, 2019 at 09:52:11AM +0900, Junio C Hamano wrote:

> I can sympathize, but I do not think it is worth inventing OPT_U64()
> or adding "int total_i" whose value is assigned to "u64 total" after
> parsing a command line arg with OPT_INTEGER() into the former.
> 
> Catching a pointer whose type is not "int*" passed at the third
> position of OPT_INTGER() mechanically may be worth it, though.
> Would Coccinelle be a suitable tool for that kind of thing?

I wondered if we could be a bit more clever with the definition of
"struct option". Something like:

diff --git a/parse-options.h b/parse-options.h
index 38a33a087e..99c7ff466d 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -126,7 +126,10 @@ struct option {
 	enum parse_opt_type type;
 	int short_name;
 	const char *long_name;
-	void *value;
+	union {
+		int *intp;
+		const char *strp;
+	} value;
 	const char *argh;
 	const char *help;

which would let the compiler complain about the type mismatch (of course
it can't help you if you assign to "intp" while trying to parse a
string).

Initializing the union from a compound literal becomes more painful,
but:

  1. That's mostly hidden behind OPT_INTEGER(), etc.

  2. I think we're OK with named initializers these days. I.e., I think:

        { OPTION_INTEGER, 'f', "--foo", { .intp = &foo } }

     would work OK.

I didn't even try compiling to see how painful the fallout might be,
though.

-Peff

Junio C Hamano Oct. 21, 2019, 8:51 a.m. UTC | #5

Jeff King <peff@peff.net> writes:

> I wondered if we could be a bit more clever with the definition of
> "struct option". Something like:
>
> diff --git a/parse-options.h b/parse-options.h
> index 38a33a087e..99c7ff466d 100644
> --- a/parse-options.h
> +++ b/parse-options.h
> @@ -126,7 +126,10 @@ struct option {
>  	enum parse_opt_type type;
>  	int short_name;
>  	const char *long_name;
> -	void *value;
> +	union {
> +		int *intp;
> +		const char *strp;
> +	} value;
>  	const char *argh;
>  	const char *help;
>  
>
> which would let the compiler complain about the type mismatch (of course
> it can't help you if you assign to "intp" while trying to parse a
> string).
>
> Initializing the union from a compound literal becomes more painful,
> but:
>
>   1. That's mostly hidden behind OPT_INTEGER(), etc.
>
>   2. I think we're OK with named initializers these days. I.e., I think:
>
>         { OPTION_INTEGER, 'f', "--foo", { .intp = &foo } }
>
>      would work OK.

The side that actually use .vale would need to change for obvious
reasons, which may be painful, but I agree it would have easily
prevented the regression from happening in the first place.

Thanks for a food for thought.

Jeff King Oct. 21, 2019, 6:49 p.m. UTC | #6

On Mon, Oct 21, 2019 at 05:51:52PM +0900, Junio C Hamano wrote:

> > -	void *value;
> > +	union {
> > +		int *intp;
> > +		const char *strp;
> > +	} value;
> [...]
> The side that actually use .vale would need to change for obvious
> reasons, which may be painful, but I agree it would have easily
> prevented the regression from happening in the first place.

I was curious just how painful, so here's what I found.

The conversion is indeed annoying. There are 330 sites that need
touched to handle the switch to a union (both declarations and places
that access the variables).

Most of the declarations are hidden by the OPT_*() macros, but there's a
fair bit of changes like this sprinkled around:

@@ -4952,13 +4952,13 @@ int apply_parse_options(int argc, const char **argv,
                        const char * const *apply_usage)
 {
        struct option builtin_apply_options[] = {
-               { OPTION_CALLBACK, 0, "exclude", state, N_("path"),
+               { OPTION_CALLBACK, 0, "exclude", { .voidp = state }, N_("path"),
                        N_("don't apply changes matching the given path"),
                        PARSE_OPT_NONEG, apply_option_parse_exclude },
-               { OPTION_CALLBACK, 0, "include", state, N_("path"),
+               { OPTION_CALLBACK, 0, "include", { .voidp = state }, N_("path"),
                        N_("apply changes matching the given path"),
                        PARSE_OPT_NONEG, apply_option_parse_include },
-               { OPTION_CALLBACK, 'p', NULL, state, N_("num"),
+               { OPTION_CALLBACK, 'p', NULL, { .voidp = state }, N_("num"),
                        N_("remove <num> leading slashes from traditional diff paths"),
                        0, apply_option_parse_p },
                OPT_BOOL(0, "no-add", &state->no_add,

which is strictly worse syntactically, and doesn't give us any type
safety (and won't ever, because parse-options is never going to learn
about "struct apply_state").

Likewise the access side gets slightly uglier, but not too bad:

@@ -4768,7 +4768,7 @@ static int apply_patch(struct apply_state *state,
 static int apply_option_parse_exclude(const struct option *opt,
                                      const char *arg, int unset)
 {
-       struct apply_state *state = opt->value;
+       struct apply_state *state = opt->value.voidp;

        BUG_ON_OPT_NEG(unset);

For things that actually use intp, I think the access side is fine (and
possibly even slightly nicer):

@@ -101,65 +101,65 @@ static enum parse_opt_result get_value(struct parse_opt_ctx_t *p,

        case OPTION_BIT:
                if (unset)
-                       *(int *)opt->value &= ~opt->defval;
+                       *opt->value.intp &= ~opt->defval;
                else
-                       *(int *)opt->value |= opt->defval;
+                       *opt->value.intp |= opt->defval;
                return 0;

The declaration side is mostly handled by OPT_INTEGER(), etc, but
hand-written ones still need to adjust as you'd expect:

@@ -298,7 +298,7 @@ static struct option builtin_add_options[] = {
        OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
        OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
        OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
-       { OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
+       { OPTION_CALLBACK, 0, "ignore-removal", { .intp = &addremove_explicit },
          NULL /* takes no arguments */,
          N_("ignore paths removed in the working tree (same as --no-all)"),
          PARSE_OPT_NOARG, ignore_removal_cb },

That's ugly, but at least we're getting some type safety out of it.

But here's where it gets tricky. In addition to catching any size
mismatches, this will also catch signedness problems. I.e., if we make
OPT_INTEGER() use "intp", then everybody passing in &unsigned_var now
gets a compiler warning. Which maybe is a good thing, I dunno. But it
triggers a lot of warnings. We probably ought to be using a "uintp" for
OPT_BIT(), etc, but that just complains about callers passing in signed
integers. ;)

So that's where I gave up. Converting between signed and unsigned
variables needs to be done very carefully, as there are often subtle
impacts (e.g., loop terminations). And because we have so many sign
issues already, compiling with "-Wsign-compare", etc, isn't likely to
help.

But if anybody wants to take a stab at it, the work I've done so far is
can be fetched from:

  https://github.com/peff/git jk/parseopt-intp-wip

-Peff

Junio C Hamano Oct. 23, 2019, 1:58 a.m. UTC | #7

Jeff King <peff@peff.net> writes:

> But here's where it gets tricky. In addition to catching any size
> mismatches, this will also catch signedness problems. I.e., if we make
> OPT_INTEGER() use "intp", then everybody passing in &unsigned_var now
> gets a compiler warning. Which maybe is a good thing, I dunno.

Hmph, true.  I'd agree with back-burnering it for now.  

Perhaps we'd fix the signedness issue one by one in a preparatory
series before converting the value field to a union, if we want to
pursue this idea further (in which I am mildly interested, by the
way), but it does sound like it should be given lower priority.

> So that's where I gave up. Converting between signed and unsigned
> variables needs to be done very carefully, as there are often subtle
> impacts (e.g., loop terminations). And because we have so many sign
> issues already, compiling with "-Wsign-compare", etc, isn't likely to
> help.

True.

Thanks.

SZEDER Gábor Oct. 24, 2019, 5:24 p.m. UTC | #8

On Mon, Oct 21, 2019 at 09:52:11AM +0900, Junio C Hamano wrote:
> I can sympathize, but I do not think it is worth inventing OPT_U64()
> or adding "int total_i" whose value is assigned to "u64 total" after
> parsing a command line arg with OPT_INTEGER() into the former.

I agree, we should wait for the first real use case where specifying a
larger-than-32bit integer actually makes sense in practice.

> Catching a pointer whose type is not "int*" passed at the third
> position of OPT_INTGER() mechanically may be worth it, though.
> Would Coccinelle be a suitable tool for that kind of thing?

The semantic patch below will do that, but this is one of those "I
don't have the slightest idea what I am doing" patches...

It's output looks like this when applied to an older version without
the big-endian fix upthread:

  potential error at apply.c:4982:26:
    passing variable 'state -> p_context' of type 'unsigned int' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at builtin/column.c:29:30:
    passing variable 'colopts' of type 'unsigned int' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at builtin/column.c:32:24:
    passing variable 'copts . nl' of type 'const char *' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at builtin/grep.c:884:38:
    passing variable 'opt . pre_context' of type 'unsigned' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at builtin/grep.c:886:37:
    passing variable 'opt . post_context' of type 'unsigned' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at builtin/upload-pack.c:28:29:
    passing variable 'opts . timeout' of type 'unsigned int' to OPT_INTEGER
    OPT_INTEGER expects an int
  potential error at t/helper/test-progress.c:42:27:
    passing variable 'total' of type 'uint64_t' to OPT_INTEGER
    OPT_INTEGER expects an int

  https://travis-ci.org/szeder/git/jobs/602423358#L436

I think most of them are harmless, like the number of context lines in
apply and grep, or the timeout in seconds in upload-pack.  So I think
the semantic patch should allow 'unsigned' and 'unsigned int' as well.

But note the one in 'builtin/column.c', where we pass a 'const char
*' to OPT_INTEGER.  That can't possibly be good; I suspect copy-paste
error and it should have been OPT_STRING.

 --- >8 ---

Subject: [PATCH] coccinelle: warn about passing a non-int to parse-options'
 OPT_INTEGER

parse-options' OPT_INTEGER wants to parse an integer argument into a
variable of type 'int', and passing e.g. an 'uint64_t' causes troubles
[1].

Add a Coccinelle semantic patch that checks the type of the variable
where the integer argument should be parsed into, and print an error
if that variable is not of type 'int'.

Note that this semantic patch won't result in a proper and applicable
patch, because who knows where that variable of the inappropriate type
is defined.  However, the printed error message will still cause our
static analysis CI jobs to fail, drawing our attention to the issue.

TODO: refusing an 'unsigned int' might be unnecessarily harsh...

[1] 11a803d861 (test-progress: fix test failures on big-endian
    systems, 2019-10-20)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/coccinelle/parse-options.cocci | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
 create mode 100644 contrib/coccinelle/parse-options.cocci

diff --git a/contrib/coccinelle/parse-options.cocci b/contrib/coccinelle/parse-options.cocci
new file mode 100644
index 0000000000..e0cddef421
--- /dev/null
+++ b/contrib/coccinelle/parse-options.cocci
@@ -0,0 +1,18 @@
+@ optint @
+identifier opts;
+type T;
+T var;
+expression SHORT, LONG, HELP;
+position p;
+@@
+struct option opts[] = { ..., OPT_INTEGER(SHORT, LONG, &var@p, HELP), ...};
+
+@ script:python @
+p << optint.p;
+var << optint.var;
+vartype << optint.T;
+@@
+if vartype != "int":
+	print "potential error at %s:%s:%s:" % (p[0].file, p[0].line, p[0].column)
+	print "  passing variable '%s' of type '%s' to OPT_INTEGER" % (var, vartype)
+	print "  OPT_INTEGER expects an int"

Jeff King Oct. 24, 2019, 5:55 p.m. UTC | #9

On Thu, Oct 24, 2019 at 07:24:42PM +0200, SZEDER Gábor wrote:

> On Mon, Oct 21, 2019 at 09:52:11AM +0900, Junio C Hamano wrote:
> > I can sympathize, but I do not think it is worth inventing OPT_U64()
> > or adding "int total_i" whose value is assigned to "u64 total" after
> > parsing a command line arg with OPT_INTEGER() into the former.
> 
> I agree, we should wait for the first real use case where specifying a
> larger-than-32bit integer actually makes sense in practice.

Anybody who wants to refer to a file or memory size would want that. We
have a few cases in pack-objects already, but they use OPT_MAGNITUDE()
instead. Which makes sense. Anything we expect to be that gigantic would
want to have the shorthand to say "4G" or whatever.

(As a side note, I notice that OPT_MAGNITUDE uses "unsigned long", which
probably needs to be adjusted for Windows. Dealing with all of the
void-pointer type-punning fallout from that will be fun. :) ).

> It's output looks like this when applied to an older version without
> the big-endian fix upthread:
> 
>   potential error at apply.c:4982:26:
>     passing variable 'state -> p_context' of type 'unsigned int' to OPT_INTEGER
>     OPT_INTEGER expects an int

Looks like this found a lot of the same issues my "intp" conversion
found. My recollection is that mine found more, but it might just have
been that the compiler's warning output is rather verbose. TBH, I didn't
carefully catalog them; as soon as I saw the problem, I went to bed in
disgust. ;)

> diff --git a/contrib/coccinelle/parse-options.cocci b/contrib/coccinelle/parse-options.cocci
> new file mode 100644
> index 0000000000..e0cddef421
> --- /dev/null
> +++ b/contrib/coccinelle/parse-options.cocci
> @@ -0,0 +1,18 @@
> +@ optint @
> +identifier opts;
> +type T;
> +T var;
> +expression SHORT, LONG, HELP;
> +position p;
> +@@
> +struct option opts[] = { ..., OPT_INTEGER(SHORT, LONG, &var@p, HELP), ...};
> +
> +@ script:python @
> +p << optint.p;
> +var << optint.var;
> +vartype << optint.T;
> +@@

I avoided relying on the python interpreter for previous cocci patches,
since some builds don't have it (IIRC, the one in Debian experimental
doesn't, but that one has not graduated even to "unstable" after several
years, so maybe it's not worth worrying about).

If we do start using python, we might want to revisit 4d168e742a
(coccinelle: use <...> for function exclusion, 2018-08-28), which
describes a faster python alternative in the commit message.

-Peff

test-progress: fix test failures on big-endian systems

Commit Message

Comments

Patch