diff mbox series

parse-options: make parse_options_check() test-only

Message ID xmqqr17lphav.fsf_-_@gitster.g (mailing list archive)
State New, archived
Headers show
Series parse-options: make parse_options_check() test-only | expand

Commit Message

Junio C Hamano March 1, 2022, 8:08 p.m. UTC
The array of options given to the parse-options API is sanity
checked for reuse of a single-letter option for multiple entries and
other programmer mistakes by calling parse_options_check() from
parse_options_start().  This allows our developers to catch silly
mistakes early, but all callers of parse-options API pays this cost.
Once the set of options in an array is validated and passes this
check, until a programmer modifies the array, there is no way for it
to fail the check, which is wasteful.

Introduce the GIT_TEST_PARSE_OPTIONS_CHECK environment variable and
make the sanity check only when it is set to true.  Set it in
t/test-lib.sh so that our tests will continue to catch buggy options
arrays.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

    >  (2) Rethink if parse_options_check() can be made optional at
    >      runtime, which would (a) allow our test to enable it, and allow
    >      us to test all code paths that use parse_options() centrally,
    >      and (b) allow us to bypass the check while the end-user runs
    >      "git", to avoid overhead of checking the same option[] array,
    >      which does not change between invocations of "git", over and
    >      over again all over the world.
    >
    >      We may add the check back to parse_options_check() after doing
    >      the above.  There are already tons of "check sanity of what is
    >      inside option[]" in there, and it would be beneficial if we can
    >      separate out from parse_options_start() the sanity checking
    >      code, regardless of this topic.

    This looked too easy and there may be some pitfalls, but I am
    hoping that we will know soon enough by floating a weather
    balloon like this.

 parse-options.c | 12 +++++++++++-
 t/README        |  5 +++++
 t/test-lib.sh   |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason March 1, 2022, 9:57 p.m. UTC | #1
On Tue, Mar 01 2022, Junio C Hamano wrote:

> The array of options given to the parse-options API is sanity
> checked for reuse of a single-letter option for multiple entries and
> other programmer mistakes by calling parse_options_check() from
> parse_options_start().  This allows our developers to catch silly
> mistakes early, but all callers of parse-options API pays this cost.
> Once the set of options in an array is validated and passes this
> check, until a programmer modifies the array, there is no way for it
> to fail the check, which is wasteful.

That's not true due to the "git rev-parse --parseopt" interface. I'd be
happy to deprecate it, but I think the last time I brought it up you
were opposed, i.e. it's documented as plumbing in "git-rev-parse", and
it's easy to have it hit some of these BUG()'s.

I see the benifit of Johannes's suggestion of checking this once (but
with t0012-help.sh etc. we're nowhere near being able to do that).

Now this runs for the whole test suite, so our tests will have the the
same behavior.

So it's just an optimization? Isn't it premature, if you run
parse_options_check() in a loop how many checks/sec can we do? I haven't
tested, but I'm betting it's a *lot*.

So aren't we shaving microseconds off the runtime here?
Junio C Hamano March 1, 2022, 10:18 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Mar 01 2022, Junio C Hamano wrote:
>
>> The array of options given to the parse-options API is sanity
>> checked for reuse of a single-letter option for multiple entries and
>> other programmer mistakes by calling parse_options_check() from
>> parse_options_start().  This allows our developers to catch silly
>> mistakes early, but all callers of parse-options API pays this cost.
>> Once the set of options in an array is validated and passes this
>> check, until a programmer modifies the array, there is no way for it
>> to fail the check, which is wasteful.
>
> That's not true due to the "git rev-parse --parseopt" interface. I'd be

Meaning that a parse-options array can be fed by "rev-parse --parseopt"
and having the sanity check enabled does help the use case?  Even there,
I would say that once the script writer finishes developing the script
that uses "rev-parse --parseopt", setting the parseopt input in stone,
there is no need to check the same thing over and over again.  Am I
mistaken?  Does "rev-parse --parseopt" that is fed the same input
sometimes trigger the sanity check and sometimes not?

> I see the benifit of Johannes's suggestion of checking this once (but
> with t0012-help.sh etc. we're nowhere near being able to do that).
>
> Now this runs for the whole test suite, so our tests will have the the
> same behavior.

The code for sanity check is there ONLY to help those who develop
while they develop, and it is logical to enable it during our tests.
There is no reason to trigger the sanity check in the end-user
environment, no?

> So aren't we shaving microseconds off the runtime here?

No, the problem I have with the runtime check is more at the
conceptual level.  Those who remove assert() by setting _NDEBUG
would not be doing so to save nanoseconds, either.
Ævar Arnfjörð Bjarmason March 2, 2022, 10:52 a.m. UTC | #3
On Tue, Mar 01 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Tue, Mar 01 2022, Junio C Hamano wrote:
>>
>>> The array of options given to the parse-options API is sanity
>>> checked for reuse of a single-letter option for multiple entries and
>>> other programmer mistakes by calling parse_options_check() from
>>> parse_options_start().  This allows our developers to catch silly
>>> mistakes early, but all callers of parse-options API pays this cost.
>>> Once the set of options in an array is validated and passes this
>>> check, until a programmer modifies the array, there is no way for it
>>> to fail the check, which is wasteful.
>>
>> That's not true due to the "git rev-parse --parseopt" interface. I'd be
>
> Meaning that a parse-options array can be fed by "rev-parse --parseopt"
> and having the sanity check enabled does help the use case?  Even there,
> I would say that once the script writer finishes developing the script
> that uses "rev-parse --parseopt", setting the parseopt input in stone,
> there is no need to check the same thing over and over again.  Am I
> mistaken?  Does "rev-parse --parseopt" that is fed the same input
> sometimes trigger the sanity check and sometimes not?

If we're declaring that "git rev-parse --parseopt" is something that was
only ever intended for in-tree usage sure, that should hold true.

I.e. "git rev-parse" is documented as plumbing, and we document
--parseopt as a generic option parsing mechanism you can use in
shellscripts.

So out-of-tree users wouldn't guard against
GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could
e.g. segfault on some subsequent code if some of the sanity checks
aren't happening anymore.

No, I'd be quite happy if we declared that it's for our use only, and
could remove it when the last in-tree *.sh user went away. there's a bit
of complexity in parse_options() required only for its use....

>> I see the benifit of Johannes's suggestion of checking this once (but
>> with t0012-help.sh etc. we're nowhere near being able to do that).
>>
>> Now this runs for the whole test suite, so our tests will have the the
>> same behavior.
>
> The code for sanity check is there ONLY to help those who develop
> while they develop, and it is logical to enable it during our tests.
> There is no reason to trigger the sanity check in the end-user
> environment, no?

I don't see the benefit of skipping it. Your commit message mentions
"but all callers of parse-options API pays this cost". As a quick & dumb
perf test I tried:
	
	diff --git a/parse-options.c b/parse-options.c
	index 6e57744fd22..cabea35e8b1 100644
	--- a/parse-options.c
	+++ b/parse-options.c
	@@ -523,7 +523,10 @@ static void parse_options_start_1(struct parse_opt_ctx_t *ctx,
	        if ((flags & PARSE_OPT_ONE_SHOT) &&
	            (flags & PARSE_OPT_KEEP_ARGV0))
	                BUG("Can't keep argv0 if you don't have it");
	-       parse_options_check(options);
	+       while (1) {
	+               printf(".");
	+               parse_options_check(options);
	+       }
	 }
	 
	 void parse_options_start(struct parse_opt_ctx_t *ctx,

And:

    ./git [am|rebase] | pv >/dev/null

Get around 4MiB/s. I.e. we can do this check ~4 million times/sec on my
computer, with -O3, with -O0 -g it's ~3MiB/s.

So the performance cost is trivial & not worth worrying about.

>> So aren't we shaving microseconds off the runtime here?
>
> No, the problem I have with the runtime check is more at the
> conceptual level.  Those who remove assert() by setting _NDEBUG
> would not be doing so to save nanoseconds, either.

I think the trade-off of not having to worry about the runtime
v.s. "development build" checks is one we've done well with BUG(),
i.e. not to have it be an assert().

E.g. in this case we have parse_options_concat(), so you can dynamically
construct the options to be checked.

I happen to have looked in detail at all of that code in the past, and I
don't *think* it's doing something "actually dynamic". I.e. it should be
the same when the tests run and when git runs in the wild.

But having to know and check that when using or changing the API is just
more state to keep in your head.
Junio C Hamano March 2, 2022, 6:59 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Meaning that a parse-options array can be fed by "rev-parse --parseopt"
>> and having the sanity check enabled does help the use case?  Even there,
>> I would say that once the script writer finishes developing the script
>> that uses "rev-parse --parseopt", setting the parseopt input in stone,
>> there is no need to check the same thing over and over again.  Am I
>> mistaken?  Does "rev-parse --parseopt" that is fed the same input
>> sometimes trigger the sanity check and sometimes not?
>
> If we're declaring that "git rev-parse --parseopt" is something that was
> only ever intended for in-tree usage sure, that should hold true.

> So out-of-tree users wouldn't guard against
> GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could
> e.g. segfault on some subsequent code if some of the sanity checks
> aren't happening anymore.
> ...
> No, I'd be quite happy if we declared that it's for our use only, and
> could remove it when the last in-tree *.sh user went away. there's a bit
> of complexity in parse_options() required only for its use....

I do not see any need for such a declaration.  We are not changing
the behaviour of "git rev-parse --parseopt" plumbing command at all
for those who feed valid input to it.

"rev-parse --parseopt" users can keep using their scripts just the
same as before, debugging their scripts to catch silly mistakes like
duplicated short options may become slightly harder, but they still
have a way to ask for the same debugging support available.

Yes, I am saying that is perfectly fine, and both in-tree and
out-of-tree users have a way to reinstate the sanity checks.  I also
do not mind if your proposal were one of these:

 * introduce --parseopt-with-sanity-check to "rev-parse" and arrange
   the parse_options_check() call to be made when the command was
   invoked with it; or

 * introduce --parse-opt-without-sanity-check to "rev-parse", and
   arrange the parse_options_check() call to be still made when
   "--parse-opt" is used.  Those who finished developing their
   scripts can rewrite their --parse-opt to "without" version for
   conceptual cleanliness.

> So the performance cost is trivial & not worth worrying about.

I already said I am not worried about it, didn't I?  These numbers
do not matter in this discussion.
Ævar Arnfjörð Bjarmason March 2, 2022, 7:17 p.m. UTC | #5
On Wed, Mar 02 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> [...]
>> So the performance cost is trivial & not worth worrying about.
>
> I already said I am not worried about it, didn't I?  These numbers
> do not matter in this discussion.

Sorry, but I really don't see the point then.

You'd like to keep "git rev-parse --parseopt", but now if you feed bad
input to it you'll get worse error messages from it, and it's not for a
performance benefit then why? Why would we have worse error reporting
without any upside?

Another common case would be locally hacking a command that uses
parse_options(), having it do the wrong thing for some cryptic reason
we'd catch in parse_options_check().

Then eventually remember to turn on this GIT_TEST_* knob (i.e.  if
testing via the command-line/debugger instead of the test suite). I for
one do that a lot when working on the parse_options()-using commands
in-tree, if this land I'll probably remember to add this knob to my
.bashrc, but everyone else will find out the hard way...
diff mbox series

Patch

diff --git a/parse-options.c b/parse-options.c
index 6e57744fd2..02cfe3f2cd 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -439,6 +439,14 @@  static void check_typos(const char *arg, const struct option *options)
 	}
 }
 
+/*
+ * Check the sanity of contents of opts[] array to find programmer
+ * mistakes (like duplicated short options).
+ *
+ * This function is supposed to be no-op when it returns without
+ * dying, making a call from parse_options_start_1() to it optional
+ * in end-user builds.
+ */
 static void parse_options_check(const struct option *opts)
 {
 	int err = 0;
@@ -523,7 +531,9 @@  static void parse_options_start_1(struct parse_opt_ctx_t *ctx,
 	if ((flags & PARSE_OPT_ONE_SHOT) &&
 	    (flags & PARSE_OPT_KEEP_ARGV0))
 		BUG("Can't keep argv0 if you don't have it");
-	parse_options_check(options);
+
+	if (git_env_bool("GIT_TEST_PARSE_OPTIONS_CHECK", 0))
+		parse_options_check(options);
 }
 
 void parse_options_start(struct parse_opt_ctx_t *ctx,
diff --git a/t/README b/t/README
index f48e0542cd..b7285531f2 100644
--- a/t/README
+++ b/t/README
@@ -472,6 +472,11 @@  a test and then fails then the whole test run will abort. This can help to make
 sure the expected tests are executed and not silently skipped when their
 dependency breaks or is simply not present in a new environment.
 
+GIT_TEST_PARSE_OPTIONS_CHECK=<boolean>, when true, makes all options
+array passed to the parse-options API to be sanity checked.  This
+environment variable is set to true by test-lib.sh unless it is set.
+
+
 Naming Tests
 ------------
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index e4716b0b86..805f495fd4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -474,6 +474,9 @@  export GIT_DEFAULT_HASH
 GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"
 export GIT_TEST_MERGE_ALGORITHM
 
+: ${GIT_TEST_PARSE_OPTIONS_CHECK:=1}
+export GIT_TEST_PARSE_OPTIONS_CHECK
+
 # Tests using GIT_TRACE typically don't want <timestamp> <file>:<line> output
 GIT_TRACE_BARE=1
 export GIT_TRACE_BARE