diff mbox series

help: do not expect built-in commands to be hardlinked

Message ID pull.745.git.1602074589460.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series help: do not expect built-in commands to be hardlinked | expand

Commit Message

Johannes Schindelin via GitGitGadget Oct. 7, 2020, 12:43 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

When building with SKIP_DASHED_BUILT_INS=YesPlease, the built-in
commands are no longer present in the `PATH` as hardlinks to `git`.

As a consequence, `load_command_list()` needs to be taught to find the
names of the built-in commands from elsewhere.

This only affected the output of `git --list-cmds=main`, but not the
output of `git help -a` because the latter includes the built-in
commands by virtue of them being listed in command-list.txt.

The bug was detected via a patch series that turns the merge strategies
included in Git into built-in commands: `git merge -s help` relies on
`load_command_list()` to determine the list of available merge
strategies.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Fix the command list with SKIP_DASHED_BUILT_INS=YesPlease
    
    In a recent patch series
    [https://lore.kernel.org/git/20201005122646.27994-12-alban.gruin@gmail.com/#r]
    , the merge strategies were converted into built-ins, which is good.
    
    Together with the change where we stop hard-linking the built-in
    commands in CI builds, this broke t9902.199.
    
    The actual root cause is that git merge -s help relies on 
    load_command_list() to find all available Git commands, and that
    function had the long-standing bug that it expects the built-in commands
    to be available in the PATH.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-745%2Fdscho%2Falways-include-builtins-in-commands-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-745/dscho/always-include-builtins-in-commands-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/745

 git.c  | 13 +++++++++++++
 help.c |  2 ++
 help.h |  1 +
 3 files changed, 16 insertions(+)


base-commit: 8f7759d2c8c13716bfdb9ae602414fd987787e8d

Comments

Junio C Hamano Oct. 7, 2020, 5:21 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When building with SKIP_DASHED_BUILT_INS=YesPlease, the built-in
> commands are no longer present in the `PATH` as hardlinks to `git`.
>
> As a consequence, `load_command_list()` needs to be taught to find the
> names of the built-in commands from elsewhere.
>
> This only affected the output of `git --list-cmds=main`, but not the
> output of `git help -a` because the latter includes the built-in
> commands by virtue of them being listed in command-list.txt.
>
> The bug was detected via a patch series that turns the merge strategies
> included in Git into built-in commands: `git merge -s help` relies on
> `load_command_list()` to determine the list of available merge
> strategies.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Fix the command list with SKIP_DASHED_BUILT_INS=YesPlease
>     
>     In a recent patch series
>     [https://lore.kernel.org/git/20201005122646.27994-12-alban.gruin@gmail.com/#r]
>     , the merge strategies were converted into built-ins, which is good.
>     
>     Together with the change where we stop hard-linking the built-in
>     commands in CI builds, this broke t9902.199.
>     
>     The actual root cause is that git merge -s help relies on 
>     load_command_list() to find all available Git commands, and that
>     function had the long-standing bug that it expects the built-in commands
>     to be available in the PATH.
>

That is not a bug in "merge -s help" or "longstanding" at all.  It
has been a quite natural and long-standing expectation to find all
the merge strategies on PATH (after GIT_EXEC_PATH is added to it),
because that was the promise we gave to our users long time ago and
have kept.

The bug is in load_command_list() and it was introduced by the
recent SKIP_DASHED_BUILT_INS series.  We forgot to teach the
function that in the new world order, what we see on disk plus what
we have in the built-in table are the set of subcommands available
to us, and the rule that was valid in the old world order can no
longer be relied upon, and nobody noticed  the breakage while
developing or reviewing.

>  git.c  | 13 +++++++++++++
>  help.c |  2 ++
>  help.h |  1 +
>  3 files changed, 16 insertions(+)
>
> diff --git a/git.c b/git.c
> index d51fb5d2bf..a6224badce 100644
> --- a/git.c
> +++ b/git.c
> @@ -641,6 +641,19 @@ static void list_builtins(struct string_list *out, unsigned int exclude_option)
>  	}
>  }
>  
> +void load_builtin_commands(const char *prefix, struct cmdnames *cmds)
> +{
> +	const char *name;
> +	int i;
> +
> +	if (!skip_prefix(prefix, "git-", &prefix))
> +		return;

Do we want to explain that this is for dropping "gitk" and the like
in a comment near here?

> +	for (i = 0; i < ARRAY_SIZE(commands); i++)
> +		if (skip_prefix(commands[i].cmd, prefix, &name))
> +			add_cmdname(cmds, name, strlen(name));
> +}
> +
>  #ifdef STRIP_EXTENSION
>  static void strip_extension(const char **argv)
>  {
> diff --git a/help.c b/help.c
> index 4e2468a44d..919cbb9206 100644
> --- a/help.c
> +++ b/help.c
> @@ -263,6 +263,8 @@ void load_command_list(const char *prefix,
>  	const char *env_path = getenv("PATH");
>  	const char *exec_path = git_exec_path();
>  
> +	load_builtin_commands(prefix, main_cmds);
> +
>  	if (exec_path) {
>  		list_commands_in_dir(main_cmds, exec_path, prefix);
>  		QSORT(main_cmds->names, main_cmds->cnt, cmdname_compare);
> diff --git a/help.h b/help.h
> index dc02458855..5871e93ba2 100644
> --- a/help.h
> +++ b/help.h
> @@ -32,6 +32,7 @@ const char *help_unknown_cmd(const char *cmd);
>  void load_command_list(const char *prefix,
>  		       struct cmdnames *main_cmds,
>  		       struct cmdnames *other_cmds);
> +void load_builtin_commands(const char *prefix, struct cmdnames *cmds);
>  void add_cmdname(struct cmdnames *cmds, const char *name, int len);
>  /* Here we require that excludes is a sorted list. */
>  void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes);
>
> base-commit: 8f7759d2c8c13716bfdb9ae602414fd987787e8d
Junio C Hamano Oct. 7, 2020, 5:48 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> ... in the new world order, what we see on disk plus what
> we have in the built-in table are the set of subcommands available
> to us, and the rule that was valid in the old world order can no
> longer be relied upon, and nobody noticed  the breakage while
> developing or reviewing.

>> diff --git a/help.c b/help.c
>> index 4e2468a44d..919cbb9206 100644
>> --- a/help.c
>> +++ b/help.c
>> @@ -263,6 +263,8 @@ void load_command_list(const char *prefix,
>>  	const char *env_path = getenv("PATH");
>>  	const char *exec_path = git_exec_path();
>>  
>> +	load_builtin_commands(prefix, main_cmds);
>> +
>>  	if (exec_path) {
>>  		list_commands_in_dir(main_cmds, exec_path, prefix);
>>  		QSORT(main_cmds->names, main_cmds->cnt, cmdname_compare);

I wondered if we need, after this change, to worry about duplicates,
because some Git subcommands, even after they made into a built-in
and callable internally, must have on-disk footprint.

It turns out that after the post-context in this hunk we do make a
call to uniq(main_cmds) so it is fine.

This was unexpected to me, as we read only from a single directory
"exec_path" and the need to call uniq() in the old world order would
have meant that readdir in exec_path gave us duplicate entries.

In fact, the very original version of load_command_list() did not
have this unnecessary call to uniq().  It was introduced in 1f08e5ce
(Allow git help work without PATH set, 2008-08-28); perhaps Alex saw
12 years into the future and predicted that we would start needing
it ;-)

In any case, the patch is good thanks to that existing uniq() call.

>> diff --git a/help.h b/help.h
>> index dc02458855..5871e93ba2 100644
>> --- a/help.h
>> +++ b/help.h
>> @@ -32,6 +32,7 @@ const char *help_unknown_cmd(const char *cmd);
>>  void load_command_list(const char *prefix,
>>  		       struct cmdnames *main_cmds,
>>  		       struct cmdnames *other_cmds);
>> +void load_builtin_commands(const char *prefix, struct cmdnames *cmds);
>>  void add_cmdname(struct cmdnames *cmds, const char *name, int len);
>>  /* Here we require that excludes is a sorted list. */
>>  void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes);
>>
>> base-commit: 8f7759d2c8c13716bfdb9ae602414fd987787e8d
Johannes Schindelin Oct. 7, 2020, 9:43 p.m. UTC | #3
Hi Junio,

On Wed, 7 Oct 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > When building with SKIP_DASHED_BUILT_INS=YesPlease, the built-in
> > commands are no longer present in the `PATH` as hardlinks to `git`.
> >
> > As a consequence, `load_command_list()` needs to be taught to find the
> > names of the built-in commands from elsewhere.
> >
> > This only affected the output of `git --list-cmds=main`, but not the
> > output of `git help -a` because the latter includes the built-in
> > commands by virtue of them being listed in command-list.txt.
> >
> > The bug was detected via a patch series that turns the merge strategies
> > included in Git into built-in commands: `git merge -s help` relies on
> > `load_command_list()` to determine the list of available merge
> > strategies.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >     Fix the command list with SKIP_DASHED_BUILT_INS=YesPlease
> >
> >     In a recent patch series
> >     [https://lore.kernel.org/git/20201005122646.27994-12-alban.gruin@gmail.com/#r]
> >     , the merge strategies were converted into built-ins, which is good.
> >
> >     Together with the change where we stop hard-linking the built-in
> >     commands in CI builds, this broke t9902.199.
> >
> >     The actual root cause is that git merge -s help relies on
> >     load_command_list() to find all available Git commands, and that
> >     function had the long-standing bug that it expects the built-in commands
> >     to be available in the PATH.
> >
>
> That is not a bug in "merge -s help" or "longstanding" at all.  It
> has been a quite natural and long-standing expectation to find all
> the merge strategies on PATH (after GIT_EXEC_PATH is added to it),
> because that was the promise we gave to our users long time ago and
> have kept.

Sure, we promised to the outside world that those built-ins would always
be in the PATH, but we highly recommended against using dashed
invocations.

To me, that means that _internally_ we should have been more stringent
about how we do things ourselves.

In any case, your complaint isn't about the commit message, so I hope it
can advance?

> The bug is in load_command_list() and it was introduced by the
> recent SKIP_DASHED_BUILT_INS series.  We forgot to teach the
> function that in the new world order, what we see on disk plus what
> we have in the built-in table are the set of subcommands available
> to us, and the rule that was valid in the old world order can no
> longer be relied upon, and nobody noticed  the breakage while
> developing or reviewing.
>
> >  git.c  | 13 +++++++++++++
> >  help.c |  2 ++
> >  help.h |  1 +
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/git.c b/git.c
> > index d51fb5d2bf..a6224badce 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -641,6 +641,19 @@ static void list_builtins(struct string_list *out, unsigned int exclude_option)
> >  	}
> >  }
> >
> > +void load_builtin_commands(const char *prefix, struct cmdnames *cmds)
> > +{
> > +	const char *name;
> > +	int i;
> > +
> > +	if (!skip_prefix(prefix, "git-", &prefix))
> > +		return;
>
> Do we want to explain that this is for dropping "gitk" and the like
> in a comment near here?

I guess I have to explain this, as it is too easy to mistake this
`skip_prefix()` to work on the actual command names rather than about the
`prefix` parameter.

The `commands[]` array in `git.c` stores only the command names, but
`load_command_list()` is called with the prefix `git-` or `git-merge-`.
Therefore, `load_builtin_commands()` skips the prefix `git-` *from the
`prefix` itself*.

I'll send the next iteration shortly.

Ciao,
Dscho

>
> > +	for (i = 0; i < ARRAY_SIZE(commands); i++)
> > +		if (skip_prefix(commands[i].cmd, prefix, &name))
> > +			add_cmdname(cmds, name, strlen(name));
> > +}
> > +
> >  #ifdef STRIP_EXTENSION
> >  static void strip_extension(const char **argv)
> >  {
> > diff --git a/help.c b/help.c
> > index 4e2468a44d..919cbb9206 100644
> > --- a/help.c
> > +++ b/help.c
> > @@ -263,6 +263,8 @@ void load_command_list(const char *prefix,
> >  	const char *env_path = getenv("PATH");
> >  	const char *exec_path = git_exec_path();
> >
> > +	load_builtin_commands(prefix, main_cmds);
> > +
> >  	if (exec_path) {
> >  		list_commands_in_dir(main_cmds, exec_path, prefix);
> >  		QSORT(main_cmds->names, main_cmds->cnt, cmdname_compare);
> > diff --git a/help.h b/help.h
> > index dc02458855..5871e93ba2 100644
> > --- a/help.h
> > +++ b/help.h
> > @@ -32,6 +32,7 @@ const char *help_unknown_cmd(const char *cmd);
> >  void load_command_list(const char *prefix,
> >  		       struct cmdnames *main_cmds,
> >  		       struct cmdnames *other_cmds);
> > +void load_builtin_commands(const char *prefix, struct cmdnames *cmds);
> >  void add_cmdname(struct cmdnames *cmds, const char *name, int len);
> >  /* Here we require that excludes is a sorted list. */
> >  void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes);
> >
> > base-commit: 8f7759d2c8c13716bfdb9ae602414fd987787e8d
>
Johannes Schindelin Oct. 7, 2020, 9:49 p.m. UTC | #4
Hi Junio,

On Wed, 7 Oct 2020, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > ... in the new world order, what we see on disk plus what
> > we have in the built-in table are the set of subcommands available
> > to us, and the rule that was valid in the old world order can no
> > longer be relied upon, and nobody noticed  the breakage while
> > developing or reviewing.
>
> >> diff --git a/help.c b/help.c
> >> index 4e2468a44d..919cbb9206 100644
> >> --- a/help.c
> >> +++ b/help.c
> >> @@ -263,6 +263,8 @@ void load_command_list(const char *prefix,
> >>  	const char *env_path = getenv("PATH");
> >>  	const char *exec_path = git_exec_path();
> >>
> >> +	load_builtin_commands(prefix, main_cmds);
> >> +
> >>  	if (exec_path) {
> >>  		list_commands_in_dir(main_cmds, exec_path, prefix);
> >>  		QSORT(main_cmds->names, main_cmds->cnt, cmdname_compare);
>
> I wondered if we need, after this change, to worry about duplicates,
> because some Git subcommands, even after they made into a built-in
> and callable internally, must have on-disk footprint.
>
> It turns out that after the post-context in this hunk we do make a
> call to uniq(main_cmds) so it is fine.
>
> This was unexpected to me, as we read only from a single directory
> "exec_path" and the need to call uniq() in the old world order would
> have meant that readdir in exec_path gave us duplicate entries.
>
> In fact, the very original version of load_command_list() did not
> have this unnecessary call to uniq().  It was introduced in 1f08e5ce
> (Allow git help work without PATH set, 2008-08-28); perhaps Alex saw
> 12 years into the future and predicted that we would start needing
> it ;-)
>
> In any case, the patch is good thanks to that existing uniq() call.

Yep, I was fully prepared to add that `uniq()` call and was surprised to
find it. I guess it was "for good measure" because the same commit also
added the same `qsort(); uniq()` combo another time, a little further down
in that function.

Now, what I would have expected you to say when you found the `uniq()`
function is: Johannes, why don't you call `QSORT(); uniq()` after the call
to `load_builtin_commands()`? After all, `exec_path` and `env_path` might
both be `NULL`...

Well, the answer to that question is _not_ "but without `env_path` nothing
works anyway" even if that would be pretty valid. The answer is that the
`commands[]` list in `git.c` is already sorted alphabetically.

Thanks,
Dscho

>
> >> diff --git a/help.h b/help.h
> >> index dc02458855..5871e93ba2 100644
> >> --- a/help.h
> >> +++ b/help.h
> >> @@ -32,6 +32,7 @@ const char *help_unknown_cmd(const char *cmd);
> >>  void load_command_list(const char *prefix,
> >>  		       struct cmdnames *main_cmds,
> >>  		       struct cmdnames *other_cmds);
> >> +void load_builtin_commands(const char *prefix, struct cmdnames *cmds);
> >>  void add_cmdname(struct cmdnames *cmds, const char *name, int len);
> >>  /* Here we require that excludes is a sorted list. */
> >>  void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes);
> >>
> >> base-commit: 8f7759d2c8c13716bfdb9ae602414fd987787e8d
>
Junio C Hamano Oct. 7, 2020, 10:24 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Now, what I would have expected you to say when you found the `uniq()`
> function is: Johannes, why don't you call `QSORT(); uniq()` after the call
> to `load_builtin_commands()`? After all, `exec_path` and `env_path` might
> both be `NULL`...

Nah, you are expecting too much out of me.  I didn't ask because I
knew we didn't need to, and I didn't particularly care if you lucked
out or if you had the same understanding as I had how you arrived at
the right solution ;-)
diff mbox series

Patch

diff --git a/git.c b/git.c
index d51fb5d2bf..a6224badce 100644
--- a/git.c
+++ b/git.c
@@ -641,6 +641,19 @@  static void list_builtins(struct string_list *out, unsigned int exclude_option)
 	}
 }
 
+void load_builtin_commands(const char *prefix, struct cmdnames *cmds)
+{
+	const char *name;
+	int i;
+
+	if (!skip_prefix(prefix, "git-", &prefix))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(commands); i++)
+		if (skip_prefix(commands[i].cmd, prefix, &name))
+			add_cmdname(cmds, name, strlen(name));
+}
+
 #ifdef STRIP_EXTENSION
 static void strip_extension(const char **argv)
 {
diff --git a/help.c b/help.c
index 4e2468a44d..919cbb9206 100644
--- a/help.c
+++ b/help.c
@@ -263,6 +263,8 @@  void load_command_list(const char *prefix,
 	const char *env_path = getenv("PATH");
 	const char *exec_path = git_exec_path();
 
+	load_builtin_commands(prefix, main_cmds);
+
 	if (exec_path) {
 		list_commands_in_dir(main_cmds, exec_path, prefix);
 		QSORT(main_cmds->names, main_cmds->cnt, cmdname_compare);
diff --git a/help.h b/help.h
index dc02458855..5871e93ba2 100644
--- a/help.h
+++ b/help.h
@@ -32,6 +32,7 @@  const char *help_unknown_cmd(const char *cmd);
 void load_command_list(const char *prefix,
 		       struct cmdnames *main_cmds,
 		       struct cmdnames *other_cmds);
+void load_builtin_commands(const char *prefix, struct cmdnames *cmds);
 void add_cmdname(struct cmdnames *cmds, const char *name, int len);
 /* Here we require that excludes is a sorted list. */
 void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes);