diff mbox series

alias: detect loops in mixed execution mode

Message ID 20181018225739.28857-1-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series alias: detect loops in mixed execution mode | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 18, 2018, 10:57 p.m. UTC
Add detection for aliasing loops in cases where one of the aliases
re-invokes git as a shell command. This catches cases like:

    [alias]
    foo = !git bar
    bar = !git foo

Before this change running "git {foo,bar}" would create a
forkbomb. Now using the aliasing loop detection and call history
reporting added in 82f71d9a5a ("alias: show the call history when an
alias is looping", 2018-09-16) and c6d75bc17a ("alias: add support for
aliases of an alias", 2018-09-16) we'll instead report:

    fatal: alias loop detected: expansion of 'foo' does not terminate:
      foo <==
      bar ==>

Since the implementation carries the call history in an environment
variable, using the same sort of trick as used for -c (see
2b64fc894d ("pass "git -c foo=bar" params through environment",
2010-08-23) ). For example:

    [alias]
    one = two
    two = !git three
    three = four
    four = !git five
    five = two

Will, on "git one" report:

    fatal: alias loop detected: expansion of 'one' does not terminate:
      one
      two <==
      three
      four
      five ==>

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Implements what I suggested in
https://public-inbox.org/git/87o9dar9qc.fsf@evledraar.gmail.com/

 cache.h          |  1 +
 git.c            | 36 ++++++++++++++++++++++++++++++++++--
 t/t0001-init.sh  |  1 +
 t/t0014-alias.sh | 15 ++++++---------
 4 files changed, 42 insertions(+), 11 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 19, 2018, 8:28 a.m. UTC | #1
On Thu, Oct 18 2018, Ævar Arnfjörð Bjarmason wrote:

> +static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
> +{
> +	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
> +	struct strbuf **cmd_history, **ptr;
> +
> +	if (!old || !*old)
> +		return;
> +
> +	strbuf_addstr(env, old);
> +	strbuf_rtrim(env);
> +
> +	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
> +	for (ptr = cmd_history; *ptr; ptr++) {
> +		strbuf_rtrim(*ptr);
> +		string_list_append(cmd_list, (*ptr)->buf);
> +	}
> +	strbuf_list_free(cmd_history);
> +}
> +
> +static void add_cmd_history(struct strbuf *env, struct string_list *cmd_list,
> +			    const char *cmd)
> +{
> +	string_list_append(cmd_list, cmd);
> +	if (env->len)
> +		strbuf_addch(env, ' ');
> +	strbuf_addstr(env, cmd);
> +	setenv(COMMAND_HISTORY_ENVIRONMENT, env->buf, 1);
> +}
> +
>  static int run_argv(int *argcp, const char ***argv)
>  {
>  	int done_alias = 0;
> -	struct string_list cmd_list = STRING_LIST_INIT_NODUP;
> +	struct string_list cmd_list = STRING_LIST_INIT_DUP;
>  	struct string_list_item *seen;
> +	struct strbuf env = STRBUF_INIT;
>
> +	init_cmd_history(&env, &cmd_list);
>  	while (1) {
>  		/*
>  		 * If we tried alias and futzed with our environment,
> @@ -711,7 +742,7 @@ static int run_argv(int *argcp, const char ***argv)
>  			      " not terminate:%s"), cmd_list.items[0].string, sb.buf);
>  		}
>
> -		string_list_append(&cmd_list, *argv[0]);
> +		add_cmd_history(&env, &cmd_list, *argv[0]);
>
>  		/*
>  		 * It could be an alias -- this works around the insanity

Just to sanity check an assumption of mine: One thing I didn't do is use
sq_quote_buf() and sq_dequote_to_argv() like we do for
CONFIG_DATA_ENVIRONMENT. This is because in the case of config we need
to deal with:

    $ git config alias.cfgdump
    !env
    $ git -c x.y=z -c "foo.bar='baz'" cfgdump|grep baz
    GIT_CONFIG_PARAMETERS='x.y=z' 'foo.bar='\''baz'\'''

But in this case I don't see how a command-name would ever contain
whitespace. So we skip quoting and just delimit by space.

There's also nothing stopping you from doing e.g.:

    $ GIT_COMMAND_HISTORY='foo bar' ~/g/git/git --exec-path=$PWD one
    fatal: alias loop detected: expansion of 'foo' does not terminate:
      foo
      bar
      one
      two <==
      three
      four
      five ==>

Or even confuse the code by adding a whitespace at the beginning:

    $ GIT_COMMAND_HISTORY=' foo bar' ~/g/git/git --exec-path=$PWD one
    fatal: alias loop detected: expansion of '' does not terminate:

      foo
      bar
      one
      two <==
      three
      four
      five ==>

I thought none of this was worth dealing with. Worst case someone's
screwing with this, but I don't see how it would happen accidentally,
and even then we detect the infinite loop and just degrade to confusing
error messages because you decided to screw with git's GIT_* env vars.
Jeff King Oct. 19, 2018, 10:07 p.m. UTC | #2
On Thu, Oct 18, 2018 at 10:57:39PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Add detection for aliasing loops in cases where one of the aliases
> re-invokes git as a shell command. This catches cases like:
> 
>     [alias]
>     foo = !git bar
>     bar = !git foo
> 
> Before this change running "git {foo,bar}" would create a
> forkbomb. Now using the aliasing loop detection and call history
> reporting added in 82f71d9a5a ("alias: show the call history when an
> alias is looping", 2018-09-16) and c6d75bc17a ("alias: add support for
> aliases of an alias", 2018-09-16) we'll instead report:
> 
>     fatal: alias loop detected: expansion of 'foo' does not terminate:
>       foo <==
>       bar ==>

The regular alias expansion can generally assume that there's no
conditional recursion going on, because it's expanding everything
itself. But when we involve multiple processes, things get trickier.

For instance, I could do this:

  [alias]
  countdown = "!f() { echo \"$@\"; test \"$1\" -gt 0 && git countdown $(($1-1)); }; f"

which works now, but not with your patch.

Now obviously that's a silly toy example, but are there real cases which
might trigger this? Some plausible ones I can think of:

  - an alias which handles some special cases, then chains to itself for
    the simpler one (or to another alias or script, which ends up
    chaining back to the original)

  - an alias that runs a git command, which then spawns a hook or other
    user-controlled script, which incidentally uses that same alias

I'd guess this sort of thing is pretty rare. But I wonder if we're
crossing the line of trying to assume too much about what the user's
arbitrary code does.

A simple depth counter can limit the fork bomb, and with a high enough
depth would be unlikely to trigger a false positive. It could also
protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
git process hierarchy, there's a good chance you've found an infinite
loop in git itself).

> +static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
> +{
> +	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
> +	struct strbuf **cmd_history, **ptr;
> +
> +	if (!old || !*old)
> +		return;
> +
> +	strbuf_addstr(env, old);
> +	strbuf_rtrim(env);
> +
> +	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
> +	for (ptr = cmd_history; *ptr; ptr++) {
> +		strbuf_rtrim(*ptr);
> +		string_list_append(cmd_list, (*ptr)->buf);
> +	}
> +	strbuf_list_free(cmd_history);

Maybe string_list_split() would be a little simpler?

-Peff
Jeff King Oct. 19, 2018, 10:09 p.m. UTC | #3
On Fri, Oct 19, 2018 at 10:28:22AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > -		string_list_append(&cmd_list, *argv[0]);
> > +		add_cmd_history(&env, &cmd_list, *argv[0]);
> >
> >  		/*
> >  		 * It could be an alias -- this works around the insanity
> 
> Just to sanity check an assumption of mine: One thing I didn't do is use
> sq_quote_buf() and sq_dequote_to_argv() like we do for
> CONFIG_DATA_ENVIRONMENT. This is because in the case of config we need
> to deal with:
> 
>     $ git config alias.cfgdump
>     !env
>     $ git -c x.y=z -c "foo.bar='baz'" cfgdump|grep baz
>     GIT_CONFIG_PARAMETERS='x.y=z' 'foo.bar='\''baz'\'''
> 
> But in this case I don't see how a command-name would ever contain
> whitespace. So we skip quoting and just delimit by space.

Alias names cannot currently contain whitespace, because it's not
allowed in the key. However, we've discussed making the syntax
alias.<name>.command, which would then make it possible.

Whether anyone would use that is a different question, but hey,
apparently some people think "My Documents" is a good name for a
directory. ;)

-Peff
Ævar Arnfjörð Bjarmason Oct. 20, 2018, 10:52 a.m. UTC | #4
On Fri, Oct 19 2018, Jeff King wrote:

> On Fri, Oct 19, 2018 at 10:28:22AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > -		string_list_append(&cmd_list, *argv[0]);
>> > +		add_cmd_history(&env, &cmd_list, *argv[0]);
>> >
>> >  		/*
>> >  		 * It could be an alias -- this works around the insanity
>>
>> Just to sanity check an assumption of mine: One thing I didn't do is use
>> sq_quote_buf() and sq_dequote_to_argv() like we do for
>> CONFIG_DATA_ENVIRONMENT. This is because in the case of config we need
>> to deal with:
>>
>>     $ git config alias.cfgdump
>>     !env
>>     $ git -c x.y=z -c "foo.bar='baz'" cfgdump|grep baz
>>     GIT_CONFIG_PARAMETERS='x.y=z' 'foo.bar='\''baz'\'''
>>
>> But in this case I don't see how a command-name would ever contain
>> whitespace. So we skip quoting and just delimit by space.
>
> Alias names cannot currently contain whitespace, because it's not
> allowed in the key. However, we've discussed making the syntax
> alias.<name>.command, which would then make it possible.
>
> Whether anyone would use that is a different question, but hey,
> apparently some people think "My Documents" is a good name for a
> directory. ;)

I'll just leave this part as it is for now. If we ever have commands
with whitespace this'll be the least of our worries.
Ævar Arnfjörð Bjarmason Oct. 20, 2018, 11:14 a.m. UTC | #5
On Fri, Oct 19 2018, Jeff King wrote:

> On Thu, Oct 18, 2018 at 10:57:39PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Add detection for aliasing loops in cases where one of the aliases
>> re-invokes git as a shell command. This catches cases like:
>>
>>     [alias]
>>     foo = !git bar
>>     bar = !git foo
>>
>> Before this change running "git {foo,bar}" would create a
>> forkbomb. Now using the aliasing loop detection and call history
>> reporting added in 82f71d9a5a ("alias: show the call history when an
>> alias is looping", 2018-09-16) and c6d75bc17a ("alias: add support for
>> aliases of an alias", 2018-09-16) we'll instead report:
>>
>>     fatal: alias loop detected: expansion of 'foo' does not terminate:
>>       foo <==
>>       bar ==>
>
> The regular alias expansion can generally assume that there's no
> conditional recursion going on, because it's expanding everything
> itself. But when we involve multiple processes, things get trickier.
>
> For instance, I could do this:
>
>   [alias]
>   countdown = "!f() { echo \"$@\"; test \"$1\" -gt 0 && git countdown $(($1-1)); }; f"
>
> which works now, but not with your patch.
>
> Now obviously that's a silly toy example, but are there real cases which
> might trigger this? Some plausible ones I can think of:
>
>   - an alias which handles some special cases, then chains to itself for
>     the simpler one (or to another alias or script, which ends up
>     chaining back to the original)
>
>   - an alias that runs a git command, which then spawns a hook or other
>     user-controlled script, which incidentally uses that same alias
>
> I'd guess this sort of thing is pretty rare. But I wonder if we're
> crossing the line of trying to assume too much about what the user's
> arbitrary code does.
>
> A simple depth counter can limit the fork bomb, and with a high enough
> depth would be unlikely to trigger a false positive. It could also
> protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
> git process hierarchy, there's a good chance you've found an infinite
> loop in git itself).

I don't think this edge case you're describing is very plausible, and I
doubt it exists in the wild.

But going by my personal incredulity and a git release breaking code in
the wild would suck, so agree that I need to re-roll this to anticipate
that.

I don't have time to write it now, but what do you think about a version
of this where we introduce a core.recursionLimit setting, and by default
set it to "1" (for one recursion), so by default die just as we do now,
but with some advice() saying that we've bailed out early because this
looks crazy, but you can set it to e.g. "1000" if you think you know
what you're doing, or "0" for no limit.

The reason I'd like to do that is because I think it's *way* more common
to do this accidentally than intentionally, and by having a default
limit of 1000 we'd print a really long error message, or alternatively
would have to get into the mess of de-duplicating the callstack as we
print the error.

It also has the advantage that if people in the wild really use this
they'll chime in about this new annoying core.recursionLimit=1 setting,
at the cost of me having annoyed them all by breaking their working
code.

>> +static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
>> +{
>> +	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
>> +	struct strbuf **cmd_history, **ptr;
>> +
>> +	if (!old || !*old)
>> +		return;
>> +
>> +	strbuf_addstr(env, old);
>> +	strbuf_rtrim(env);
>> +
>> +	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
>> +	for (ptr = cmd_history; *ptr; ptr++) {
>> +		strbuf_rtrim(*ptr);
>> +		string_list_append(cmd_list, (*ptr)->buf);
>> +	}
>> +	strbuf_list_free(cmd_history);
>
> Maybe string_list_split() would be a little simpler?

Yeah looks like it. I cargo-culted this from elsewhere without looking
at that API. I'll look into it.
Jeff King Oct. 20, 2018, 6:58 p.m. UTC | #6
On Sat, Oct 20, 2018 at 01:14:28PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'd guess this sort of thing is pretty rare. But I wonder if we're
> > crossing the line of trying to assume too much about what the user's
> > arbitrary code does.
> >
> > A simple depth counter can limit the fork bomb, and with a high enough
> > depth would be unlikely to trigger a false positive. It could also
> > protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
> > git process hierarchy, there's a good chance you've found an infinite
> > loop in git itself).
> 
> I don't think this edge case you're describing is very plausible, and I
> doubt it exists in the wild.
> 
> But going by my personal incredulity and a git release breaking code in
> the wild would suck, so agree that I need to re-roll this to anticipate
> that.

I agree it's probably quite rare, if it exists at all. But I also wonder
how important looping alias protection is. It's also rare, and the
outcome is usually "gee, I wonder why this is taking so long? ^C".

At least that's my instinct. I don't remember having run into this at
all myself (though certainly I have written my fair share of infinite
loops in other systems, like bash aliases, and that is what happened).

> I don't have time to write it now, but what do you think about a version
> of this where we introduce a core.recursionLimit setting, and by default
> set it to "1" (for one recursion), so by default die just as we do now,
> but with some advice() saying that we've bailed out early because this
> looks crazy, but you can set it to e.g. "1000" if you think you know
> what you're doing, or "0" for no limit.
> 
> The reason I'd like to do that is because I think it's *way* more common
> to do this accidentally than intentionally, and by having a default
> limit of 1000 we'd print a really long error message, or alternatively
> would have to get into the mess of de-duplicating the callstack as we
> print the error.

Would we print a long error message? I'd assume that we'd just recurse
for longer and print one error message that says:

  fatal: woah, you're 1000-levels deep in Git commands!

That doesn't help the user find the recursion, but re-running with
GIT_TRACE=1 would make it pretty clear, I'd think.

> It also has the advantage that if people in the wild really use this
> they'll chime in about this new annoying core.recursionLimit=1 setting,
> at the cost of me having annoyed them all by breaking their working
> code.

Right, I'm not too happy about that annoyance. But it seems clear that I
think the loop protection is way less important than you do, so I'm
willing to sacrifice (or more accurately, risk the possibility of
sacrificing) a lot less for it. :)

I dunno. I doubt it is likely to help or hinder that many people either
way.

> >> +	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
> >> +	for (ptr = cmd_history; *ptr; ptr++) {
> >> +		strbuf_rtrim(*ptr);
> >> +		string_list_append(cmd_list, (*ptr)->buf);
> >> +	}
> >> +	strbuf_list_free(cmd_history);
> >
> > Maybe string_list_split() would be a little simpler?
> 
> Yeah looks like it. I cargo-culted this from elsewhere without looking
> at that API. I'll look into it.

I cheated before writing that and confirmed that it does seem to work. ;)

Here's the patch in case it is useful. IMHO we should be trying to get
rid of strbuf_split, because it's a pretty crappy interface.

diff --git a/git.c b/git.c
index cba242836c..9d1b66a1fa 100644
--- a/git.c
+++ b/git.c
@@ -675,7 +675,6 @@ static void execv_dashed_external(const char **argv)
 static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
 {
 	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
-	struct strbuf **cmd_history, **ptr;
 
 	if (!old || !*old)
 		return;
@@ -683,12 +682,7 @@ static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
 	strbuf_addstr(env, old);
 	strbuf_rtrim(env);
 
-	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
-	for (ptr = cmd_history; *ptr; ptr++) {
-		strbuf_rtrim(*ptr);
-		string_list_append(cmd_list, (*ptr)->buf);
-	}
-	strbuf_list_free(cmd_history);
+	string_list_split(cmd_list, env->buf, ' ', -1);
 }
 
 static void add_cmd_history(struct strbuf *env, struct string_list *cmd_list,

-Peff
Ævar Arnfjörð Bjarmason Oct. 20, 2018, 7:18 p.m. UTC | #7
On Sat, Oct 20 2018, Jeff King wrote:

> On Sat, Oct 20, 2018 at 01:14:28PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I'd guess this sort of thing is pretty rare. But I wonder if we're
>> > crossing the line of trying to assume too much about what the user's
>> > arbitrary code does.
>> >
>> > A simple depth counter can limit the fork bomb, and with a high enough
>> > depth would be unlikely to trigger a false positive. It could also
>> > protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
>> > git process hierarchy, there's a good chance you've found an infinite
>> > loop in git itself).
>>
>> I don't think this edge case you're describing is very plausible, and I
>> doubt it exists in the wild.
>>
>> But going by my personal incredulity and a git release breaking code in
>> the wild would suck, so agree that I need to re-roll this to anticipate
>> that.
>
> I agree it's probably quite rare, if it exists at all. But I also wonder
> how important looping alias protection is. It's also rare, and the
> outcome is usually "gee, I wonder why this is taking so long? ^C".
>
> At least that's my instinct. I don't remember having run into this at
> all myself (though certainly I have written my fair share of infinite
> loops in other systems, like bash aliases, and that is what happened).
>
>> I don't have time to write it now, but what do you think about a version
>> of this where we introduce a core.recursionLimit setting, and by default
>> set it to "1" (for one recursion), so by default die just as we do now,
>> but with some advice() saying that we've bailed out early because this
>> looks crazy, but you can set it to e.g. "1000" if you think you know
>> what you're doing, or "0" for no limit.
>>
>> The reason I'd like to do that is because I think it's *way* more common
>> to do this accidentally than intentionally, and by having a default
>> limit of 1000 we'd print a really long error message, or alternatively
>> would have to get into the mess of de-duplicating the callstack as we
>> print the error.
>
> Would we print a long error message? I'd assume that we'd just recurse
> for longer and print one error message that says:
>
>   fatal: woah, you're 1000-levels deep in Git commands!
>
> That doesn't help the user find the recursion, but re-running with
> GIT_TRACE=1 would make it pretty clear, I'd think.

Yeah the reason I'd like the core.recursionLimit=1 setting by default is
so that we can also print the same pretty and easy to grok error message
we do now for non-! aliases by default without spewing out ~3-4k lines
of mostly duplicate output (with a default limit of 1000).

We didn't support chained aliases at all before, so I think the odds
that people will run into this now will increase as they add "!" to
existing aliases, and I'd like to have git's UI friendly enough to tell
users what went wrong by default, and not have to resort to the likes of
GIT_TRACE=1 which really should be left to powerusers.

>> It also has the advantage that if people in the wild really use this
>> they'll chime in about this new annoying core.recursionLimit=1 setting,
>> at the cost of me having annoyed them all by breaking their working
>> code.
>
> Right, I'm not too happy about that annoyance. But it seems clear that I
> think the loop protection is way less important than you do, so I'm
> willing to sacrifice (or more accurately, risk the possibility of
> sacrificing) a lot less for it. :)
>
> I dunno. I doubt it is likely to help or hinder that many people either
> way.
>
>> >> +	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
>> >> +	for (ptr = cmd_history; *ptr; ptr++) {
>> >> +		strbuf_rtrim(*ptr);
>> >> +		string_list_append(cmd_list, (*ptr)->buf);
>> >> +	}
>> >> +	strbuf_list_free(cmd_history);
>> >
>> > Maybe string_list_split() would be a little simpler?
>>
>> Yeah looks like it. I cargo-culted this from elsewhere without looking
>> at that API. I'll look into it.
>
> I cheated before writing that and confirmed that it does seem to work. ;)
>
> Here's the patch in case it is useful. IMHO we should be trying to get
> rid of strbuf_split, because it's a pretty crappy interface.
>
> diff --git a/git.c b/git.c
> index cba242836c..9d1b66a1fa 100644
> --- a/git.c
> +++ b/git.c
> @@ -675,7 +675,6 @@ static void execv_dashed_external(const char **argv)
>  static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
>  {
>  	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
> -	struct strbuf **cmd_history, **ptr;
>
>  	if (!old || !*old)
>  		return;
> @@ -683,12 +682,7 @@ static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
>  	strbuf_addstr(env, old);
>  	strbuf_rtrim(env);
>
> -	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
> -	for (ptr = cmd_history; *ptr; ptr++) {
> -		strbuf_rtrim(*ptr);
> -		string_list_append(cmd_list, (*ptr)->buf);
> -	}
> -	strbuf_list_free(cmd_history);
> +	string_list_split(cmd_list, env->buf, ' ', -1);
>  }
>
>  static void add_cmd_history(struct strbuf *env, struct string_list *cmd_list,

Thanks! Will squash it.
Junio C Hamano Oct. 22, 2018, 1:23 a.m. UTC | #8
Jeff King <peff@peff.net> writes:

> I agree it's probably quite rare, if it exists at all. But I also wonder
> how important looping alias protection is. It's also rare, and the
> outcome is usually "gee, I wonder why this is taking so long? ^C".
>
> At least that's my instinct. I don't remember having run into this at
> all myself (though certainly I have written my fair share of infinite
> loops in other systems, like bash aliases, and that is what happened).

Yup, that instict is shared with me, and I tend to prefer something
based on a simple counter for that reason.

> Would we print a long error message? I'd assume that we'd just recurse
> for longer and print one error message that says:
>
>   fatal: woah, you're 1000-levels deep in Git commands!
>
> That doesn't help the user find the recursion, but re-running with
> GIT_TRACE=1 would make it pretty clear, I'd think.

Thanks.
Jeff King Oct. 22, 2018, 9:15 p.m. UTC | #9
On Sat, Oct 20, 2018 at 09:18:21PM +0200, Ævar Arnfjörð Bjarmason wrote:

> We didn't support chained aliases at all before, so I think the odds
> that people will run into this now will increase as they add "!" to
> existing aliases, and I'd like to have git's UI friendly enough to tell
> users what went wrong by default, and not have to resort to the likes of
> GIT_TRACE=1 which really should be left to powerusers.

It's true that non-! aliases couldn't recurse before, but couldn't "!"
ones always do so?

-Peff
Ævar Arnfjörð Bjarmason Oct. 22, 2018, 9:28 p.m. UTC | #10
On Mon, Oct 22 2018, Jeff King wrote:

> On Sat, Oct 20, 2018 at 09:18:21PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> We didn't support chained aliases at all before, so I think the odds
>> that people will run into this now will increase as they add "!" to
>> existing aliases, and I'd like to have git's UI friendly enough to tell
>> users what went wrong by default, and not have to resort to the likes of
>> GIT_TRACE=1 which really should be left to powerusers.
>
> It's true that non-! aliases couldn't recurse before, but couldn't "!"
> ones always do so?

Yes. I meant that maybe now it's a feature that works for that people
will start using it, and then convert some of that to !-aliases they
wouldn't otherwise have written. Just idle speculation...
Jeff King Oct. 26, 2018, 8:39 a.m. UTC | #11
On Sat, Oct 20, 2018 at 02:58:53PM -0400, Jeff King wrote:

> On Sat, Oct 20, 2018 at 01:14:28PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > > I'd guess this sort of thing is pretty rare. But I wonder if we're
> > > crossing the line of trying to assume too much about what the user's
> > > arbitrary code does.
> > >
> > > A simple depth counter can limit the fork bomb, and with a high enough
> > > depth would be unlikely to trigger a false positive. It could also
> > > protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
> > > git process hierarchy, there's a good chance you've found an infinite
> > > loop in git itself).
> > 
> > I don't think this edge case you're describing is very plausible, and I
> > doubt it exists in the wild.
> > 
> > But going by my personal incredulity and a git release breaking code in
> > the wild would suck, so agree that I need to re-roll this to anticipate
> > that.
> 
> I agree it's probably quite rare, if it exists at all. But I also wonder
> how important looping alias protection is. It's also rare, and the
> outcome is usually "gee, I wonder why this is taking so long? ^C".

Hmph. So I was speaking before purely hypothetically, but now that your
patch is in 'next', it is part of my daily build. And indeed, I hit a
false positive within 5 minutes of building it. ;)

I have an alias like this:

  $ git help dotgit
  'dotgit' is aliased to '!git rev-parse 2>/dev/null || cd ~/compile/git; git'

The idea being that I can run "git dotgit foo" to run "git foo" in the
current directory, or if it is not a git repository, in my checkout of
git.git.

I use it in two ways:

  - some of my aliases know about it themselves. So I have an alias "ll"
    that does:

      $ git help ll
      'll' is aliased to '!git dotgit --no-pager log --no-walk=unsorted --format='%h (%s, %ad)' --date=short'

    with the idea being to produce a nice annotation for a commit id.
    Using "git dotgit" there lets me just run it from any directory,
    since 99% of the time I am working on git.git anyway.

  - I have a vim command defined:

      command! -nargs=* Git :call MaybeInlineCommand("git dotgit <args>")

    so I can do ":Git foo" inside vim and it uses either the current
    repo (e.g., if I'm writing a commit message) or git.git (e.g., if
    I'm writing an email and didn't start in the repo).

So of course the alias expansion is something like (in older versions of
Git):

  1. "git dotgit ll" runs the dotgit alias, which sees that we need to go
     to the git.git checkout

  2. that runs "git ll"

  3. that runs "git dotgit log"; this second dotgit invocation sees we're
     already in a repository and is a noop

  4. git-log runs

With your patch, step 3 complains:

  $ git dotgit ll
  fatal: alias loop detected: expansion of 'dotgit' does not terminate:
  dotgit <==
  ll ==>

So I would really prefer a depth counter that can be set sufficiently
high to make this case work. ;)


As an aside, I got to experience this error message as an unsuspecting
user would. Unfortunately the output was not super helpful for figuring
out the cause. I scratched my head for a while before remembering that
"ll" uses "dotgit" explicitly (which was quite apparent when running
GIT_TRACE=1, or "git help ll"). I think showing the alias definitions in
the loop output would have made it much more obvious (if perhaps a bit
uglier).  E.g., something like:

  fatal: alias loop...
  ==> dotgit is aliased to '!git rev-parse ...'
  <== ll is aliased to '!git dotgit ...'

-Peff
Ævar Arnfjörð Bjarmason Oct. 26, 2018, 12:44 p.m. UTC | #12
On Fri, Oct 26 2018, Jeff King wrote:

> On Sat, Oct 20, 2018 at 02:58:53PM -0400, Jeff King wrote:
>
>> On Sat, Oct 20, 2018 at 01:14:28PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > > I'd guess this sort of thing is pretty rare. But I wonder if we're
>> > > crossing the line of trying to assume too much about what the user's
>> > > arbitrary code does.
>> > >
>> > > A simple depth counter can limit the fork bomb, and with a high enough
>> > > depth would be unlikely to trigger a false positive. It could also
>> > > protect non-aliases more reasonably, too (e.g., if you have a 1000-deep
>> > > git process hierarchy, there's a good chance you've found an infinite
>> > > loop in git itself).
>> >
>> > I don't think this edge case you're describing is very plausible, and I
>> > doubt it exists in the wild.
>> >
>> > But going by my personal incredulity and a git release breaking code in
>> > the wild would suck, so agree that I need to re-roll this to anticipate
>> > that.
>>
>> I agree it's probably quite rare, if it exists at all. But I also wonder
>> how important looping alias protection is. It's also rare, and the
>> outcome is usually "gee, I wonder why this is taking so long? ^C".
>
> Hmph. So I was speaking before purely hypothetically, but now that your
> patch is in 'next', it is part of my daily build. And indeed, I hit a
> false positive within 5 minutes of building it. ;)
>
> I have an alias like this:
>
>   $ git help dotgit
>   'dotgit' is aliased to '!git rev-parse 2>/dev/null || cd ~/compile/git; git'
>
> The idea being that I can run "git dotgit foo" to run "git foo" in the
> current directory, or if it is not a git repository, in my checkout of
> git.git.
>
> I use it in two ways:
>
>   - some of my aliases know about it themselves. So I have an alias "ll"
>     that does:
>
>       $ git help ll
>       'll' is aliased to '!git dotgit --no-pager log --no-walk=unsorted --format='%h (%s, %ad)' --date=short'
>
>     with the idea being to produce a nice annotation for a commit id.
>     Using "git dotgit" there lets me just run it from any directory,
>     since 99% of the time I am working on git.git anyway.
>
>   - I have a vim command defined:
>
>       command! -nargs=* Git :call MaybeInlineCommand("git dotgit <args>")
>
>     so I can do ":Git foo" inside vim and it uses either the current
>     repo (e.g., if I'm writing a commit message) or git.git (e.g., if
>     I'm writing an email and didn't start in the repo).
>
> So of course the alias expansion is something like (in older versions of
> Git):
>
>   1. "git dotgit ll" runs the dotgit alias, which sees that we need to go
>      to the git.git checkout
>
>   2. that runs "git ll"
>
>   3. that runs "git dotgit log"; this second dotgit invocation sees we're
>      already in a repository and is a noop
>
>   4. git-log runs
>
> With your patch, step 3 complains:
>
>   $ git dotgit ll
>   fatal: alias loop detected: expansion of 'dotgit' does not terminate:
>   dotgit <==
>   ll ==>
>
> So I would really prefer a depth counter that can be set sufficiently
> high to make this case work. ;)
>
>
> As an aside, I got to experience this error message as an unsuspecting
> user would. Unfortunately the output was not super helpful for figuring
> out the cause. I scratched my head for a while before remembering that
> "ll" uses "dotgit" explicitly (which was quite apparent when running
> GIT_TRACE=1, or "git help ll"). I think showing the alias definitions in
> the loop output would have made it much more obvious (if perhaps a bit
> uglier).  E.g., something like:
>
>   fatal: alias loop...
>   ==> dotgit is aliased to '!git rev-parse ...'
>   <== ll is aliased to '!git dotgit ...'
>
> -Peff

Yikes.

Junio: After your previous "What's cooking" in
<xmqq8t2u1nkh.fsf@gitster-ct.c.googlers.com> I sent
<87ftx0dg4r.fsf@evledraar.gmail.com>, but should have just replied to
"What's cooking".

I.e. I think this topic should just be ejected, I'll try to submit a
re-roll, but don't know if I have time in the next few days.

Can you please queue a "git revert" of it (or rewind next, but not sure
if you want to do that...).
Junio C Hamano Oct. 29, 2018, 3:44 a.m. UTC | #13
Jeff King <peff@peff.net> writes:

> Hmph. So I was speaking before purely hypothetically, but now that your
> patch is in 'next', it is part of my daily build. And indeed, I hit a
> false positive within 5 minutes of building it. ;)

Sounds like somebody is having not-so-fun-a-time having "I told you
so" moment.  The 'dotgit' thing already feels bit convoluted but I
would say that it is still within the realm of reasonable workflow
elements.

> ...
> With your patch, step 3 complains:
>
>   $ git dotgit ll
>   fatal: alias loop detected: expansion of 'dotgit' does not terminate:
>   dotgit <==
>   ll ==>
>
> So I would really prefer a depth counter that can be set sufficiently
> high to make this case work. ;)

Sounds like a concrete enough case to demonstrate why one-level deep
loop detector is not sufficient X-<.
Jeff King Oct. 29, 2018, 2:17 p.m. UTC | #14
On Mon, Oct 29, 2018 at 12:44:58PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Hmph. So I was speaking before purely hypothetically, but now that your
> > patch is in 'next', it is part of my daily build. And indeed, I hit a
> > false positive within 5 minutes of building it. ;)
> 
> Sounds like somebody is having not-so-fun-a-time having "I told you
> so" moment.  The 'dotgit' thing already feels bit convoluted but I
> would say that it is still within the realm of reasonable workflow
> elements.

To be clear, the "dotgit" thing _is_ weird and convoluted. And I imagine
that I push Git more than 99% of our users would. But I also won't be
surprised if somebody else has something similarly disgusting in the
wild. :)

TBH, I'm still not really sold on the idea of doing loop detection at
all in this case. But I can live with it if others feel strongly. What
makes the current patch so bad is that there's no escape hatch (i.e.,
even a depth counter with a default of "1" would have given me something
I could bump).

-Peff
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index d508f3d4f8..00cbd25f1c 100644
--- a/cache.h
+++ b/cache.h
@@ -478,6 +478,7 @@  static inline enum object_type object_type(unsigned int mode)
 #define TEMPLATE_DIR_ENVIRONMENT "GIT_TEMPLATE_DIR"
 #define CONFIG_ENVIRONMENT "GIT_CONFIG"
 #define CONFIG_DATA_ENVIRONMENT "GIT_CONFIG_PARAMETERS"
+#define COMMAND_HISTORY_ENVIRONMENT "GIT_COMMAND_HISTORY"
 #define EXEC_PATH_ENVIRONMENT "GIT_EXEC_PATH"
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
diff --git a/git.c b/git.c
index 5920f8019b..cba242836c 100644
--- a/git.c
+++ b/git.c
@@ -672,12 +672,43 @@  static void execv_dashed_external(const char **argv)
 		exit(128);
 }
 
+static void init_cmd_history(struct strbuf *env, struct string_list *cmd_list)
+{
+	const char *old = getenv(COMMAND_HISTORY_ENVIRONMENT);
+	struct strbuf **cmd_history, **ptr;
+
+	if (!old || !*old)
+		return;
+
+	strbuf_addstr(env, old);
+	strbuf_rtrim(env);
+
+	cmd_history = strbuf_split_buf(old, strlen(old), ' ', 0);
+	for (ptr = cmd_history; *ptr; ptr++) {
+		strbuf_rtrim(*ptr);
+		string_list_append(cmd_list, (*ptr)->buf);
+	}
+	strbuf_list_free(cmd_history);
+}
+
+static void add_cmd_history(struct strbuf *env, struct string_list *cmd_list,
+			    const char *cmd)
+{
+	string_list_append(cmd_list, cmd);
+	if (env->len)
+		strbuf_addch(env, ' ');
+	strbuf_addstr(env, cmd);
+	setenv(COMMAND_HISTORY_ENVIRONMENT, env->buf, 1);
+}
+
 static int run_argv(int *argcp, const char ***argv)
 {
 	int done_alias = 0;
-	struct string_list cmd_list = STRING_LIST_INIT_NODUP;
+	struct string_list cmd_list = STRING_LIST_INIT_DUP;
 	struct string_list_item *seen;
+	struct strbuf env = STRBUF_INIT;
 
+	init_cmd_history(&env, &cmd_list);
 	while (1) {
 		/*
 		 * If we tried alias and futzed with our environment,
@@ -711,7 +742,7 @@  static int run_argv(int *argcp, const char ***argv)
 			      " not terminate:%s"), cmd_list.items[0].string, sb.buf);
 		}
 
-		string_list_append(&cmd_list, *argv[0]);
+		add_cmd_history(&env, &cmd_list, *argv[0]);
 
 		/*
 		 * It could be an alias -- this works around the insanity
@@ -724,6 +755,7 @@  static int run_argv(int *argcp, const char ***argv)
 	}
 
 	string_list_clear(&cmd_list, 0);
+	strbuf_release(&env);
 
 	return done_alias;
 }
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 182da069f1..eb2ca8a172 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -93,6 +93,7 @@  test_expect_success 'No extra GIT_* on alias scripts' '
 		sed -n \
 			-e "/^GIT_PREFIX=/d" \
 			-e "/^GIT_TEXTDOMAINDIR=/d" \
+			-e "/^GIT_COMMAND_HISTORY=/d" \
 			-e "/^GIT_/s/=.*//p" |
 		sort
 	EOF
diff --git a/t/t0014-alias.sh b/t/t0014-alias.sh
index a070e645d7..9ed03a4a4f 100755
--- a/t/t0014-alias.sh
+++ b/t/t0014-alias.sh
@@ -27,14 +27,11 @@  test_expect_success 'looping aliases - internal execution' '
 	test_i18ngrep "^fatal: alias loop detected: expansion of" output
 '
 
-# This test is disabled until external loops are fixed, because would block
-# the test suite for a full minute.
-#
-#test_expect_failure 'looping aliases - mixed execution' '
-#	git config alias.loop-mixed-1 loop-mixed-2 &&
-#	git config alias.loop-mixed-2 "!git loop-mixed-1" &&
-#	test_must_fail git loop-mixed-1 2>output &&
-#	test_i18ngrep "^fatal: alias loop detected: expansion of" output
-#'
+test_expect_success 'looping aliases - mixed execution' '
+	git config alias.loop-mixed-1 loop-mixed-2 &&
+	git config alias.loop-mixed-2 "!git loop-mixed-1" &&
+	test_must_fail git loop-mixed-1 2>output &&
+	test_i18ngrep "^fatal: alias loop detected: expansion of" output
+'
 
 test_done