diff mbox series

[11/12] builtin/rebase: fix options.strategy memory lifecycle

Message ID 20210620151204.19260-12-andrzej@ahunt.org (mailing list archive)
State New, archived
Headers show
Series Fix all leaks in tests t0002-t0099: Part 2 | expand

Commit Message

Andrzej Hunt June 20, 2021, 3:12 p.m. UTC
From: Andrzej Hunt <ajrhunt@google.com>

This change:
- xstrdup()'s all string being used for replace_opts.strategy, to
  guarantee that replace_opts owns these strings. This is needed because
  sequencer_remove_state() will free replace_opts.strategy, and it's
  usually called as part of the usage of replace_opts.
- Removes xstrdup()'s being used to populate options.strategy in
  cmd_rebase(), which avoids leaking options.strategy, even in the
  case where strategy is never moved/copied into replace_opts.

These changes are needed because:
- We would always create a new string for options.strategy if we either
  get a strategy via options (OPT_STRING(...strategy...), or via
  GIT_TEST_MERGE_ALGORITHM.
- But only sometimes is this string copied into replace_opts - in which
  case it did get free()'d in sequencer_remove_state().
- The rest of the time, the newly allocated string would remain unused,
  causing a leak. But we can't just add a free because that can result
  in a double-free in those cases where replace_opts was populated.

An alternative approach would be to set options.strategy to NULL when
moving the pointer to replace_opts.strategy, combined with always
free()'ing options.strategy, but that seems like a more
complicated and wasteful approach.

This was first seen when running t0021 with LSAN, but t2012 helped catch
the fact that we can't just free(options.strategy) at the end of
cmd_rebase (as that can cause a double-free). LSAN output from t0021:

LSAN output from t0021:

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa71eb8 in xstrdup wrapper.c:29:14
    #2 0x61b1cc in cmd_rebase builtin/rebase.c:1779:22
    #3 0x4ce83e in run_builtin git.c:475:11
    #4 0x4ccafe in handle_builtin git.c:729:3
    #5 0x4cb01c in run_argv git.c:818:4
    #6 0x4cb01c in cmd_main git.c:949:19
    #7 0x6b3fad in main common-main.c:52:11
    #8 0x7f267b512349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
---
 builtin/rebase.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Phillip Wood June 20, 2021, 6:14 p.m. UTC | #1
Hi Andrzej

Thanks for working on removing memory leaks from git.

On 20/06/2021 16:12, andrzej@ahunt.org wrote:
> From: Andrzej Hunt <ajrhunt@google.com>
> 
> This change:
> - xstrdup()'s all string being used for replace_opts.strategy, to

I think you mean replay_opts rather than replace_opts.

>    guarantee that replace_opts owns these strings. This is needed because
>    sequencer_remove_state() will free replace_opts.strategy, and it's
>    usually called as part of the usage of replace_opts.
> - Removes xstrdup()'s being used to populate options.strategy in
>    cmd_rebase(), which avoids leaking options.strategy, even in the
>    case where strategy is never moved/copied into replace_opts.


> These changes are needed because:
> - We would always create a new string for options.strategy if we either
>    get a strategy via options (OPT_STRING(...strategy...), or via
>    GIT_TEST_MERGE_ALGORITHM.
> - But only sometimes is this string copied into replace_opts - in which
>    case it did get free()'d in sequencer_remove_state().
> - The rest of the time, the newly allocated string would remain unused,
>    causing a leak. But we can't just add a free because that can result
>    in a double-free in those cases where replace_opts was populated.
> 
> An alternative approach would be to set options.strategy to NULL when
> moving the pointer to replace_opts.strategy, combined with always
> free()'ing options.strategy, but that seems like a more
> complicated and wasteful approach.

read_basic_state() contains
	if (file_exists(state_dir_path("strategy", opts))) {
		strbuf_reset(&buf);
		if (!read_oneliner(&buf, state_dir_path("strategy", opts),
				   READ_ONELINER_WARN_MISSING))
			return -1;
		free(opts->strategy);
		opts->strategy = xstrdup(buf.buf);
	}

So we do try to free opts->strategy when reading the state from disc and 
we allocate a new string. I suspect that opts->strategy is actually NULL 
in when this function is called but I haven't checked. Given that we are 
allocating a copy above I think maybe your alternative approach of 
always freeing opts->strategy would be better.

Best Wishes

Phillip

> This was first seen when running t0021 with LSAN, but t2012 helped catch
> the fact that we can't just free(options.strategy) at the end of
> cmd_rebase (as that can cause a double-free). LSAN output from t0021:
> 
> LSAN output from t0021:
> 
> Direct leak of 4 byte(s) in 1 object(s) allocated from:
>      #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
>      #1 0xa71eb8 in xstrdup wrapper.c:29:14
>      #2 0x61b1cc in cmd_rebase builtin/rebase.c:1779:22
>      #3 0x4ce83e in run_builtin git.c:475:11
>      #4 0x4ccafe in handle_builtin git.c:729:3
>      #5 0x4cb01c in run_argv git.c:818:4
>      #6 0x4cb01c in cmd_main git.c:949:19
>      #7 0x6b3fad in main common-main.c:52:11
>      #8 0x7f267b512349 in __libc_start_main (/lib64/libc.so.6+0x24349)
> 
> SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
> 
> Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
> ---
>   builtin/rebase.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 12f093121d..9d81db0f3a 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -139,7 +139,7 @@ static struct replay_opts get_replay_opts(const struct rebase_options *opts)
>   	replay.ignore_date = opts->ignore_date;
>   	replay.gpg_sign = xstrdup_or_null(opts->gpg_sign_opt);
>   	if (opts->strategy)
> -		replay.strategy = opts->strategy;
> +		replay.strategy = xstrdup_or_null(opts->strategy);
>   	else if (!replay.strategy && replay.default_strategy) {
>   		replay.strategy = replay.default_strategy;
>   		replay.default_strategy = NULL;
> @@ -1723,7 +1723,6 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   	}
>   
>   	if (options.strategy) {
> -		options.strategy = xstrdup(options.strategy);
>   		switch (options.type) {
>   		case REBASE_APPLY:
>   			die(_("--strategy requires --merge or --interactive"));
> @@ -1776,7 +1775,7 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   	if (options.type == REBASE_MERGE &&
>   	    !options.strategy &&
>   	    getenv("GIT_TEST_MERGE_ALGORITHM"))
> -		options.strategy = xstrdup(getenv("GIT_TEST_MERGE_ALGORITHM"));
> +		options.strategy = getenv("GIT_TEST_MERGE_ALGORITHM");
>   
>   	switch (options.type) {
>   	case REBASE_MERGE:
>
Elijah Newren June 21, 2021, 9:39 p.m. UTC | #2
On Sun, Jun 20, 2021 at 11:29 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Andrzej
>
> Thanks for working on removing memory leaks from git.
>
> On 20/06/2021 16:12, andrzej@ahunt.org wrote:
> > From: Andrzej Hunt <ajrhunt@google.com>
> >
> > This change:
> > - xstrdup()'s all string being used for replace_opts.strategy, to
>
> I think you mean replay_opts rather than replace_opts.
>
> >    guarantee that replace_opts owns these strings. This is needed because
> >    sequencer_remove_state() will free replace_opts.strategy, and it's
> >    usually called as part of the usage of replace_opts.
> > - Removes xstrdup()'s being used to populate options.strategy in
> >    cmd_rebase(), which avoids leaking options.strategy, even in the
> >    case where strategy is never moved/copied into replace_opts.
>
>
> > These changes are needed because:
> > - We would always create a new string for options.strategy if we either
> >    get a strategy via options (OPT_STRING(...strategy...), or via
> >    GIT_TEST_MERGE_ALGORITHM.
> > - But only sometimes is this string copied into replace_opts - in which
> >    case it did get free()'d in sequencer_remove_state().
> > - The rest of the time, the newly allocated string would remain unused,
> >    causing a leak. But we can't just add a free because that can result
> >    in a double-free in those cases where replace_opts was populated.
> >
> > An alternative approach would be to set options.strategy to NULL when
> > moving the pointer to replace_opts.strategy, combined with always
> > free()'ing options.strategy, but that seems like a more
> > complicated and wasteful approach.
>
> read_basic_state() contains
>         if (file_exists(state_dir_path("strategy", opts))) {
>                 strbuf_reset(&buf);
>                 if (!read_oneliner(&buf, state_dir_path("strategy", opts),
>                                    READ_ONELINER_WARN_MISSING))
>                         return -1;
>                 free(opts->strategy);
>                 opts->strategy = xstrdup(buf.buf);
>         }
>
> So we do try to free opts->strategy when reading the state from disc and
> we allocate a new string. I suspect that opts->strategy is actually NULL
> in when this function is called but I haven't checked. Given that we are
> allocating a copy above I think maybe your alternative approach of
> always freeing opts->strategy would be better.

Good catches.  sequencer_remove_state() in sequencer.c also has a
free(opts->strategy) call.

To make things even more muddy, we have code like
    replay.strategy = replay.default_strategy;
or
    opts->strategy = opts->default_strategy;
which both will probably work really poorly with the calls to
    free(opts->default_strategy);
    free(opts->strategy);
from sequencer_remove_state().  I suspect we've got a few bugs here...
Phillip Wood June 22, 2021, 9:02 a.m. UTC | #3
Hi Elijah

On 21/06/2021 22:39, Elijah Newren wrote:
> On Sun, Jun 20, 2021 at 11:29 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi Andrzej
>>
>> Thanks for working on removing memory leaks from git.
>>
>> On 20/06/2021 16:12, andrzej@ahunt.org wrote:
>>> From: Andrzej Hunt <ajrhunt@google.com>
>>>
>>> This change:
>>> - xstrdup()'s all string being used for replace_opts.strategy, to
>>
>> I think you mean replay_opts rather than replace_opts.
>>
>>>     guarantee that replace_opts owns these strings. This is needed because
>>>     sequencer_remove_state() will free replace_opts.strategy, and it's
>>>     usually called as part of the usage of replace_opts.
>>> - Removes xstrdup()'s being used to populate options.strategy in
>>>     cmd_rebase(), which avoids leaking options.strategy, even in the
>>>     case where strategy is never moved/copied into replace_opts.
>>
>>
>>> These changes are needed because:
>>> - We would always create a new string for options.strategy if we either
>>>     get a strategy via options (OPT_STRING(...strategy...), or via
>>>     GIT_TEST_MERGE_ALGORITHM.
>>> - But only sometimes is this string copied into replace_opts - in which
>>>     case it did get free()'d in sequencer_remove_state().
>>> - The rest of the time, the newly allocated string would remain unused,
>>>     causing a leak. But we can't just add a free because that can result
>>>     in a double-free in those cases where replace_opts was populated.
>>>
>>> An alternative approach would be to set options.strategy to NULL when
>>> moving the pointer to replace_opts.strategy, combined with always
>>> free()'ing options.strategy, but that seems like a more
>>> complicated and wasteful approach.
>>
>> read_basic_state() contains
>>          if (file_exists(state_dir_path("strategy", opts))) {
>>                  strbuf_reset(&buf);
>>                  if (!read_oneliner(&buf, state_dir_path("strategy", opts),
>>                                     READ_ONELINER_WARN_MISSING))
>>                          return -1;
>>                  free(opts->strategy);
>>                  opts->strategy = xstrdup(buf.buf);
>>          }
>>
>> So we do try to free opts->strategy when reading the state from disc and
>> we allocate a new string. I suspect that opts->strategy is actually NULL
>> in when this function is called but I haven't checked. Given that we are
>> allocating a copy above I think maybe your alternative approach of
>> always freeing opts->strategy would be better.
> 
> Good catches.  sequencer_remove_state() in sequencer.c also has a
> free(opts->strategy) call.
> 
> To make things even more muddy, we have code like
>      replay.strategy = replay.default_strategy;
> or
>      opts->strategy = opts->default_strategy;
> which both will probably work really poorly with the calls to
>      free(opts->default_strategy);
>      free(opts->strategy);
> from sequencer_remove_state().  I suspect we've got a few bugs here...

It's not immediately obvious but I think those are actually safe. 
opts->default_strategy is allocated by sequencer_init_config() so it is 
correct to free it and when we assign it in rebase.c we do

	else if (!replay.strategy && replay.default_strategy) {
		replay.strategy = replay.default_strategy;
		replay.default_strategy = NULL;
	}

so there is no double free. There is similar code in builtin/revert.c 
which I think is where your other example came from. I think there is a 
leak in builtin/revert.c though

	if (!opts->strategy && opts->default_strategy) {
		opts->strategy = opts->default_strategy;
		opts->default_strategy = NULL;
	}

	/* do some other stuff */

	/* These option values will be free()d */
	opts->gpg_sign = xstrdup_or_null(opts->gpg_sign);
	opts->strategy = xstrdup_or_null(opts->strategy);

So we copy the default strategy, leaking the original copy from 
sequencer_init_options() if --strategy isn't given on the command line. 
I think it would be simple to fix this by making the copy earlier.

	if (!opts->strategy && opts->default_strategy) {
		opts->strategy = opts->default_strategy;
		opts->default_strategy = NULL;
	} else if (opts->strategy) {
	/* This option will be free()d in sequencer_remove_state() */
		opts->strategy = xstrdup(opts->strategy);
	}

I'm going offline for a week or so in a couple of days but I'll have 
look at making a proper patch when I get back.

Best Wishes

Phillip
Andrzej Hunt July 25, 2021, 1:03 p.m. UTC | #4
On 22/06/2021 11:02, Phillip Wood wrote:
> Hi Elijah
> 
> On 21/06/2021 22:39, Elijah Newren wrote:
>> On Sun, Jun 20, 2021 at 11:29 AM Phillip Wood 
>> <phillip.wood123@gmail.com> wrote:
>>>
>>> Hi Andrzej
>>>
>>> Thanks for working on removing memory leaks from git.
>>>
>>> On 20/06/2021 16:12, andrzej@ahunt.org wrote:
>>>> From: Andrzej Hunt <ajrhunt@google.com>
>>>>
>>>> This change:
>>>> - xstrdup()'s all string being used for replace_opts.strategy, to
>>>
>>> I think you mean replay_opts rather than replace_opts.
>>>
>>>>     guarantee that replace_opts owns these strings. This is needed 
>>>> because
>>>>     sequencer_remove_state() will free replace_opts.strategy, and it's
>>>>     usually called as part of the usage of replace_opts.
>>>> - Removes xstrdup()'s being used to populate options.strategy in
>>>>     cmd_rebase(), which avoids leaking options.strategy, even in the
>>>>     case where strategy is never moved/copied into replace_opts.
>>>
>>>
>>>> These changes are needed because:
>>>> - We would always create a new string for options.strategy if we either
>>>>     get a strategy via options (OPT_STRING(...strategy...), or via
>>>>     GIT_TEST_MERGE_ALGORITHM.
>>>> - But only sometimes is this string copied into replace_opts - in which
>>>>     case it did get free()'d in sequencer_remove_state().
>>>> - The rest of the time, the newly allocated string would remain unused,
>>>>     causing a leak. But we can't just add a free because that can 
>>>> result
>>>>     in a double-free in those cases where replace_opts was populated.
>>>>
>>>> An alternative approach would be to set options.strategy to NULL when
>>>> moving the pointer to replace_opts.strategy, combined with always
>>>> free()'ing options.strategy, but that seems like a more
>>>> complicated and wasteful approach.
>>>
>>> read_basic_state() contains
>>>          if (file_exists(state_dir_path("strategy", opts))) {
>>>                  strbuf_reset(&buf);
>>>                  if (!read_oneliner(&buf, state_dir_path("strategy", 
>>> opts),
>>>                                     READ_ONELINER_WARN_MISSING))
>>>                          return -1;
>>>                  free(opts->strategy);
>>>                  opts->strategy = xstrdup(buf.buf);
>>>          }
>>>
>>> So we do try to free opts->strategy when reading the state from disc and
>>> we allocate a new string. I suspect that opts->strategy is actually NULL
>>> in when this function is called but I haven't checked. 

Thank you for noticing this. I think you're right - running an ASAN 
build past the whole test suite also didn't catch any double-frees which 
mostly confirms that opts->strategy is indeed always NULL here. But 
that's not a good reason for taking the risk.

>>> Given that we are
>>> allocating a copy above I think maybe your alternative approach of
>>> always freeing opts->strategy would be better.

I will go down this route for V2. Although on further thought: instead 
of my original idea of moving the string to replay_opts (and NULL'ing 
out rebase_options->strategy), I think it's better to create a new copy 
when populating replay_opts. The move/NULL approach I suggested in V1 
happens to work OK, but I think it's non-obvious and could break if we 
ever wanted to use get_replay_opts() more than once - creating separate 
copies reduces the number of surprises.

>>
>> Good catches.  sequencer_remove_state() in sequencer.c also has a
>> free(opts->strategy) call.
>>
>> To make things even more muddy, we have code like
>>      replay.strategy = replay.default_strategy;
>> or
>>      opts->strategy = opts->default_strategy;
>> which both will probably work really poorly with the calls to
>>      free(opts->default_strategy);
>>      free(opts->strategy);
>> from sequencer_remove_state().  I suspect we've got a few bugs here...
> 
> It's not immediately obvious but I think those are actually safe. 
> opts->default_strategy is allocated by sequencer_init_config() so it is 
> correct to free it and when we assign it in rebase.c we do
> 
>      else if (!replay.strategy && replay.default_strategy) {
>          replay.strategy = replay.default_strategy;
>          replay.default_strategy = NULL;
>      }
> 
> so there is no double free.

As mentioned above, ASAN isn't catching any double-frees here (but I 
guess that depends on whether or not you trust the test suite to be 
reasonably testing all permutations).

But it's still good to take note of sequencer_remove_state() free'ing 
opts->strategy, because I almost did manage to add a double free when I 
added a free(options.strategy) to cmd_rebase without also xstrdup'ing 
strategy in get_replay_opts().

> There is similar code in builtin/revert.c 
> which I think is where your other example came from. I think there is a 
> leak in builtin/revert.c though
> 
>      if (!opts->strategy && opts->default_strategy) {
>          opts->strategy = opts->default_strategy;
>          opts->default_strategy = NULL;
>      }
> 
>      /* do some other stuff */
> 
>      /* These option values will be free()d */
>      opts->gpg_sign = xstrdup_or_null(opts->gpg_sign);
>      opts->strategy = xstrdup_or_null(opts->strategy);
> 
> So we copy the default strategy, leaking the original copy from 
> sequencer_init_options() if --strategy isn't given on the command line. 
> I think it would be simple to fix this by making the copy earlier.
> 
>      if (!opts->strategy && opts->default_strategy) {
>          opts->strategy = opts->default_strategy;
>          opts->default_strategy = NULL;
>      } else if (opts->strategy) {
>      /* This option will be free()d in sequencer_remove_state() */
>          opts->strategy = xstrdup(opts->strategy);
>      }
> 

Nice find. I'm noticing a lot of interesting leaks in git's options 
handling, and those leaks also tend to be the trickiest ones to fix (as 
my blunder in the original version of this patch demonstrates :) ).

ATB,

   Andrzej
Phillip Wood July 27, 2021, 7:34 p.m. UTC | #5
Hi Andrzej

On 25/07/2021 14:03, Andrzej Hunt wrote:
> [...]
>>>> Given that we are
>>>> allocating a copy above I think maybe your alternative approach of
>>>> always freeing opts->strategy would be better.
> 
> I will go down this route for V2. Although on further thought: instead 
> of my original idea of moving the string to replay_opts (and NULL'ing 
> out rebase_options->strategy), I think it's better to create a new copy 
> when populating replay_opts. The move/NULL approach I suggested in V1 
> happens to work OK, but I think it's non-obvious and could break if we 
> ever wanted to use get_replay_opts() more than once - creating separate 
> copies reduces the number of surprises.

Copying the string sounds like a good approach. I've looked at the V2 
patch and it looks fine to me.

Thanks

Phillip
diff mbox series

Patch

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 12f093121d..9d81db0f3a 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -139,7 +139,7 @@  static struct replay_opts get_replay_opts(const struct rebase_options *opts)
 	replay.ignore_date = opts->ignore_date;
 	replay.gpg_sign = xstrdup_or_null(opts->gpg_sign_opt);
 	if (opts->strategy)
-		replay.strategy = opts->strategy;
+		replay.strategy = xstrdup_or_null(opts->strategy);
 	else if (!replay.strategy && replay.default_strategy) {
 		replay.strategy = replay.default_strategy;
 		replay.default_strategy = NULL;
@@ -1723,7 +1723,6 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 	}
 
 	if (options.strategy) {
-		options.strategy = xstrdup(options.strategy);
 		switch (options.type) {
 		case REBASE_APPLY:
 			die(_("--strategy requires --merge or --interactive"));
@@ -1776,7 +1775,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 	if (options.type == REBASE_MERGE &&
 	    !options.strategy &&
 	    getenv("GIT_TEST_MERGE_ALGORITHM"))
-		options.strategy = xstrdup(getenv("GIT_TEST_MERGE_ALGORITHM"));
+		options.strategy = getenv("GIT_TEST_MERGE_ALGORITHM");
 
 	switch (options.type) {
 	case REBASE_MERGE: