diff mbox series

[v2,1/9] worktree: remove redundant NULL-ing of "cp.argv

Message ID patch-v2-1.9-9cc220ce5a3-20211123T115551Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series run-command API: get rid of "argv" and "env" | expand

Commit Message

Ævar Arnfjörð Bjarmason Nov. 23, 2021, 12:06 p.m. UTC
The clearing of "argv" was added in 7f44e3d1de0 (worktree: make setup
of new HEAD distinct from worktree population, 2015-07-17) when the
"cp" variable wasn't initialized. It hasn't been needed since
542aa25d974 (use CHILD_PROCESS_INIT to initialize automatic variables,
2016-08-05).

Let's remove it to make a later change that gets rid of the "argv"
member from "struct child_process" smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/worktree.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Eric Sunshine Nov. 23, 2021, 3:26 p.m. UTC | #1
On Tue, Nov 23, 2021 at 7:08 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> The clearing of "argv" was added in 7f44e3d1de0 (worktree: make setup
> of new HEAD distinct from worktree population, 2015-07-17) when the
> "cp" variable wasn't initialized. It hasn't been needed since
> 542aa25d974 (use CHILD_PROCESS_INIT to initialize automatic variables,
> 2016-08-05).
>
> Let's remove it to make a later change that gets rid of the "argv"
> member from "struct child_process" smaller.

Let me preface with the caveat that (due to lack of time) I haven't
dug into this deeply and I haven't been following changes to the
run_command() machinery closely, so what I say below may be
inaccurate, but...

Although it will be safe to drop these builtin/worktree.c assignments
when the series eventually removes the child_process::argv member, I
think that the commit message is misleading (or outright wrong), and
that this change so early in the series potentially breaks
git-worktree, leaving it in a state where it works only "by accident".

The reason that this code repeatedly clears `cp.argv` is that the `cp`
structure is reused for multiple run_command() invocations (which, in
retrospect, is pretty ugly since worktree.c is too familiar with the
internals of run-command). The reason that `cp.argv` needs to be
cleared between each invocation is due to this code from
start_command():

    if (!cmd->argv)
         cmd->argv = cmd->args.v;

worktree.c re-populates child_process::args between run_command()
invocations and needs to clear child_process::argv to ensure that the
latter gets re-initialized from child_process::args each time. The use
of CHILD_PROCESS_INIT does not change anything in regard to the
requirement; child_process::argv still needs to be cleared between
run_command() invocations.

As to why this change so early in the series potentially breaks
git-worktree: with the removal of these assignments,
child_process::argv now _always_ points at the _initial_ value of
child_process::args.v even though that vector gets cleared between
run_command() invocations. At best, following this change,
git-worktree is only working "by accident" if the underlying
child_process::args.v doesn't get reallocated between run_command()
invocations. Relying upon this "by accident" behavior feels rather
unsafe.

I think perhaps the simplest thing to do is merely to squash this
patch into the patch which ultimately removes the child_process::argv
member (and the removal of these lines from worktree.c probably
doesn't even need mention in the commit message -- or maybe just a
minor mention).

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index d22ece93e1a..7264a5b5de0 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -355,7 +355,6 @@ static int add_worktree(const char *path, const char *refname,
>                 goto done;
>
>         if (opts->checkout) {
> -               cp.argv = NULL;
>                 strvec_clear(&cp.args);
>                 strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
>                 if (opts->quiet)
> @@ -390,7 +389,6 @@ static int add_worktree(const char *path, const char *refname,
>                         cp.stdout_to_stderr = 1;
>                         cp.dir = path;
>                         cp.env = env;
> -                       cp.argv = NULL;
>                         cp.trace2_hook_name = "post-checkout";
>                         strvec_pushl(&cp.args, absolute_path(hook),
>                                      oid_to_hex(null_oid()),
> --
> 2.34.0.831.gd33babec0d1
Junio C Hamano Nov. 24, 2021, 1:54 a.m. UTC | #2
Eric Sunshine <sunshine@sunshineco.com> writes:

> ... At best, following this change,
> git-worktree is only working "by accident" if the underlying
> child_process::args.v doesn't get reallocated between run_command()
> invocations. Relying upon this "by accident" behavior feels rather
> unsafe.

Very true.  Relying on the "if argv is null, point it at args.v"
assignment at the very beginning of the start_command() function is
safe because by that time the reallocations have happened already
if needed.

The pattern with or without NULLing is

	initialize cp
	push to cp.args
	use cp

	/* cp.argv = NULL */
	strvec_clear(&cp.args);
	push to cp.args

and strvec_clear() frees the underying array, and the first push
will reallocates from NULL, so there is no guarantee that cp.argv
in the first use that used to be pointing at cp.args that has
already been freed is still valid.

Thanks for spotting this.  Has this patch ever been tested with
sanitizer?  Do we have gap in test coverage?
Eric Sunshine Nov. 24, 2021, 5:44 a.m. UTC | #3
On Tue, Nov 23, 2021 at 10:26:56AM -0500, Eric Sunshine wrote:
> I think perhaps the simplest thing to do is merely to squash this
> patch into the patch which ultimately removes the child_process::argv
> member (and the removal of these lines from worktree.c probably
> doesn't even need mention in the commit message -- or maybe just a
> minor mention).

On second thought, squashing this patch into the patch which
ultimately retires child_process::argv is probably not necessary. If
the patch is instead rewritten as below, then it prepares
builtin/worktree.c for eventual retirement of child_process::argv
without breaking git-worktree functionality.

(You could also extend this patch so it prepares for removal of
child_process::env, as well, or keep it minimal as I did here.)

-- >8 --
From: Eric Sunshine <sunshine@sunshineco.com>
Subject: [PATCH] worktree: stop being overly intimate with run_command()
 internals

add_worktree() reuses a `child_process` for three run_command()
invocations, but to do so, it has overly-intimate knowledge of
run-command.c internals. In particular, it knows that it must reset
child_process::argv to NULL for each subsequent invocation[*] in order
for start_command() to latch the newly-populated child_process::args for
each invocation, even though this behavior is not a part of the
documented API. Beyond having overly-intimate knowledge of run-command.c
internals, the reuse of one `child_process` for three run_command()
invocations smells like an unnecessary micro-optimization. Therefore,
stop sharing one `child_process` and instead use a new one for each
run_command() call.

[*] If child_process::argv is not reset to NULL, then subsequent
run_command() invocations will instead incorrectly access a dangling
pointer to freed memory which had been allocated by child_process::args
on the previous run. This is due to the following code in
start_command():

    if (!cmd->argv)
        cmd->argv = cmd->args.v;

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 builtin/worktree.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index d22ece93e1..9edd3e2829 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -355,8 +355,8 @@ static int add_worktree(const char *path, const char *refname,
 		goto done;
 
 	if (opts->checkout) {
-		cp.argv = NULL;
-		strvec_clear(&cp.args);
+		struct child_process cp = CHILD_PROCESS_INIT;
+		cp.git_cmd = 1;
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
@@ -385,12 +385,11 @@ static int add_worktree(const char *path, const char *refname,
 		const char *hook = find_hook("post-checkout");
 		if (hook) {
 			const char *env[] = { "GIT_DIR", "GIT_WORK_TREE", NULL };
-			cp.git_cmd = 0;
+			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.no_stdin = 1;
 			cp.stdout_to_stderr = 1;
 			cp.dir = path;
 			cp.env = env;
-			cp.argv = NULL;
 			cp.trace2_hook_name = "post-checkout";
 			strvec_pushl(&cp.args, absolute_path(hook),
 				     oid_to_hex(null_oid()),
Eric Sunshine Nov. 24, 2021, 6 a.m. UTC | #4
On Tue, Nov 23, 2021 at 8:54 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > ... At best, following this change,
> > git-worktree is only working "by accident" if the underlying
> > child_process::args.v doesn't get reallocated between run_command()
> > invocations. Relying upon this "by accident" behavior feels rather
> > unsafe.
>
> The pattern with or without NULLing is
>         /* cp.argv = NULL */
>         strvec_clear(&cp.args);
>         push to cp.args
>
> and strvec_clear() frees the underying array, and the first push
> will reallocates from NULL, so there is no guarantee that cp.argv
> in the first use that used to be pointing at cp.args that has
> already been freed is still valid.

Indeed, so this is even worse than I thought. I was somewhat pressed
for time when I wrote the review, thus didn't look at the
implementation of strvec_clear(), and incorrectly thought that it only
reset its `nr` member to 0 but kept the array allocated (much like
strbuf_reset()). That's why I thought it was only working "by
accident". But, as you point out, strvec_clear() does free its
allocated array (much like strbuf_release()), so -- with this patch
applied -- each subsequent run_command() invocation is _definitely_
accessing the dangling pointer in child_process::argv, and that
dangling pointer would (in the "best" case) be referencing the
original populated value of child_process::args, not the repopulated
value. So, even if it didn't crash outright, it would just re-run the
same command three times (unless by chance it reallocated the same
memory it had freed earlier.)

> Thanks for spotting this.  Has this patch ever been tested with
> sanitizer?  Do we have gap in test coverage?

The question about potential gap in test coverage is a good one.
Maybe, by chance it reallocated the same memory that it had earlier
freed, thus did indeed work "by accident". Another possibility is that
Ævar only ran the tests after applying the full patch series, in which
case this dangling-pointer bug would be gone, rather than running the
tests after each patch.
Eric Sunshine Nov. 24, 2021, 6:12 a.m. UTC | #5
On Wed, Nov 24, 2021 at 1:00 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, Nov 23, 2021 at 8:54 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Thanks for spotting this.  Has this patch ever been tested with
> > sanitizer?  Do we have gap in test coverage?
>
> The question about potential gap in test coverage is a good one.
> Maybe, by chance it reallocated the same memory that it had earlier
> freed, thus did indeed work "by accident". Another possibility is that
> Ævar only ran the tests after applying the full patch series, in which
> case this dangling-pointer bug would be gone, rather than running the
> tests after each patch.

As a follow-up, I just applied this patch alone and ran the tests, and
they do indeed fail as expected (on my macOS). In
t2400-worktree-add.sh, alone, 44 out of 71 tests failed, thus I don't
think there's a gap in test coverage. So, the most likely explanation
of how this problem slipped through is that Ævar only tested after
applying the full series, in which case the dangling pointer bug would
be gone, rather than testing after each patch.
diff mbox series

Patch

diff --git a/builtin/worktree.c b/builtin/worktree.c
index d22ece93e1a..7264a5b5de0 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -355,7 +355,6 @@  static int add_worktree(const char *path, const char *refname,
 		goto done;
 
 	if (opts->checkout) {
-		cp.argv = NULL;
 		strvec_clear(&cp.args);
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
@@ -390,7 +389,6 @@  static int add_worktree(const char *path, const char *refname,
 			cp.stdout_to_stderr = 1;
 			cp.dir = path;
 			cp.env = env;
-			cp.argv = NULL;
 			cp.trace2_hook_name = "post-checkout";
 			strvec_pushl(&cp.args, absolute_path(hook),
 				     oid_to_hex(null_oid()),