diff mbox series

Re* [PATCH v5] describe: refresh the index when 'broken' flag is used

Message ID xmqqikxv4t1v.fsf_-_@gitster.g (mailing list archive)
State New
Headers show
Series Re* [PATCH v5] describe: refresh the index when 'broken' flag is used | expand

Commit Message

Junio C Hamano June 26, 2024, 3:34 p.m. UTC
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> On 26/06/24 17:00, Karthik Nayak wrote:
>> Not worth a reroll, but you don't have to create file.new twice.
>
> Actually, now that I think of it, those two were better off being
> separate tests.  It might so happen the first call to describe
> refreshes the index, due to which the second call with the --broken
> option does not bug-out in the way it would if the command was run by
> itself. Having them separate would give them enough isolation so that
> previous command does not interfere with the later.

Good thinking.  Yes, we may end up having a few commands that are
duplicated in these two tests (for setting the stage up, for
example), but it would be better to test these two separately.

>>> Range-diff against v4:
>>> 1:  1da5fa48d9 ! 1:  52f590b70f describe: refresh the index when 'broken' flag is used
>>>      @@ builtin/describe.c: int cmd_describe(int argc, const char **argv, const char *pr
>>>       +			cp.git_cmd = 1;
>>>       +			cp.no_stdin = 1;
>>>       +			cp.no_stdout = 1;
>>>      -+			run_command(&cp);
>>>      -+			strvec_clear(&cp.args);
>>>      ++			if (run_command(&cp))
>>>      ++				child_process_clear(&cp);
>>>       +
>>>        			strvec_pushv(&cp.args, diff_index_args);
>>>        			cp.git_cmd = 1;
>>> --
>>> 2.45.2.606.g9005149a4a.dirty
>> Other than this, this looks good to me.
> I am not sure if I follow this one.  Am I expected to not share the
> struct child_process between the two sub-process calls?

Without reusing and instead of using two, we do not have to worry
about the reusablility of the child_process structure in the first
place, which is a huge plus, but in the longer run we should make
sure it is safe to reuse child_process and document the safe way to
reuse it (run-command.h does document a way to use it once and then
clean it up, but the "clean-up" extends only to not leaking
resources after we are done---it does not guarantee that it is OK to
reuse it).

I think with the updated "we clear cp ourselves if run_command() fails",
it should be safe to reuse, but it probably is even safer to do
something like this:

	... the first run ...
	if (run_command(&cp))
		child_process_clear(&cp);

	child_process_init(&cp);
	
        ... setup for the second run ...
	strvec_pushv(&cp.args, diff_index_args);
	cp.git_cmd = 1;
	... full set-up without relying on anything done earlier ...

The extra child_process_init() call may serve as an extra
documentation that we are reusing the same struct here (we often do
"git grep" for use of a specific API function before tree wide code
clean-up, and child_process_init() would be a good key to look for).

... goes and looks ...

Oh, I found an interesting one.  builtin/fsck.c:cmd_fsck() does this
in a loop:

	struct child_process verify = CHILD_PROCESS_INIT;

	... setup ...
	for (... loop ...) {
		child_process_init(&verify);
		... set up various .members of verify struct ...
		strvec_pushl(&verify.args, ... command line ...);
		if (run_command(&verify))
			errors_found |= ...;
	}

This code clearly assumes that it is safe to reuse the child_process
structure after you run_command() and let it clean-up if you do
another child_process_init().  And I think that is a sensible
assumption.

The code in builtin/fsck.c:cmd_fsck() is buggy when run_command()
fails, I think.  Without doing child_process_clear() there, doesn't
it leak the strvec?

------- >8 ------------- >8 ------------- >8 -------
Subject: [PATCH] fsck: clear child_process after failed run_command()

There are two loops that calls run_command() using the same
child_process struct near the end of cmd_fsck().  4d0984be (fsck: do
not reuse child_process structs, 2018-11-12) tightened these code
paths to reinitialize the structure in order to safely reuse it.

    The run-command API makes no promises about what is left in a struct
    child_process after a command finishes, and it's not safe to simply
    reuse it again for a similar command.

Upon failure, run_command() can return without releasing the
resource held by the child_process structure, which is done by
calling finish_command() which in turn calls child_process_clear().

Reinitializing the structure without calling child_process_clear()
for the next round would leak the .args and .env strvecs.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 26, 2024, 4:17 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> Subject: [PATCH] fsck: clear child_process after failed run_command()
>
> There are two loops that calls run_command() using the same
> child_process struct near the end of cmd_fsck().  4d0984be (fsck: do
> not reuse child_process structs, 2018-11-12) tightened these code
> paths to reinitialize the structure in order to safely reuse it.
>
>     The run-command API makes no promises about what is left in a struct
>     child_process after a command finishes, and it's not safe to simply
>     reuse it again for a similar command.
>
> Upon failure, run_command() can return without releasing the
> resource held by the child_process structure, which is done by
> calling finish_command() which in turn calls child_process_clear().

Sorry, but I have to take this back.  

The error return code paths in the start_command() function does
seem to call clear_child_process(), which means that there is no
need to call clear_child_process() ourselves after a failed
run_command() call.

The need for reinitializing that established by 4d0984be (fsck: do
not reuse child_process structs, 2018-11-12) still stand, so

	... the first run ...
	run_command(&cp);
	/* no need for separate child_process_clear(&cp) */

	child_process_init(&cp); /* prepare for reuse */
	
        ... setup for the second run ...
	strvec_pushv(&cp.args, diff_index_args);
	cp.git_cmd = 1;
	... full set-up without relying on anything done earlier ...

would still be needed.

Or alternatively, we could do this to ensure that the child_process
structure is always reusable.

 run-command.c | 1 +
 run-command.h | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git c/run-command.c w/run-command.c
index 6ac1d14516..aba250fbe1 100644
--- c/run-command.c
+++ w/run-command.c
@@ -26,6 +26,7 @@ void child_process_clear(struct child_process *child)
 {
 	strvec_clear(&child->args);
 	strvec_clear(&child->env);
+	child_process_init(child);
 }
 
 struct child_to_clean {
diff --git c/run-command.h w/run-command.h
index 55f6631a2a..6e203c22f6 100644
--- c/run-command.h
+++ w/run-command.h
@@ -204,7 +204,8 @@ int start_command(struct child_process *);
 
 /**
  * Wait for the completion of a sub-process that was started with
- * start_command().
+ * start_command().  The child_process structure is cleared and
+ * reinitialized.
  */
 int finish_command(struct child_process *);
 
@@ -214,6 +215,9 @@ int finish_command_in_signal(struct child_process *);
  * A convenience function that encapsulates a sequence of
  * start_command() followed by finish_command(). Takes a pointer
  * to a `struct child_process` that specifies the details.
+ * The child_process structure is cleared and reinitialized,
+ * even when the command fails to start or an error is detected
+ * in finish_command().
  */
 int run_command(struct child_process *);
Abhijeet Sonar June 26, 2024, 5:29 p.m. UTC | #2
On 26/06/24 21:47, Junio C Hamano wrote:
> Or alternatively, we could do this to ensure that the child_process
> structure is always reusable.
> 
>  run-command.c | 1 +
>  run-command.h | 6 +++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git c/run-command.c w/run-command.c
> index 6ac1d14516..aba250fbe1 100644
> --- c/run-command.c
> +++ w/run-command.c
> @@ -26,6 +26,7 @@ void child_process_clear(struct child_process *child)
>  {
>  	strvec_clear(&child->args);
>  	strvec_clear(&child->env);
> +	child_process_init(child);
>  }

To me, this looks much better.  child_process_clear's name already
suggests that is sort of like a destructor, so it makes sense to
re-initialize everything here.  I even wonder why it was not that way to
begin with.  I suppose no callers are assuming that it only clears args
and env though?

>  struct child_to_clean {
> diff --git c/run-command.h w/run-command.h
> index 55f6631a2a..6e203c22f6 100644
> --- c/run-command.h
> +++ w/run-command.h
> @@ -204,7 +204,8 @@ int start_command(struct child_process *);
>  
>  /**
>   * Wait for the completion of a sub-process that was started with
> - * start_command().
> + * start_command().  The child_process structure is cleared and
> + * reinitialized.
>   */
>  int finish_command(struct child_process *);
>  
> @@ -214,6 +215,9 @@ int finish_command_in_signal(struct child_process *);
>   * A convenience function that encapsulates a sequence of
>   * start_command() followed by finish_command(). Takes a pointer
>   * to a `struct child_process` that specifies the details.
> + * The child_process structure is cleared and reinitialized,
> + * even when the command fails to start or an error is detected
> + * in finish_command().
>   */
>  int run_command(struct child_process *);
>
Junio C Hamano June 26, 2024, 5:35 p.m. UTC | #3
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> To me, this looks much better.  child_process_clear's name already
> suggests that is sort of like a destructor, so it makes sense to
> re-initialize everything here.  I even wonder why it was not that way to
> begin with.  I suppose no callers are assuming that it only clears args
> and env though?

I guess that validating that supposition is a prerequisite to
declare the change as "much better" and "makes sense".
Junio C Hamano June 26, 2024, 5:45 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>
>> To me, this looks much better.  child_process_clear's name already
>> suggests that is sort of like a destructor, so it makes sense to
>> re-initialize everything here.  I even wonder why it was not that way to
>> begin with.  I suppose no callers are assuming that it only clears args
>> and env though?
>
> I guess that validating that supposition is a prerequisite to
> declare the change as "much better" and "makes sense".

Having said that, a lot more plausible reason is that most callers
are expected to use a single struct just once without reusing it, so
the cost of re-initialization is wasted cycles for them if it were
part of child_process_clear() which is primarily about releasing
resources (instead of leaking them).

It can be argued that it is a sensible design decision to make those
minority callers that do reuse the same struct to pay the cost of
reinitialization themselves, instead of the majority callers that
use it once and then release resources held in there.
Abhijeet Sonar June 26, 2024, 6:07 p.m. UTC | #5
On 26/06/24 23:05, Junio C Hamano wrote:
> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
> 
>> To me, this looks much better.  child_process_clear's name already
>> suggests that is sort of like a destructor, so it makes sense to
>> re-initialize everything here.  I even wonder why it was not that way to
>> begin with.  I suppose no callers are assuming that it only clears args
>> and env though?
> 
> I guess that validating that supposition is a prerequisite to
> declare the change as "much better" and "makes sense".

OK.  I found one: at the end of submodule.c:push_submodule()

	if (...) {
		...some setup...
		if (run_command(&cp))
			return 0;
		close(cp.out);
	}
Junio C Hamano June 26, 2024, 6:49 p.m. UTC | #6
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> On 26/06/24 23:05, Junio C Hamano wrote:
>> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>> 
>>> To me, this looks much better.  child_process_clear's name already
>>> suggests that is sort of like a destructor, so it makes sense to
>>> re-initialize everything here.  I even wonder why it was not that way to
>>> begin with.  I suppose no callers are assuming that it only clears args
>>> and env though?
>> 
>> I guess that validating that supposition is a prerequisite to
>> declare the change as "much better" and "makes sense".
>
> OK.  I found one: at the end of submodule.c:push_submodule()
>
> 	if (...) {
> 		...some setup...
> 		if (run_command(&cp))
> 			return 0;
> 		close(cp.out);
> 	}

This is curious.

 * What is this thing trying to do?  When run_command() fails, it
   wants to leave cp.out open, so that the caller this returns to
   can write into it???  That cannot be the case, as cp itself is
   internal.  So does this "close(cp.out)" really matter?

 * Even though we are running child_process_clear() to release the
   resources in run_command() we are not closing the file descriptor
   cp.out in the child_process_clear() and force the caller to close
   it instead.  An open file descriptor is a resource, and a file
   descriptor opened but forgotten is considered a leak.  I wonder
   if child_process_clear() should be closing the file descriptor,
   at least the ones it opened or dup2()ed.

In any case, you found a case where child_process_clear() may not
want to do the full re-initialization and at the same time it is not
doing its job sufficiently well.  Let's decide, at least for now,
not to do the reinitialization from child_process_clear(), then.

Thanks.
Jeff King June 26, 2024, 8:34 p.m. UTC | #7
On Wed, Jun 26, 2024 at 11:49:22AM -0700, Junio C Hamano wrote:

> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
> 
> > On 26/06/24 23:05, Junio C Hamano wrote:
> >> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
> >> 
> >>> To me, this looks much better.  child_process_clear's name already
> >>> suggests that is sort of like a destructor, so it makes sense to
> >>> re-initialize everything here.  I even wonder why it was not that way to
> >>> begin with.  I suppose no callers are assuming that it only clears args
> >>> and env though?
> >> 
> >> I guess that validating that supposition is a prerequisite to
> >> declare the change as "much better" and "makes sense".
> >
> > OK.  I found one: at the end of submodule.c:push_submodule()
> >
> > 	if (...) {
> > 		...some setup...
> > 		if (run_command(&cp))
> > 			return 0;
> > 		close(cp.out);
> > 	}
> 
> This is curious.
> 
>  * What is this thing trying to do?  When run_command() fails, it
>    wants to leave cp.out open, so that the caller this returns to
>    can write into it???  That cannot be the case, as cp itself is
>    internal.  So does this "close(cp.out)" really matter?

I think it's totally broken. Using cp.out, cp.in, etc, with
run_command() is a deadlock waiting to happen, since it implies opening
a pipe, not doing anything with our end, and then doing a waitpid() on
the child.

You'd always want to use start_command(), and then do something useful
with the pipe, and then finish_command(). Arguably run_command() should
bug if cp.out, etc are non-zero.

In this case the code leaves cp.out as 0, so we are not asking for a
pipe. That is good, and we are not subject to any race/deadlock. But
then...what is the cleanup trying to do? This will always just close(0),
the parent's stdin. That is certainly surprising, but I guess nobody
ever noticed because git-push does not usually read from its stdin.

So I think the close() is superfluous and should be deleted. It goes all
the way back to eb21c732d6 (push: teach --recurse-submodules the
on-demand option, 2012-03-29). I guess at one point the author thought
we'd read output from a pipe (rather than letting it just go to the
parent's stdout) and this was leftover?

>  * Even though we are running child_process_clear() to release the
>    resources in run_command() we are not closing the file descriptor
>    cp.out in the child_process_clear() and force the caller to close
>    it instead.  An open file descriptor is a resource, and a file
>    descriptor opened but forgotten is considered a leak.  I wonder
>    if child_process_clear() should be closing the file descriptor,
>    at least the ones it opened or dup2()ed.

Usually the caller will have handled this already (since it cares about
exactly _when_ to close), so we'd end up double-closing in most cases.
This is sometimes harmless (you'll get EBADF), but is a problem if the
descriptor was assigned to something else in the meantime.

In most cases callers shouldn't be using child_process_clear() at all
after start_command(), etc. Either:

  - start_command() failed, in which case it cleans up everything
    already (both in the struct and any pipes it opened)

  - we succeeded in starting the command, in which case
    child_process_clear() is insufficient. You need to actually
    finish_command() to avoid leaking the pid.

  - after finish_command(), the struct has been cleaned already. You do
    need to close pipes in the caller, but you'd have to do so before
    calling finish_command() anyway, to avoid the deadlock.

You really only need to call child_process_clear() yourself when you set
up a struct but didn't actually start the process. And from a quick skim
over the grep results, it looks like that's how it's used.

-Peff
Karthik Nayak June 26, 2024, 9:23 p.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:

> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>
>> On 26/06/24 23:05, Junio C Hamano wrote:
>>> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>>>
>>>> To me, this looks much better.  child_process_clear's name already
>>>> suggests that is sort of like a destructor, so it makes sense to
>>>> re-initialize everything here.  I even wonder why it was not that way to
>>>> begin with.  I suppose no callers are assuming that it only clears args
>>>> and env though?
>>>
>>> I guess that validating that supposition is a prerequisite to
>>> declare the change as "much better" and "makes sense".
>>
>> OK.  I found one: at the end of submodule.c:push_submodule()
>>
>> 	if (...) {
>> 		...some setup...
>> 		if (run_command(&cp))
>> 			return 0;
>> 		close(cp.out);
>> 	}
>
> This is curious.
>
>  * What is this thing trying to do?  When run_command() fails, it
>    wants to leave cp.out open, so that the caller this returns to
>    can write into it???  That cannot be the case, as cp itself is
>    internal.  So does this "close(cp.out)" really matter?
>

This is curious one indeed, we only need to close the file descriptor
when we set `cp.out = -1`, otherwise there is no need to close `cp.out`
since `start_command()` takes care of it internally. So the
`close(cp.out)` can really be removed here.

Wonder if this was copied over from other parts of the submodule code,
where we actually set `cp.out = -1`.

>
>  * Even though we are running child_process_clear() to release the
>    resources in run_command() we are not closing the file descriptor
>    cp.out in the child_process_clear() and force the caller to close
>    it instead.  An open file descriptor is a resource, and a file
>    descriptor opened but forgotten is considered a leak.  I wonder
>    if child_process_clear() should be closing the file descriptor,
>    at least the ones it opened or dup2()ed.
>

If you see the documentation for run_command.h::child_process, it states

    /*
     * Using .in, .out, .err:
     * - Specify 0 for no redirections. No new file descriptor is allocated.
     * (child inherits stdin, stdout, stderr from parent).
     * - Specify -1 to have a pipe allocated as follows:
     *     .in: returns the writable pipe end; parent writes to it,
     *          the readable pipe end becomes child's stdin
     *     .out, .err: returns the readable pipe end; parent reads from
     *          it, the writable pipe end becomes child's stdout/stderr
     *   The caller of start_command() must close the returned FDs
     *   after it has completed reading from/writing to it!
     * - Specify > 0 to set a channel to a particular FD as follows:
     *     .in: a readable FD, becomes child's stdin
     *     .out: a writable FD, becomes child's stdout/stderr
     *     .err: a writable FD, becomes child's stderr
     *   The specified FD is closed by start_command(), even in case
     *   of errors!
     */
    int in;
    int out;
    int err;

So this makes sense right? run_command() doesn't support the pipe
options, meaning clients have to call start_command() + finish_command()
manually. When clients do call start_command() with '-1' set to these
options, they are in charge of closing the created pipes.

> In any case, you found a case where child_process_clear() may not
> want to do the full re-initialization and at the same time it is not
> doing its job sufficiently well.  Let's decide, at least for now,
> not to do the reinitialization from child_process_clear(), then.
>
> Thanks.
Jeff King June 27, 2024, 12:33 a.m. UTC | #9
On Wed, Jun 26, 2024 at 04:34:17PM -0400, Jeff King wrote:

> >  * What is this thing trying to do?  When run_command() fails, it
> >    wants to leave cp.out open, so that the caller this returns to
> >    can write into it???  That cannot be the case, as cp itself is
> >    internal.  So does this "close(cp.out)" really matter?
> 
> I think it's totally broken. Using cp.out, cp.in, etc, with
> run_command() is a deadlock waiting to happen, since it implies opening
> a pipe, not doing anything with our end, and then doing a waitpid() on
> the child.
> 
> You'd always want to use start_command(), and then do something useful
> with the pipe, and then finish_command(). Arguably run_command() should
> bug if cp.out, etc are non-zero.

Oh, I guess I should have looked at the code. We do indeed BUG() in this
case already, courtesy of c29b3962af (run-command: forbid using
run_command with piped output, 2015-03-22).

So all is well there. This caller is still completely wrong to call
close() though.

-Peff
diff mbox series

Patch

diff --git c/builtin/fsck.c w/builtin/fsck.c
index d13a226c2e..398b492184 100644
--- c/builtin/fsck.c
+++ w/builtin/fsck.c
@@ -1078,8 +1078,10 @@  int cmd_fsck(int argc, const char **argv, const char *prefix)
 				strvec_push(&commit_graph_verify.args, "--progress");
 			else
 				strvec_push(&commit_graph_verify.args, "--no-progress");
-			if (run_command(&commit_graph_verify))
+			if (run_command(&commit_graph_verify)) {
+				child_process_clear(&commit_graph_verify);
 				errors_found |= ERROR_COMMIT_GRAPH;
+			}
 		}
 	}
 
@@ -1096,8 +1098,10 @@  int cmd_fsck(int argc, const char **argv, const char *prefix)
 				strvec_push(&midx_verify.args, "--progress");
 			else
 				strvec_push(&midx_verify.args, "--no-progress");
-			if (run_command(&midx_verify))
+			if (run_command(&midx_verify)) {
+				child_process_clear(&midx_verify);
 				errors_found |= ERROR_MULTI_PACK_INDEX;
+			}
 		}
 	}