diff mbox series

[6/7] run-command: create start_bg_command

Message ID f97038a563d889d740a7e968fcbdfaadb41e2008.1631738177.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Builtin FSMonitor Part 1 | expand

Commit Message

Jeff Hostetler Sept. 15, 2021, 8:36 p.m. UTC
From: Jeff Hostetler <jeffhost@microsoft.com>

Create a variation of `run_command()` and `start_command()` to launch a command
into the background and optionally wait for it to become "ready" before returning.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
 run-command.h |  48 ++++++++++++++++++++
 2 files changed, 171 insertions(+)

Comments

Taylor Blau Sept. 16, 2021, 4:53 a.m. UTC | #1
On Wed, Sep 15, 2021 at 08:36:16PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create a variation of `run_command()` and `start_command()` to launch a command
> into the background and optionally wait for it to become "ready" before returning.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  run-command.h |  48 ++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>
> diff --git a/run-command.c b/run-command.c
> index 3e4e082e94d..fe75fd08f74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>  	}
>  	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>  }
> +
> +enum start_bg_result start_bg_command(struct child_process *cmd,
> +				      start_bg_wait_cb *wait_cb,
> +				      void *cb_data,
> +				      unsigned int timeout_sec)
> +{
> +	enum start_bg_result sbgr = SBGR_ERROR;
> +	int ret;
> +	int wait_status;
> +	pid_t pid_seen;
> +	time_t time_limit;
> +
> +	/*
> +	 * Silently disallow child cleanup -- even if requested.
> +	 * The child process should persist in the background
> +	 * and possibly/probably after this process exits.  That
> +	 * is, don't kill the child during our atexit routine.
> +	 */
> +	cmd->clean_on_exit = 0;
> +
> +	ret = start_command(cmd);
> +	if (ret) {
> +		/*
> +		 * We assume that if `start_command()` fails, we
> +		 * either get a complete `trace2_child_start() /
> +		 * trace2_child_exit()` pair or it fails before the
> +		 * `trace2_child_start()` is emitted, so we do not
> +		 * need to worry about it here.
> +		 *
> +		 * We also assume that `start_command()` does not add
> +		 * us to the cleanup list.  And that it calls
> +		 * calls `child_process_clear()`.
> +		 */
> +		sbgr = SBGR_ERROR;
> +		goto done;
> +	}
> +
> +	time(&time_limit);
> +	time_limit += timeout_sec;

This jumped out to me as unsafe, since POSIX guarantees time_t to be an
integral value holding a number of seconds (so += timeout_sec is safe
there), but it isn't in the C standard.

But we have lots of other examples of adding a number of seconds
directly the value filled in by time(2), so I think this is fine.

> +
> +wait:
> +	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
> +
> +	if (pid_seen == 0) {

Small nit, probably better to write this as if (!pid_seen), but not
worth a reroll alone.

> +		/*
> +		 * The child is currently running.  Ask the callback
> +		 * if the child is ready to do work or whether we
> +		 * should keep waiting for it to boot up.
> +		 */

This comment is simple and informative, thank you!

> +		ret = (*wait_cb)(cb_data, cmd);
> +		if (!ret) {
> +			/*
> +			 * The child is running and "ready".
> +			 *
> +			 * NEEDSWORK: As we prepare to orphan (release to
> +			 * the background) this child, it is not appropriate
> +			 * to emit a `trace2_child_exit()` event.  Should we
> +			 * create a new event for this case?

Probably. Maybe trace2_child_orphaned() or trace2_child_background()?

> +			 */
> +			sbgr = SBGR_READY;
> +			goto done;
> +		} else if (ret > 0) {
> +			time_t now;
> +
> +			time(&now);
> +			if (now < time_limit)
> +				goto wait;
> +
> +			/*
> +			 * Our timeout has expired.  We don't try to
> +			 * kill the child, but rather let it continue
> +			 * (hopefully) trying to startup.
> +			 *
> +			 * NEEDSWORK: Like the "ready" case, should we
> +			 * log a custom child-something Trace2 event here?
> +			 */
> +			sbgr = SBGR_TIMEOUT;
> +			goto done;
> +		} else {
> +			/*
> +			 * The cb gave up on this child.
> +			 *
> +			 * NEEDSWORK: Like above, should we log a custom
> +			 * Trace2 child-something event here?
> +			 */
> +			sbgr = SBGR_CB_ERROR;
> +			goto done;
> +		}

OK, so assuming that the child is running, then we ask wait_cb what to
do. Returning zero from the callback means to background it, a positive
value means to give it more time, and negative means to cause an error.
And those match the documentation below, good.

> +	if (pid_seen == cmd->pid) {

This could be an "else if", no?

> +		int child_code = -1;
> +
> +		/*
> +		 * The child started, but exited or was terminated
> +		 * before becoming "ready".
> +		 *
> +		 * We try to match the behavior of `wait_or_whine()`
> +		 * and convert the child's status to a return code for
> +		 * tracing purposes and emit the `trace2_child_exit()`
> +		 * event.
> +		 */
> +		if (WIFEXITED(wait_status))
> +			child_code = WEXITSTATUS(wait_status);
> +		else if (WIFSIGNALED(wait_status))
> +			child_code = WTERMSIG(wait_status) + 128;

Do we care about emitting the same error (when it was signaled with
something other than SIGINT/SIGQUIT/SIGPIPE) as is reported by
wait_or_whine()?

If we want that error here, too, we could probably share the same code
here from here and in wait_or_whine(). I would probably write something
like:

    static int handle_awaited_status(int status, int *code)
    {
      if (WIFSIGNALED(status)) {
        *code = WTERMSIG(status);
        if (*code != SIGINT && *code != SIGQUIT && *code != SIGPIPE)
               error("%s died of signal %d", argv0, *code);
        /*
         * This return value is chosen so that code & 0xff
         * mimics the exit code that a POSIX shell would report for
         * a program that died from this signal.
         */
        *code += 128;
        return 1;
      } else if (WIFEXITED(status)) {
        *code = WEXITSTATUS(status);
        return 1;
      }
      return 0;
    }

so that we could call it in wait_or_whine() like:

    } else if (!handle_awaited_status(status, &code)) {
      error("waitpid is confused (%s)", argv0);
    }

and similarly here in this new function. Alternatively, if we don't want
that error, then it may help future readers to add a short comment
explaining why not.

> +/**
> + * Callback used by `start_bg_command()` to ask whether the
> + * child process is ready or needs more time to become ready.
> + *
> + * Returns 1 is child needs more time (subject to the requested timeout).
> + * Returns 0 if child is ready.
> + * Returns -1 on any error and cause `start_bg_command()` to also error out.
> + */
> +typedef int(start_bg_wait_cb)(void *cb_data,
> +			      const struct child_process *cmd);

Nitpicking, but typically I would assume that the "extra" void pointer
is the last argument in a callback. It definitely does not matter,
though.

Thanks,
Taylor
Taylor Blau Sept. 16, 2021, 4:58 a.m. UTC | #2
On Thu, Sep 16, 2021 at 12:53:07AM -0400, Taylor Blau wrote:
> > +/**
> > + * Callback used by `start_bg_command()` to ask whether the
> > + * child process is ready or needs more time to become ready.
> > + *
> > + * Returns 1 is child needs more time (subject to the requested timeout).
> > + * Returns 0 if child is ready.
> > + * Returns -1 on any error and cause `start_bg_command()` to also error out.
> > + */
> > +typedef int(start_bg_wait_cb)(void *cb_data,
> > +			      const struct child_process *cmd);
>
> Nitpicking, but typically I would assume that the "extra" void pointer
> is the last argument in a callback. It definitely does not matter,
> though.

Looking at the last patch (which adds the first implementation of one of
these callbacks) it appears that this cb_data pointer is unused. I
assume that it is used in later patches which aren't in this topic?

If so, then it may help future readers to indicate as much in the patch
message. Perhaps "the cb_data argument in the start_bg_wait_cb callback
is unused in this series, but will be useful in later patches".

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Sept. 16, 2021, 5:56 a.m. UTC | #3
On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create a variation of `run_command()` and `start_command()` to launch a command
> into the background and optionally wait for it to become "ready" before returning.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  run-command.h |  48 ++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>
> diff --git a/run-command.c b/run-command.c
> index 3e4e082e94d..fe75fd08f74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>  	}
>  	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>  }
> +
> +enum start_bg_result start_bg_command(struct child_process *cmd,
> +				      start_bg_wait_cb *wait_cb,
> +				      void *cb_data,
> +				      unsigned int timeout_sec)
> +{
> +	enum start_bg_result sbgr = SBGR_ERROR;
> +	int ret;
> +	int wait_status;
> +	pid_t pid_seen;
> +	time_t time_limit;
> +
> +	/*
> +	 * Silently disallow child cleanup -- even if requested.
> +	 * The child process should persist in the background
> +	 * and possibly/probably after this process exits.  That
> +	 * is, don't kill the child during our atexit routine.
> +	 */
> +	cmd->clean_on_exit = 0;

Since we have no existing users, can we change this to:

if (cmd->clean_on_exit)
    BUG("start_bg_command() doesn't support non-zero clean_on_exit");

Just silently discarding what the caller asked for seems like the wrong
thing to do, why the silence?

> +
> +	ret = start_command(cmd);
> +	if (ret) {
> +		/*
> +		 * We assume that if `start_command()` fails, we
> +		 * either get a complete `trace2_child_start() /
> +		 * trace2_child_exit()` pair or it fails before the
> +		 * `trace2_child_start()` is emitted, so we do not
> +		 * need to worry about it here.
> +		 *
> +		 * We also assume that `start_command()` does not add
> +		 * us to the cleanup list.  And that it calls
> +		 * calls `child_process_clear()`.
> +		 */

These all look like sensible things to assume, but I think commentary /
writing on this would be much better just documenting that the
start_command() API does this in its comment in run-command.h, or
perhaps the comment starting with "The functions: child_process_init".

> +		sbgr = SBGR_ERROR;
> +		goto done;
> +	}
> +
> +	time(&time_limit);
> +	time_limit += timeout_sec;
> +
> +wait:
> +	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
> +
> +	if (pid_seen == 0) {
> +		/*
> +		 * The child is currently running.  Ask the callback
> +		 * if the child is ready to do work or whether we
> +		 * should keep waiting for it to boot up.
> +		 */
> +		ret = (*wait_cb)(cb_data, cmd);
> +		if (!ret) {
> +			/*
> +			 * The child is running and "ready".
> +			 *
> +			 * NEEDSWORK: As we prepare to orphan (release to
> +			 * the background) this child, it is not appropriate
> +			 * to emit a `trace2_child_exit()` event.  Should we
> +			 * create a new event for this case?
> +			 */
> +			sbgr = SBGR_READY;
> +			goto done;

Per api-trace2.txt:

    `"child_exit"`::
            This event is generated after the current process has returned
            from the waitpid() and collected the exit information from the
            child.

My (perhaps wrong) reading of that is that yes, we should do that, after
all if we've released the child are we otherwise going to hear from them
again in any way trace2 could log with a child_exit later?

Ditto the "commentary better elsewhere" whatever the result of this
discussion is, let's add a note to api-trace2.txt about how detached
children are handled by child_exit events.

> [...]
> +			 * NEEDSWORK: Like the "ready" case, should we
> +			 * log a custom child-something Trace2 event here?
> +			 */
> +			sbgr = SBGR_TIMEOUT;
> +			goto done;

[...]

> +		} else {
> +			/*
> +			 * The cb gave up on this child.
> +			 *
> +			 * NEEDSWORK: Like above, should we log a custom
> +			 * Trace2 child-something event here?
> +			 */
> +			sbgr = SBGR_CB_ERROR;
> +			goto done;

*ditto*

> +		}
> +	}
> +
> +	if (pid_seen == cmd->pid) {
> +		int child_code = -1;
> +
> +		/*
> +		 * The child started, but exited or was terminated
> +		 * before becoming "ready".
> +		 *
> +		 * We try to match the behavior of `wait_or_whine()`
> +		 * and convert the child's status to a return code for
> +		 * tracing purposes and emit the `trace2_child_exit()`
> +		 * event.
> +		 */
> +		if (WIFEXITED(wait_status))
> +			child_code = WEXITSTATUS(wait_status);
> +		else if (WIFSIGNALED(wait_status))
> +			child_code = WTERMSIG(wait_status) + 128;
> +		trace2_child_exit(cmd, child_code);

Although with the above: Perhaps I've missed some subtleties...

> [...]
> diff --git a/run-command.h b/run-command.h
> index af1296769f9..58065197d1b 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -496,4 +496,52 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
>   */
>  void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
>  
> +/**
> + * Possible return values for `start_bg_command()`.
> + */
> +enum start_bg_result {
> +	/* child process is "ready" */
> +	SBGR_READY = 0,

Clarity nit, whenever I see an explicit enum assignment I start hunting
for things that depend on the specific value, as we sometimes do, but
there appears to be nothing...

> +	/* child process could not be started */
> +	SBGR_ERROR,
> +
> +	/* callback error when testing for "ready" */
> +	SBGR_CB_ERROR,
> +
> +	/* timeout expired waiting for child to become "ready" */
> +	SBGR_TIMEOUT,
> +
> +	/* child process exited or was signalled before becomming "ready" */
> +	SBGR_DIED,

...I'd think if we were tweaking the value we'd want to put all the
error values at < 0, but I'm fine with this pattern of them being
positive, it encourages users to use switch/case & compiler checks, and
since it's all new users...

> + * This is a combination of `start_command()` and `finish_command()`, but

Doc nit: I think with whatever syntax standard we use for /** comments
that we prefer functions like_this() not quoted `like_this()`, that
being for values like `123` or whatever. But I'm just going by a quick
grep there & what Emacs is highlighting.
diff mbox series

Patch

diff --git a/run-command.c b/run-command.c
index 3e4e082e94d..fe75fd08f74 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1901,3 +1901,126 @@  void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
 	}
 	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
 }
+
+enum start_bg_result start_bg_command(struct child_process *cmd,
+				      start_bg_wait_cb *wait_cb,
+				      void *cb_data,
+				      unsigned int timeout_sec)
+{
+	enum start_bg_result sbgr = SBGR_ERROR;
+	int ret;
+	int wait_status;
+	pid_t pid_seen;
+	time_t time_limit;
+
+	/*
+	 * Silently disallow child cleanup -- even if requested.
+	 * The child process should persist in the background
+	 * and possibly/probably after this process exits.  That
+	 * is, don't kill the child during our atexit routine.
+	 */
+	cmd->clean_on_exit = 0;
+
+	ret = start_command(cmd);
+	if (ret) {
+		/*
+		 * We assume that if `start_command()` fails, we
+		 * either get a complete `trace2_child_start() /
+		 * trace2_child_exit()` pair or it fails before the
+		 * `trace2_child_start()` is emitted, so we do not
+		 * need to worry about it here.
+		 *
+		 * We also assume that `start_command()` does not add
+		 * us to the cleanup list.  And that it calls
+		 * calls `child_process_clear()`.
+		 */
+		sbgr = SBGR_ERROR;
+		goto done;
+	}
+
+	time(&time_limit);
+	time_limit += timeout_sec;
+
+wait:
+	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
+
+	if (pid_seen == 0) {
+		/*
+		 * The child is currently running.  Ask the callback
+		 * if the child is ready to do work or whether we
+		 * should keep waiting for it to boot up.
+		 */
+		ret = (*wait_cb)(cb_data, cmd);
+		if (!ret) {
+			/*
+			 * The child is running and "ready".
+			 *
+			 * NEEDSWORK: As we prepare to orphan (release to
+			 * the background) this child, it is not appropriate
+			 * to emit a `trace2_child_exit()` event.  Should we
+			 * create a new event for this case?
+			 */
+			sbgr = SBGR_READY;
+			goto done;
+		} else if (ret > 0) {
+			time_t now;
+
+			time(&now);
+			if (now < time_limit)
+				goto wait;
+
+			/*
+			 * Our timeout has expired.  We don't try to
+			 * kill the child, but rather let it continue
+			 * (hopefully) trying to startup.
+			 *
+			 * NEEDSWORK: Like the "ready" case, should we
+			 * log a custom child-something Trace2 event here?
+			 */
+			sbgr = SBGR_TIMEOUT;
+			goto done;
+		} else {
+			/*
+			 * The cb gave up on this child.
+			 *
+			 * NEEDSWORK: Like above, should we log a custom
+			 * Trace2 child-something event here?
+			 */
+			sbgr = SBGR_CB_ERROR;
+			goto done;
+		}
+	}
+
+	if (pid_seen == cmd->pid) {
+		int child_code = -1;
+
+		/*
+		 * The child started, but exited or was terminated
+		 * before becoming "ready".
+		 *
+		 * We try to match the behavior of `wait_or_whine()`
+		 * and convert the child's status to a return code for
+		 * tracing purposes and emit the `trace2_child_exit()`
+		 * event.
+		 */
+		if (WIFEXITED(wait_status))
+			child_code = WEXITSTATUS(wait_status);
+		else if (WIFSIGNALED(wait_status))
+			child_code = WTERMSIG(wait_status) + 128;
+		trace2_child_exit(cmd, child_code);
+
+		sbgr = SBGR_DIED;
+		goto done;
+	}
+
+	if (pid_seen < 0 && errno == EINTR)
+		goto wait;
+
+	trace2_child_exit(cmd, -1);
+	sbgr = SBGR_ERROR;
+
+done:
+	child_process_clear(cmd);
+	invalidate_lstat_cache();
+	return sbgr;
+}
diff --git a/run-command.h b/run-command.h
index af1296769f9..58065197d1b 100644
--- a/run-command.h
+++ b/run-command.h
@@ -496,4 +496,52 @@  int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
  */
 void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
 
+/**
+ * Possible return values for `start_bg_command()`.
+ */
+enum start_bg_result {
+	/* child process is "ready" */
+	SBGR_READY = 0,
+
+	/* child process could not be started */
+	SBGR_ERROR,
+
+	/* callback error when testing for "ready" */
+	SBGR_CB_ERROR,
+
+	/* timeout expired waiting for child to become "ready" */
+	SBGR_TIMEOUT,
+
+	/* child process exited or was signalled before becomming "ready" */
+	SBGR_DIED,
+};
+
+/**
+ * Callback used by `start_bg_command()` to ask whether the
+ * child process is ready or needs more time to become ready.
+ *
+ * Returns 1 is child needs more time (subject to the requested timeout).
+ * Returns 0 if child is ready.
+ * Returns -1 on any error and cause `start_bg_command()` to also error out.
+ */
+typedef int(start_bg_wait_cb)(void *cb_data,
+			      const struct child_process *cmd);
+
+/**
+ * Start a command in the background.  Wait long enough for the child to
+ * become "ready".  Capture immediate errors (like failure to start) and
+ * any immediate exit status (such as a shutdown/signal before the child
+ * became "ready").
+ *
+ * This is a combination of `start_command()` and `finish_command()`, but
+ * with a custom `wait_or_whine()` that allows the caller to define when
+ * the child is "ready".
+ *
+ * The caller does not need to call `finish_command()`.
+ */
+enum start_bg_result start_bg_command(struct child_process *cmd,
+				      start_bg_wait_cb *wait_cb,
+				      void *cb_data,
+				      unsigned int timeout_sec);
+
 #endif