diff mbox series

[GSoC] submodule--helper: run update procedures from C

Message ID 20210722134012.99457-1-raykar.ath@gmail.com (mailing list archive)
State Superseded
Headers show
Series [GSoC] submodule--helper: run update procedures from C | expand

Commit Message

Atharva Raykar July 22, 2021, 1:40 p.m. UTC
Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.

This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.

Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.

This patch could have been approached differently, by first changing the
`is_tip_reachable` and `fetch_in_submodule` shell functions to be
`submodule--helper` subcommands, and then following up with a patch that
introduces the `run-update-procedure` subcommand. We have not done it
like that because those functions are trivial enough to convert directly
along with these other changes. This lets us avoid the boilerplate and
the cleanup patches that will need to be introduced in following that
approach.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
---

This patch depends on changes introduced in 559e49fe5c (submodule: prefix die
messages with 'fatal', 2021-07-08), which belongs to the ar/submodule-add
(2021-07-12) series[1]. Other than that commit, it is independent of my other
'submodule add' conversion series.

Opinions on the following would be appreciated:

* Currently there is a lot of special meaning for the exit code, of this
  subcommand, which was needed to handle the various failures of running the
  update mode in the shell code that follows (note the extra handling of exit
  code 128, because a die() in C returns that value). I felt this was okay to do
  because in a later series that converts whatever is left, the handling of exit
  codes will be simplified.

* Is there a way to check if a sha1 is unreachable from all the refs?
  Currently 'is_tip_reachable()' spawns a subprocess with an incantation of:
  'git rev-list -n 1 $sha1 --not --all'
  I suppose I could do this with the revision-walk API [2], but it felt like
  a lot of boilerplate for something that was really succinct in the original
  shell implementation. Maybe worth looking into it for a later patch?

* I added a 'NOTE' comment for `case SM_UPDATE_COMMAND` in
  submodule--helper.c:run_update_command(). I wonder if that comment is
  unnecessary noise or worth mentioning. Is there an edge case where the
  !command in the 'submodule.update' configuration can be improperly handled by
  strvec_split()?

[1] https://lore.kernel.org/git/20210710074801.19917-1-raykar.ath@gmail.com/

Fetch-it-via:
git fetch https://github.com/tfidfwastaken/git.git submodule-run-update-proc-list-1

 builtin/submodule--helper.c | 247 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  95 ++++----------
 2 files changed, 269 insertions(+), 73 deletions(-)

Comments

Ævar Arnfjörð Bjarmason July 23, 2021, 9:37 a.m. UTC | #1
On Thu, Jul 22 2021, Atharva Raykar wrote:

> +/* NEEDSWORK: try to do this without creating a new process */
> +static int is_tip_reachable(const char *path, struct object_id *sha1)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf rev = STRBUF_INIT;
> +	char *sha1_hex = oid_to_hex(sha1);
> +
> +	cp.git_cmd = 1;
> +	cp.dir = xstrdup(path);
> +	cp.no_stderr = 1;
> +	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
> +
> +	prepare_submodule_repo_env(&cp.env_array);
> +
> +	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
> +		return 0;
> +
> +	return 1;
> +}

I think it's fine to do this & leave out the NEEDSWORK commit, just
briefly noting in the commit-message that we're not bothering with
trying to reduce sub-command invocations. It can be done later if anyone
cares.

> [...]
> +		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
> +			    ud->displaypath, sha1);

For all of these you're removing the translation from a message like:

    die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"

Which is easy enough to fix, just use _(), i.e.:

    strbuf_addf(&die_msg, _("Unable to checkout '%s' in submodule path '%s'\n"), [...]

I removed the "fatal: " per a comment below...

> +		break;
> +	case SM_UPDATE_REBASE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "rebase");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
> +			    ud->displaypath, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_MERGE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "merge");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
> +			    ud->displaypath, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_COMMAND:
> +		/* NOTE: this does not handle quoted arguments */
> +		strvec_split(&cp.args, ud->update_strategy.command);
> +		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
> +			    ud->update_strategy.command, sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
> +			    ud->displaypath, ud->update_strategy.command, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_UNSPECIFIED:
> +	case SM_UPDATE_NONE:
> +		BUG("update strategy should have been specified");
> +	}
> +
> +	strvec_push(&cp.args, sha1);
> +
> +	prepare_submodule_repo_env(&cp.env_array);
> +
> +	if (run_command(&cp)) {
> +		if (must_die_on_failure) {
> +			retval = 2;
> +			fputs(_(die_msg.buf), stderr);
> +			goto cleanup;

FWIW I'd find this clearer if we just kept track of what operation we
ran above, and just in this run_command() && must_die_on_failure case
started populating these die messages.

But even if not the reason I dropped the "fatal: " is shouldn't we just
call die() here directly? Why clean up when we're dying anyway?

Also since I see you used _() here that won't work, i.e. with gettet if
you happen to need to declare things earlier, you need to use N_() to
mark the message for translation.

The _() here won't find any message translated (unless the string
happened to exactly match a thing in the *.po file for other reasons,
not the case here).

But in this case we can just die(msg) here and have used the _() above,
or just call die() directly here not having made a die_msg we usually
won't use...

> +static int do_run_update_procedure(struct update_data *ud)
> +{
> +	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
> +	    is_null_oid(&ud->subsha1) || ud->force) {
> +		int subforce = is_null_oid(&ud->subsha1) || ud->force;
> +
> +		if (!ud->nofetch) {
> +			/*
> +			 * Run fetch only if `sha1` isn't present or it
> +			 * is not reachable from a ref.
> +			 */
> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
> +				    !ud->quiet)
> +					fprintf_ln(stderr,
> +						   _("Unable to fetch in submodule path '%s'; "
> +						     "trying to directly fetch %s:"),
> +						   ud->displaypath, oid_to_hex(&ud->sha1));
> +			/*
> +			 * Now we tried the usual fetch, but `sha1` may
> +			 * not be reachable from any of the refs.
> +			 */
> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
> +					die(_("Fetched in submodule path '%s', but it did not "
> +					      "contain %s. Direct fetching of that commit failed."),
> +					    ud->displaypath, oid_to_hex(&ud->sha1));
> +		}
> +
> +		return run_update_command(ud, subforce);
> +	}
> +
> +	return 3;
> +}

Since this has excatly one caller I think it's better for readability
(less indentation) and flow to just remove that "return 3" condition and
do the big "if" you have at the end, i.e. have this function start with
"int subforce =" and...

>  static void update_submodule(struct update_clone_data *ucd)
>  {
>  	fprintf(stdout, "dummy %s %d\t%s\n",
> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>  	return update_submodules(&suc);
>  }
>  
> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
> +{
> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
> +	char *prefixed_path, *update = NULL;
> +	char *sha1 = NULL, *subsha1 = NULL;
> +	struct update_data update_data = UPDATE_DATA_INIT;
> +
> +	struct option options[] = {
> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
> +		OPT_BOOL('N', "no-fetch", &nofetch,
> +			 N_("don't fetch new objects from the remote site")),
> +		OPT_BOOL(0, "just-cloned", &just_cloned,
> +			 N_("overrides update mode in case the repository is a fresh clone")),
> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
> +		OPT_STRING(0, "prefix", &prefix,
> +			   N_("path"),
> +			   N_("path into the working tree")),
> +		OPT_STRING(0, "update", &update,
> +			   N_("string"),
> +			   N_("rebase, merge, checkout or none")),
> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
> +			   N_("path into the working tree, across nested "
> +			      "submodule boundaries")),
> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
> +			   N_("SHA1 expected by superproject")),
> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
> +			   N_("SHA1 of submodule's HEAD")),
> +		OPT_END()
> +	};
> +
> +	const char *const usage[] = {
> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
> +		NULL
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
> +
> +	if (argc != 1)
> +		usage_with_options(usage, options);
> +	update_data.force = !!force;
> +	update_data.quiet = !!quiet;
> +	update_data.nofetch = !!nofetch;
> +	update_data.just_cloned = !!just_cloned;

For all of these just pass the reference to the update_data variable
directly in the OPT_*(). No need to set an "int force", only to copy it
over to update_data.force. Let's just use the latter only.

> +
> +	if (sha1)
> +		get_oid_hex(sha1, &update_data.sha1);
> +	else
> +		oidcpy(&update_data.sha1, null_oid());

Nit: Even if a historical option forces us to support --sha1, let's use
"oid" for the variable etc. But in this case the --sha1 is new, no?
Let's use --object-id or --oid (whatever is more common, I didn't
check)>

> +
> +	if (subsha1)
> +		get_oid_hex(subsha1, &update_data.subsha1);
> +	else
> +		oidcpy(&update_data.subsha1, null_oid());

Ditto. Also I think for both of these you can re-use
parse_opt_object_id. See "squash-onto" and "upstream" in
builtin/rebase.c.

Then you just supply an oid variable directly and let that helper do all
the get_oid etc.

> +	if (update_data.recursive_prefix)
> +		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
> +	else
> +		prefixed_path = xstrdup(update_data.sm_path);
> +
> +	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
> +
> +	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
> +					    update_data.sm_path, update,
> +					    &update_data.update_strategy);
> +
> +	free(prefixed_path);
> +
> +	return do_run_update_procedure(&update_data);

....(continued from above) ...here just do:

    if (that big if condition)
        return do_run_update_procedure(&update_data);
    else
        return 3;
Atharva Raykar July 23, 2021, 4:59 p.m. UTC | #2
On 23-Jul-2021, at 15:07, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> 
> On Thu, Jul 22 2021, Atharva Raykar wrote:
> 
>> +/* NEEDSWORK: try to do this without creating a new process */
>> +static int is_tip_reachable(const char *path, struct object_id *sha1)
>> +{
>> +	struct child_process cp = CHILD_PROCESS_INIT;
>> +	struct strbuf rev = STRBUF_INIT;
>> +	char *sha1_hex = oid_to_hex(sha1);
>> +
>> +	cp.git_cmd = 1;
>> +	cp.dir = xstrdup(path);
>> +	cp.no_stderr = 1;
>> +	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
>> +
>> +	prepare_submodule_repo_env(&cp.env_array);
>> +
>> +	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
>> +		return 0;
>> +
>> +	return 1;
>> +}
> 
> I think it's fine to do this & leave out the NEEDSWORK commit, just
> briefly noting in the commit-message that we're not bothering with
> trying to reduce sub-command invocations. It can be done later if anyone
> cares.

Okay, fair enough. I realise I have not been consistent about leaving such
comments in all my conversion efforts.

>> [...]
>> +		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
>> +			    ud->displaypath, sha1);
> 
> For all of these you're removing the translation from a message like:
> 
>    die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
> 
> Which is easy enough to fix, just use _(), i.e.:
> 
>    strbuf_addf(&die_msg, _("Unable to checkout '%s' in submodule path '%s'\n"), [...]

Thanks for catching this.

> I removed the "fatal: " per a comment below...
> 
>> +		break;
>> +	case SM_UPDATE_REBASE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "rebase");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
>> +			    ud->displaypath, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_MERGE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "merge");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
>> +			    ud->displaypath, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_COMMAND:
>> +		/* NOTE: this does not handle quoted arguments */
>> +		strvec_split(&cp.args, ud->update_strategy.command);
>> +		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
>> +			    ud->update_strategy.command, sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
>> +			    ud->displaypath, ud->update_strategy.command, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_UNSPECIFIED:
>> +	case SM_UPDATE_NONE:
>> +		BUG("update strategy should have been specified");
>> +	}
>> +
>> +	strvec_push(&cp.args, sha1);
>> +
>> +	prepare_submodule_repo_env(&cp.env_array);
>> +
>> +	if (run_command(&cp)) {
>> +		if (must_die_on_failure) {
>> +			retval = 2;
>> +			fputs(_(die_msg.buf), stderr);
>> +			goto cleanup;
> 
> FWIW I'd find this clearer if we just kept track of what operation we
> ran above, and just in this run_command() && must_die_on_failure case
> started populating these die messages.

Could you clarify what you meant here?

I can't think of a way to do this at the moment without introducing one
or more redundant 'switch ()' statements. This is what I interpreted your
statement as:

...
	/* we added the arguments to 'cp' already in the switch () above... */

	if (run_command(&cp)) {
		if (must_die_on_failure) {
			switch (ud->update_strategy.type) {
			case SM_UPDATE_CHECKOUT:
				die(_("Execution of..."));
				break;
			case ...
			case ...
		}
		/*
		 * This signifies to the caller in shell that
		 * the command failed without dying
		 */
		retval = 1;
		goto cleanup;
	}
	
	/* ...another switch to figure out which say_msg() to use? */
...

Or did you mean that instead of storing the die_msg in entirety at the
first switch, I instead store just the action, and in the conditional
finally have something like...?

	die(_("Unable to %s '%s' in submodule path '%s'"),
	    action, sha1, ud->displaypath);

...where 'action' is a value like "merge", "checkout" etc, set in the
first 'switch'.

I am guessing the motivation for this is to make more clear which error
message will be shown for each case?

> But even if not the reason I dropped the "fatal: " is shouldn't we just
> call die() here directly? Why clean up when we're dying anyway?

The reason I did not call die() directly is because the original shell
version originally specifically exited with 2 in the 'must_die_on_error'
case while die() in C exits with 128. I think it would be more semantically
correct for me to have done something like an 'exit(2)' instead of cleaning
up and returning.

That said, it occured to me that the receiving end does not do anything
special with the different exit code, other than just pass it on to another
exit, ie, this bit:

+		2|128)
+			exit $res
+			;;

This code currently sits in a weird middle position where it is neither
fully matching the exit codes as before the conversion (where a failure
unrelated to the command execution should have exited with 1, not 128),
nor is it having a complete disregard for their exact value, which would
somewhat simplify the failure handling code.

I wonder if any script in a machine somewhere cares about the exact exit value
of 'submodule add'. I suspect it would be relatively harmless if I just follow
your suggestion and just die() on command execution failure...

Thoughts?

> Also since I see you used _() here that won't work, i.e. with gettet if
> you happen to need to declare things earlier, you need to use N_() to
> mark the message for translation.
> 
> The _() here won't find any message translated (unless the string
> happened to exactly match a thing in the *.po file for other reasons,
> not the case here).
> 
> But in this case we can just die(msg) here and have used the _() above,
> or just call die() directly here not having made a die_msg we usually
> won't use...

Okay, I'll ensure that the translations are marked properly when I reroll.

>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
>> +	    is_null_oid(&ud->subsha1) || ud->force) {
>> +		int subforce = is_null_oid(&ud->subsha1) || ud->force;
>> +
>> +		if (!ud->nofetch) {
>> +			/*
>> +			 * Run fetch only if `sha1` isn't present or it
>> +			 * is not reachable from a ref.
>> +			 */
>> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
>> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
>> +				    !ud->quiet)
>> +					fprintf_ln(stderr,
>> +						   _("Unable to fetch in submodule path '%s'; "
>> +						     "trying to directly fetch %s:"),
>> +						   ud->displaypath, oid_to_hex(&ud->sha1));
>> +			/*
>> +			 * Now we tried the usual fetch, but `sha1` may
>> +			 * not be reachable from any of the refs.
>> +			 */
>> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
>> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
>> +					die(_("Fetched in submodule path '%s', but it did not "
>> +					      "contain %s. Direct fetching of that commit failed."),
>> +					    ud->displaypath, oid_to_hex(&ud->sha1));
>> +		}
>> +
>> +		return run_update_command(ud, subforce);
>> +	}
>> +
>> +	return 3;
>> +}
> 
> Since this has excatly one caller I think it's better for readability
> (less indentation) and flow to just remove that "return 3" condition and
> do the big "if" you have at the end, i.e. have this function start with
> "int subforce =" and...

Yeah, that would be better. I'll change that.

>> static void update_submodule(struct update_clone_data *ucd)
>> {
>> 	fprintf(stdout, "dummy %s %d\t%s\n",
>> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>> 	return update_submodules(&suc);
>> }
>> 
>> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
>> +{
>> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
>> +	char *prefixed_path, *update = NULL;
>> +	char *sha1 = NULL, *subsha1 = NULL;
>> +	struct update_data update_data = UPDATE_DATA_INIT;
>> +
>> +	struct option options[] = {
>> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
>> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
>> +		OPT_BOOL('N', "no-fetch", &nofetch,
>> +			 N_("don't fetch new objects from the remote site")),
>> +		OPT_BOOL(0, "just-cloned", &just_cloned,
>> +			 N_("overrides update mode in case the repository is a fresh clone")),
>> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
>> +		OPT_STRING(0, "prefix", &prefix,
>> +			   N_("path"),
>> +			   N_("path into the working tree")),
>> +		OPT_STRING(0, "update", &update,
>> +			   N_("string"),
>> +			   N_("rebase, merge, checkout or none")),
>> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
>> +			   N_("path into the working tree, across nested "
>> +			      "submodule boundaries")),
>> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
>> +			   N_("SHA1 expected by superproject")),
>> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
>> +			   N_("SHA1 of submodule's HEAD")),
>> +		OPT_END()
>> +	};
>> +
>> +	const char *const usage[] = {
>> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
>> +		NULL
>> +	};
>> +
>> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
>> +
>> +	if (argc != 1)
>> +		usage_with_options(usage, options);
>> +	update_data.force = !!force;
>> +	update_data.quiet = !!quiet;
>> +	update_data.nofetch = !!nofetch;
>> +	update_data.just_cloned = !!just_cloned;
> 
> For all of these just pass the reference to the update_data variable
> directly in the OPT_*(). No need to set an "int force", only to copy it
> over to update_data.force. Let's just use the latter only.

Hmm, I'm trying to remember why the single bit values are treated this way
in this whole file...

...there seems to be no good reason for it. The API docs for parse options
state that OPT_BOOL() is guaranteed to return either zero or one, so that
double negation does look unnecessary.

>> +
>> +	if (sha1)
>> +		get_oid_hex(sha1, &update_data.sha1);
>> +	else
>> +		oidcpy(&update_data.sha1, null_oid());
> 
> Nit: Even if a historical option forces us to support --sha1, let's use
> "oid" for the variable etc. But in this case the --sha1 is new, no?
> Let's use --object-id or --oid (whatever is more common, I didn't
> check)>

Okay. I can see the confusion this may cause.

>> +
>> +	if (subsha1)
>> +		get_oid_hex(subsha1, &update_data.subsha1);
>> +	else
>> +		oidcpy(&update_data.subsha1, null_oid());
> 
> Ditto. Also I think for both of these you can re-use
> parse_opt_object_id. See "squash-onto" and "upstream" in
> builtin/rebase.c.
> 
> Then you just supply an oid variable directly and let that helper do all
> the get_oid etc.

Thanks for pointing me to this!

>> +	if (update_data.recursive_prefix)
>> +		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
>> +	else
>> +		prefixed_path = xstrdup(update_data.sm_path);
>> +
>> +	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
>> +
>> +	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
>> +					    update_data.sm_path, update,
>> +					    &update_data.update_strategy);
>> +
>> +	free(prefixed_path);
>> +
>> +	return do_run_update_procedure(&update_data);
> 
> ....(continued from above) ...here just do:
> 
>    if (that big if condition)
>        return do_run_update_procedure(&update_data);
>    else
>        return 3;

Okay.

Thanks for the review!
Atharva Raykar Aug. 4, 2021, 8:35 a.m. UTC | #3
Atharva Raykar <raykar.ath@gmail.com> writes:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>> static void update_submodule(struct update_clone_data *ucd)
>>> {
>>> 	fprintf(stdout, "dummy %s %d\t%s\n",
>>> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>>> 	return update_submodules(&suc);
>>> }
>>>
>>> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
>>> +{
>>> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
>>> +	char *prefixed_path, *update = NULL;
>>> +	char *sha1 = NULL, *subsha1 = NULL;
>>> +	struct update_data update_data = UPDATE_DATA_INIT;
>>> +
>>> +	struct option options[] = {
>>> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
>>> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
>>> +		OPT_BOOL('N', "no-fetch", &nofetch,
>>> +			 N_("don't fetch new objects from the remote site")),
>>> +		OPT_BOOL(0, "just-cloned", &just_cloned,
>>> +			 N_("overrides update mode in case the repository is a fresh clone")),
>>> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
>>> +		OPT_STRING(0, "prefix", &prefix,
>>> +			   N_("path"),
>>> +			   N_("path into the working tree")),
>>> +		OPT_STRING(0, "update", &update,
>>> +			   N_("string"),
>>> +			   N_("rebase, merge, checkout or none")),
>>> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
>>> +			   N_("path into the working tree, across nested "
>>> +			      "submodule boundaries")),
>>> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
>>> +			   N_("SHA1 expected by superproject")),
>>> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
>>> +			   N_("SHA1 of submodule's HEAD")),
>>> +		OPT_END()
>>> +	};
>>> +
>>> +	const char *const usage[] = {
>>> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
>>> +		NULL
>>> +	};
>>> +
>>> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
>>> +
>>> +	if (argc != 1)
>>> +		usage_with_options(usage, options);
>>> +	update_data.force = !!force;
>>> +	update_data.quiet = !!quiet;
>>> +	update_data.nofetch = !!nofetch;
>>> +	update_data.just_cloned = !!just_cloned;
>>
>> For all of these just pass the reference to the update_data variable
>> directly in the OPT_*(). No need to set an "int force", only to copy it
>> over to update_data.force. Let's just use the latter only.
>
> Hmm, I'm trying to remember why the single bit values are treated this way
> in this whole file...
>
> ...there seems to be no good reason for it. The API docs for parse options
> state that OPT_BOOL() is guaranteed to return either zero or one, so that
> double negation does look unnecessary.

I forgot to mention why I did not address this change in my v3 patch.
The reason why we are handling boolean values this way is because they
are declared as bitfields in the 'update_data' struct. Since we cannot
take the address of bitfields, we have to use a different variable to
store when using 'parse_options()'.
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d55f6262e9..4e16561bf1 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2029,6 +2029,20 @@  struct submodule_update_clone {
 	.max_jobs = 1, \
 }
 
+struct update_data {
+	const char *recursive_prefix;
+	const char *sm_path;
+	const char *displaypath;
+	struct object_id sha1;
+	struct object_id subsha1;
+	struct submodule_update_strategy update_strategy;
+	int depth;
+	unsigned int force: 1;
+	unsigned int quiet: 1;
+	unsigned int nofetch: 1;
+	unsigned int just_cloned: 1;
+};
+#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
 		struct strbuf *out, const char *displaypath)
@@ -2282,6 +2296,165 @@  static int git_update_clone_config(const char *var, const char *value,
 	return 0;
 }
 
+/* NEEDSWORK: try to do this without creating a new process */
+static int is_tip_reachable(const char *path, struct object_id *sha1)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf rev = STRBUF_INIT;
+	char *sha1_hex = oid_to_hex(sha1);
+
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(path);
+	cp.no_stderr = 1;
+	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
+		return 0;
+
+	return 1;
+}
+
+static int fetch_in_submodule(const char *module_path, int depth, int quiet, struct object_id *sha1)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+
+	prepare_submodule_repo_env(&cp.env_array);
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(module_path);
+
+	strvec_push(&cp.args, "fetch");
+	if (quiet)
+		strvec_push(&cp.args, "--quiet");
+	if (depth)
+		strvec_pushf(&cp.args, "--depth=%d", depth);
+	if (sha1) {
+		char *sha1_hex = oid_to_hex(sha1);
+		char *remote = get_default_remote();
+		strvec_pushl(&cp.args, remote, sha1_hex, NULL);
+	}
+
+	return run_command(&cp);
+}
+
+static int run_update_command(struct update_data *ud, int subforce)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf die_msg = STRBUF_INIT;
+	struct strbuf say_msg = STRBUF_INIT;
+	char *sha1 = oid_to_hex(&ud->sha1);
+	int retval, must_die_on_failure = 0;
+
+	cp.dir = xstrdup(ud->sm_path);
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "checkout", "-q", NULL);
+		if (subforce)
+			strvec_push(&cp.args, "-f");
+		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
+			    ud->displaypath, sha1);
+		break;
+	case SM_UPDATE_REBASE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "rebase");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
+			    ud->displaypath, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_MERGE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "merge");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
+			    ud->displaypath, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_COMMAND:
+		/* NOTE: this does not handle quoted arguments */
+		strvec_split(&cp.args, ud->update_strategy.command);
+		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
+			    ud->update_strategy.command, sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
+			    ud->displaypath, ud->update_strategy.command, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+	case SM_UPDATE_NONE:
+		BUG("update strategy should have been specified");
+	}
+
+	strvec_push(&cp.args, sha1);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (run_command(&cp)) {
+		if (must_die_on_failure) {
+			retval = 2;
+			fputs(_(die_msg.buf), stderr);
+			goto cleanup;
+		}
+		/*
+		 * This signifies to the caller in shell that
+		 * the command failed without dying
+		 */
+		retval = 1;
+		goto cleanup;
+	}
+	retval = 0;
+	puts(_(say_msg.buf));
+
+cleanup:
+	strbuf_release(&die_msg);
+	strbuf_release(&say_msg);
+	return retval;
+}
+
+static int do_run_update_procedure(struct update_data *ud)
+{
+	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
+	    is_null_oid(&ud->subsha1) || ud->force) {
+		int subforce = is_null_oid(&ud->subsha1) || ud->force;
+
+		if (!ud->nofetch) {
+			/*
+			 * Run fetch only if `sha1` isn't present or it
+			 * is not reachable from a ref.
+			 */
+			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
+				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
+				    !ud->quiet)
+					fprintf_ln(stderr,
+						   _("Unable to fetch in submodule path '%s'; "
+						     "trying to directly fetch %s:"),
+						   ud->displaypath, oid_to_hex(&ud->sha1));
+			/*
+			 * Now we tried the usual fetch, but `sha1` may
+			 * not be reachable from any of the refs.
+			 */
+			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
+				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
+					die(_("Fetched in submodule path '%s', but it did not "
+					      "contain %s. Direct fetching of that commit failed."),
+					    ud->displaypath, oid_to_hex(&ud->sha1));
+		}
+
+		return run_update_command(ud, subforce);
+	}
+
+	return 3;
+}
+
 static void update_submodule(struct update_clone_data *ucd)
 {
 	fprintf(stdout, "dummy %s %d\t%s\n",
@@ -2379,6 +2552,79 @@  static int update_clone(int argc, const char **argv, const char *prefix)
 	return update_submodules(&suc);
 }
 
+static int run_update_procedure(int argc, const char **argv, const char *prefix)
+{
+	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
+	char *prefixed_path, *update = NULL;
+	char *sha1 = NULL, *subsha1 = NULL;
+	struct update_data update_data = UPDATE_DATA_INIT;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
+		OPT__FORCE(&force, N_("force checkout updates"), 0),
+		OPT_BOOL('N', "no-fetch", &nofetch,
+			 N_("don't fetch new objects from the remote site")),
+		OPT_BOOL(0, "just-cloned", &just_cloned,
+			 N_("overrides update mode in case the repository is a fresh clone")),
+		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("rebase, merge, checkout or none")),
+		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "sha1", &sha1, N_("string"),
+			   N_("SHA1 expected by superproject")),
+		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
+			   N_("SHA1 of submodule's HEAD")),
+		OPT_END()
+	};
+
+	const char *const usage[] = {
+		N_("git submodule--helper run-update-procedure [<options>] <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (argc != 1)
+		usage_with_options(usage, options);
+
+	update_data.force = !!force;
+	update_data.quiet = !!quiet;
+	update_data.nofetch = !!nofetch;
+	update_data.just_cloned = !!just_cloned;
+	update_data.sm_path = argv[0];
+
+	if (sha1)
+		get_oid_hex(sha1, &update_data.sha1);
+	else
+		oidcpy(&update_data.sha1, null_oid());
+
+	if (subsha1)
+		get_oid_hex(subsha1, &update_data.subsha1);
+	else
+		oidcpy(&update_data.subsha1, null_oid());
+
+	if (update_data.recursive_prefix)
+		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
+	else
+		prefixed_path = xstrdup(update_data.sm_path);
+
+	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
+
+	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
+					    update_data.sm_path, update,
+					    &update_data.update_strategy);
+
+	free(prefixed_path);
+
+	return do_run_update_procedure(&update_data);
+}
+
 static int resolve_relative_path(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -2759,6 +3005,7 @@  static struct cmd_struct commands[] = {
 	{"clone", module_clone, 0},
 	{"update-module-mode", module_update_module_mode, 0},
 	{"update-clone", update_clone, 0},
+	{"run-update-procedure", run_update_procedure, 0},
 	{"ensure-core-worktree", ensure_core_worktree, 0},
 	{"relative-path", resolve_relative_path, 0},
 	{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 1187c21260..4d5437f5c2 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -405,13 +405,6 @@  cmd_deinit()
 	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit ${GIT_QUIET:+--quiet} ${prefix:+--prefix "$prefix"} ${force:+--force} ${deinit_all:+--all} -- "$@"
 }
 
-is_tip_reachable () (
-	sanitize_submodule_env &&
-	cd "$1" &&
-	rev=$(git rev-list -n 1 "$2" --not --all 2>/dev/null) &&
-	test -z "$rev"
-)
-
 # usage: fetch_in_submodule <module_path> [<depth>] [<sha1>]
 # Because arguments are positional, use an empty string to omit <depth>
 # but include <sha1>.
@@ -555,14 +548,13 @@  cmd_update()
 
 		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
 
-		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
-
 		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
 
 		if test $just_cloned -eq 1
 		then
 			subsha1=
 		else
+			just_cloned=
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
 			die "$(eval_gettext "fatal: Unable to find current revision in submodule path '\$displaypath'")"
@@ -583,70 +575,27 @@  cmd_update()
 			die "$(eval_gettext "fatal: Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
 		fi
 
-		if test "$subsha1" != "$sha1" || test -n "$force"
-		then
-			subforce=$force
-			# If we don't already have a -f flag and the submodule has never been checked out
-			if test -z "$subsha1" && test -z "$force"
-			then
-				subforce="-f"
-			fi
+		out=$(git submodule--helper run-update-procedure ${wt_prefix:+--prefix "$wt_prefix"} ${GIT_QUIET:+--quiet} ${force:+--force} ${just_cloned:+--just-cloned} ${nofetch:+--no-fetch} ${depth:+"$depth"} ${update:+--update "$update"} ${prefix:+--recursive-prefix "$prefix"} ${sha1:+--sha1 "$sha1"} ${subsha1:+--subsha1 "$subsha1"} "$sm_path")
 
-			if test -z "$nofetch"
-			then
-				# Run fetch only if $sha1 isn't present or it
-				# is not reachable from a ref.
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" $depth ||
-				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
-
-				# Now we tried the usual fetch, but $sha1 may
-				# not be reachable from any of the refs
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
-				die "$(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
-			fi
-
-			must_die_on_failure=
-			case "$update_module" in
-			checkout)
-				command="git checkout $subforce -q"
-				die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
-				;;
-			rebase)
-				command="git rebase ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			merge)
-				command="git merge ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			!*)
-				command="${update_module#!}"
-				die_msg="$(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
-				must_die_on_failure=yes
-				;;
-			*)
-				die "$(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
-			esac
-
-			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
-			then
-				say "$say_msg"
-			elif test -n "$must_die_on_failure"
-			then
-				die_with_status 2 "$die_msg"
-			else
-				err="${err};$die_msg"
-				continue
-			fi
-		fi
+		# exit codes for run-update-procedure:
+		# 0: update was successful, say command output
+		# 128: subcommand died during execution
+		# 1: update procedure failed and must die
+		# 2: update procedure failed, but should not die
+		# 3: no update procedure was run
+		res="$?"
+		case $res in
+		0)
+			say "$out"
+			;;
+		2|128)
+			exit $res
+			;;
+		1)
+			err="${err};$out"
+			continue
+			;;
+		esac
 
 		if test -n "$recursive"
 		then
@@ -661,7 +610,7 @@  cmd_update()
 			if test $res -gt 0
 			then
 				die_msg="$(eval_gettext "fatal: Failed to recurse into submodule path '\$displaypath'")"
-				if test $res -ne 2
+				if test $res -ne 2 && test $res -ne 128
 				then
 					err="${err};$die_msg"
 					continue