[07/10] bisect--helper: reimplement `bisect_state` & `bisect_head` shell functions in C
diff mbox series

Message ID 20200226101429.81327-8-mirucam@gmail.com
State New
Headers show
Series
  • Finish converting git bisect to C part 2
Related show

Commit Message

Miriam Rubio Feb. 26, 2020, 10:14 a.m. UTC
From: Pranit Bauva <pranit.bauva@gmail.com>

Reimplement the `bisect_state()` shell functions in C and also add a
subcommand `--bisect-state` to `git-bisect--helper` to call them from
git-bisect.sh .

Using `--bisect-state` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.

`bisect_head()` is only called from `bisect_state()`, thus it is not
required to introduce another subcommand.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
---
 builtin/bisect--helper.c | 80 ++++++++++++++++++++++++++++++++++++++++
 git-bisect.sh            | 55 ++-------------------------
 2 files changed, 84 insertions(+), 51 deletions(-)

Comments

Johannes Schindelin Feb. 27, 2020, 11:12 p.m. UTC | #1
Hi,

On Wed, 26 Feb 2020, Miriam Rubio wrote:

> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index f9b04bee23..fedcad4d9b 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -833,6 +835,74 @@ static int bisect_autostart(struct bisect_terms *terms)
>  	return BISECT_OK;
>  }
>
> +static char *bisect_head(void)
> +{
> +	if (is_empty_or_missing_file(git_path_bisect_head()))
> +		return "HEAD";
> +	else
> +		return "BISECT_HEAD";

It is relatively common, and makes it easier to read at least to me, to
omit the `else` here: the conditional `return` already leaves the function
early.

> +}
> +
> +static enum bisect_error bisect_state(struct bisect_terms *terms, const char **argv,
> +			int argc)
> +{
> +	const char *state = argv[0];
> +
> +	if (check_and_set_terms(terms, state))
> +		return BISECT_FAILED;
> +
> +	if (!argc)
> +		return error(_("Please call `--bisect-state` with at least one argument"));
> +
> +	if (argc == 1 && one_of(state, terms->term_good,
> +	    terms->term_bad, "skip", NULL)) {
> +		const char *bisected_head = xstrdup(bisect_head());

Do we really want to `strdup()` it? That would leak memory, no? And I
don't think that we need this to be allocated anywhere. It's not like we
are trying to modify the characters or that we need to take custody of the
string: `bisect_head()` will return a static, immutable string.

But then, it is not very natural for C code to let `bisect_head()` return
a string. Why not return the already-parsed object ID? That would also
make that function more robust: instead of expecting `BISECT_HEAD` to be
stored in the file `BISECT_HEAD` in the `.git/` directory, you could
simply call `get_oid("BISECT_HEAD", &oid)` and if that fails, fall back to
do the same with `HEAD`. That way, you tap into the refs machinery and the
bisect code would be less dependent on implementation details.

> +		const char *hex[1];
> +		struct object_id oid;
> +
> +		if (get_oid(bisected_head, &oid))
> +			return error(_("Bad rev input: %s"), bisected_head);
> +		if (bisect_write(state, oid_to_hex(&oid), terms, 0))
> +			return BISECT_FAILED;
> +
> +		*hex = xstrdup(oid_to_hex(&oid));
> +		check_expected_revs(hex, 1);

Hmm. Do we expect `check_expected_revs()` to modify what is passed into
it? If not, we could probably simplify this a lot (and avoid another
memory leak) by something like this:

		const char *hex;

		[...]
		hex = oid_to_hex(&oid);
		check_expected_revs(&hex, 1);

However, taking a step back, I wonder whether it makes sense for
`check_expected_revs()` to accept `const char **revs` instead of `struct
object_id *oids`.

Looking at the definition of `check_expected_revs()`, I would think that
it would be better to pass object IDs in, rather than strings:

static int is_expected_rev(const char *expected_hex)
{
        struct strbuf actual_hex = STRBUF_INIT;
        int res = 0;
        if (strbuf_read_file(&actual_hex, git_path_bisect_expected_rev(), 0) >= 40) {
                strbuf_trim(&actual_hex);
                res = !strcmp(actual_hex.buf, expected_hex);
        }
        strbuf_release(&actual_hex);
        return res;
}

static void check_expected_revs(const char **revs, int rev_nr)
{
        int i;

        for (i = 0; i < rev_nr; i++) {
                if (!is_expected_rev(revs[i])) {
                        unlink_or_warn(git_path_bisect_ancestors_ok());
                        unlink_or_warn(git_path_bisect_expected_rev());
                }
        }
}

There are a couple of observation that immediately come to my mind:

- Reading the bisect_expected_rev file for each rev, only to immediately
  release it before the next iteration, is wasteful

- We can break out of the loop immediately once `is_expected_rev()`
  returns 0 because we are then unlinking the very file that
  `is_expected_rev()` checks against.

- As I suspected above, a much cleaner interface would use `struct
  object_id **revs`.

- Keeping the code of `is_expected_rev()` separate from
  `check_expected_revs()` does not make the code more readable for me, but
  actually less readable instead.

- Why are we hard-coding the 40? At this time and age, we've _got_ to use
  `the_hash_algo->hexsz` instead.

In other words, I would do something like this instead:

static void check_expected_revs(struct object_id **revs, int rev_nr)
{
	struct object_id expected;
	struct strbuf buf = STRBUF_INIT;
	int i;

	if (!rev_nr)
		return;

        if (strbuf_read_file(&buf, git_path_bisect_expected_rev(), 0)
	      < the_hash_algo->hexsz ||
	    get_oid_hex(buf.buf, &expected) < 0)
		return; /* Ignore invalid file contents */

	for (i = 0; i < rev_nr; i++)
		if (!oideq(revs[i], &expected)) {
			... unlink ...
			return;
		}
}

However, taking _yet_ another step back, it seems a bit silly to compare
many revs (that need to be provided as 40-digit hex strings) to the
contents of a file that we expect to contain exactly one 40-digit hex
string. Surely, we do not want to take an arbitrary number of revisions,
then expect all of them to be identical to the _same_ rev, and otherwise
delete that file from where that rev came from?

We could even skip reading that file if two or more revisions were passed
to `check_expected_revs()` unless _all_ of them are identical!

So let's look at the actual caller of this thing:

bisect_state() {
        bisect_autostart
        state=$1
        git bisect--helper --check-and-set-terms $state $TERM_GOOD $TERM_BAD || exit
        get_terms
        case "$#,$state" in
        0,*)
                die "Please call 'bisect_state' with at least one argument." ;;
        1,"$TERM_BAD"|1,"$TERM_GOOD"|1,skip)
                bisected_head=$(bisect_head)
                rev=$(git rev-parse --verify "$bisected_head") ||
                        die "$(eval_gettext "Bad rev input: \$bisected_head")"
                git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit
                git bisect--helper --check-expected-revs "$rev" ;;
        2,"$TERM_BAD"|*,"$TERM_GOOD"|*,skip)
                shift
                hash_list=''
                for rev in "$@"
                do
                        sha=$(git rev-parse --verify "$rev^{commit}") ||
                                die "$(eval_gettext "Bad rev input: \$rev")"
                        hash_list="$hash_list $sha"
                done
                for rev in $hash_list
                do
                        git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit
                done
                git bisect--helper --check-expected-revs $hash_list ;;
        *,"$TERM_BAD")
                die "$(eval_gettext "'git bisect \$TERM_BAD' can take only one argument.")" ;;
        *)
                usage ;;
        esac
        bisect_auto_next
}

So there are actually two callers: the first one in the 1,* arm, the
second one in the 2,* arm. The first one is easy: it only passes a single
rev to `git bisect--helper --check-expected-revs`.

The second caller indeed can pass multiple arguments to
`check_expected_revs()`, but only if the first argument was not `bad`.

But this whole shell function is super ugly, and I do not believe that
we're served well by translating it verbatim to even super uglier C code.

I propose to instead rewrite it substantially, in the following way:

- Handle the `argc == 0` case first thing: in this case, die, complaining
  that we need at least one argument.

  (You already did that. Although `state` was already assigned to
   `argv[0]` at that stage, and I think we don't really want that, just in
   case that _one_ call chain ends up using `NULL, 0` for `argv, argc`.)

- Next, assign `state = argv[0]` and verify that it is one of `good`,
  `bad` or `skip`, otherwise erroring out.

- At this stage, we should do the equivalent of a `shift`: `argv++;
  argc--;`

- Now, we probably need to special-case the `bad` case and verify that
  `argc <= 1`, otherwise error out.

  Side note: I do _not_ understand this restriction. Why shouldn't it be
  valid to pass several revisions to `git bisect bad <revisions>...`?

- At this point, if `argc == 0`, we should parse `bisect_head()` into a
  `struct object_id` and use that as `revs`, setting `argc = 0`.
  Otherwise, if `argc > 0`, we should populate a `struct object_array` and
  use that as `revs`.

- And finally, we should iterate over these revs and call `bisect_write()`
  with them, and in the same loop we can also take care of the expected
  rev. It does not make sense to keep those separate from one another
  (i.e. to have _two_ loops).

That would be a much more natural, and readable, flow, at least to me.

> +	}
> +
> +	if ((argc == 2 && !strcmp(state, terms->term_bad)) ||
> +			one_of(state, terms->term_good, "skip", NULL)) {
> +		int i;
> +		struct string_list hex = STRING_LIST_INIT_DUP;
> +
> +		for (i = 1; i < argc; i++) {
> +			struct object_id oid;
> +
> +			if (get_oid(argv[i], &oid)) {

The original does `git rev-parse --verify "$rev^{commit}"` here, so I
think we really want `get_oid_commit()` here, not just `get_oid()`.

> +				string_list_clear(&hex, 0);
> +				return error(_("Bad rev input: %s"), argv[i]);
> +			}
> +			string_list_append(&hex, oid_to_hex(&oid));

Converting the parsed object ID to hex and storing _that_ as a string and
throwing away the parsed object ID looks not very much like C, but much
more like shell script. Let's not do that. Let's store the parsed data in
a proper `struct object_array`, and only resort to `oid_to_hex()` when we
_have_ to.

> +		}
> +		for (i = 0; i < hex.nr; i++) {
> +			const char **hex_string = (const char **) &hex.items[i].string;
> +			if (bisect_write(state, *hex_string, terms, 0)) {
> +				string_list_clear(&hex, 0);
> +				return BISECT_FAILED;
> +			}
> +			check_expected_revs(hex_string, 1);
> +		string_list_clear(&hex, 0);
> +		return bisect_auto_next(terms, NULL);
> +	}
> +
> +	if (!strcmp(state, terms->term_bad))
> +		return error(_("'git bisect %s' can take only one argument."),
> +		      terms->term_bad);

That is an awful late time in the function for a reviewer to read about a
condition that should be an early error.

> +
> +	return BISECT_FAILED;
> +}
> +
>  int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>  {
>  	enum {
> [...]
> @@ -944,6 +1017,13 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>  		set_terms(&terms, "bad", "good");
>  		res = bisect_autostart(&terms);
>  		break;
> +	case BISECT_STATE:
> +		if (!argc)
> +			return error(_("--bisect-state requires at least one revision"));

Wait, we _already_ test for that here? Then we do not need the very same
check in `bisect_state()`.

I'd rather have it in `bisect_state()`.

Besides, we're in a `cmd_*()` function, so we cannot return `error()`
because that translates to `-1` and `cmd_*()` functions need to return
exit codes, i.e. numbers between 0 and 127.

> +		set_terms(&terms, "bad", "good");
> +		get_terms(&terms);
> +		res = bisect_state(&terms, argv, argc);
> +		break;
>  	default:
>  		return error("BUG: unknown subcommand '%d'", cmdmode);

Granted, this incorrect `return error()` in a `cmd_*()` function was there
before your patch. It is just as incorrect, though.

>  	}
> diff --git a/git-bisect.sh b/git-bisect.sh
> index 049ffacdff..7f10187055 100755
> --- a/git-bisect.sh
> +++ b/git-bisect.sh
> [...]
> @@ -185,8 +139,7 @@ exit code \$res from '\$command' is < 0 or >= 128" >&2
>  			state="$TERM_GOOD"
>  		fi
>
> -		# We have to use a subshell because "bisect_state" can exit.
> -		( bisect_state $state >"$GIT_DIR/BISECT_RUN" )
> +		( git bisect--helper --bisect-state $state >"$GIT_DIR/BISECT_RUN" )

This subshell looks pointless. Before, we wanted to be able to catch the
`die` in `bisect_state`. But since the `bisect--helper` cannot cause the
_shell script_ to exit, we can just drop the subshell here.

Phew, that was quite a lot, and the review of this patch also took two
hours. I'll have to stop here, once again, and hope that I will find time
soon to continue the review.

I left you quite a bit of work here, but I hope that all of my suggestions
are clear enough to act on. If not, please holler.

Ciao,
Dscho

>  		res=$?
>
>  		cat "$GIT_DIR/BISECT_RUN"
> @@ -201,7 +154,7 @@ exit code \$res from '\$command' is < 0 or >= 128" >&2
>  		if [ $res -ne 0 ]
>  		then
>  			eval_gettextln "bisect run failed:
> -'bisect_state \$state' exited with error code \$res" >&2
> +'git bisect--helper --bisect-state \$state' exited with error code \$res" >&2
>  			exit $res
>  		fi
>
> @@ -242,7 +195,7 @@ case "$#" in
>  	start)
>  		git bisect--helper --bisect-start "$@" ;;
>  	bad|good|new|old|"$TERM_BAD"|"$TERM_GOOD")
> -		bisect_state "$cmd" "$@" ;;
> +		git bisect--helper --bisect-state "$cmd" "$@" ;;
>  	skip)
>  		bisect_skip "$@" ;;
>  	next)
> --
> 2.25.0
>
>

Patch
diff mbox series

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index f9b04bee23..fedcad4d9b 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -31,6 +31,8 @@  static const char * const git_bisect_helper_usage[] = {
 	N_("git bisect--helper --bisect-next"),
 	N_("git bisect--helper --bisect-auto-next"),
 	N_("git bisect--helper --bisect-autostart"),
+	N_("git bisect--helper --bisect-state (bad|new) [<rev>]"),
+	N_("git bisect--helper --bisect-state (good|old) [<rev>...]"),
 	NULL
 };
 
@@ -833,6 +835,74 @@  static int bisect_autostart(struct bisect_terms *terms)
 	return BISECT_OK;
 }
 
+static char *bisect_head(void)
+{
+	if (is_empty_or_missing_file(git_path_bisect_head()))
+		return "HEAD";
+	else
+		return "BISECT_HEAD";
+}
+
+static enum bisect_error bisect_state(struct bisect_terms *terms, const char **argv,
+			int argc)
+{
+	const char *state = argv[0];
+
+	if (check_and_set_terms(terms, state))
+		return BISECT_FAILED;
+
+	if (!argc)
+		return error(_("Please call `--bisect-state` with at least one argument"));
+
+	if (argc == 1 && one_of(state, terms->term_good,
+	    terms->term_bad, "skip", NULL)) {
+		const char *bisected_head = xstrdup(bisect_head());
+		const char *hex[1];
+		struct object_id oid;
+
+		if (get_oid(bisected_head, &oid))
+			return error(_("Bad rev input: %s"), bisected_head);
+		if (bisect_write(state, oid_to_hex(&oid), terms, 0))
+			return BISECT_FAILED;
+
+		*hex = xstrdup(oid_to_hex(&oid));
+		check_expected_revs(hex, 1);
+		return bisect_auto_next(terms, NULL);
+	}
+
+	if ((argc == 2 && !strcmp(state, terms->term_bad)) ||
+			one_of(state, terms->term_good, "skip", NULL)) {
+		int i;
+		struct string_list hex = STRING_LIST_INIT_DUP;
+
+		for (i = 1; i < argc; i++) {
+			struct object_id oid;
+
+			if (get_oid(argv[i], &oid)) {
+				string_list_clear(&hex, 0);
+				return error(_("Bad rev input: %s"), argv[i]);
+			}
+			string_list_append(&hex, oid_to_hex(&oid));
+		}
+		for (i = 0; i < hex.nr; i++) {
+			const char **hex_string = (const char **) &hex.items[i].string;
+			if (bisect_write(state, *hex_string, terms, 0)) {
+				string_list_clear(&hex, 0);
+				return BISECT_FAILED;
+			}
+			check_expected_revs(hex_string, 1);
+		}
+		string_list_clear(&hex, 0);
+		return bisect_auto_next(terms, NULL);
+	}
+
+	if (!strcmp(state, terms->term_bad))
+		return error(_("'git bisect %s' can take only one argument."),
+		      terms->term_bad);
+
+	return BISECT_FAILED;
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
 	enum {
@@ -847,6 +917,7 @@  int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 		BISECT_NEXT,
 		BISECT_AUTO_NEXT,
 		BISECT_AUTOSTART,
+		BISECT_STATE
 	} cmdmode = 0;
 	int no_checkout = 0, res = 0, nolog = 0;
 	struct option options[] = {
@@ -872,6 +943,8 @@  int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 			 N_("verify the next bisection state then checkout the next bisection commit"), BISECT_AUTO_NEXT),
 		OPT_CMDMODE(0, "bisect-autostart", &cmdmode,
 			 N_("start the bisection if BISECT_START is empty or missing"), BISECT_AUTOSTART),
+		OPT_CMDMODE(0, "bisect-state", &cmdmode,
+			 N_("mark the state of ref (or refs)"), BISECT_STATE),
 		OPT_BOOL(0, "no-checkout", &no_checkout,
 			 N_("update BISECT_HEAD instead of checking out the current commit")),
 		OPT_BOOL(0, "no-log", &nolog,
@@ -944,6 +1017,13 @@  int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 		set_terms(&terms, "bad", "good");
 		res = bisect_autostart(&terms);
 		break;
+	case BISECT_STATE:
+		if (!argc)
+			return error(_("--bisect-state requires at least one revision"));
+		set_terms(&terms, "bad", "good");
+		get_terms(&terms);
+		res = bisect_state(&terms, argv, argc);
+		break;
 	default:
 		return error("BUG: unknown subcommand '%d'", cmdmode);
 	}
diff --git a/git-bisect.sh b/git-bisect.sh
index 049ffacdff..7f10187055 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -39,16 +39,6 @@  _x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
 TERM_BAD=bad
 TERM_GOOD=good
 
-bisect_head()
-{
-	if test -f "$GIT_DIR/BISECT_HEAD"
-	then
-		echo BISECT_HEAD
-	else
-		echo HEAD
-	fi
-}
-
 bisect_skip() {
 	all=''
 	for arg in "$@"
@@ -61,43 +51,7 @@  bisect_skip() {
 		esac
 		all="$all $revs"
 	done
-	eval bisect_state 'skip' $all
-}
-
-bisect_state() {
-	git bisect--helper --bisect-autostart
-	state=$1
-	git bisect--helper --check-and-set-terms $state $TERM_GOOD $TERM_BAD || exit
-	get_terms
-	case "$#,$state" in
-	0,*)
-		die "Please call 'bisect_state' with at least one argument." ;;
-	1,"$TERM_BAD"|1,"$TERM_GOOD"|1,skip)
-		bisected_head=$(bisect_head)
-		rev=$(git rev-parse --verify "$bisected_head") ||
-			die "$(eval_gettext "Bad rev input: \$bisected_head")"
-		git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit
-		git bisect--helper --check-expected-revs "$rev" ;;
-	2,"$TERM_BAD"|*,"$TERM_GOOD"|*,skip)
-		shift
-		hash_list=''
-		for rev in "$@"
-		do
-			sha=$(git rev-parse --verify "$rev^{commit}") ||
-				die "$(eval_gettext "Bad rev input: \$rev")"
-			hash_list="$hash_list $sha"
-		done
-		for rev in $hash_list
-		do
-			git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit
-		done
-		git bisect--helper --check-expected-revs $hash_list ;;
-	*,"$TERM_BAD")
-		die "$(eval_gettext "'git bisect \$TERM_BAD' can take only one argument.")" ;;
-	*)
-		usage ;;
-	esac
-	git bisect--helper --bisect-auto-next
+	eval git bisect--helper --bisect-state 'skip' $all
 }
 
 bisect_visualize() {
@@ -185,8 +139,7 @@  exit code \$res from '\$command' is < 0 or >= 128" >&2
 			state="$TERM_GOOD"
 		fi
 
-		# We have to use a subshell because "bisect_state" can exit.
-		( bisect_state $state >"$GIT_DIR/BISECT_RUN" )
+		( git bisect--helper --bisect-state $state >"$GIT_DIR/BISECT_RUN" )
 		res=$?
 
 		cat "$GIT_DIR/BISECT_RUN"
@@ -201,7 +154,7 @@  exit code \$res from '\$command' is < 0 or >= 128" >&2
 		if [ $res -ne 0 ]
 		then
 			eval_gettextln "bisect run failed:
-'bisect_state \$state' exited with error code \$res" >&2
+'git bisect--helper --bisect-state \$state' exited with error code \$res" >&2
 			exit $res
 		fi
 
@@ -242,7 +195,7 @@  case "$#" in
 	start)
 		git bisect--helper --bisect-start "$@" ;;
 	bad|good|new|old|"$TERM_BAD"|"$TERM_GOOD")
-		bisect_state "$cmd" "$@" ;;
+		git bisect--helper --bisect-state "$cmd" "$@" ;;
 	skip)
 		bisect_skip "$@" ;;
 	next)