diff mbox series

[v2,3/7] rebase: store orig_head as a commit

Message ID 9daee95d434155742dbb19271eea8c0bc05eb365.1662561470.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series rebase --keep-base: imply --reapply-cherry-picks and --no-fork-point | expand

Commit Message

Phillip Wood Sept. 7, 2022, 2:37 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

Using a struct commit rather than a struct oid to hold orig_head means
that we error out straight away if the branch being rebased does not
point to a commit. It also simplifies the code that handles finding
the merge base and fork point as it no longer has to convert from an
oid to a commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/rebase.c | 61 +++++++++++++++++++++---------------------------
 1 file changed, 27 insertions(+), 34 deletions(-)

Comments

Junio C Hamano Sept. 7, 2022, 6:12 p.m. UTC | #1
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 56e4214b441..a3cf1ef5923 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -68,7 +68,7 @@ struct rebase_options {
>  	const char *upstream_name;
>  	const char *upstream_arg;
>  	char *head_name;
> -	struct object_id orig_head;
> +	struct commit *orig_head;

OK.

> @@ -261,13 +261,13 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
>  	struct replay_opts replay = get_replay_opts(opts);
>  	struct string_list commands = STRING_LIST_INIT_DUP;
>  
> -	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
> +	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head->object.oid,
>  				&revisions, &shortrevisions))
>  		return -1;

This step has ton of changes like this, which are fallouts from the
change of type of a member that used to be an oid to a full blown
commit object.  It is expected and I will strip all of them from my
quotes.  They all look correct, and otherwise the compiler would
complain ;-).

> @@ -448,7 +447,8 @@ static int read_basic_state(struct rebase_options *opts)
>  	} else if (!read_oneliner(&buf, state_dir_path("head", opts),
>  				  READ_ONELINER_WARN_MISSING))
>  		return -1;
> -	if (get_oid(buf.buf, &opts->orig_head))
> +	opts->orig_head = lookup_commit_reference_by_name(buf.buf);

This is not exactly a new problem, but I noticed it while looking
for more iffy uses of lookup_commit_reference_by_name(), so...

At this point in the codepath, we expect buf.buf has full-hex object
name and nothing else; the original should have used get_oid_hex()
to highlight that fact.  lookup_commit_reference_by_name() allows
object names that are not written as full-hex object name, and it
may get confused if a branch or tag with 40-hex (or 64-hex in a
repository with newhash) name exists.  It would be a more sensible
conversion to use get_oid_hex() to turn buf.buf into an object name
and then use lookup_commit_reference() on it.

> @@ -866,15 +866,11 @@ static int is_linear_history(struct commit *from, struct commit *to)
>  
>  static int can_fast_forward(struct commit *onto, struct commit *upstream,
>  			    struct commit *restrict_revision,
> -			    struct object_id *head_oid, struct object_id *merge_base)
> +			    struct commit *head, struct object_id *merge_base)
>  {
> -	struct commit *head = lookup_commit(the_repository, head_oid);
>  	struct commit_list *merge_bases = NULL;
>  	int res = 0;
>  
> -	if (!head)
> -		goto done;
> -

This one benefits from being able to avoid its own lookup_commit()
because the caller already has it in the desired form.

This is not a comment on the new code, but it does make readers
wonder if the conversion changes behaviour.  lookup_commit() takes
an object name and requires it to be a commit object's name, doesn't
it?  If we gave a tag to the program, the old code would have had
the object name of that tag in head_oid and at this point and
lookup_commit() noticed and would have stopped you from fast
forwarding your branch to the tag, which was a good thing.  In the
new code, since we turn the object name we take from the user into a
commit object way before the control reaches this place, we won't
get such an error here, but if we fast-forward to the object, we
will still fast forward to the commit that is pointed by the tag,
so the new behaviour is even better, perhaps?

> @@ -1610,17 +1606,17 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  		/* Is it a local branch? */
>  		strbuf_reset(&buf);
>  		strbuf_addf(&buf, "refs/heads/%s", branch_name);
> -		if (!read_ref(buf.buf, &options.orig_head)) {
> +		options.orig_head = lookup_commit_reference_by_name(buf.buf);
> +		if (options.orig_head) {
>  			die_if_checked_out(buf.buf, 1);
>  			options.head_name = xstrdup(buf.buf);
>  		/* If not is it a valid ref (branch or commit)? */

This is iffy, or it may be just wrong.

The old code is checking if "refs/heads/$branch_name" is a branch
and does the check.  If you had a branch "refs/heads/A" (whose ref
is at "refs/heads/refs/heads/A") but do not have a branch "A", and
if you fed "A" to this part of the code in buf.buf, then the
original code would not have been fooled by the presence of such a
funny branch.  New code (incorrectly) does because it prefixes
"refs/heads/" to "A" and asks to turn string "refs/heads/A" into a
commit object, triggering the usual ref dwim rules.

We end up setting options.head_name to a wrong thing (in this case,
the user said "A", we turned it into a refname "refs/heads/A" that
does not exist, and set options.orig_head to the commit object
pointed by the ref "refs/heads/refs/heads/A", and we use that commit
as orig_head, but use an incorrect head_name).

I didn't look as carefully as this one, but there may be similarly
iffy uses of lookup_commit_reference_by_name() introduced by this
patch that used to be more strict/exact; they may need to be fixed.

>  		} else {
> -			struct commit *commit =
> +			options.orig_head =
>  				lookup_commit_reference_by_name(branch_name);
> -			if (!commit)
> +			if (!options.orig_head)
>  				die(_("no such branch/commit '%s'"),
>  				    branch_name);
> -			oidcpy(&options.orig_head, &commit->object.oid);
>  			options.head_name = NULL;

This side, which is "this is what happens to a random object name
that is not a branch name", is perfectly fine.
Phillip Wood Sept. 8, 2022, 1:19 p.m. UTC | #2
Hi Junio

Thank you for your thoughtful comments

On 07/09/2022 19:12, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/builtin/rebase.c b/builtin/rebase.c
>> index 56e4214b441..a3cf1ef5923 100644
>> --- a/builtin/rebase.c
>> +++ b/builtin/rebase.c
>> @@ -448,7 +447,8 @@ static int read_basic_state(struct rebase_options *opts)
>>   	} else if (!read_oneliner(&buf, state_dir_path("head", opts),
>>   				  READ_ONELINER_WARN_MISSING))
>>   		return -1;
>> -	if (get_oid(buf.buf, &opts->orig_head))
>> +	opts->orig_head = lookup_commit_reference_by_name(buf.buf);
> 
> This is not exactly a new problem, but I noticed it while looking
> for more iffy uses of lookup_commit_reference_by_name(), so...
> 
> At this point in the codepath, we expect buf.buf has full-hex object
> name and nothing else; the original should have used get_oid_hex()
> to highlight that fact.  lookup_commit_reference_by_name() allows
> object names that are not written as full-hex object name, and it
> may get confused if a branch or tag with 40-hex (or 64-hex in a
> repository with newhash) name exists.  It would be a more sensible
> conversion to use get_oid_hex() to turn buf.buf into an object name
> and then use lookup_commit_reference() on it.

That's a good point, we expect orig_head to be a commit but I don't 
think we have a function that takes an oid and parses it as a commit 
(lookup_commit() just looks in the commit in the repository's parsed 
object hash and returns a newly allocated object if the object is not 
there). lookup_commit_reference() de-references a tags which we don't 
really want to do here if we're being strict but I'm not sure there is 
an easy way to avoid that.

We should probably be stricter when reading 'onto' as well which is also 
using get_oid() rather than get_oid_hex().

>> @@ -866,15 +866,11 @@ static int is_linear_history(struct commit *from, struct commit *to)
>>   
>>   static int can_fast_forward(struct commit *onto, struct commit *upstream,
>>   			    struct commit *restrict_revision,
>> -			    struct object_id *head_oid, struct object_id *merge_base)
>> +			    struct commit *head, struct object_id *merge_base)
>>   {
>> -	struct commit *head = lookup_commit(the_repository, head_oid);
>>   	struct commit_list *merge_bases = NULL;
>>   	int res = 0;
>>   
>> -	if (!head)
>> -		goto done;
>> -
> 
> This one benefits from being able to avoid its own lookup_commit()
> because the caller already has it in the desired form.
> 
> This is not a comment on the new code, but it does make readers
> wonder if the conversion changes behaviour.  lookup_commit() takes
> an object name and requires it to be a commit object's name, doesn't
> it?  If we gave a tag to the program, the old code would have had
> the object name of that tag in head_oid and at this point and
> lookup_commit() noticed and would have stopped you from fast
> forwarding your branch to the tag, which was a good thing.  In the
> new code, since we turn the object name we take from the user into a
> commit object way before the control reaches this place, we won't
> get such an error here, but if we fast-forward to the object, we
> will still fast forward to the commit that is pointed by the tag,
> so the new behaviour is even better, perhaps?

I don't think head_oid can point to a tag in the original as we will 
have done

	commit = lookup_commit_reference_by_name(branch)
	oidcpy(&options.orig_head, &commit->object.oid)

when parsing the branch name given on the command line. If the user does 
not give a branch name then we use HEAD which should not be a tag.

>> @@ -1610,17 +1606,17 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>>   		/* Is it a local branch? */
>>   		strbuf_reset(&buf);
>>   		strbuf_addf(&buf, "refs/heads/%s", branch_name);
>> -		if (!read_ref(buf.buf, &options.orig_head)) {
>> +		options.orig_head = lookup_commit_reference_by_name(buf.buf);
>> +		if (options.orig_head) {
>>   			die_if_checked_out(buf.buf, 1);
>>   			options.head_name = xstrdup(buf.buf);
>>   		/* If not is it a valid ref (branch or commit)? */
> 
> This is iffy, or it may be just wrong.

It's wrong, thanks for pointing that out.

> The old code is checking if "refs/heads/$branch_name" is a branch
> and does the check.  If you had a branch "refs/heads/A" (whose ref
> is at "refs/heads/refs/heads/A") but do not have a branch "A", and
> if you fed "A" to this part of the code in buf.buf, then the
> original code would not have been fooled by the presence of such a
> funny branch.  New code (incorrectly) does because it prefixes
> "refs/heads/" to "A" and asks to turn string "refs/heads/A" into a
> commit object, triggering the usual ref dwim rules.
> 
> We end up setting options.head_name to a wrong thing (in this case,
> the user said "A", we turned it into a refname "refs/heads/A" that
> does not exist, and set options.orig_head to the commit object
> pointed by the ref "refs/heads/refs/heads/A", and we use that commit
> as orig_head, but use an incorrect head_name).
> 
> I didn't look as carefully as this one, but there may be similarly
> iffy uses of lookup_commit_reference_by_name() introduced by this
> patch that used to be more strict/exact; they may need to be fixed.

The only other use added in this patch is

-               if (get_oid("HEAD", &options.orig_head))
-                       die(_("Could not resolve HEAD to a revision"));
+               options.orig_head = lookup_commit_reference_by_name("HEAD");
+               if (!options.orig_head)
+                       die(_("Could not resolve HEAD to a commit"));

So now we will de-reference HEAD to a commit if it points to a tag but I 
don't think that can happen with 'git checkout' and we'll complain if it 
somehow points to a tree or blob.

Thanks for your comments, I'll fix and re-roll

Phillip
diff mbox series

Patch

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 56e4214b441..a3cf1ef5923 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -68,7 +68,7 @@  struct rebase_options {
 	const char *upstream_name;
 	const char *upstream_arg;
 	char *head_name;
-	struct object_id orig_head;
+	struct commit *orig_head;
 	struct commit *onto;
 	const char *onto_name;
 	const char *revisions;
@@ -261,13 +261,13 @@  static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 	struct replay_opts replay = get_replay_opts(opts);
 	struct string_list commands = STRING_LIST_INIT_DUP;
 
-	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
+	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head->object.oid,
 				&revisions, &shortrevisions))
 		return -1;
 
 	if (init_basic_state(&replay,
 			     opts->head_name ? opts->head_name : "detached HEAD",
-			     opts->onto, &opts->orig_head)) {
+			     opts->onto, &opts->orig_head->object.oid)) {
 		free(revisions);
 		free(shortrevisions);
 
@@ -298,9 +298,8 @@  static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 		split_exec_commands(opts->cmd, &commands);
 		ret = complete_action(the_repository, &replay, flags,
 			shortrevisions, opts->onto_name, opts->onto,
-			&opts->orig_head, &commands, opts->autosquash,
-			opts->update_refs,
-			&todo_list);
+			&opts->orig_head->object.oid, &commands,
+			opts->autosquash, opts->update_refs, &todo_list);
 	}
 
 	string_list_clear(&commands, 0);
@@ -448,7 +447,8 @@  static int read_basic_state(struct rebase_options *opts)
 	} else if (!read_oneliner(&buf, state_dir_path("head", opts),
 				  READ_ONELINER_WARN_MISSING))
 		return -1;
-	if (get_oid(buf.buf, &opts->orig_head))
+	opts->orig_head = lookup_commit_reference_by_name(buf.buf);
+	if (!opts->orig_head)
 		return error(_("invalid orig-head: '%s'"), buf.buf);
 
 	if (file_exists(state_dir_path("quiet", opts)))
@@ -517,7 +517,7 @@  static int rebase_write_basic_state(struct rebase_options *opts)
 	write_file(state_dir_path("onto", opts), "%s",
 		   opts->onto ? oid_to_hex(&opts->onto->object.oid) : "");
 	write_file(state_dir_path("orig-head", opts), "%s",
-		   oid_to_hex(&opts->orig_head));
+		   oid_to_hex(&opts->orig_head->object.oid));
 	if (!(opts->flags & REBASE_NO_QUIET))
 		write_file(state_dir_path("quiet", opts), "%s", "");
 	if (opts->flags & REBASE_VERBOSE)
@@ -646,7 +646,7 @@  static int run_am(struct rebase_options *opts)
 			       /* this is now equivalent to !opts->upstream */
 			       &opts->onto->object.oid :
 			       &opts->upstream->object.oid),
-		    oid_to_hex(&opts->orig_head));
+		    oid_to_hex(&opts->orig_head->object.oid));
 
 	rebased_patches = xstrdup(git_path("rebased-patches"));
 	format_patch.out = open(rebased_patches,
@@ -680,7 +680,7 @@  static int run_am(struct rebase_options *opts)
 		free(rebased_patches);
 		strvec_clear(&am.args);
 
-		ropts.oid = &opts->orig_head;
+		ropts.oid = &opts->orig_head->object.oid;
 		ropts.branch = opts->head_name;
 		ropts.default_reflog_action = DEFAULT_REFLOG_ACTION;
 		reset_head(the_repository, &ropts);
@@ -833,7 +833,7 @@  static int checkout_up_to_date(struct rebase_options *options)
 	strbuf_addf(&buf, "%s: checkout %s",
 		    getenv(GIT_REFLOG_ACTION_ENVIRONMENT),
 		    options->switch_to);
-	ropts.oid = &options->orig_head;
+	ropts.oid = &options->orig_head->object.oid;
 	ropts.branch = options->head_name;
 	ropts.flags = RESET_HEAD_RUN_POST_CHECKOUT_HOOK;
 	if (!ropts.branch)
@@ -866,15 +866,11 @@  static int is_linear_history(struct commit *from, struct commit *to)
 
 static int can_fast_forward(struct commit *onto, struct commit *upstream,
 			    struct commit *restrict_revision,
-			    struct object_id *head_oid, struct object_id *merge_base)
+			    struct commit *head, struct object_id *merge_base)
 {
-	struct commit *head = lookup_commit(the_repository, head_oid);
 	struct commit_list *merge_bases = NULL;
 	int res = 0;
 
-	if (!head)
-		goto done;
-
 	merge_bases = get_merge_bases(onto, head);
 	if (!merge_bases || merge_bases->next) {
 		oidcpy(merge_base, null_oid());
@@ -1312,13 +1308,13 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 
 		if (read_basic_state(&options))
 			exit(1);
-		ropts.oid = &options.orig_head;
+		ropts.oid = &options.orig_head->object.oid;
 		ropts.branch = options.head_name;
 		ropts.flags = RESET_HEAD_HARD;
 		ropts.default_reflog_action = DEFAULT_REFLOG_ACTION;
 		if (reset_head(the_repository, &ropts) < 0)
 			die(_("could not move back to %s"),
-			    oid_to_hex(&options.orig_head));
+			    oid_to_hex(&options.orig_head->object.oid));
 		remove_branch_state(the_repository, 0);
 		ret = finish_rebase(&options);
 		goto cleanup;
@@ -1610,17 +1606,17 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 		/* Is it a local branch? */
 		strbuf_reset(&buf);
 		strbuf_addf(&buf, "refs/heads/%s", branch_name);
-		if (!read_ref(buf.buf, &options.orig_head)) {
+		options.orig_head = lookup_commit_reference_by_name(buf.buf);
+		if (options.orig_head) {
 			die_if_checked_out(buf.buf, 1);
 			options.head_name = xstrdup(buf.buf);
 		/* If not is it a valid ref (branch or commit)? */
 		} else {
-			struct commit *commit =
+			options.orig_head =
 				lookup_commit_reference_by_name(branch_name);
-			if (!commit)
+			if (!options.orig_head)
 				die(_("no such branch/commit '%s'"),
 				    branch_name);
-			oidcpy(&options.orig_head, &commit->object.oid);
 			options.head_name = NULL;
 		}
 	} else if (argc == 0) {
@@ -1639,8 +1635,9 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 			FREE_AND_NULL(options.head_name);
 			branch_name = "HEAD";
 		}
-		if (get_oid("HEAD", &options.orig_head))
-			die(_("Could not resolve HEAD to a revision"));
+		options.orig_head = lookup_commit_reference_by_name("HEAD");
+		if (!options.orig_head)
+			die(_("Could not resolve HEAD to a commit"));
 	} else
 		BUG("unexpected number of arguments left to parse");
 
@@ -1672,13 +1669,9 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 				options.onto_name);
 	}
 
-	if (options.fork_point > 0) {
-		struct commit *head =
-			lookup_commit_reference(the_repository,
-						&options.orig_head);
+	if (options.fork_point > 0)
 		options.restrict_revision =
-			get_fork_point(options.upstream_name, head);
-	}
+			get_fork_point(options.upstream_name, options.orig_head);
 
 	if (repo_read_index(the_repository) < 0)
 		die(_("could not read index"));
@@ -1708,7 +1701,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 	 * call it before checking allow_preemptive_ff.
 	 */
 	if (can_fast_forward(options.onto, options.upstream, options.restrict_revision,
-		    &options.orig_head, &merge_base) &&
+		    options.orig_head, &merge_base) &&
 	    allow_preemptive_ff) {
 		int flag;
 
@@ -1785,7 +1778,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 	strbuf_addf(&msg, "%s: checkout %s",
 		    getenv(GIT_REFLOG_ACTION_ENVIRONMENT), options.onto_name);
 	ropts.oid = &options.onto->object.oid;
-	ropts.orig_head = &options.orig_head,
+	ropts.orig_head = &options.orig_head->object.oid,
 	ropts.flags = RESET_HEAD_DETACH | RESET_ORIG_HEAD |
 			RESET_HEAD_RUN_POST_CHECKOUT_HOOK;
 	ropts.head_msg = msg.buf;
@@ -1799,7 +1792,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 	 * we just fast-forwarded.
 	 */
 	strbuf_reset(&msg);
-	if (oideq(&merge_base, &options.orig_head)) {
+	if (oideq(&merge_base, &options.orig_head->object.oid)) {
 		printf(_("Fast-forwarded %s to %s.\n"),
 			branch_name, options.onto_name);
 		strbuf_addf(&msg, "rebase finished: %s onto %s",
@@ -1820,7 +1813,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 		    (options.restrict_revision ?
 		     oid_to_hex(&options.restrict_revision->object.oid) :
 		     oid_to_hex(&options.upstream->object.oid)),
-		    oid_to_hex(&options.orig_head));
+		    oid_to_hex(&options.orig_head->object.oid));
 
 	options.revisions = revisions.buf;