diff mbox series

[RFH] sequencer: simplify logic around stopped-sha file

Message ID xmqq8scymmo1.fsf@gitster.c.googlers.com (mailing list archive)
State Accepted
Commit 9af3360572069b937a146cbf552a12d7e24b7c91
Headers show
Series [RFH] sequencer: simplify logic around stopped-sha file | expand

Commit Message

Junio C Hamano Sept. 25, 2020, 5:48 a.m. UTC
While I was auditing the use of get_oid_committish(), I noticed that
an abbreviated object name is written to rebase-merge/stopped-sha
only to be read later and expanded to a full object name to be
passed to record_in_rewritten().  Nobody seems to expose this to the
end-user and it is unclear if there is a point in keeping it short
by abbreviating and risking to make it ambiguous as the rebase
progresses and creates new objects.

The attached patch tries to simplify the logic involved around this
file, to write and read full object names to/from the file, and the
result seems to pass testsuite---which in the ideal world would be
sufficient signal that this change is safe and sane, but it could be
that original authors thought that a change to stop abbreviating is
nonsense and not worth protecting the code against, hence RFH here.

Is there something obvious I am not seeing that makes this change a
bad idea (other than "somebody may be in the middle of a rebase and
all of a sudden, version of Git gets updated to contain this one,
which is unable to read abbreviated object name the current version
left on disk", which I am deliberately ignoring)?

Thanks.


 sequencer.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Johannes Schindelin Sept. 26, 2020, 9:28 p.m. UTC | #1
Hi Junio,

On Thu, 24 Sep 2020, Junio C Hamano wrote:

> While I was auditing the use of get_oid_committish(), I noticed that
> an abbreviated object name is written to rebase-merge/stopped-sha
> only to be read later and expanded to a full object name to be
> passed to record_in_rewritten().  Nobody seems to expose this to the
> end-user and it is unclear if there is a point in keeping it short
> by abbreviating and risking to make it ambiguous as the rebase
> progresses and creates new objects.
>
> The attached patch tries to simplify the logic involved around this
> file, to write and read full object names to/from the file, and the
> result seems to pass testsuite---which in the ideal world would be
> sufficient signal that this change is safe and sane, but it could be
> that original authors thought that a change to stop abbreviating is
> nonsense and not worth protecting the code against, hence RFH here.
>
> Is there something obvious I am not seeing that makes this change a
> bad idea (other than "somebody may be in the middle of a rebase and
> all of a sudden, version of Git gets updated to contain this one,
> which is unable to read abbreviated object name the current version
> left on disk", which I am deliberately ignoring)?

At least in my understanding, you are not missing anything:

- this file is an implementation detail,

- it is not exposed directly via any user-visible interface,

- any reader will _have_ to be prepared for an unabbreviated object ID (in
  the highly unlikely case that an object ID would be ambiguous if
  abbreviated even by one hex character),

- and most importantly: just like we expand the commit IDs in the todo
  list, we actually want to expand them in `stopped-sha` because it _is_
  possible that a new object is written that makes the previous
  unambiguously abbreviated object ID now ambiguous (e.g. when the user
  commits in a separate worktree while the rebase is interrupted, before
  continuing the rebase).

In short: ACK

Thank you,
Dscho

>
> Thanks.
>
>
>  sequencer.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index fd7701c88a..7dc9088d09 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -120,7 +120,7 @@ static GIT_PATH_FUNC(rebase_path_author_script, "rebase-merge/author-script")
>  static GIT_PATH_FUNC(rebase_path_amend, "rebase-merge/amend")
>  /*
>   * When we stop at a given patch via the "edit" command, this file contains
> - * the abbreviated commit name of the corresponding patch.
> + * the commit object name of the corresponding patch.
>   */
>  static GIT_PATH_FUNC(rebase_path_stopped_sha, "rebase-merge/stopped-sha")
>  /*
> @@ -3012,11 +3012,12 @@ static int make_patch(struct repository *r,
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	struct rev_info log_tree_opt;
> -	const char *subject, *p;
> +	const char *subject;
> +	char hex[GIT_MAX_HEXSZ + 1];
>  	int res = 0;
>
> -	p = short_commit_name(commit);
> -	if (write_message(p, strlen(p), rebase_path_stopped_sha(), 1) < 0)
> +	oid_to_hex_r(hex, &commit->object.oid);
> +	if (write_message(hex, strlen(hex), rebase_path_stopped_sha(), 1) < 0)
>  		return -1;
>  	res |= write_rebase_head(&commit->object.oid);
>
> @@ -4396,7 +4397,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
>
>  		if (read_oneliner(&buf, rebase_path_stopped_sha(),
>  				  READ_ONELINER_SKIP_IF_EMPTY) &&
> -		    !get_oid_committish(buf.buf, &oid))
> +		    !get_oid_hex(buf.buf, &oid))
>  			record_in_rewritten(&oid, peek_command(&todo_list, 0));
>  		strbuf_release(&buf);
>  	}
>
Junio C Hamano Sept. 27, 2020, 12:08 a.m. UTC | #2
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Is there something obvious I am not seeing that makes this change a
>> bad idea (other than "somebody may be in the middle of a rebase and
>> all of a sudden, version of Git gets updated to contain this one,
>> which is unable to read abbreviated object name the current version
>> left on disk", which I am deliberately ignoring)?
>
> At least in my understanding, you are not missing anything:
>
> - this file is an implementation detail,
>
> - it is not exposed directly via any user-visible interface,
>
> - any reader will _have_ to be prepared for an unabbreviated object ID (in
>   the highly unlikely case that an object ID would be ambiguous if
>   abbreviated even by one hex character),

Ah, no that was not what I was worried about.  If an existing code,
after restarting and reading the abbreviated object name back from
the file, compares that shortened string against list of shortened
strings it has in-core (perhaps it reads todo back from the user in
abbreviated form, does comparison with stopped-sha as-is, before it
expands the object names from todo to full length), this change
would break it.

> - and most importantly: just like we expand the commit IDs in the todo
>   list, we actually want to expand them in `stopped-sha` because it _is_
>   possible that a new object is written that makes the previous
>   unambiguously abbreviated object ID now ambiguous (e.g. when the user
>   commits in a separate worktree while the rebase is interrupted, before
>   continuing the rebase).

Exactly.  I just wasn't sure if stopped-sha is handled with the same
carefulness as the object names in todo, which are expanded after
read and shortened before given back to the users.

Thanks.
Johannes Schindelin Sept. 28, 2020, 7:51 p.m. UTC | #3
Hi Junio,

On Sat, 26 Sep 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> Is there something obvious I am not seeing that makes this change a
> >> bad idea (other than "somebody may be in the middle of a rebase and
> >> all of a sudden, version of Git gets updated to contain this one,
> >> which is unable to read abbreviated object name the current version
> >> left on disk", which I am deliberately ignoring)?
> >
> > [...]
> > - and most importantly: just like we expand the commit IDs in the todo
> >   list, we actually want to expand them in `stopped-sha` because it _is_
> >   possible that a new object is written that makes the previous
> >   unambiguously abbreviated object ID now ambiguous (e.g. when the user
> >   commits in a separate worktree while the rebase is interrupted, before
> >   continuing the rebase).
>
> Exactly.  I just wasn't sure if stopped-sha is handled with the same
> carefulness as the object names in todo, which are expanded after
> read and shortened before given back to the users.

The main purpose of `stopped-sha` is to let `git rebase --continue` after
an `edit` command amend the commit where it stopped _iff_ it is still
`HEAD`.

So yes, I think we need to be as careful here.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/sequencer.c b/sequencer.c
index fd7701c88a..7dc9088d09 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -120,7 +120,7 @@  static GIT_PATH_FUNC(rebase_path_author_script, "rebase-merge/author-script")
 static GIT_PATH_FUNC(rebase_path_amend, "rebase-merge/amend")
 /*
  * When we stop at a given patch via the "edit" command, this file contains
- * the abbreviated commit name of the corresponding patch.
+ * the commit object name of the corresponding patch.
  */
 static GIT_PATH_FUNC(rebase_path_stopped_sha, "rebase-merge/stopped-sha")
 /*
@@ -3012,11 +3012,12 @@  static int make_patch(struct repository *r,
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct rev_info log_tree_opt;
-	const char *subject, *p;
+	const char *subject;
+	char hex[GIT_MAX_HEXSZ + 1];
 	int res = 0;
 
-	p = short_commit_name(commit);
-	if (write_message(p, strlen(p), rebase_path_stopped_sha(), 1) < 0)
+	oid_to_hex_r(hex, &commit->object.oid);
+	if (write_message(hex, strlen(hex), rebase_path_stopped_sha(), 1) < 0)
 		return -1;
 	res |= write_rebase_head(&commit->object.oid);
 
@@ -4396,7 +4397,7 @@  int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 		if (read_oneliner(&buf, rebase_path_stopped_sha(),
 				  READ_ONELINER_SKIP_IF_EMPTY) &&
-		    !get_oid_committish(buf.buf, &oid))
+		    !get_oid_hex(buf.buf, &oid))
 			record_in_rewritten(&oid, peek_command(&todo_list, 0));
 		strbuf_release(&buf);
 	}