diff mbox series

[RFC] rebase -m: partial support for copying extra commit headers

Message ID pull.1902.git.1744041163929.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series [RFC] rebase -m: partial support for copying extra commit headers | expand

Commit Message

Phillip Wood April 7, 2025, 3:52 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

There are external projects such as GitButler that add extra headers to
the commit objects that they create. Unfortunately these headers are
lost when the user runs "git rebase". In the absence of merge conflicts
copying these headers across to the rebased commit is relatively
straight forward as the sequencer creates the rebased commits using
commit_tree_extended() rather than forking "git commit". Preserving them
in the presence of merge conflict would mean that we either need to add
a way to creating extra headers when running "git commit" or modifying
the sequencer to stop using git commit when the commit message needs to
be edited. Both of these options involve a significant amount of more
work.

While losing the extra headers if there are merge conflicts is a
significant shortcoming for users rebasing their branches it is not such
a problem on the server where forges typically reject a rebase if it has
conflicts. Therefore even though this commit is far from a complete
solution it is a significant improvement for forges that have not yet
moved to using "git replay" which already preserves extra commit
headers.

In the long run we would also want to preserve extra headers when
cherry-picking a commit. As we cannot currently preserve extra headers
when the user wishes to edit the commit message of the cherry-picked
commit this patch only changes the behavior of "git rebase"

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
    [RFC] rebase -m: partial support for copying extra commit headers
    
    This patch is largely a response to
    https://lore.kernel.org/git/Z-5rpWKAVPmz32jC@pks.im/ . I'm in two minds
    about whether we should consider merging such partial support but if it
    helps forges preserve extra commit headers then it may well be worth it.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1902%2Fphillipwood%2Frebase-preserve-extra-commit-headers-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1902/phillipwood/rebase-preserve-extra-commit-headers-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1902

 sequencer.c                   | 25 +++++++++++++-----
 t/t3404-rebase-interactive.sh | 48 +++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 7 deletions(-)


base-commit: a36e024e989f4d35f35987a60e3af8022cac3420

Comments

brian m. carlson April 8, 2025, 1:22 a.m. UTC | #1
On 2025-04-07 at 15:52:43, Phillip Wood via GitGitGadget wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
>     [RFC] rebase -m: partial support for copying extra commit headers
>     
>     This patch is largely a response to
>     https://lore.kernel.org/git/Z-5rpWKAVPmz32jC@pks.im/ . I'm in two minds
>     about whether we should consider merging such partial support but if it
>     helps forges preserve extra commit headers then it may well be worth it.

I'd like to see command-line options to control this and ideally a
configuration option.  Right now, we know nothing about these extra
headers, including an expected format.  If a future version of Git (say,
3.0) adds a new header and the user includes invalid data in this extra
header (which happens all the time with author and committer
information), then 2.50 will propagate it on rebase and it won't be
fixed until the user uses a version of Git that understands the header
and can fsck it correctly.  That's not really great, since it means we
can unknowingly spread corruption.

I am pretty sure that at $DAYJOB we'll need to have a discussion about
whether we want to propagate these headers during rebase and I'm
personally leaning against it.

Why, you ask?  I've seen at least the following types of corruption:

* Missing timezones
* Timezones with less than four digits
* Valid timezones padded to more than four digits with zeros
* Timezones which don't exist and never have (e.g., +1700)
* Timezones which are so absurdly large that they push the date to a
  year when nobody alive now will still be living
* Date stamps that are larger than 2^64
* Date stamps which are smaller than 2^64 but beyond the expected life
  of the Sun
* Extra angle brackets in the email field
* Nothing in between the email brackets
* Nothing before the email brackets (no name at all)
* Names which are not UTF-8 but without an encoding header
* Names which are not valid in the specified encoding
* Emails which are not valid UTF-8[0]
* Emails which don't meet the (ludicrously generous to the point of
  being nearly unparseable) RFC production
* Encodings which are not valid IANA charsets
* Messages with no body and no blank line (just the newline at the end
  of the final header)
* gpgsig headers that include random non-ASCII bytes and control
  characters[1]

Note that all of these must be parsed in some meaningful way because
users don't want their forge to serve them a 500 despite them having
sent wildly invalid data.  I encountered these during part of our
transition from Rugged to Git (reftable, SHA-256) and they definitely
added a lot of interesting complications (plus the need for lots of
tests).

Considering that writing valid data should not be that hard[2] (and
should definitely be a priority) but apparently is for many people, I'm
very wary of us propagating headers we're not ready to fsck and I'd like
to have an out for users and forges who would like to be a little more
careful.

With those constraints, I'm not totally opposed in principle to this
feature.

I see Patrick is CC'd here and I'm interested in his thoughts, as well
as, of course, those of anyone else as well.

[0] SMTPUTF8 (RFC 6531 et al.) specifies that mailbox names may now
contain UTF-8.  For instance, you can email 
diff mbox series

Patch

diff --git a/sequencer.c b/sequencer.c
index ad0ab75c8d4..5b92f77b660 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1527,6 +1527,9 @@  static int parse_head(struct repository *r, struct commit **head)
 	return 0;
 }
 
+/* Headers to exclude when copying extra commit headers */
+static const char *exclude_gpgsig[] = { "gpgsig", "gpgsig-sha256", NULL };
+
 /*
  * Try to commit without forking 'git commit'. In some cases we need
  * to run 'git commit' to display an error message
@@ -1538,6 +1541,7 @@  static int parse_head(struct repository *r, struct commit **head)
  */
 static int try_to_commit(struct repository *r,
 			 struct strbuf *msg, const char *author,
+			 struct commit_extra_header *extra_header,
 			 struct replay_opts *opts, unsigned int flags,
 			 struct object_id *oid)
 {
@@ -1545,7 +1549,7 @@  static int try_to_commit(struct repository *r,
 	struct object_id tree;
 	struct commit *current_head = NULL;
 	struct commit_list *parents = NULL;
-	struct commit_extra_header *extra = NULL;
+	struct commit_extra_header *extra = extra_header;
 	struct strbuf err = STRBUF_INIT;
 	struct strbuf commit_msg = STRBUF_INIT;
 	char *amend_author = NULL;
@@ -1561,7 +1565,6 @@  static int try_to_commit(struct repository *r,
 		return -1;
 
 	if (flags & AMEND_MSG) {
-		const char *exclude_gpgsig[] = { "gpgsig", "gpgsig-sha256", NULL };
 		const char *out_enc = get_commit_output_encoding();
 		const char *message = repo_logmsg_reencode(r, current_head,
 							   NULL, out_enc);
@@ -1714,7 +1717,8 @@  static int try_to_commit(struct repository *r,
 		commit_post_rewrite(r, current_head, oid);
 
 out:
-	free_commit_extra_headers(extra);
+	if (extra != extra_header)
+		free_commit_extra_headers(extra);
 	free_commit_list(parents);
 	strbuf_release(&err);
 	strbuf_release(&commit_msg);
@@ -1734,6 +1738,7 @@  static int write_rebase_head(struct object_id *oid)
 
 static int do_commit(struct repository *r,
 		     const char *msg_file, const char *author,
+		     struct commit_extra_header *extra_header,
 		     struct replay_opts *opts, unsigned int flags,
 		     struct object_id *oid)
 {
@@ -1749,7 +1754,7 @@  static int do_commit(struct repository *r,
 					   msg_file);
 
 		res = try_to_commit(r, msg_file ? &sb : NULL,
-				    author, opts, flags, &oid);
+				    author, extra_header, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
 			refs_delete_ref(get_main_ref_store(r), "",
@@ -2251,6 +2256,7 @@  static int do_pick_commit(struct repository *r,
 	int res, unborn = 0, reword = 0, allow, drop_commit;
 	enum todo_command command = item->command;
 	struct commit *commit = item->commit;
+	struct commit_extra_header *extra_header = NULL;
 
 	if (opts->no_commit) {
 		/*
@@ -2391,8 +2397,12 @@  static int do_pick_commit(struct repository *r,
 			strbuf_addstr(&ctx->message, oid_to_hex(&commit->object.oid));
 			strbuf_addstr(&ctx->message, ")\n");
 		}
-		if (!is_fixup(command))
+		if (!is_fixup(command)) {
 			author = get_author(msg.message);
+			if (is_rebase_i(opts))
+				extra_header = read_commit_extra_headers(commit,
+								exclude_gpgsig);
+		}
 	}
 	ctx->have_message = 1;
 
@@ -2503,8 +2513,8 @@  static int do_pick_commit(struct repository *r,
 	} /* else allow == 0 and there's nothing special to do */
 	if (!opts->no_commit && !drop_commit) {
 		if (author || command == TODO_REVERT || (flags & AMEND_MSG))
-			res = do_commit(r, msg_file, author, opts, flags,
-					commit? &commit->object.oid : NULL);
+			res = do_commit(r, msg_file, author, extra_header, opts,
+					flags, commit? &commit->object.oid : NULL);
 		else
 			res = error(_("unable to parse commit author"));
 		*check_todo = !!(flags & EDIT_MSG);
@@ -2535,6 +2545,7 @@  fast_forward_edit:
 leave:
 	free_message(commit, &msg);
 	free(author);
+	free_commit_extra_headers(extra_header);
 	update_abort_safety_file();
 
 	return res;
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 2aee9789a2f..200f55824c7 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -2297,6 +2297,54 @@  test_expect_success 'non-merge commands reject merge commits' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unconflicted pick copies extra commit headers' '
+	tree="$(git rev-parse C^{tree})" &&
+	parent="$(git rev-parse C^{commit})" &&
+	for i in 1 2 3 4 5 6 7
+	do
+		cat >commit <<-EOF &&
+		tree $tree
+		parent $parent
+		author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		x-extra-header value $i
+
+		An empty commit with an extra header $i
+		EOF
+
+		parent="$(git hash-object -t commit -w commit)" &&
+		eval "oid$i=\$parent" &&
+		test_tick || return 1
+	done &&
+
+	cat >todo <<-EOF &&
+	pick $oid1
+	pick $oid2
+	fixup $oid3
+	reword $oid4
+	edit $oid5
+	pick $oid6
+	squash $oid7
+	EOF
+
+	(
+		set_replace_editor todo &&
+		FAKE_COMMIT_AMEND=EDITED git rebase -i --onto A $oid1^ $oid5
+	) &&
+	echo changed >file2 &&
+	git add file2 &&
+	FAKE_COMMIT_AMEND=EDITED git rebase --continue &&
+	j=4 &&
+	for i in 1 2 4 5 6
+	do
+		git cat-file commit HEAD~$j >actual &&
+		sed -n -e /^\$/q -e /^x-extra/p actual >actual.extra-header &&
+		echo "x-extra-header value $i" >expect &&
+		test_cmp expect actual.extra-header &&
+		j=$((j-1)) || return 1
+	done
+'
+
 # This must be the last test in this file
 test_expect_success '$EDITOR and friends are unchanged' '
 	test_editor_unchanged