diff mbox series

[v2,8/8] refs: add support for migrating reflogs

Message ID 20241213-320-git-refs-migrate-reflogs-v2-8-f28312cdb6c0@gmail.com (mailing list archive)
State Superseded
Headers show
Series refs: add reflog support to `git refs migrate` | expand

Commit Message

Karthik Nayak Dec. 13, 2024, 10:36 a.m. UTC
The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.

To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-refs.txt |  2 --
 refs.c                     | 89 ++++++++++++++++++++++++++++++++--------------
 t/t1460-refs-migrate.sh    | 73 +++++++++++++++++++++++++------------
 3 files changed, 113 insertions(+), 51 deletions(-)

Comments

Patrick Steinhardt Dec. 13, 2024, 12:24 p.m. UTC | #1
On Fri, Dec 13, 2024 at 11:36:53AM +0100, Karthik Nayak wrote:
> diff --git a/refs.c b/refs.c
> index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2708,6 +2710,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
>  	return ret;
>  }
>  
> +struct reflog_migration_data {
> +	unsigned int *index;
> +	const char *refname;
> +	struct ref_store *old_refs;
> +	struct ref_transaction *transaction;
> +	struct strbuf *errbuf;
> +};
> +
> +static int migrate_one_reflog_entry(struct object_id *old_oid,
> +				    struct object_id *new_oid,
> +				    const char *committer,
> +				    timestamp_t timestamp, int tz,
> +				    const char *msg, void *cb_data)
> +{
> +	struct reflog_migration_data *data = cb_data;
> +	struct strbuf sb = STRBUF_INIT;
> +	const char *date;
> +	int ret;
> +
> +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
> +	/* committer contains name and email */
> +	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
> +
> +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
> +					    new_oid, old_oid, sb.buf,
> +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
> +					    (*data->index)++, data->errbuf);

This is where we now increment the reflog index to ensure a proper
ordering.

> +	strbuf_release(&sb);
> +
> +	return ret;
> +}

We're now allocating one buffer per reflog entry. We may want to
optimize this by having a scratch buffer in `migration_data`, which we
could then pass on via `reflog_migration_data`.

> @@ -2910,6 +2940,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>  	if (ret < 0)
>  		goto done;
>  
> +	data.reflog_index = 1;

I'm a bit surprised that we initialize the relfog entry here, because
that means we now have a globally increasing counter across all reflogs.
Couldn't we initialize the index per reflog instead? It ultimately does
not really matter, but feels like the more obvious design to me

Also, is there any specific reason why we start at 1 and not 0? Just curious.

Patrick
Karthik Nayak Dec. 15, 2024, 11:09 a.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Dec 13, 2024 at 11:36:53AM +0100, Karthik Nayak wrote:
>> diff --git a/refs.c b/refs.c
>> index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2708,6 +2710,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
>>  	return ret;
>>  }
>>
>> +struct reflog_migration_data {
>> +	unsigned int *index;
>> +	const char *refname;
>> +	struct ref_store *old_refs;
>> +	struct ref_transaction *transaction;
>> +	struct strbuf *errbuf;
>> +};
>> +
>> +static int migrate_one_reflog_entry(struct object_id *old_oid,
>> +				    struct object_id *new_oid,
>> +				    const char *committer,
>> +				    timestamp_t timestamp, int tz,
>> +				    const char *msg, void *cb_data)
>> +{
>> +	struct reflog_migration_data *data = cb_data;
>> +	struct strbuf sb = STRBUF_INIT;
>> +	const char *date;
>> +	int ret;
>> +
>> +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
>> +	/* committer contains name and email */
>> +	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
>> +
>> +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
>> +					    new_oid, old_oid, sb.buf,
>> +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
>> +					    (*data->index)++, data->errbuf);
>
> This is where we now increment the reflog index to ensure a proper
> ordering.
>
>> +	strbuf_release(&sb);
>> +
>> +	return ret;
>> +}
>
> We're now allocating one buffer per reflog entry. We may want to
> optimize this by having a scratch buffer in `migration_data`, which we
> could then pass on via `reflog_migration_data`.
>

That makes sense, let me do that.

>> @@ -2910,6 +2940,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>>  	if (ret < 0)
>>  		goto done;
>>
>> +	data.reflog_index = 1;
>
> I'm a bit surprised that we initialize the relfog entry here, because
> that means we now have a globally increasing counter across all reflogs.
> Couldn't we initialize the index per reflog instead? It ultimately does
> not really matter, but feels like the more obvious design to me

Yes, this was needed cause I initially didn't understand how the
udpate_index worked and assumed two logs couldn't have the same
update_index. I missed changing it, like you said, it works, but I'll
fix it.

> Also, is there any specific reason why we start at 1 and not 0? Just curious.

Not really, I wanted to distinguish between index entries vs non-indexed
entries. But logically, no, I'll remove it, to remove any confusion.

> Patrick

Thanks
diff mbox series

Patch

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index ce31f93061db5e5d16aca516dd3d15f6527db870..9829984b0a4c4f54ec7f9b6c6c7072f62b1d198d 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -57,8 +57,6 @@  KNOWN LIMITATIONS
 
 The ref format migration has several known limitations in its current form:
 
-* It is not possible to migrate repositories that have reflogs.
-
 * It is not possible to migrate repositories that have worktrees.
 
 * There is no way to block concurrent writes to the repository during an
diff --git a/refs.c b/refs.c
index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
--- a/refs.c
+++ b/refs.c
@@ -30,6 +30,7 @@ 
 #include "date.h"
 #include "commit.h"
 #include "wildmatch.h"
+#include "ident.h"
 
 /*
  * List of all available backends
@@ -2673,6 +2674,7 @@  int ref_update_check_old_target(const char *referent, struct ref_update *update,
 }
 
 struct migration_data {
+	unsigned int reflog_index;
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
 	struct strbuf *errbuf;
@@ -2708,6 +2710,53 @@  static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 	return ret;
 }
 
+struct reflog_migration_data {
+	unsigned int *index;
+	const char *refname;
+	struct ref_store *old_refs;
+	struct ref_transaction *transaction;
+	struct strbuf *errbuf;
+};
+
+static int migrate_one_reflog_entry(struct object_id *old_oid,
+				    struct object_id *new_oid,
+				    const char *committer,
+				    timestamp_t timestamp, int tz,
+				    const char *msg, void *cb_data)
+{
+	struct reflog_migration_data *data = cb_data;
+	struct strbuf sb = STRBUF_INIT;
+	const char *date;
+	int ret;
+
+	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
+	/* committer contains name and email */
+	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
+
+	ret = ref_transaction_update_reflog(data->transaction, data->refname,
+					    new_oid, old_oid, sb.buf,
+					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
+					    (*data->index)++, data->errbuf);
+	strbuf_release(&sb);
+
+	return ret;
+}
+
+static int migrate_one_reflog(const char *refname, void *cb_data)
+{
+	struct migration_data *migration_data = cb_data;
+	struct reflog_migration_data data;
+
+	data.refname = refname;
+	data.old_refs = migration_data->old_refs;
+	data.transaction = migration_data->transaction;
+	data.errbuf = migration_data->errbuf;
+	data.index = &migration_data->reflog_index;
+
+	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
+					migrate_one_reflog_entry, &data);
+}
+
 static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
 {
 	struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
@@ -2774,13 +2823,6 @@  static int move_files(const char *from_path, const char *to_path, struct strbuf
 	return ret;
 }
 
-static int count_reflogs(const char *reflog UNUSED, void *payload)
-{
-	size_t *reflog_count = payload;
-	(*reflog_count)++;
-	return 0;
-}
-
 static int has_worktrees(void)
 {
 	struct worktree **worktrees = get_worktrees();
@@ -2806,7 +2848,6 @@  int repo_migrate_ref_storage_format(struct repository *repo,
 	struct ref_transaction *transaction = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	struct migration_data data;
-	size_t reflog_count = 0;
 	int did_migrate_refs = 0;
 	int ret;
 
@@ -2818,21 +2859,6 @@  int repo_migrate_ref_storage_format(struct repository *repo,
 
 	old_refs = get_main_ref_store(repo);
 
-	/*
-	 * We do not have any interfaces that would allow us to write many
-	 * reflog entries. Once we have them we can remove this restriction.
-	 */
-	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
-		strbuf_addstr(errbuf, "cannot count reflogs");
-		ret = -1;
-		goto done;
-	}
-	if (reflog_count) {
-		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
-		ret = -1;
-		goto done;
-	}
-
 	/*
 	 * Worktrees complicate the migration because every worktree has a
 	 * separate ref storage. While it should be feasible to implement, this
@@ -2858,17 +2884,21 @@  int repo_migrate_ref_storage_format(struct repository *repo,
 	 *      This operation is safe as we do not yet modify the main
 	 *      repository.
 	 *
-	 *   3. If we're in dry-run mode then we are done and can hand over the
+	 *   3. Enumerate all reflogs and write them into the new ref storage.
+	 *      This operation is safe as we do not yet modify the main
+	 *      repository.
+	 *
+	 *   4. If we're in dry-run mode then we are done and can hand over the
 	 *      directory to the caller for inspection. If not, we now start
 	 *      with the destructive part.
 	 *
-	 *   4. Delete the old ref storage from disk. As we have a copy of refs
+	 *   5. Delete the old ref storage from disk. As we have a copy of refs
 	 *      in the new ref storage it's okay(ish) if we now get interrupted
 	 *      as there is an equivalent copy of all refs available.
 	 *
-	 *   5. Move the new ref storage files into place.
+	 *   6. Move the new ref storage files into place.
 	 *
-	 *   6. Change the repository format to the new ref format.
+	 *  7. Change the repository format to the new ref format.
 	 */
 	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
 	if (!mkdtemp(new_gitdir.buf)) {
@@ -2910,6 +2940,11 @@  int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
+	data.reflog_index = 1;
+	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+	if (ret < 0)
+		goto done;
+
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
 		goto done;
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index 1bfff3a7afd5acc470424dfe7ec3e97d45f5c481..f59bc4860f19c4af82dc6f2984bdb69d61fe3ec2 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -7,23 +7,44 @@  export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
+# Migrate the provided repository from one format to the other and
+# verify that the references and logs are migrated over correctly.
+# Usage: test_migration <repo> <format> <skip_reflog_verify>
+#   <repo> is the relative path to the repo to be migrated.
+#   <format> is the ref format to be migrated to.
+#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
 test_migration () {
-	git -C "$1" for-each-ref --include-root-refs \
+	repo=$1 &&
+	format=$2 &&
+	skip_reflog_verify=${3:-false} &&
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
-	git -C "$1" refs migrate --ref-format="$2" &&
-	git -C "$1" for-each-ref --include-root-refs \
+	if ! $skip_reflog_verify
+	then
+	   git -C "$repo" reflog --all >expect_logs &&
+	   git -C "$repo" reflog list >expect_log_list
+	fi &&
+
+	git -C "$repo" refs migrate --ref-format="$2" &&
+
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
 	test_cmp expect actual &&
+	if ! $skip_reflog_verify
+	then
+		git -C "$repo" reflog --all >actual_logs &&
+		git -C "$repo" reflog list >actual_log_list &&
+		test_cmp expect_logs actual_logs &&
+		test_cmp expect_log_list actual_log_list
+	fi &&
 
-	git -C "$1" rev-parse --show-ref-format >actual &&
-	echo "$2" >expect &&
+	git -C "$repo" rev-parse --show-ref-format >actual &&
+	echo "$format" >expect &&
 	test_cmp expect actual
 }
 
 test_expect_success 'setup' '
-	rm -rf .git &&
-	# The migration does not yet support reflogs.
-	git config --global core.logAllRefUpdates false
+	rm -rf .git
 '
 
 test_expect_success "superfluous arguments" '
@@ -78,19 +99,6 @@  do
 			test_cmp expect err
 		'
 
-		test_expect_success "$from_format -> $to_format: migration with reflog fails" '
-			test_when_finished "rm -rf repo" &&
-			git init --ref-format=$from_format repo &&
-			test_config -C repo core.logAllRefUpdates true &&
-			test_commit -C repo logged &&
-			test_must_fail git -C repo refs migrate \
-				--ref-format=$to_format 2>err &&
-			cat >expect <<-EOF &&
-			error: migrating reflogs is not supported yet
-			EOF
-			test_cmp expect err
-		'
-
 		test_expect_success "$from_format -> $to_format: migration with worktree fails" '
 			test_when_finished "rm -rf repo" &&
 			git init --ref-format=$from_format repo &&
@@ -141,7 +149,7 @@  do
 			test_commit -C repo initial &&
 			test-tool -C repo ref-store main update-ref "" refs/heads/broken \
 				"$(test_oid 001)" "$ZERO_OID" REF_SKIP_CREATE_REFLOG,REF_SKIP_OID_VERIFICATION &&
-			test_migration repo "$to_format" &&
+			test_migration repo "$to_format" true &&
 			test_oid 001 >expect &&
 			git -C repo rev-parse refs/heads/broken >actual &&
 			test_cmp expect actual
@@ -195,6 +203,27 @@  do
 			git -C repo rev-parse --show-ref-format >actual &&
 			test_cmp expect actual
 		'
+
+		test_expect_success "$from_format -> $to_format: reflogs of symrefs with target deleted" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			git -C repo branch branch-1 HEAD &&
+			git -C repo symbolic-ref refs/heads/symref refs/heads/branch-1 &&
+			cat >input <<-EOF &&
+			delete refs/heads/branch-1
+			EOF
+			git -C repo update-ref --stdin <input &&
+			test_migration repo "$to_format"
+		'
+
+		test_expect_success "$from_format -> $to_format: reflogs order is retained" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit --date "100005000 +0700" --no-tag -C repo initial &&
+			test_commit --date "100003000 +0700" --no-tag -C repo second &&
+			test_migration repo "$to_format"
+		'
 	done
 done