mbox series

[0/7] refs: add reflog support to `git refs migrate`

Message ID 20241209-320-git-refs-migrate-reflogs-v1-0-d4bc37ee860f@gmail.com (mailing list archive)
Headers show
Series refs: add reflog support to `git refs migrate` | expand

Message

Karthik Nayak Dec. 9, 2024, 11:07 a.m. UTC
The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the feature was that it didn't support migrating
repositories which contained reflogs. This isn't a requirement on the
server side as repositories are stored as bare repositories (which do
not contain any reflogs). Clients however generally use reflogs and
until now couldn't use the `git refs migrate` command to migrate their
repositories to the new reftable format.

One of the issues for adding reflog support is that the ref transactions
don't support reflogs additions:
  1. While there is REF_LOG_ONLY flag, there is no function to utilize
  the flag and add reflogs.
  2. reference backends generally sort the updates by the refname. This
  wouldn't work for reflogs which need to ensure that they maintain the
  order of creation.
  3. In the files backend, reflog entries are added by obtaining locks
  on the refs themselves. This means each update in the transaction, will
  obtain a ref_lock. This paradigm fails to accompany the fact that there
  could be multiple reflog updates for a refname in a single transaction.
  4. The backends check for duplicate entries, which doesn't make sense
  in the context of adding multiple reflogs for a given refname.

We overcome these issue we make the following changes:
  - Update the ref_update structure to also include the committer
  information. Using this, we can add a new function which only adds
  reflog updates to the transaction.
  - Add an index field to the ref_update structure, this will help order
  updates in pre-defined order, this fixes #2.
  - While the ideal fix for #3 would be to actually introduce reflog
  locks, this wouldn't be possible without breaking backward
  compatibility. So we add a count field to the existing ref_lock. With
  this, multiple reflog updates can share a single ref_lock.

Overall, this series is a bit more involved, and I would appreciate it
if it receives a bit more scrutiny.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Karthik Nayak (7):
      refs: include committer info in `ref_update` struct
      refs: add `index` field to `struct ref_udpate`
      refs/files: add count field to ref_lock
      refs: extract out refname verification in transactions
      refs: introduce the `ref_transaction_update_reflog` function
      refs: allow multiple reflog entries for the same refname
      refs: add support for migrating reflogs

 Documentation/git-refs.txt |   2 -
 refs.c                     | 204 ++++++++++++++++++++++++++++++++-------------
 refs.h                     |  12 +++
 refs/files-backend.c       | 144 ++++++++++++++++++++------------
 refs/refs-internal.h       |  24 ++++--
 refs/reftable-backend.c    |  47 +++++++++--
 t/t1460-refs-migrate.sh    |  73 +++++++++++-----
 7 files changed, 360 insertions(+), 146 deletions(-)
---



--- 

base-commit: e66fd72e972df760a53c3d6da023c17adfc426d6
change-id: 20241111-320-git-refs-migrate-reflogs-a53e3a6cffc9

Thanks
- Karthik

Comments

Junio C Hamano Dec. 10, 2024, 12:13 p.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> The `git refs migrate` command was introduced in
> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
> 2024-06-06) to support migrating from one reference backend to another.

This topic pass the tests standalone for me locally, but seems to
fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
integration result tonight; it would be very much appreciated if you
can help find if there are semantic (or otherwise) mismerges that
are causing this breakage.

Thanks.
Karthik Nayak Dec. 10, 2024, 5:42 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git refs migrate` command was introduced in
>> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
>> 2024-06-06) to support migrating from one reference backend to another.
>
> This topic pass the tests standalone for me locally, but seems to
> fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
> integration result tonight; it would be very much appreciated if you
> can help find if there are semantic (or otherwise) mismerges that
> are causing this breakage.
>

I see. I can reproduce it on 'seen' as you mentioned. Will debug and get
back to you on this. Thanks for letting me know.
Karthik Nayak Dec. 10, 2024, 6:03 p.m. UTC | #3
karthik nayak <karthik.188@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Karthik Nayak <karthik.188@gmail.com> writes:
>>
>>> The `git refs migrate` command was introduced in
>>> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
>>> 2024-06-06) to support migrating from one reference backend to another.
>>
>> This topic pass the tests standalone for me locally, but seems to
>> fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
>> integration result tonight; it would be very much appreciated if you
>> can help find if there are semantic (or otherwise) mismerges that
>> are causing this breakage.
>>
>
> I see. I can reproduce it on 'seen' as you mentioned. Will debug and get
> back to you on this. Thanks for letting me know.

Seems like this is due to 'kn/reftable-writer-log-write-verify', which I
should have totally seen coming. A quick fix like the one below fixes
the issue. I'll merge in 'kn/reftable-writer-log-write-verify' when I
re-roll.

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1badf88df0..5c51a6a226 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1428,6 +1428,7 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	struct strintmap logs_ts;
 	int ret = 0;
@@ -1541,6 +1542,13 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 				log->update_index = update_index;
 				strintmap_set(&logs_ts, u->refname, update_index+1);

+				/*
+				 * Note the max_update_index, so we can reset the limit
+				 * before actually writing the logs.
+				 */
+				if (update_index > max_update_index)
+					max_update_index = update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1604,6 +1612,8 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;