diff mbox series

[v2,7/8] refs: allow multiple reflog entries for the same refname

Message ID 20241213-320-git-refs-migrate-reflogs-v2-7-f28312cdb6c0@gmail.com (mailing list archive)
State Superseded
Headers show
Series refs: add reflog support to `git refs migrate` | expand

Commit Message

Karthik Nayak Dec. 13, 2024, 10:36 a.m. UTC
The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only be set when multiple reflog entries for a
given refname are added and as such in most scenarios the old behavior
remains.

This is required to add reflog migration support to `git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 22 +++++++++++++++++++---
 2 files changed, 30 insertions(+), 7 deletions(-)

Comments

Patrick Steinhardt Dec. 13, 2024, 12:24 p.m. UTC | #1
On Fri, Dec 13, 2024 at 11:36:52AM +0100, Karthik Nayak wrote:
> The reference transaction only allows a single update for a given
> reference to avoid conflicts. This, however, isn't an issue for reflogs.
> There are no conflicts to be resolved in reflogs and when migrating
> reflogs between backends we'd have multiple reflog entries for the same
> refname.
> 
> So allow multiple reflog updates within a single transaction. Also the
> reflog creation logic isn't exposed to the end user. While this might
> change in the future, currently, this reduces the scope of issues to
> think about.
> 
> In the reftable backend, the writer sorts all updates based on the
> update_index before writing to the block. When there are multiple
> reflogs for a given refname, it is essential that the order of the
> reflogs is maintained. So add the `index` value to the `update_index`.
> The `index` field is only be set when multiple reflog entries for a

s/only be/only

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  				}
>  
>  				fill_reftable_log_record(log, &c);
> -				log->update_index = ts;
> +
> +				/*
> +				 * Updates are sorted by the writer. So updates for the same
> +				 * refname need to contain different update indices.
> +				 */
> +				log->update_index = ts + u->index;

Okay. So instead of tracking things via a map, we now rely on the caller
to provide the update index. And if they don't provide one then we
cannot guarantee ordering.

I guess that's a good solution. After all, there will only be a very
limited amount of callers in the first place, so I think it's fine to
shift the responsibility onto them to maintain reflog ordering. They're
also the only ones who really know about the actual ordering.

> +				/*
> +				 * Note the max update_index so the limit can be set later on.
> +				 */
> +				if (log->update_index > max_update_index)
> +					max_update_index = log->update_index;

Makes sense, as well.

Patrick
Karthik Nayak Dec. 13, 2024, 8:02 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Dec 13, 2024 at 11:36:52AM +0100, Karthik Nayak wrote:
>> The reference transaction only allows a single update for a given
>> reference to avoid conflicts. This, however, isn't an issue for reflogs.
>> There are no conflicts to be resolved in reflogs and when migrating
>> reflogs between backends we'd have multiple reflog entries for the same
>> refname.
>>
>> So allow multiple reflog updates within a single transaction. Also the
>> reflog creation logic isn't exposed to the end user. While this might
>> change in the future, currently, this reduces the scope of issues to
>> think about.
>>
>> In the reftable backend, the writer sorts all updates based on the
>> update_index before writing to the block. When there are multiple
>> reflogs for a given refname, it is essential that the order of the
>> reflogs is maintained. So add the `index` value to the `update_index`.
>> The `index` field is only be set when multiple reflog entries for a
>
> s/only be/only

Thanks.

>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  				}
>>
>>  				fill_reftable_log_record(log, &c);
>> -				log->update_index = ts;
>> +
>> +				/*
>> +				 * Updates are sorted by the writer. So updates for the same
>> +				 * refname need to contain different update indices.
>> +				 */
>> +				log->update_index = ts + u->index;
>
> Okay. So instead of tracking things via a map, we now rely on the caller
> to provide the update index. And if they don't provide one then we
> cannot guarantee ordering.
>

Which works, because for reflog migration, the caller _has_ to specify
ordering.

> I guess that's a good solution. After all, there will only be a very
> limited amount of callers in the first place, so I think it's fine to
> shift the responsibility onto them to maintain reflog ordering. They're
> also the only ones who really know about the actual ordering.
>

Exactly. Currently this is entirely and only used in the migration code.
Perhaps we expose reflog addition to users, but that is not something
being pursued.

>> +				/*
>> +				 * Note the max update_index so the limit can be set later on.
>> +				 */
>> +				if (log->update_index > max_update_index)
>> +					max_update_index = log->update_index;
>
> Makes sense, as well.
>
> Patrick

Thanks
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2611,6 +2611,9 @@  static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
@@ -2829,13 +2832,16 @@  static int files_transaction_prepare(struct ref_store *ref_store,
 	 */
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
-		struct string_list_item *item =
-			string_list_append(&affected_refnames, update->refname);
+		struct string_list_item *item;
 
 		if ((update->flags & REF_IS_PRUNING) &&
 		    !(update->flags & REF_NO_DEREF))
 			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
+		if (update->flags & REF_LOG_ONLY)
+			continue;
+
+		item = string_list_append(&affected_refnames, update->refname);
 		/*
 		 * We store a pointer to update in item->util, but at
 		 * the moment we never use the value of this field
@@ -3035,8 +3041,9 @@  static int files_transaction_finish_initial(struct files_ref_store *refs,
 
 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -990,8 +990,9 @@  static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		if (ret)
 			goto done;
 
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	}
 
 	/*
@@ -1301,6 +1302,7 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1405,7 +1407,19 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}
 
 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				/*
+				 * Updates are sorted by the writer. So updates for the same
+				 * refname need to contain different update indices.
+				 */
+				log->update_index = ts + u->index;
+
+				/*
+				 * Note the max update_index so the limit can be set later on.
+				 */
+				if (log->update_index > max_update_index)
+					max_update_index = log->update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1469,6 +1483,8 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;