Message ID | 20241216-320-git-refs-migrate-reflogs-v4-7-d7cd3f197453@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 297c09eabb1e8b44230bca86fc7fd344175e0be7 |
Headers | show |
Series | refs: add reflog support to `git refs migrate` | expand |
Karthik Nayak <karthik.188@gmail.com> writes: > The reference transaction only allows a single update for a given > reference to avoid conflicts. This, however, isn't an issue for reflogs. > There are no conflicts to be resolved in reflogs and when migrating > reflogs between backends we'd have multiple reflog entries for the same > refname. > > So allow multiple reflog updates within a single transaction. Also the > reflog creation logic isn't exposed to the end user. While this might > change in the future, currently, this reduces the scope of issues to > think about. > > In the reftable backend, the writer sorts all updates based on the > update_index before writing to the block. When there are multiple > reflogs for a given refname, it is essential that the order of the > reflogs is maintained. So add the `index` value to the `update_index`. > The `index` field is only set when multiple reflog entries for a given > refname are added and as such in most scenarios the old behavior > remains. > > This is required to add reflog migration support to `git refs migrate`. > > Signed-off-by: Karthik Nayak <karthik.188@gmail.com> > --- > refs/files-backend.c | 15 +++++++++++---- > refs/reftable-backend.c | 22 +++++++++++++++++++--- > 2 files changed, 30 insertions(+), 7 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, > > update->backend_data = lock; > > + if (update->flags & REF_LOG_ONLY) > + goto out; > + > if (update->type & REF_ISSYMREF) { > if (update->flags & REF_NO_DEREF) { > /* > @@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store, > */ > for (i = 0; i < transaction->nr; i++) { > struct ref_update *update = transaction->updates[i]; > - struct string_list_item *item = > - string_list_append(&affected_refnames, update->refname); > + struct string_list_item *item; > > if ((update->flags & REF_IS_PRUNING) && > !(update->flags & REF_NO_DEREF)) > BUG("REF_IS_PRUNING set without REF_NO_DEREF"); > > + if (update->flags & REF_LOG_ONLY) > + continue; > + > + item = string_list_append(&affected_refnames, update->refname); > /* > * We store a pointer to update in item->util, but at > * the moment we never use the value of this field > @@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, > > /* Fail if a refname appears more than once in the transaction: */ > for (i = 0; i < transaction->nr; i++) > - string_list_append(&affected_refnames, > - transaction->updates[i]->refname); > + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) > + string_list_append(&affected_refnames, > + transaction->updates[i]->refname); > string_list_sort(&affected_refnames); > if (ref_update_reject_duplicates(&affected_refnames, err)) { > ret = TRANSACTION_GENERIC_ERROR; > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c > index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644 > --- a/refs/reftable-backend.c > +++ b/refs/reftable-backend.c > @@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, > if (ret) > goto done; > > - string_list_append(&affected_refnames, > - transaction->updates[i]->refname); > + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) > + string_list_append(&affected_refnames, > + transaction->updates[i]->refname); > } > > /* > @@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data > struct reftable_log_record *logs = NULL; > struct ident_split committer_ident = {0}; > size_t logs_nr = 0, logs_alloc = 0, i; > + uint64_t max_update_index = ts; > const char *committer_info; > int ret = 0; > > @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data > } > > fill_reftable_log_record(log, &c); > - log->update_index = ts; > + > + /* > + * Updates are sorted by the writer. So updates for the same > + * refname need to contain different update indices. > + */ > + log->update_index = ts + u->index; During my review I was having a hard time figuring out when `u->index` was not 0 and where it is being set. Can you maybe explain a bit? > + > + /* > + * Note the max update_index so the limit can be set later on. > + */ > + if (log->update_index > max_update_index) Is there a lot of value in having this if clause? I was a bit confused why it is here, because I think we can do the assignment to max_update_index unconditionally. > + max_update_index = log->update_index; > + > log->refname = xstrdup(u->refname); > memcpy(log->value.update.new_hash, > u->new_oid.hash, GIT_MAX_RAWSZ); > @@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data > * and log blocks. > */ > if (logs) { > + reftable_writer_set_limits(writer, ts, max_update_index); So max_update_index is used to set the limits on the current writer, but using reftable_stack_next_update_index() it's also used to give the next stack it's starting point for their range. Now I'm not familiar enough with the code, but are all stacks handled in sequential order? And how does a stack relate to a reftable file? > + > ret = reftable_writer_add_logs(writer, logs, logs_nr); > if (ret < 0) > goto done; > > -- > 2.47.1
Toon Claes <toon@iotcl.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> The reference transaction only allows a single update for a given >> reference to avoid conflicts. This, however, isn't an issue for reflogs. >> There are no conflicts to be resolved in reflogs and when migrating >> reflogs between backends we'd have multiple reflog entries for the same >> refname. >> >> So allow multiple reflog updates within a single transaction. Also the >> reflog creation logic isn't exposed to the end user. While this might >> change in the future, currently, this reduces the scope of issues to >> think about. >> >> In the reftable backend, the writer sorts all updates based on the >> update_index before writing to the block. When there are multiple >> reflogs for a given refname, it is essential that the order of the >> reflogs is maintained. So add the `index` value to the `update_index`. >> The `index` field is only set when multiple reflog entries for a given >> refname are added and as such in most scenarios the old behavior >> remains. >> >> This is required to add reflog migration support to `git refs migrate`. >> >> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> >> --- >> refs/files-backend.c | 15 +++++++++++---- >> refs/reftable-backend.c | 22 +++++++++++++++++++--- >> 2 files changed, 30 insertions(+), 7 deletions(-) >> >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> @@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, >> >> update->backend_data = lock; >> >> + if (update->flags & REF_LOG_ONLY) >> + goto out; >> + >> if (update->type & REF_ISSYMREF) { >> if (update->flags & REF_NO_DEREF) { >> /* >> @@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store, >> */ >> for (i = 0; i < transaction->nr; i++) { >> struct ref_update *update = transaction->updates[i]; >> - struct string_list_item *item = >> - string_list_append(&affected_refnames, update->refname); >> + struct string_list_item *item; >> >> if ((update->flags & REF_IS_PRUNING) && >> !(update->flags & REF_NO_DEREF)) >> BUG("REF_IS_PRUNING set without REF_NO_DEREF"); >> >> + if (update->flags & REF_LOG_ONLY) >> + continue; >> + >> + item = string_list_append(&affected_refnames, update->refname); >> /* >> * We store a pointer to update in item->util, but at >> * the moment we never use the value of this field >> @@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, >> >> /* Fail if a refname appears more than once in the transaction: */ >> for (i = 0; i < transaction->nr; i++) >> - string_list_append(&affected_refnames, >> - transaction->updates[i]->refname); >> + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) >> + string_list_append(&affected_refnames, >> + transaction->updates[i]->refname); >> string_list_sort(&affected_refnames); >> if (ref_update_reject_duplicates(&affected_refnames, err)) { >> ret = TRANSACTION_GENERIC_ERROR; >> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c >> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644 >> --- a/refs/reftable-backend.c >> +++ b/refs/reftable-backend.c >> @@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, >> if (ret) >> goto done; >> >> - string_list_append(&affected_refnames, >> - transaction->updates[i]->refname); >> + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) >> + string_list_append(&affected_refnames, >> + transaction->updates[i]->refname); >> } >> >> /* >> @@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data >> struct reftable_log_record *logs = NULL; >> struct ident_split committer_ident = {0}; >> size_t logs_nr = 0, logs_alloc = 0, i; >> + uint64_t max_update_index = ts; >> const char *committer_info; >> int ret = 0; >> >> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data >> } >> >> fill_reftable_log_record(log, &c); >> - log->update_index = ts; >> + >> + /* >> + * Updates are sorted by the writer. So updates for the same >> + * refname need to contain different update indices. >> + */ >> + log->update_index = ts + u->index; > > During my review I was having a hard time figuring out when `u->index` > was not 0 and where it is being set. Can you maybe explain a bit? > As of this patch, there is no users of the index. This patch adds in the infrastructure. The next patch is where we actually set the index. In short, the index is only needed for the reftable backend. This is because reflogs contain a specific order and we need to retain that order. In the reftable backend. For optimization, all writes are sorted by refnames. The index provided a parallel system to retain the order of the updates. There are no real usecases apart from migration of reflogs from one backend to another, which is added in the next patch. >> + >> + /* >> + * Note the max update_index so the limit can be set later on. >> + */ >> + if (log->update_index > max_update_index) > > Is there a lot of value in having this if clause? I was a bit confused > why it is here, because I think we can do the assignment to > max_update_index unconditionally. > It is necessary. For reflogs whose index isn't set, their `update_index` would simply be the `ts` value. So if there are a mix of reflog updates with and without index, we could end up with a scenario where we don't set the max to the actual max. >> + max_update_index = log->update_index; >> + >> log->refname = xstrdup(u->refname); >> memcpy(log->value.update.new_hash, >> u->new_oid.hash, GIT_MAX_RAWSZ); >> @@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data >> * and log blocks. >> */ >> if (logs) { >> + reftable_writer_set_limits(writer, ts, max_update_index); > > So max_update_index is used to set the limits on the current writer, but > using reftable_stack_next_update_index() it's also used to give the next > stack it's starting point for their range. Using `reftable_stack_next_update_index()` would return `ts + 1` as that is the next sequential update. This could be lesser than the max_update_index. So we can't use that. Once all the reflogs are written, the next call to `reftable_stack_next_update_index()` would return `max_update_index + 1`. > Now I'm not familiar enough with the code, but are all stacks handled > in sequential order? Not sure I understand your question correctly. Updates are handled as per a given index. Each update is also sequentially stored. Tables are named after the min and max index that they store. > And how does a stack relate to a reftable file? The stack is used to refer to a collection of reftable tables. So for a given worktree, the tables under '$GIT_DIR/reftable' would constitute a stack, where the 'tables.list' would state the tables which are part of the stack >> + >> ret = reftable_writer_add_logs(writer, logs, logs_nr); >> if (ret < 0) >> goto done; >> >> -- >> 2.47.1
diff --git a/refs/files-backend.c b/refs/files-backend.c index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, update->backend_data = lock; + if (update->flags & REF_LOG_ONLY) + goto out; + if (update->type & REF_ISSYMREF) { if (update->flags & REF_NO_DEREF) { /* @@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store, */ for (i = 0; i < transaction->nr; i++) { struct ref_update *update = transaction->updates[i]; - struct string_list_item *item = - string_list_append(&affected_refnames, update->refname); + struct string_list_item *item; if ((update->flags & REF_IS_PRUNING) && !(update->flags & REF_NO_DEREF)) BUG("REF_IS_PRUNING set without REF_NO_DEREF"); + if (update->flags & REF_LOG_ONLY) + continue; + + item = string_list_append(&affected_refnames, update->refname); /* * We store a pointer to update in item->util, but at * the moment we never use the value of this field @@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, /* Fail if a refname appears more than once in the transaction: */ for (i = 0; i < transaction->nr; i++) - string_list_append(&affected_refnames, - transaction->updates[i]->refname); + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) + string_list_append(&affected_refnames, + transaction->updates[i]->refname); string_list_sort(&affected_refnames); if (ref_update_reject_duplicates(&affected_refnames, err)) { ret = TRANSACTION_GENERIC_ERROR; diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, if (ret) goto done; - string_list_append(&affected_refnames, - transaction->updates[i]->refname); + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) + string_list_append(&affected_refnames, + transaction->updates[i]->refname); } /* @@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data struct reftable_log_record *logs = NULL; struct ident_split committer_ident = {0}; size_t logs_nr = 0, logs_alloc = 0, i; + uint64_t max_update_index = ts; const char *committer_info; int ret = 0; @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data } fill_reftable_log_record(log, &c); - log->update_index = ts; + + /* + * Updates are sorted by the writer. So updates for the same + * refname need to contain different update indices. + */ + log->update_index = ts + u->index; + + /* + * Note the max update_index so the limit can be set later on. + */ + if (log->update_index > max_update_index) + max_update_index = log->update_index; + log->refname = xstrdup(u->refname); memcpy(log->value.update.new_hash, u->new_oid.hash, GIT_MAX_RAWSZ); @@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data * and log blocks. */ if (logs) { + reftable_writer_set_limits(writer, ts, max_update_index); + ret = reftable_writer_add_logs(writer, logs, logs_nr); if (ret < 0) goto done;
The reference transaction only allows a single update for a given reference to avoid conflicts. This, however, isn't an issue for reflogs. There are no conflicts to be resolved in reflogs and when migrating reflogs between backends we'd have multiple reflog entries for the same refname. So allow multiple reflog updates within a single transaction. Also the reflog creation logic isn't exposed to the end user. While this might change in the future, currently, this reduces the scope of issues to think about. In the reftable backend, the writer sorts all updates based on the update_index before writing to the block. When there are multiple reflogs for a given refname, it is essential that the order of the reflogs is maintained. So add the `index` value to the `update_index`. The `index` field is only set when multiple reflog entries for a given refname are added and as such in most scenarios the old behavior remains. This is required to add reflog migration support to `git refs migrate`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- refs/files-backend.c | 15 +++++++++++---- refs/reftable-backend.c | 22 +++++++++++++++++++--- 2 files changed, 30 insertions(+), 7 deletions(-)