Message ID | 20241216-320-git-refs-migrate-reflogs-v4-0-d7cd3f197453@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | refs: add reflog support to `git refs migrate` | expand |
On Mon, Dec 16, 2024 at 05:44:25PM +0100, Karthik Nayak wrote: > Changes in v4: > - Fix broken tests, due to two reasons in patch 8/8: > - The `index` field in `reflog_migration_data` wasn't initialized to > 0. This specifically doesn't break the test, but causes undefined > behavior. Fix this by using designated initializers. > - The strbuf within `migration_data` wasn't initialized when the flow > exited early, causing memory leaks. Fix this too by using designated > initializers. > - Thanks to Junio for reporting and Patrick for helping shed some light > on these broken tests. > - Link to v3: https://lore.kernel.org/r/20241215-320-git-refs-migrate-reflogs-v3-0-4127fe707b98@gmail.com The range-diff looks as expected, so this version of the series looks good to me. Thanks! Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Mon, Dec 16, 2024 at 05:44:25PM +0100, Karthik Nayak wrote: >> Changes in v4: >> - Fix broken tests, due to two reasons in patch 8/8: >> - The `index` field in `reflog_migration_data` wasn't initialized to >> 0. This specifically doesn't break the test, but causes undefined >> behavior. Fix this by using designated initializers. >> - The strbuf within `migration_data` wasn't initialized when the flow >> exited early, causing memory leaks. Fix this too by using designated >> initializers. >> - Thanks to Junio for reporting and Patrick for helping shed some light >> on these broken tests. >> - Link to v3: https://lore.kernel.org/r/20241215-320-git-refs-migrate-reflogs-v3-0-4127fe707b98@gmail.com > > The range-diff looks as expected, so this version of the series looks > good to me. Thanks! > > Patrick Thanks Patrick for your review!
karthik nayak <karthik.188@gmail.com> writes: > Patrick Steinhardt <ps@pks.im> writes: > >> The range-diff looks as expected, so this version of the series looks >> good to me. Thanks! >> >> Patrick > > Thanks Patrick for your review! Thanks, both. Let's queue it again.
Karthik Nayak <karthik.188@gmail.com> writes: > The `git refs migrate` command was introduced in > 25a0023f28 (builtin/refs: new command to migrate ref storage formats, > 2024-06-06) to support migrating from one reference backend to another. > > One limitation of the feature was that it didn't support migrating > repositories which contained reflogs. This isn't a requirement on the > server side as repositories are stored as bare repositories (which do > not contain any reflogs). Clients however generally use reflogs and > until now couldn't use the `git refs migrate` command to migrate their > repositories to the new reftable format. > > One of the issues for adding reflog support is that the ref transactions > don't support reflogs additions: > 1. While there is REF_LOG_ONLY flag, there is no function to utilize > the flag and add reflogs. > 2. reference backends generally sort the updates by the refname. This > wouldn't work for reflogs which need to ensure that they maintain the > order of creation. > 3. In the files backend, reflog entries are added by obtaining locks > on the refs themselves. This means each update in the transaction, will > obtain a ref_lock. This paradigm fails to accompany the fact that there > could be multiple reflog updates for a refname in a single transaction. > 4. The backends check for duplicate entries, which doesn't make sense > in the context of adding multiple reflogs for a given refname. > > We overcome these issue we make the following changes: > - Update the ref_update structure to also include the committer > information. Using this, we can add a new function which only adds > reflog updates to the transaction. Out of interest, I see various changes happen around committer info. But why is the committer info more relevant for reflog updates, in contrast to normal ref updates?
Toon Claes <toon@iotcl.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> The `git refs migrate` command was introduced in >> 25a0023f28 (builtin/refs: new command to migrate ref storage formats, >> 2024-06-06) to support migrating from one reference backend to another. >> >> One limitation of the feature was that it didn't support migrating >> repositories which contained reflogs. This isn't a requirement on the >> server side as repositories are stored as bare repositories (which do >> not contain any reflogs). Clients however generally use reflogs and >> until now couldn't use the `git refs migrate` command to migrate their >> repositories to the new reftable format. >> >> One of the issues for adding reflog support is that the ref transactions >> don't support reflogs additions: >> 1. While there is REF_LOG_ONLY flag, there is no function to utilize >> the flag and add reflogs. >> 2. reference backends generally sort the updates by the refname. This >> wouldn't work for reflogs which need to ensure that they maintain the >> order of creation. >> 3. In the files backend, reflog entries are added by obtaining locks >> on the refs themselves. This means each update in the transaction, will >> obtain a ref_lock. This paradigm fails to accompany the fact that there >> could be multiple reflog updates for a refname in a single transaction. >> 4. The backends check for duplicate entries, which doesn't make sense >> in the context of adding multiple reflogs for a given refname. >> >> We overcome these issue we make the following changes: >> - Update the ref_update structure to also include the committer >> information. Using this, we can add a new function which only adds >> reflog updates to the transaction. > > Out of interest, I see various changes happen around committer info. But > why is the committer info more relevant for reflog updates, in contrast > to normal ref updates? > > -- > Toon The committer info is metadata around a ref update. It is only stored in the reflogs. So they're only relevant for reflogs! For e.g. in the files backend, we can see: $ cat .git/logs/HEAD | tail -1 7a929cb27ca6df69d4db64b008e27e002c691028 bc8e00178b671da2b845ccbba175f6c093ed6949 Karthik Nayak <karthik.188@google.com> 1734624674 +0100 commit: c10 Here the 'Karthik Nayak <karthik.188@gmail.com> 1734624673 +0100' section is the committer_info. Whereas if you see the reference itself $ cat .git/HEAD ref: refs/heads/master It only contains reference information. So, the `committer_info` is not needed for regular ref updates. Earlier we dynamically obtain the information when a reflog was being created. But for migration of existing reflogs we need to pass this information from the source to the ref transaction mechanism. So we pass the `committer_info` through the layers. Thank you for the review.