Message ID | 20250212-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v3-1-98b2c4d2bb0c@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] builtin/refs: add '--skip-reflog' flag to bypass reflog migration | expand |
Karthik Nayak <karthik.188@gmail.com> writes: > The 'git-refs(1)' migrate subcommand, which transfers repositories > between reference backends, currently migrates reflogs by default as of > 246cebe320 (refs: add support for migrating reflogs, 2024-12-16). > > While this behavior is desirable for most client-side repositories, > server-side repositories are not expected to contain reflogs. However, > due to historical reasons, some may still have them. This could be > caused, for example, by bugs, misconfiguration, or an administrator > enabling reflogs on the server for debugging purposes. > > To address this, introduce the --skip-reflog flag, allowing users to > bypass reflog migration. This ensures that the repository ends up in the > expected state after migration. I do not quite understand the motivation behind this change. If a repository has reflog that you do not need by mistake or misconfiguration, I agree that there should be a way for you to remove the reflog. Removing it while converting the ref backend may be a convenient way if and only if the reason why you noticed such a repository with unwanted reflog is because you were about to migrate it, but regardless of when you notice such refs with unwanted log, you would want to be able to drop their logs. You may not even be planning to migrate your backend when you noticed that the refs have unwanted log, you may have already migrated long time ago when you noticed that the refs have unwanted log, or you may not even be planning to migrate in the first place. Even after you migrated your backend, an administrator may have to enable reflog for debugging, and then after the administrator is done, then what? Should the backend migrated back from reftable to files and then back again, only to pass this --skip-reflog option? Wouldn't it be a lot more flexible if you add a new subcommand "drop" to the "git reflog" command?
Junio C Hamano <gitster@pobox.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> The 'git-refs(1)' migrate subcommand, which transfers repositories >> between reference backends, currently migrates reflogs by default as of >> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16). >> >> While this behavior is desirable for most client-side repositories, >> server-side repositories are not expected to contain reflogs. However, >> due to historical reasons, some may still have them. This could be >> caused, for example, by bugs, misconfiguration, or an administrator >> enabling reflogs on the server for debugging purposes. >> >> To address this, introduce the --skip-reflog flag, allowing users to >> bypass reflog migration. This ensures that the repository ends up in the >> expected state after migration. > > I do not quite understand the motivation behind this change. > > If a repository has reflog that you do not need by mistake or > misconfiguration, I agree that there should be a way for you to > remove the reflog. Removing it while converting the ref backend may > be a convenient way if and only if the reason why you noticed such a > repository with unwanted reflog is because you were about to migrate > it, but regardless of when you notice such refs with unwanted log, > you would want to be able to drop their logs. You may not even be > planning to migrate your backend when you noticed that the refs have > unwanted log, you may have already migrated long time ago when you > noticed that the refs have unwanted log, or you may not even be > planning to migrate in the first place. Even after you migrated > your backend, an administrator may have to enable reflog for > debugging, and then after the administrator is done, then what? > Should the backend migrated back from reftable to files and then > back again, only to pass this --skip-reflog option? > > Wouldn't it be a lot more flexible if you add a new subcommand > "drop" to the "git reflog" command? > Karthik Nayak <karthik.188@gmail.com> writes: > >> The 'git-refs(1)' migrate subcommand, which transfers repositories >> between reference backends, currently migrates reflogs by default as of >> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16). >> >> While this behavior is desirable for most client-side repositories, >> server-side repositories are not expected to contain reflogs. However, >> due to historical reasons, some may still have them. This could be >> caused, for example, by bugs, misconfiguration, or an administrator >> enabling reflogs on the server for debugging purposes. >> >> To address this, introduce the --skip-reflog flag, allowing users to >> bypass reflog migration. This ensures that the repository ends up in the >> expected state after migration. > > I do not quite understand the motivation behind this change. > > If a repository has reflog that you do not need by mistake or > misconfiguration, I agree that there should be a way for you to > remove the reflog. Removing it while converting the ref backend may > be a convenient way if and only if the reason why you noticed such a > repository with unwanted reflog is because you were about to migrate > it, but regardless of when you notice such refs with unwanted log, > you would want to be able to drop their logs. You may not even be > planning to migrate your backend when you noticed that the refs have > unwanted log, you may have already migrated long time ago when you > noticed that the refs have unwanted log, or you may not even be > planning to migrate in the first place. Even after you migrated > your backend, an administrator may have to enable reflog for > debugging, and then after the administrator is done, then what? > Should the backend migrated back from reftable to files and then > back again, only to pass this --skip-reflog option? > > Wouldn't it be a lot more flexible if you add a new subcommand > "drop" to the "git reflog" command? To just get rid of reflogs from a repository, I think 'git reflog drop' or something similar would indeed be a better way to go about it. As you stated, with this patch, we could still face the issue wherein the administartor could re-enable reflog and we're back to square one. Why I think this patch is important, is because while there could be existing reflogs in a repository, if one doesn't care about _reflogs_ there could be significant performance gains while migrating repos from one backend to the other, while also leaving the reflogs behind. The intent of the patch is not to be a detterrent for reflogs. The `git refs migrate` command is not the pathway for that. Rather, the goal is to ensure we skip them during migraiton because we do not want to bother migrating them. Saying this, another option is to have `git reflog drop` and then perform the migration, but I think it does makes sense to provide users with a fine-tuning during migration to allow them to choose what they want to migrate. I know that in GitLab we have repositories with millions of references and while we generally have reflogs disabled. There could be repositories with reflogs enabled during debugging or just old repositories where reflogs always existed. Since migrating backends blocks all operations on the repository, it would be vital to also optimize it as much as possible. Thanks, Karthik
Karthik Nayak <karthik.188@gmail.com> writes: > To just get rid of reflogs from a repository, I think 'git reflog drop' > or something similar would indeed be a better way to go about it. As you > stated, with this patch, we could still face the issue wherein the > administartor could re-enable reflog and we're back to square one. Exactly. > Why I think this patch is important, is because while there could be > existing reflogs in a repository, if one doesn't care about _reflogs_ > there could be significant performance gains while migrating repos from > one backend to the other, while also leaving the reflogs behind. Sure. It could be done with a combination of "git reflog drop && git refs migrate" (or "git refs migrate && git reflog drop", if the migrated-to backend performs better when it drops reflogs). With "git refs migrate --skip-reflog" alone, we are very limited. We can lose reflogs _only_ when we are migrating. Doing it _during_ migration may very well be more efficient than dropping first and then migrate (or the other way around), so I do not have much against the "migrate --skip-reflog" existing. But I find it backwards to add it first _before_ we have a tool that is more generally applicable to wider situations, i.e., "reflog drop". IOW, it feels as if we are worried about icing on the cake long before we actually bake the cake.
Junio C Hamano <gitster@pobox.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> To just get rid of reflogs from a repository, I think 'git reflog drop' >> or something similar would indeed be a better way to go about it. As you >> stated, with this patch, we could still face the issue wherein the >> administartor could re-enable reflog and we're back to square one. > > Exactly. > >> Why I think this patch is important, is because while there could be >> existing reflogs in a repository, if one doesn't care about _reflogs_ >> there could be significant performance gains while migrating repos from >> one backend to the other, while also leaving the reflogs behind. > > Sure. It could be done with a combination of "git reflog drop && > git refs migrate" (or "git refs migrate && git reflog drop", if the > migrated-to backend performs better when it drops reflogs). > > With "git refs migrate --skip-reflog" alone, we are very limited. > We can lose reflogs _only_ when we are migrating. > > Doing it _during_ migration may very well be more efficient than > dropping first and then migrate (or the other way around), so I do > not have much against the "migrate --skip-reflog" existing. But I > find it backwards to add it first _before_ we have a tool that is > more generally applicable to wider situations, i.e., "reflog drop". > > IOW, it feels as if we are worried about icing on the cake long > before we actually bake the cake. I understand and agree. I do see a lot of benefits having 'git reflog drop' too, so I'll pick it up and send a patch series towards the same next (as soon as I send in the next version of the partial transactions series :), which is taking me a bit of time with some other ongoing work at $DAYJOB). Thanks!
Karthik Nayak <karthik.188@gmail.com> writes: > The 'git-refs(1)' migrate subcommand, which transfers repositories > between reference backends, currently migrates reflogs by default as of > 246cebe320 (refs: add support for migrating reflogs, 2024-12-16). > > While this behavior is desirable for most client-side repositories, > server-side repositories are not expected to contain reflogs. However, > due to historical reasons, some may still have them. This could be > caused, for example, by bugs, misconfiguration, or an administrator > enabling reflogs on the server for debugging purposes. > > To address this, introduce the --skip-reflog flag, allowing users to > bypass reflog migration. This ensures that the repository ends up in the > expected state after migration. It wasn't really obvious to me this change removes the reflog, instead of "skipping". So I was surprised the reflog was removed after migrating, instead of staying in tact using the files backend. Only after reading through the test and the discussion in this thread I realized that's the intented behavior. So can I suggest to name the option `--no-reflog`? To me that makes it more obvious the reflog won't exist no more after migrating, and is more in line with the common UX of Git. Also emphasizing this more clearly in the commit message and help message also would be advised.
Toon Claes <toon@iotcl.com> writes: > So can I suggest to name the option `--no-reflog`? To me that makes it > more obvious the reflog won't exist no more after migrating, and is more > in line with the common UX of Git. Also emphasizing this more clearly in > the commit message and help message also would be advised. I have always thought, until I saw the message I am responding to, that everybody would expect that "migrate --skip=X --skip=Y" that usually migrates X and Y and Z would lose X and Y with the transition. But I realized that it was most likely because I happen to know that the choice between reftable and files backends is "which one do you take, you cannot have both at the same time", and it was clear that "skip and keep using the old form" is not on the table. For all others, your interpretation of the option name is entirely plausible. So I agree `--no-reflog` is really an excellent suggestion, even though `--reflog` option would be a no-op, and `--no-refs` would be a "Huh?" option that only logically makes sense to have for completeness but nobody would want to use ;-)
Junio C Hamano <gitster@pobox.com> writes: > Toon Claes <toon@iotcl.com> writes: > >> So can I suggest to name the option `--no-reflog`? To me that makes it >> more obvious the reflog won't exist no more after migrating, and is more >> in line with the common UX of Git. Also emphasizing this more clearly in >> the commit message and help message also would be advised. > > I have always thought, until I saw the message I am responding to, > that everybody would expect that "migrate --skip=X --skip=Y" that > usually migrates X and Y and Z would lose X and Y with the > transition. But I realized that it was most likely because I happen > to know that the choice between reftable and files backends is > "which one do you take, you cannot have both at the same time", and > it was clear that "skip and keep using the old form" is not on the > table. For all others, your interpretation of the option name is > entirely plausible. So I agree `--no-reflog` is really an excellent > suggestion, even though `--reflog` option would be a no-op, and > `--no-refs` would be a "Huh?" option that only logically makes sense > to have for completeness but nobody would want to use ;-) I share the same reaction. I didn't consider that flow of thought at all. So I too agree with name change. Let me push in a new version. Although I'm not sure if, Junio, you want to wait for the `git reflog drop` command that we were discussing before accepting this topic [1]. I'll leave that to your discretion. [1]: https://lore.kernel.org/all/xmqq4j0xpvmu.fsf@gitster.g/
Karthik Nayak <karthik.188@gmail.com> writes: > I share the same reaction. I didn't consider that flow of thought at > all. So I too agree with name change. Let me push in a new version. > Although I'm not sure if, Junio, you want to wait for the `git reflog > drop` command that we were discussing before accepting this topic [1]. > I'll leave that to your discretion. Well, from my point of view, "reflog drop", if can be done for both files and reftable without too much hassle, would be a greater addition to the toolset than the value "you can drop while migrating, but you need to remember to pass that option" gives us ;-) So it really depends on how involved the work to add "drop" thing would be. Thanks.
diff --git a/builtin/refs.c b/builtin/refs.c index a29f19583474518ee0942ea53c39cbdf9661c5e2..30be0254c14dd3d07693d70c25dddc9990756e9c 100644 --- a/builtin/refs.c +++ b/builtin/refs.c @@ -30,6 +30,9 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix, OPT_BIT(0, "dry-run", &flags, N_("perform a non-destructive dry-run"), REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN), + OPT_BIT(0, "skip-reflog", &flags, + N_("skip migrating reflogs"), + REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG), OPT_END(), }; struct strbuf errbuf = STRBUF_INIT; diff --git a/refs.c b/refs.c index f4094a326a9f88f979654b668cc9c3d27d83cb5d..5e8f5c06fa68d16c93ee11edd9742995eea994b6 100644 --- a/refs.c +++ b/refs.c @@ -3035,9 +3035,11 @@ int repo_migrate_ref_storage_format(struct repository *repo, if (ret < 0) goto done; - ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data); - if (ret < 0) - goto done; + if (!(flags & REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG)) { + ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data); + if (ret < 0) + goto done; + } ret = ref_transaction_commit(transaction, errbuf); if (ret < 0) diff --git a/refs.h b/refs.h index a0cdd99250e8286b55808b697b0a94afac5d8319..ccee8fc6705e6e93a1c6234e7395180f8dfd815b 100644 --- a/refs.h +++ b/refs.h @@ -1157,8 +1157,11 @@ int is_pseudo_ref(const char *refname); * - REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN: perform a dry-run migration * without touching the main repository. The result will be written into a * temporary ref storage directory. + * + * - REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG: skip migration of reflogs. */ -#define REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN (1 << 0) +#define REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN (1 << 0) +#define REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG (1 << 1) /* * Migrate the ref storage format used by the repository to the diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh index a6d9b35a46eb59350aa0d59d982a2fbfaecf1448..83466bde1865071ee633c738fd5c2ad9ae65bca3 100755 --- a/t/t1460-refs-migrate.sh +++ b/t/t1460-refs-migrate.sh @@ -9,14 +9,21 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME # Migrate the provided repository from one format to the other and # verify that the references and logs are migrated over correctly. -# Usage: test_migration <repo> <format> <skip_reflog_verify> +# Usage: test_migration <repo> <format> [<skip_reflog_verify> [<options...>]] # <repo> is the relative path to the repo to be migrated. # <format> is the ref format to be migrated to. -# <skip_reflog_verify> (true or false) whether to skip reflog verification. +# <skip_reflog_verify> (default: false) whether to skip reflog verification. +# <options...> are other options be passed directly to 'git refs migrate'. test_migration () { repo=$1 && format=$2 && - skip_reflog_verify=${3:-false} && + shift 2 && + skip_reflog_verify=false && + if test $# -ge 1 + then + skip_reflog_verify=$1 + shift + fi && git -C "$repo" for-each-ref --include-root-refs \ --format='%(refname) %(objectname) %(symref)' >expect && if ! $skip_reflog_verify @@ -25,7 +32,7 @@ test_migration () { git -C "$repo" reflog list >expect_log_list fi && - git -C "$repo" refs migrate --ref-format="$2" && + git -C "$repo" refs migrate --ref-format="$format" "$@" && git -C "$repo" for-each-ref --include-root-refs \ --format='%(refname) %(objectname) %(symref)' >actual && @@ -241,6 +248,19 @@ do test_cmp expect.reflog actual.reflog ) ' + + test_expect_success "$from_format -> $to_format: skip reflog with --skip-reflog" ' + test_when_finished "rm -rf repo" && + git init --ref-format=$from_format repo && + test_commit -C repo initial && + # we see that the repository contains reflogs. + git -C repo reflog --all >reflogs && + test_line_count = 2 reflogs && + test_migration repo "$to_format" true --skip-reflog && + # there should be no reflogs post migration. + git -C repo reflog --all >reflogs && + test_must_be_empty reflogs + ' done done
The 'git-refs(1)' migrate subcommand, which transfers repositories between reference backends, currently migrates reflogs by default as of 246cebe320 (refs: add support for migrating reflogs, 2024-12-16). While this behavior is desirable for most client-side repositories, server-side repositories are not expected to contain reflogs. However, due to historical reasons, some may still have them. This could be caused, for example, by bugs, misconfiguration, or an administrator enabling reflogs on the server for debugging purposes. To address this, introduce the --skip-reflog flag, allowing users to bypass reflog migration. This ensures that the repository ends up in the expected state after migration. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- builtin/refs.c | 3 +++ refs.c | 8 +++++--- refs.h | 5 ++++- t/t1460-refs-migrate.sh | 28 ++++++++++++++++++++++++---- 4 files changed, 36 insertions(+), 8 deletions(-) Karthik Nayak (1): builtin/refs: add '--skip-reflog' flag to bypass reflog migration --- Changes in v3: - Make changes to the test: - Use "$@" instead of $@ - Mark optional arguments correctly - Use <options...> instead of <...options> as the former is more widely used. - Link to v2: https://lore.kernel.org/r/20250211-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v2-1-991a2ec9a796@gmail.com Changes in v2: - Fix typo in commit mesasge and clarify the intent. - Modify the test to use `test_line_count` and `test_must_be_empty`. - Link to v1: https://lore.kernel.org/r/20250207-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v1-1-7d40f3b4e30b@gmail.com Range-diff versus v2: 1: fcc313ad3b ! 1: bbc3d154cf builtin/refs: add '--skip-reflog' flag to bypass reflog migration @@ t/t1460-refs-migrate.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME # Migrate the provided repository from one format to the other and # verify that the references and logs are migrated over correctly. -# Usage: test_migration <repo> <format> <skip_reflog_verify> -+# Usage: test_migration <repo> <format> <skip_reflog_verify> <...options> ++# Usage: test_migration <repo> <format> [<skip_reflog_verify> [<options...>]] # <repo> is the relative path to the repo to be migrated. # <format> is the ref format to be migrated to. - # <skip_reflog_verify> (true or false) whether to skip reflog verification. -+# <...options> are other options be passed directly to 'git refs migrate'. +-# <skip_reflog_verify> (true or false) whether to skip reflog verification. ++# <skip_reflog_verify> (default: false) whether to skip reflog verification. ++# <options...> are other options be passed directly to 'git refs migrate'. test_migration () { repo=$1 && format=$2 && - skip_reflog_verify=${3:-false} && -+ shift $(( $# >= 3 ? 3 : 2 )) && +- skip_reflog_verify=${3:-false} && ++ shift 2 && ++ skip_reflog_verify=false && ++ if test $# -ge 1 ++ then ++ skip_reflog_verify=$1 ++ shift ++ fi && git -C "$repo" for-each-ref --include-root-refs \ --format='%(refname) %(objectname) %(symref)' >expect && if ! $skip_reflog_verify @@ t/t1460-refs-migrate.sh: test_migration () { fi && - git -C "$repo" refs migrate --ref-format="$2" && -+ git -C "$repo" refs migrate --ref-format="$format" $@ && ++ git -C "$repo" refs migrate --ref-format="$format" "$@" && git -C "$repo" for-each-ref --include-root-refs \ --format='%(refname) %(objectname) %(symref)' >actual && base-commit: bc204b742735ae06f65bb20291c95985c9633b7f Thanks - Karthik --- builtin/refs.c | 3 +++ refs.c | 8 +++++--- refs.h | 5 ++++- t/t1460-refs-migrate.sh | 28 ++++++++++++++++++++++++---- 4 files changed, 36 insertions(+), 8 deletions(-)