diff mbox series

[v3] builtin/refs: add '--skip-reflog' flag to bypass reflog migration

Message ID 20250212-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v3-1-98b2c4d2bb0c@gmail.com (mailing list archive)
State New
Headers show
Series [v3] builtin/refs: add '--skip-reflog' flag to bypass reflog migration | expand

Commit Message

Karthik Nayak Feb. 12, 2025, 6:38 p.m. UTC
The 'git-refs(1)' migrate subcommand, which transfers repositories
between reference backends, currently migrates reflogs by default as of
246cebe320 (refs: add support for migrating reflogs, 2024-12-16).

While this behavior is desirable for most client-side repositories,
server-side repositories are not expected to contain reflogs. However,
due to historical reasons, some may still have them. This could be
caused, for example, by bugs, misconfiguration, or an administrator
enabling reflogs on the server for debugging purposes.

To address this, introduce the --skip-reflog flag, allowing users to
bypass reflog migration. This ensures that the repository ends up in the
expected state after migration.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/refs.c          |  3 +++
 refs.c                  |  8 +++++---
 refs.h                  |  5 ++++-
 t/t1460-refs-migrate.sh | 28 ++++++++++++++++++++++++----
 4 files changed, 36 insertions(+), 8 deletions(-)

Karthik Nayak (1):

builtin/refs: add '--skip-reflog' flag to bypass reflog migration
---
Changes in v3:
- Make changes to the test:
  - Use "$@" instead of $@
  - Mark optional arguments correctly
  - Use <options...> instead of <...options> as the former is more widely
    used.
- Link to v2: https://lore.kernel.org/r/20250211-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v2-1-991a2ec9a796@gmail.com

Changes in v2:
- Fix typo in commit mesasge and clarify the intent.
- Modify the test to use `test_line_count` and `test_must_be_empty`.
- Link to v1: https://lore.kernel.org/r/20250207-477-refs-migrate-add-a-flag-to-ignore-reflogs-during-migration-v1-1-7d40f3b4e30b@gmail.com

Range-diff versus v2:

1:  fcc313ad3b ! 1:  bbc3d154cf builtin/refs: add '--skip-reflog' flag to bypass reflog migration
    @@ t/t1460-refs-migrate.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
      # Migrate the provided repository from one format to the other and
      # verify that the references and logs are migrated over correctly.
     -# Usage: test_migration <repo> <format> <skip_reflog_verify>
    -+# Usage: test_migration <repo> <format> <skip_reflog_verify> <...options>
    ++# Usage: test_migration <repo> <format> [<skip_reflog_verify> [<options...>]]
      #   <repo> is the relative path to the repo to be migrated.
      #   <format> is the ref format to be migrated to.
    - #   <skip_reflog_verify> (true or false) whether to skip reflog verification.
    -+#   <...options> are other options be passed directly to 'git refs migrate'.
    +-#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
    ++#   <skip_reflog_verify> (default: false) whether to skip reflog verification.
    ++#   <options...> are other options be passed directly to 'git refs migrate'.
      test_migration () {
      	repo=$1 &&
      	format=$2 &&
    - 	skip_reflog_verify=${3:-false} &&
    -+	shift $(( $# >= 3 ? 3 : 2 )) &&
    +-	skip_reflog_verify=${3:-false} &&
    ++	shift 2 &&
    ++	skip_reflog_verify=false &&
    ++	if test $# -ge 1
    ++	then
    ++		skip_reflog_verify=$1
    ++		shift
    ++	fi &&
      	git -C "$repo" for-each-ref --include-root-refs \
      		--format='%(refname) %(objectname) %(symref)' >expect &&
      	if ! $skip_reflog_verify
    @@ t/t1460-refs-migrate.sh: test_migration () {
      	fi &&
      
     -	git -C "$repo" refs migrate --ref-format="$2" &&
    -+	git -C "$repo" refs migrate --ref-format="$format" $@ &&
    ++	git -C "$repo" refs migrate --ref-format="$format" "$@" &&
      
      	git -C "$repo" for-each-ref --include-root-refs \
      		--format='%(refname) %(objectname) %(symref)' >actual &&


base-commit: bc204b742735ae06f65bb20291c95985c9633b7f

Thanks
- Karthik
---
 builtin/refs.c          |  3 +++
 refs.c                  |  8 +++++---
 refs.h                  |  5 ++++-
 t/t1460-refs-migrate.sh | 28 ++++++++++++++++++++++++----
 4 files changed, 36 insertions(+), 8 deletions(-)

Comments

Junio C Hamano Feb. 12, 2025, 10:25 p.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> The 'git-refs(1)' migrate subcommand, which transfers repositories
> between reference backends, currently migrates reflogs by default as of
> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16).
>
> While this behavior is desirable for most client-side repositories,
> server-side repositories are not expected to contain reflogs. However,
> due to historical reasons, some may still have them. This could be
> caused, for example, by bugs, misconfiguration, or an administrator
> enabling reflogs on the server for debugging purposes.
>
> To address this, introduce the --skip-reflog flag, allowing users to
> bypass reflog migration. This ensures that the repository ends up in the
> expected state after migration.

I do not quite understand the motivation behind this change.

If a repository has reflog that you do not need by mistake or
misconfiguration, I agree that there should be a way for you to
remove the reflog.  Removing it while converting the ref backend may
be a convenient way if and only if the reason why you noticed such a
repository with unwanted reflog is because you were about to migrate
it, but regardless of when you notice such refs with unwanted log,
you would want to be able to drop their logs.  You may not even be
planning to migrate your backend when you noticed that the refs have
unwanted log, you may have already migrated long time ago when you
noticed that the refs have unwanted log, or you may not even be
planning to migrate in the first place.  Even after you migrated
your backend, an administrator may have to enable reflog for
debugging, and then after the administrator is done, then what?
Should the backend migrated back from reftable to files and then
back again, only to pass this --skip-reflog option?

Wouldn't it be a lot more flexible if you add a new subcommand
"drop" to the "git reflog" command?
Karthik Nayak Feb. 13, 2025, 9:22 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The 'git-refs(1)' migrate subcommand, which transfers repositories
>> between reference backends, currently migrates reflogs by default as of
>> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16).
>>
>> While this behavior is desirable for most client-side repositories,
>> server-side repositories are not expected to contain reflogs. However,
>> due to historical reasons, some may still have them. This could be
>> caused, for example, by bugs, misconfiguration, or an administrator
>> enabling reflogs on the server for debugging purposes.
>>
>> To address this, introduce the --skip-reflog flag, allowing users to
>> bypass reflog migration. This ensures that the repository ends up in the
>> expected state after migration.
>
> I do not quite understand the motivation behind this change.
>
> If a repository has reflog that you do not need by mistake or
> misconfiguration, I agree that there should be a way for you to
> remove the reflog.  Removing it while converting the ref backend may
> be a convenient way if and only if the reason why you noticed such a
> repository with unwanted reflog is because you were about to migrate
> it, but regardless of when you notice such refs with unwanted log,
> you would want to be able to drop their logs.  You may not even be
> planning to migrate your backend when you noticed that the refs have
> unwanted log, you may have already migrated long time ago when you
> noticed that the refs have unwanted log, or you may not even be
> planning to migrate in the first place.  Even after you migrated
> your backend, an administrator may have to enable reflog for
> debugging, and then after the administrator is done, then what?
> Should the backend migrated back from reftable to files and then
> back again, only to pass this --skip-reflog option?
>
> Wouldn't it be a lot more flexible if you add a new subcommand
> "drop" to the "git reflog" command?

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The 'git-refs(1)' migrate subcommand, which transfers repositories
>> between reference backends, currently migrates reflogs by default as of
>> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16).
>>
>> While this behavior is desirable for most client-side repositories,
>> server-side repositories are not expected to contain reflogs. However,
>> due to historical reasons, some may still have them. This could be
>> caused, for example, by bugs, misconfiguration, or an administrator
>> enabling reflogs on the server for debugging purposes.
>>
>> To address this, introduce the --skip-reflog flag, allowing users to
>> bypass reflog migration. This ensures that the repository ends up in the
>> expected state after migration.
>
> I do not quite understand the motivation behind this change.
>
> If a repository has reflog that you do not need by mistake or
> misconfiguration, I agree that there should be a way for you to
> remove the reflog.  Removing it while converting the ref backend may
> be a convenient way if and only if the reason why you noticed such a
> repository with unwanted reflog is because you were about to migrate
> it, but regardless of when you notice such refs with unwanted log,
> you would want to be able to drop their logs.  You may not even be
> planning to migrate your backend when you noticed that the refs have
> unwanted log, you may have already migrated long time ago when you
> noticed that the refs have unwanted log, or you may not even be
> planning to migrate in the first place.  Even after you migrated
> your backend, an administrator may have to enable reflog for
> debugging, and then after the administrator is done, then what?
> Should the backend migrated back from reftable to files and then
> back again, only to pass this --skip-reflog option?
>
> Wouldn't it be a lot more flexible if you add a new subcommand
> "drop" to the "git reflog" command?

To just get rid of reflogs from a repository, I think 'git reflog drop'
or something similar would indeed be a better way to go about it. As you
stated, with this patch, we could still face the issue wherein the
administartor could re-enable reflog and we're back to square one.

Why I think this patch is important, is because while there could be
existing reflogs in a repository, if one doesn't care about _reflogs_
there could be significant performance gains while migrating repos from
one backend to the other, while also leaving the reflogs behind. The
intent of the patch is not to be a detterrent for reflogs. The `git refs
migrate` command is not the pathway for that. Rather, the goal is to
ensure we skip them during migraiton because we do not want to bother
migrating them.

Saying this, another option is to have `git reflog drop` and then
perform the migration, but I think it does makes sense to provide users
with a fine-tuning during migration to allow them to choose what they
want to migrate.

I know that in GitLab we have repositories with millions of references
and while we generally have reflogs disabled. There could be
repositories with reflogs enabled during debugging or just old
repositories where reflogs always existed. Since migrating backends
blocks all operations on the repository, it would be vital to also
optimize it as much as possible.

Thanks,
Karthik
Junio C Hamano Feb. 13, 2025, 6:06 p.m. UTC | #3
Karthik Nayak <karthik.188@gmail.com> writes:

> To just get rid of reflogs from a repository, I think 'git reflog drop'
> or something similar would indeed be a better way to go about it. As you
> stated, with this patch, we could still face the issue wherein the
> administartor could re-enable reflog and we're back to square one.

Exactly.

> Why I think this patch is important, is because while there could be
> existing reflogs in a repository, if one doesn't care about _reflogs_
> there could be significant performance gains while migrating repos from
> one backend to the other, while also leaving the reflogs behind.

Sure.  It could be done with a combination of "git reflog drop &&
git refs migrate" (or "git refs migrate && git reflog drop", if the
migrated-to backend performs better when it drops reflogs).

With "git refs migrate --skip-reflog" alone, we are very limited.
We can lose reflogs _only_ when we are migrating.

Doing it _during_ migration may very well be more efficient than
dropping first and then migrate (or the other way around), so I do
not have much against the "migrate --skip-reflog" existing.  But I
find it backwards to add it first _before_ we have a tool that is
more generally applicable to wider situations, i.e., "reflog drop".

IOW, it feels as if we are worried about icing on the cake long
before we actually bake the cake.
Karthik Nayak Feb. 14, 2025, 8:50 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> To just get rid of reflogs from a repository, I think 'git reflog drop'
>> or something similar would indeed be a better way to go about it. As you
>> stated, with this patch, we could still face the issue wherein the
>> administartor could re-enable reflog and we're back to square one.
>
> Exactly.
>
>> Why I think this patch is important, is because while there could be
>> existing reflogs in a repository, if one doesn't care about _reflogs_
>> there could be significant performance gains while migrating repos from
>> one backend to the other, while also leaving the reflogs behind.
>
> Sure.  It could be done with a combination of "git reflog drop &&
> git refs migrate" (or "git refs migrate && git reflog drop", if the
> migrated-to backend performs better when it drops reflogs).
>
> With "git refs migrate --skip-reflog" alone, we are very limited.
> We can lose reflogs _only_ when we are migrating.
>
> Doing it _during_ migration may very well be more efficient than
> dropping first and then migrate (or the other way around), so I do
> not have much against the "migrate --skip-reflog" existing.  But I
> find it backwards to add it first _before_ we have a tool that is
> more generally applicable to wider situations, i.e., "reflog drop".
>
> IOW, it feels as if we are worried about icing on the cake long
> before we actually bake the cake.

I understand and agree. I do see a lot of benefits having 'git reflog
drop' too, so I'll pick it up and send a patch series towards the same
next (as soon as I send in the next version of the partial transactions
series :), which is taking me a bit of time with some other ongoing work
at $DAYJOB).

Thanks!
Toon Claes Feb. 19, 2025, 10:06 a.m. UTC | #5
Karthik Nayak <karthik.188@gmail.com> writes:

> The 'git-refs(1)' migrate subcommand, which transfers repositories
> between reference backends, currently migrates reflogs by default as of
> 246cebe320 (refs: add support for migrating reflogs, 2024-12-16).
>
> While this behavior is desirable for most client-side repositories,
> server-side repositories are not expected to contain reflogs. However,
> due to historical reasons, some may still have them. This could be
> caused, for example, by bugs, misconfiguration, or an administrator
> enabling reflogs on the server for debugging purposes.
>
> To address this, introduce the --skip-reflog flag, allowing users to
> bypass reflog migration. This ensures that the repository ends up in the
> expected state after migration.

It wasn't really obvious to me this change removes the reflog, instead
of "skipping". So I was surprised the reflog was removed after
migrating, instead of staying in tact using the files backend.

Only after reading through the test and the discussion in this thread I
realized that's the intented behavior.

So can I suggest to name the option `--no-reflog`? To me that makes it
more obvious the reflog won't exist no more after migrating, and is more
in line with the common UX of Git. Also emphasizing this more clearly in
the commit message and help message also would be advised.
Junio C Hamano Feb. 19, 2025, 5:04 p.m. UTC | #6
Toon Claes <toon@iotcl.com> writes:

> So can I suggest to name the option `--no-reflog`? To me that makes it
> more obvious the reflog won't exist no more after migrating, and is more
> in line with the common UX of Git. Also emphasizing this more clearly in
> the commit message and help message also would be advised.

I have always thought, until I saw the message I am responding to,
that everybody would expect that "migrate --skip=X --skip=Y" that
usually migrates X and Y and Z would lose X and Y with the
transition.  But I realized that it was most likely because I happen
to know that the choice between reftable and files backends is
"which one do you take, you cannot have both at the same time", and
it was clear that "skip and keep using the old form" is not on the
table.  For all others, your interpretation of the option name is
entirely plausible.  So I agree `--no-reflog` is really an excellent
suggestion, even though `--reflog` option would be a no-op, and
`--no-refs` would be a "Huh?" option that only logically makes sense
to have for completeness but nobody would want to use ;-)
Karthik Nayak Feb. 19, 2025, 8:28 p.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>> So can I suggest to name the option `--no-reflog`? To me that makes it
>> more obvious the reflog won't exist no more after migrating, and is more
>> in line with the common UX of Git. Also emphasizing this more clearly in
>> the commit message and help message also would be advised.
>
> I have always thought, until I saw the message I am responding to,
> that everybody would expect that "migrate --skip=X --skip=Y" that
> usually migrates X and Y and Z would lose X and Y with the
> transition.  But I realized that it was most likely because I happen
> to know that the choice between reftable and files backends is
> "which one do you take, you cannot have both at the same time", and
> it was clear that "skip and keep using the old form" is not on the
> table.  For all others, your interpretation of the option name is
> entirely plausible.  So I agree `--no-reflog` is really an excellent
> suggestion, even though `--reflog` option would be a no-op, and
> `--no-refs` would be a "Huh?" option that only logically makes sense
> to have for completeness but nobody would want to use ;-)

I share the same reaction. I didn't consider that flow of thought at
all. So I too agree with name change. Let me push in a new version.
Although I'm not sure if, Junio, you want to wait for the `git reflog
drop` command that we were discussing before accepting this topic [1].
I'll leave that to your discretion.

[1]: https://lore.kernel.org/all/xmqq4j0xpvmu.fsf@gitster.g/
Junio C Hamano Feb. 19, 2025, 9:37 p.m. UTC | #8
Karthik Nayak <karthik.188@gmail.com> writes:

> I share the same reaction. I didn't consider that flow of thought at
> all. So I too agree with name change. Let me push in a new version.
> Although I'm not sure if, Junio, you want to wait for the `git reflog
> drop` command that we were discussing before accepting this topic [1].
> I'll leave that to your discretion.

Well, from my point of view, "reflog drop", if can be done for both
files and reftable without too much hassle, would be a greater
addition to the toolset than the value "you can drop while
migrating, but you need to remember to pass that option" gives us
;-)

So it really depends on how involved the work to add "drop" thing
would be.

Thanks.
diff mbox series

Patch

diff --git a/builtin/refs.c b/builtin/refs.c
index a29f19583474518ee0942ea53c39cbdf9661c5e2..30be0254c14dd3d07693d70c25dddc9990756e9c 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -30,6 +30,9 @@  static int cmd_refs_migrate(int argc, const char **argv, const char *prefix,
 		OPT_BIT(0, "dry-run", &flags,
 			N_("perform a non-destructive dry-run"),
 			REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN),
+		OPT_BIT(0, "skip-reflog", &flags,
+			N_("skip migrating reflogs"),
+			REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG),
 		OPT_END(),
 	};
 	struct strbuf errbuf = STRBUF_INIT;
diff --git a/refs.c b/refs.c
index f4094a326a9f88f979654b668cc9c3d27d83cb5d..5e8f5c06fa68d16c93ee11edd9742995eea994b6 100644
--- a/refs.c
+++ b/refs.c
@@ -3035,9 +3035,11 @@  int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
-	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
-	if (ret < 0)
-		goto done;
+	if (!(flags & REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG)) {
+		ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+		if (ret < 0)
+			goto done;
+	}
 
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
diff --git a/refs.h b/refs.h
index a0cdd99250e8286b55808b697b0a94afac5d8319..ccee8fc6705e6e93a1c6234e7395180f8dfd815b 100644
--- a/refs.h
+++ b/refs.h
@@ -1157,8 +1157,11 @@  int is_pseudo_ref(const char *refname);
  *   - REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN: perform a dry-run migration
  *     without touching the main repository. The result will be written into a
  *     temporary ref storage directory.
+ *
+ *   - REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG: skip migration of reflogs.
  */
-#define REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN (1 << 0)
+#define REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN      (1 << 0)
+#define REPO_MIGRATE_REF_STORAGE_FORMAT_SKIP_REFLOG (1 << 1)
 
 /*
  * Migrate the ref storage format used by the repository to the
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index a6d9b35a46eb59350aa0d59d982a2fbfaecf1448..83466bde1865071ee633c738fd5c2ad9ae65bca3 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -9,14 +9,21 @@  export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 # Migrate the provided repository from one format to the other and
 # verify that the references and logs are migrated over correctly.
-# Usage: test_migration <repo> <format> <skip_reflog_verify>
+# Usage: test_migration <repo> <format> [<skip_reflog_verify> [<options...>]]
 #   <repo> is the relative path to the repo to be migrated.
 #   <format> is the ref format to be migrated to.
-#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
+#   <skip_reflog_verify> (default: false) whether to skip reflog verification.
+#   <options...> are other options be passed directly to 'git refs migrate'.
 test_migration () {
 	repo=$1 &&
 	format=$2 &&
-	skip_reflog_verify=${3:-false} &&
+	shift 2 &&
+	skip_reflog_verify=false &&
+	if test $# -ge 1
+	then
+		skip_reflog_verify=$1
+		shift
+	fi &&
 	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
 	if ! $skip_reflog_verify
@@ -25,7 +32,7 @@  test_migration () {
 	   git -C "$repo" reflog list >expect_log_list
 	fi &&
 
-	git -C "$repo" refs migrate --ref-format="$2" &&
+	git -C "$repo" refs migrate --ref-format="$format" "$@" &&
 
 	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
@@ -241,6 +248,19 @@  do
 				test_cmp expect.reflog actual.reflog
 			)
 		'
+
+		test_expect_success "$from_format -> $to_format: skip reflog with --skip-reflog" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			# we see that the repository contains reflogs.
+			git -C repo reflog --all >reflogs &&
+			test_line_count = 2 reflogs &&
+			test_migration repo "$to_format" true --skip-reflog &&
+			# there should be no reflogs post migration.
+			git -C repo reflog --all >reflogs &&
+			test_must_be_empty reflogs
+		'
 	done
 done