diff mbox series

[2/3] setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT

Message ID 20250130-b4-pks-reinit-default-ref-format-v1-2-d2769ca01207@pks.im (mailing list archive)
State Accepted
Commit 796fda3f786b3cd5518462b46895244dfecad63c
Headers show
Series setup: fix reinit of repos with different formats | expand

Commit Message

Patrick Steinhardt Jan. 30, 2025, 4:24 p.m. UTC
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
the default ref format that new repostiories shall be initialized with.
While this is the expected behaviour when creating a new repository, it
is not when reinitializing a repository: we should retain the ref format
currently used by it in that case.

This doesn't work correctly right now:

    $ git init --ref-format=files repo
    Initialized empty Git repository in /tmp/repo/.git/
    $ GIT_DEFAULT_REF_FORMAT=reftable git init repo
    fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory

Instead of retaining the current ref format, the reinitialization tries
to reinitialize the repository with the different format. This action
fails when git-init(1) tries to write the ".git/refs/heads" stub, which
in the context of the reftable backend is always written as a file so
that we can detect clients which inadvertently try to access the repo
with the wrong ref format. Seems like the protection mechanism works for
this case, as well.

Fix the issue by ignoring the environment variable in case the repo has
already been initialized with a ref storage format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 setup.c         | 4 +++-
 t/t0001-init.sh | 9 +++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 30, 2025, 10:40 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
> the default ref format that new repostiories shall be initialized with.
> While this is the expected behaviour when creating a new repository, it
> is not when reinitializing a repository: we should retain the ref format
> currently used by it in that case.
>
> This doesn't work correctly right now:
>
>     $ git init --ref-format=files repo
>     Initialized empty Git repository in /tmp/repo/.git/
>     $ GIT_DEFAULT_REF_FORMAT=reftable git init repo
>     fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory
>
> Instead of retaining the current ref format, the reinitialization tries
> to reinitialize the repository with the different format. This action
> fails when git-init(1) tries to write the ".git/refs/heads" stub, which
> in the context of the reftable backend is always written as a file so
> that we can detect clients which inadvertently try to access the repo
> with the wrong ref format. Seems like the protection mechanism works for
> this case, as well.

Good finding.  A plausible alternative behaviour could be to do the
backend migration when this is asked, and we might gain consensus to
do so in the (far) future, but I agree that it is a good direction
to go in the short term to match the behaviour of the code to the
documented expectation.
Junio C Hamano Jan. 31, 2025, 10:38 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> Instead of retaining the current ref format, the reinitialization tries
> to reinitialize the repository with the different format. This action
> fails when git-init(1) tries to write the ".git/refs/heads" stub, which
> in the context of the reftable backend is always written as a file so
> that we can detect clients which inadvertently try to access the repo
> with the wrong ref format. Seems like the protection mechanism works for
> this case, as well.
>
> Fix the issue by ignoring the environment variable in case the repo has
> already been initialized with a ref storage format.

It certainly is better than corrupting the repository, but if we are
to do this change, shouldn't we at least issue a warning to tell
users that (a part of) their request was ignored, instead of
silently ignoring the specified ref-format?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  setup.c         | 4 +++-
>  t/t0001-init.sh | 9 +++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/setup.c b/setup.c
> index 8a488f3e7c..53ffeabc5b 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -2534,7 +2534,9 @@ static void repository_format_configure(struct repository_format *repo_fmt,
>  		ref_format = ref_storage_format_by_name(env);
>  		if (ref_format == REF_STORAGE_FORMAT_UNKNOWN)
>  			die(_("unknown ref storage format '%s'"), env);
> -		repo_fmt->ref_storage_format = ref_format;
> +		if (repo_fmt->version < 0 ||
> +		    repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> +			repo_fmt->ref_storage_format = ref_format;

Perhaps something silly like this?

		if (0 <= repo_fmt->version &&
		    repo_fmt->ref_storage_format != REF_STORAGE_FORMAT_UNKNOWN)
			warning("ignoring the specified ref-format");
		else
			repo_fmt->ref_storage_format = ref_format;

In the longer term, we might want to consider automatically
migrating the ref backend (by calling into "git ref migrate"),
but it is a good first move to stop damaging the repository.

Thanks.
Patrick Steinhardt Feb. 3, 2025, 5:29 a.m. UTC | #3
On Fri, Jan 31, 2025 at 02:38:20PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Instead of retaining the current ref format, the reinitialization tries
> > to reinitialize the repository with the different format. This action
> > fails when git-init(1) tries to write the ".git/refs/heads" stub, which
> > in the context of the reftable backend is always written as a file so
> > that we can detect clients which inadvertently try to access the repo
> > with the wrong ref format. Seems like the protection mechanism works for
> > this case, as well.
> >
> > Fix the issue by ignoring the environment variable in case the repo has
> > already been initialized with a ref storage format.
> 
> It certainly is better than corrupting the repository, but if we are
> to do this change, shouldn't we at least issue a warning to tell
> users that (a part of) their request was ignored, instead of
> silently ignoring the specified ref-format?

I don't think we should. If this was passed on the command line then
yes, we should flag this and already die indeed. But this is an
environment variable that allows you to set the default format. From my
point of view it is totally expected that this doesn't cause the format
of existing repositories to change.

> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  setup.c         | 4 +++-
> >  t/t0001-init.sh | 9 +++++++++
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/setup.c b/setup.c
> > index 8a488f3e7c..53ffeabc5b 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -2534,7 +2534,9 @@ static void repository_format_configure(struct repository_format *repo_fmt,
> >  		ref_format = ref_storage_format_by_name(env);
> >  		if (ref_format == REF_STORAGE_FORMAT_UNKNOWN)
> >  			die(_("unknown ref storage format '%s'"), env);
> > -		repo_fmt->ref_storage_format = ref_format;
> > +		if (repo_fmt->version < 0 ||
> > +		    repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> > +			repo_fmt->ref_storage_format = ref_format;
> 
> Perhaps something silly like this?
> 
> 		if (0 <= repo_fmt->version &&
> 		    repo_fmt->ref_storage_format != REF_STORAGE_FORMAT_UNKNOWN)
> 			warning("ignoring the specified ref-format");
> 		else
> 			repo_fmt->ref_storage_format = ref_format;
> 
> In the longer term, we might want to consider automatically
> migrating the ref backend (by calling into "git ref migrate"),
> but it is a good first move to stop damaging the repository.

I think keeping migrations explicit is worthwhile. Migrations are a
somewhat risky thing, so explicitly making the user ask for them is not
a bad thing. I personally wouldn't expect git-init(1) to migrate data.
After all, it is supposed to initialize stuff, not rewrite it.

This is doubly true for environment variables, where it is so extremely
easy to accidentally still have them defined. I don't think implicitly
converting every git-init(1) to do migrations would be a good idea there
as it would likely do the wrong thing in many cases.

So from my point of view we should treat the environment variables the
same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat".
Those indicate defaults, but do not cause us to change the format of
existing repostiories.

Patrick
Junio C Hamano Feb. 3, 2025, 2:01 p.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> So from my point of view we should treat the environment variables the
> same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat".
> Those indicate defaults, but do not cause us to change the format of
> existing repostiories.

Hmph, as somebody who often does things like

    $ GIT_EDITOR=: git do-something
    $ GIT_AUTHOR_NAME=foo GIT_AUTHOR_EMAIL=bar@baz git commit -a

I do not necessarily see the environment variables as replacement
for configured defaults.  They are, at least to me, more like a
single-shot override of the configured defaults, so if we were to
complain and error out command line options (we do do so, don't we?),
I would expect the environment variable that gives a single-shot
setting to be treated the same way.

Thanks.
Patrick Steinhardt Feb. 3, 2025, 3:02 p.m. UTC | #5
On Mon, Feb 03, 2025 at 06:01:33AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > So from my point of view we should treat the environment variables the
> > same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat".
> > Those indicate defaults, but do not cause us to change the format of
> > existing repostiories.
> 
> Hmph, as somebody who often does things like
> 
>     $ GIT_EDITOR=: git do-something
>     $ GIT_AUTHOR_NAME=foo GIT_AUTHOR_EMAIL=bar@baz git commit -a
> 
> I do not necessarily see the environment variables as replacement
> for configured defaults.  They are, at least to me, more like a
> single-shot override of the configured defaults, so if we were to
> complain and error out command line options (we do do so, don't we?),
> I would expect the environment variable that gives a single-shot
> setting to be treated the same way.

Especially the second one is a good example though that works mostly as
I propose: GIT_AUTHOR_NAME will impact _new_ commits, but not _existing_
ones when you for example `--amend` the commit. So this is somewhat
equivalent to how both GIT_DEFAULT_REF_FORMAT and GIT_DEFAULT_HASH work
with git-init(1), isn't it?

Patrick
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 8a488f3e7c..53ffeabc5b 100644
--- a/setup.c
+++ b/setup.c
@@ -2534,7 +2534,9 @@  static void repository_format_configure(struct repository_format *repo_fmt,
 		ref_format = ref_storage_format_by_name(env);
 		if (ref_format == REF_STORAGE_FORMAT_UNKNOWN)
 			die(_("unknown ref storage format '%s'"), env);
-		repo_fmt->ref_storage_format = ref_format;
+		if (repo_fmt->version < 0 ||
+		    repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
+			repo_fmt->ref_storage_format = ref_format;
 	} else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) {
 		repo_fmt->ref_storage_format = cfg.ref_format;
 	}
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 213d5984b1..6dff8b75f1 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -697,6 +697,15 @@  do
 		git -C refformat rev-parse --show-ref-format >actual &&
 		test_cmp expect actual
 	'
+
+	test_expect_success "reinit repository with GIT_DEFAULT_REF_FORMAT=$format does not change format" '
+		test_when_finished "rm -rf refformat" &&
+		git init refformat &&
+		git -C refformat rev-parse --show-ref-format >expect &&
+		GIT_DEFAULT_REF_FORMAT=$format git init refformat &&
+		git -C refformat rev-parse --show-ref-format >actual &&
+		test_cmp expect actual
+	'
 done
 
 test_expect_success "--ref-format= overrides GIT_DEFAULT_REF_FORMAT" '