Message ID | 20250130-b4-pks-reinit-default-ref-format-v1-2-d2769ca01207@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 796fda3f786b3cd5518462b46895244dfecad63c |
Headers | show |
Series | setup: fix reinit of repos with different formats | expand |
Patrick Steinhardt <ps@pks.im> writes: > The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence > the default ref format that new repostiories shall be initialized with. > While this is the expected behaviour when creating a new repository, it > is not when reinitializing a repository: we should retain the ref format > currently used by it in that case. > > This doesn't work correctly right now: > > $ git init --ref-format=files repo > Initialized empty Git repository in /tmp/repo/.git/ > $ GIT_DEFAULT_REF_FORMAT=reftable git init repo > fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory > > Instead of retaining the current ref format, the reinitialization tries > to reinitialize the repository with the different format. This action > fails when git-init(1) tries to write the ".git/refs/heads" stub, which > in the context of the reftable backend is always written as a file so > that we can detect clients which inadvertently try to access the repo > with the wrong ref format. Seems like the protection mechanism works for > this case, as well. Good finding. A plausible alternative behaviour could be to do the backend migration when this is asked, and we might gain consensus to do so in the (far) future, but I agree that it is a good direction to go in the short term to match the behaviour of the code to the documented expectation.
Patrick Steinhardt <ps@pks.im> writes: > Instead of retaining the current ref format, the reinitialization tries > to reinitialize the repository with the different format. This action > fails when git-init(1) tries to write the ".git/refs/heads" stub, which > in the context of the reftable backend is always written as a file so > that we can detect clients which inadvertently try to access the repo > with the wrong ref format. Seems like the protection mechanism works for > this case, as well. > > Fix the issue by ignoring the environment variable in case the repo has > already been initialized with a ref storage format. It certainly is better than corrupting the repository, but if we are to do this change, shouldn't we at least issue a warning to tell users that (a part of) their request was ignored, instead of silently ignoring the specified ref-format? > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > setup.c | 4 +++- > t/t0001-init.sh | 9 +++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/setup.c b/setup.c > index 8a488f3e7c..53ffeabc5b 100644 > --- a/setup.c > +++ b/setup.c > @@ -2534,7 +2534,9 @@ static void repository_format_configure(struct repository_format *repo_fmt, > ref_format = ref_storage_format_by_name(env); > if (ref_format == REF_STORAGE_FORMAT_UNKNOWN) > die(_("unknown ref storage format '%s'"), env); > - repo_fmt->ref_storage_format = ref_format; > + if (repo_fmt->version < 0 || > + repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) > + repo_fmt->ref_storage_format = ref_format; Perhaps something silly like this? if (0 <= repo_fmt->version && repo_fmt->ref_storage_format != REF_STORAGE_FORMAT_UNKNOWN) warning("ignoring the specified ref-format"); else repo_fmt->ref_storage_format = ref_format; In the longer term, we might want to consider automatically migrating the ref backend (by calling into "git ref migrate"), but it is a good first move to stop damaging the repository. Thanks.
On Fri, Jan 31, 2025 at 02:38:20PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Instead of retaining the current ref format, the reinitialization tries > > to reinitialize the repository with the different format. This action > > fails when git-init(1) tries to write the ".git/refs/heads" stub, which > > in the context of the reftable backend is always written as a file so > > that we can detect clients which inadvertently try to access the repo > > with the wrong ref format. Seems like the protection mechanism works for > > this case, as well. > > > > Fix the issue by ignoring the environment variable in case the repo has > > already been initialized with a ref storage format. > > It certainly is better than corrupting the repository, but if we are > to do this change, shouldn't we at least issue a warning to tell > users that (a part of) their request was ignored, instead of > silently ignoring the specified ref-format? I don't think we should. If this was passed on the command line then yes, we should flag this and already die indeed. But this is an environment variable that allows you to set the default format. From my point of view it is totally expected that this doesn't cause the format of existing repositories to change. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > > --- > > setup.c | 4 +++- > > t/t0001-init.sh | 9 +++++++++ > > 2 files changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/setup.c b/setup.c > > index 8a488f3e7c..53ffeabc5b 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -2534,7 +2534,9 @@ static void repository_format_configure(struct repository_format *repo_fmt, > > ref_format = ref_storage_format_by_name(env); > > if (ref_format == REF_STORAGE_FORMAT_UNKNOWN) > > die(_("unknown ref storage format '%s'"), env); > > - repo_fmt->ref_storage_format = ref_format; > > + if (repo_fmt->version < 0 || > > + repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) > > + repo_fmt->ref_storage_format = ref_format; > > Perhaps something silly like this? > > if (0 <= repo_fmt->version && > repo_fmt->ref_storage_format != REF_STORAGE_FORMAT_UNKNOWN) > warning("ignoring the specified ref-format"); > else > repo_fmt->ref_storage_format = ref_format; > > In the longer term, we might want to consider automatically > migrating the ref backend (by calling into "git ref migrate"), > but it is a good first move to stop damaging the repository. I think keeping migrations explicit is worthwhile. Migrations are a somewhat risky thing, so explicitly making the user ask for them is not a bad thing. I personally wouldn't expect git-init(1) to migrate data. After all, it is supposed to initialize stuff, not rewrite it. This is doubly true for environment variables, where it is so extremely easy to accidentally still have them defined. I don't think implicitly converting every git-init(1) to do migrations would be a good idea there as it would likely do the wrong thing in many cases. So from my point of view we should treat the environment variables the same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat". Those indicate defaults, but do not cause us to change the format of existing repostiories. Patrick
Patrick Steinhardt <ps@pks.im> writes: > So from my point of view we should treat the environment variables the > same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat". > Those indicate defaults, but do not cause us to change the format of > existing repostiories. Hmph, as somebody who often does things like $ GIT_EDITOR=: git do-something $ GIT_AUTHOR_NAME=foo GIT_AUTHOR_EMAIL=bar@baz git commit -a I do not necessarily see the environment variables as replacement for configured defaults. They are, at least to me, more like a single-shot override of the configured defaults, so if we were to complain and error out command line options (we do do so, don't we?), I would expect the environment variable that gives a single-shot setting to be treated the same way. Thanks.
On Mon, Feb 03, 2025 at 06:01:33AM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > So from my point of view we should treat the environment variables the > > same as we treat "init.defaultRefFormat" and "init.defaultObjectFormat". > > Those indicate defaults, but do not cause us to change the format of > > existing repostiories. > > Hmph, as somebody who often does things like > > $ GIT_EDITOR=: git do-something > $ GIT_AUTHOR_NAME=foo GIT_AUTHOR_EMAIL=bar@baz git commit -a > > I do not necessarily see the environment variables as replacement > for configured defaults. They are, at least to me, more like a > single-shot override of the configured defaults, so if we were to > complain and error out command line options (we do do so, don't we?), > I would expect the environment variable that gives a single-shot > setting to be treated the same way. Especially the second one is a good example though that works mostly as I propose: GIT_AUTHOR_NAME will impact _new_ commits, but not _existing_ ones when you for example `--amend` the commit. So this is somewhat equivalent to how both GIT_DEFAULT_REF_FORMAT and GIT_DEFAULT_HASH work with git-init(1), isn't it? Patrick
diff --git a/setup.c b/setup.c index 8a488f3e7c..53ffeabc5b 100644 --- a/setup.c +++ b/setup.c @@ -2534,7 +2534,9 @@ static void repository_format_configure(struct repository_format *repo_fmt, ref_format = ref_storage_format_by_name(env); if (ref_format == REF_STORAGE_FORMAT_UNKNOWN) die(_("unknown ref storage format '%s'"), env); - repo_fmt->ref_storage_format = ref_format; + if (repo_fmt->version < 0 || + repo_fmt->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) + repo_fmt->ref_storage_format = ref_format; } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { repo_fmt->ref_storage_format = cfg.ref_format; } diff --git a/t/t0001-init.sh b/t/t0001-init.sh index 213d5984b1..6dff8b75f1 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -697,6 +697,15 @@ do git -C refformat rev-parse --show-ref-format >actual && test_cmp expect actual ' + + test_expect_success "reinit repository with GIT_DEFAULT_REF_FORMAT=$format does not change format" ' + test_when_finished "rm -rf refformat" && + git init refformat && + git -C refformat rev-parse --show-ref-format >expect && + GIT_DEFAULT_REF_FORMAT=$format git init refformat && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual + ' done test_expect_success "--ref-format= overrides GIT_DEFAULT_REF_FORMAT" '
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence the default ref format that new repostiories shall be initialized with. While this is the expected behaviour when creating a new repository, it is not when reinitializing a repository: we should retain the ref format currently used by it in that case. This doesn't work correctly right now: $ git init --ref-format=files repo Initialized empty Git repository in /tmp/repo/.git/ $ GIT_DEFAULT_REF_FORMAT=reftable git init repo fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory Instead of retaining the current ref format, the reinitialization tries to reinitialize the repository with the different format. This action fails when git-init(1) tries to write the ".git/refs/heads" stub, which in the context of the reftable backend is always written as a file so that we can detect clients which inadvertently try to access the repo with the wrong ref format. Seems like the protection mechanism works for this case, as well. Fix the issue by ignoring the environment variable in case the repo has already been initialized with a ref storage format. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- setup.c | 4 +++- t/t0001-init.sh | 9 +++++++++ 2 files changed, 12 insertions(+), 1 deletion(-)