Message ID | a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | clone: fix init of refdb with wrong object format | expand |
Patrick Steinhardt <ps@pks.im> writes: > We're about to change git-clone(1) so that we set up the reference > database at a later point. This change will cause git-remote-curl(1) to > not detect the repository anymore due to "HEAD" not having been created > yet at the time it spawns, and thus causes it to error out once it is > asked to fetch the references. > > We can address this issue by trying to re-discover the Git repository in > case none was detected at startup time. With this change, the clone will > look as following: > > 1. git-clone(1) sets up the initial repository, excluding the > reference database. > > 2. git-clone(1) spawns git-remote-curl(1), which will be unable to > detect the repository due to a missing "HEAD". > > 3. git-clone(1) asks git-remote-curl(1) to list remote references. > This works just fine as this step does not require a local > repository > > 4. git-clone(1) creates the reference database as it has now learned > about the object format. Sorry, but I am not sure I understand this step. I assume you mean by "the object format" what hash function is used to index the objects (which can be learned from the remote "origin" in step 2 and we can choose to use the one they use), not what ref backend is used (which is purely a local matter and we do not need to know what is used at the "origin"). Why do we need to wait initializing ref backend until we learn what hash is being in use? > 5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile. > The latter notices that it doesn't have a repository available, but > it now knows to try and re-discover it. > > If the re-discovery succeeds in the last step we can continue with the > clone. OK.
On Sat, Dec 09, 2023 at 08:09:32AM +0900, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > We're about to change git-clone(1) so that we set up the reference > > database at a later point. This change will cause git-remote-curl(1) to > > not detect the repository anymore due to "HEAD" not having been created > > yet at the time it spawns, and thus causes it to error out once it is > > asked to fetch the references. > > > > We can address this issue by trying to re-discover the Git repository in > > case none was detected at startup time. With this change, the clone will > > look as following: > > > > 1. git-clone(1) sets up the initial repository, excluding the > > reference database. > > > > 2. git-clone(1) spawns git-remote-curl(1), which will be unable to > > detect the repository due to a missing "HEAD". > > > > 3. git-clone(1) asks git-remote-curl(1) to list remote references. > > This works just fine as this step does not require a local > > repository > > > > 4. git-clone(1) creates the reference database as it has now learned > > about the object format. > > Sorry, but I am not sure I understand this step. I assume you mean > by "the object format" what hash function is used to index the > objects (which can be learned from the remote "origin" in step 2 and > we can choose to use the one they use), not what ref backend is used > (which is purely a local matter and we do not need to know what is > used at the "origin"). Yes, exactly. I'm never quite sure whether I should be saying "hash function" or "object format". I'll convert the message to say "hash function" instead to clarify. > Why do we need to wait initializing ref backend until we learn what > hash is being in use? This is because of the reftable backend. With the files backend it never mattered much, because we do not encode the object format anywhere. But with the reftable backend we do indeed encode the object format in the tables' header, so it's important to initialize it with the correct format right from the start. I'll amend the commit message. Patrick
diff --git a/remote-curl.c b/remote-curl.c index ef05752ca5..fc29757b65 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -1564,8 +1564,11 @@ int cmd_main(int argc, const char **argv) if (buf.len == 0) break; if (starts_with(buf.buf, "fetch ")) { - if (nongit) - die(_("remote-curl: fetch attempted without a local repo")); + if (nongit) { + setup_git_directory_gently(&nongit); + if (nongit) + die(_("remote-curl: fetch attempted without a local repo")); + } parse_fetch(&buf); } else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {
We're about to change git-clone(1) so that we set up the reference database at a later point. This change will cause git-remote-curl(1) to not detect the repository anymore due to "HEAD" not having been created yet at the time it spawns, and thus causes it to error out once it is asked to fetch the references. We can address this issue by trying to re-discover the Git repository in case none was detected at startup time. With this change, the clone will look as following: 1. git-clone(1) sets up the initial repository, excluding the reference database. 2. git-clone(1) spawns git-remote-curl(1), which will be unable to detect the repository due to a missing "HEAD". 3. git-clone(1) asks git-remote-curl(1) to list remote references. This works just fine as this step does not require a local repository 4. git-clone(1) creates the reference database as it has now learned about the object format. 5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile. The latter notices that it doesn't have a repository available, but it now knows to try and re-discover it. If the re-discovery succeeds in the last step we can continue with the clone. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- remote-curl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)