[3/7] remote-curl: rediscover repository when fetching refs

Message ID	a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im (mailing list archive)
State	Superseded
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="hNt2LUmZ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="nSSuuZbQ" Feedback-ID: i197146af:Fastmail Date: Wed, 6 Dec 2023 13:39:57 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Subject: [PATCH 3/7] remote-curl: rediscover repository when fetching refs Message-ID: <a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im> References: <cover.1701863960.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="a1kFyGJ5JJjEf8KA" Content-Disposition: inline In-Reply-To: <cover.1701863960.git.ps@pks.im>
Series	clone: fix init of refdb with wrong object format \| expand [0/7] clone: fix init of refdb with wrong object format [1/7] setup: extract function to create the refdb [2/7] setup: allow skipping creation of the refdb [3/7] remote-curl: rediscover repository when fetching refs [4/7] builtin/clone: fix bundle URIs with mismatching object formats [5/7] builtin/clone: set up sparse checkout later [6/7] builtin/clone: skip reading HEAD when retrieving remote [7/7] builtin/clone: create the refdb with the correct object format

Message ID

a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im (mailing list archive)

State

Superseded

Headers

Feedback-ID: i197146af:Fastmail
Date: Wed, 6 Dec 2023 13:39:57 +0100
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 3/7] remote-curl: rediscover repository when fetching refs
Message-ID: 
 <a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im>
References: <cover.1701863960.git.ps@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="a1kFyGJ5JJjEf8KA"
Content-Disposition: inline
In-Reply-To: <cover.1701863960.git.ps@pks.im>

Series

clone: fix init of refdb with wrong object format | expand

Commit Message

Patrick Steinhardt Dec. 6, 2023, 12:39 p.m. UTC

We're about to change git-clone(1) so that we set up the reference
database at a later point. This change will cause git-remote-curl(1) to
not detect the repository anymore due to "HEAD" not having been created
yet at the time it spawns, and thus causes it to error out once it is
asked to fetch the references.

We can address this issue by trying to re-discover the Git repository in
case none was detected at startup time. With this change, the clone will
look as following:

  1. git-clone(1) sets up the initial repository, excluding the
     reference database.

  2. git-clone(1) spawns git-remote-curl(1), which will be unable to
     detect the repository due to a missing "HEAD".

  3. git-clone(1) asks git-remote-curl(1) to list remote references.
     This works just fine as this step does not require a local
     repository

  4. git-clone(1) creates the reference database as it has now learned
     about the object format.

  5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile.
     The latter notices that it doesn't have a repository available, but
     it now knows to try and re-discover it.

If the re-discovery succeeds in the last step we can continue with the
clone.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 remote-curl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Dec. 8, 2023, 11:09 p.m. UTC | #1

Patrick Steinhardt <ps@pks.im> writes:

> We're about to change git-clone(1) so that we set up the reference
> database at a later point. This change will cause git-remote-curl(1) to
> not detect the repository anymore due to "HEAD" not having been created
> yet at the time it spawns, and thus causes it to error out once it is
> asked to fetch the references.
>
> We can address this issue by trying to re-discover the Git repository in
> case none was detected at startup time. With this change, the clone will
> look as following:
>
>   1. git-clone(1) sets up the initial repository, excluding the
>      reference database.
>
>   2. git-clone(1) spawns git-remote-curl(1), which will be unable to
>      detect the repository due to a missing "HEAD".
>
>   3. git-clone(1) asks git-remote-curl(1) to list remote references.
>      This works just fine as this step does not require a local
>      repository
>
>   4. git-clone(1) creates the reference database as it has now learned
>      about the object format.

Sorry, but I am not sure I understand this step.  I assume you mean
by "the object format" what hash function is used to index the
objects (which can be learned from the remote "origin" in step 2 and
we can choose to use the one they use), not what ref backend is used
(which is purely a local matter and we do not need to know what is
used at the "origin").  Why do we need to wait initializing ref
backend until we learn what hash is being in use?

>   5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile.
>      The latter notices that it doesn't have a repository available, but
>      it now knows to try and re-discover it.
>
> If the re-discovery succeeds in the last step we can continue with the
> clone.

OK.

Patrick Steinhardt Dec. 11, 2023, 11:35 a.m. UTC | #2

On Sat, Dec 09, 2023 at 08:09:32AM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We're about to change git-clone(1) so that we set up the reference
> > database at a later point. This change will cause git-remote-curl(1) to
> > not detect the repository anymore due to "HEAD" not having been created
> > yet at the time it spawns, and thus causes it to error out once it is
> > asked to fetch the references.
> >
> > We can address this issue by trying to re-discover the Git repository in
> > case none was detected at startup time. With this change, the clone will
> > look as following:
> >
> >   1. git-clone(1) sets up the initial repository, excluding the
> >      reference database.
> >
> >   2. git-clone(1) spawns git-remote-curl(1), which will be unable to
> >      detect the repository due to a missing "HEAD".
> >
> >   3. git-clone(1) asks git-remote-curl(1) to list remote references.
> >      This works just fine as this step does not require a local
> >      repository
> >
> >   4. git-clone(1) creates the reference database as it has now learned
> >      about the object format.
> 
> Sorry, but I am not sure I understand this step.  I assume you mean
> by "the object format" what hash function is used to index the
> objects (which can be learned from the remote "origin" in step 2 and
> we can choose to use the one they use), not what ref backend is used
> (which is purely a local matter and we do not need to know what is
> used at the "origin").

Yes, exactly. I'm never quite sure whether I should be saying "hash
function" or "object format". I'll convert the message to say "hash
function" instead to clarify.

> Why do we need to wait initializing ref backend until we learn what
> hash is being in use?

This is because of the reftable backend. With the files backend it never
mattered much, because we do not encode the object format anywhere. But
with the reftable backend we do indeed encode the object format in the
tables' header, so it's important to initialize it with the correct
format right from the start.

I'll amend the commit message.

Patrick

diff --git a/remote-curl.c b/remote-curl.c
index ef05752ca5..fc29757b65 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1564,8 +1564,11 @@  int cmd_main(int argc, const char **argv)
 		if (buf.len == 0)
 			break;
 		if (starts_with(buf.buf, "fetch ")) {
-			if (nongit)
-				die(_("remote-curl: fetch attempted without a local repo"));
+			if (nongit) {
+				setup_git_directory_gently(&nongit);
+				if (nongit)
+					die(_("remote-curl: fetch attempted without a local repo"));
+			}
 			parse_fetch(&buf);
 
 		} else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {

[3/7] remote-curl: rediscover repository when fetching refs

Commit Message

Comments

Patch