diff mbox series

[2/2] doc/gitremote-helpers: match object-format option docs to code

Message ID 20240307085632.GB2072294@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series some transport-helper "option object-format" confusion | expand

Commit Message

Jeff King March 7, 2024, 8:56 a.m. UTC
Git's transport-helper code has always sent "option object-format\n",
and never provided the "true" or "algorithm" arguments. While the
"algorithm" request is something we might need or want to eventually
support, it probably makes sense for now to document the actual
behavior, especially as it has been in place for several years, since
8b85ee4f47 (transport-helper: implement object-format extensions,
2020-05-25).

Signed-off-by: Jeff King <peff@peff.net>
---
As I discussed in patch 1, remote-curl does handle the "true" thing
correctly. And that's really the helper that matters in practice (it's
possible some third party helper is looking for the explicit "true", but
presumably they'd have reported their confusion to the list). So we
could probably just start tacking on the "true" in transport-helper.c
and leave that part of the documentation untouched.

I'm less sure of the specific-algorithm thing, just because it seems
like remote-curl would never make use of it anyway (preferring instead
to match whatever algorithm is used by the http remote). But maybe there
are pending interoperability plans that depend on this?

I guess it would not hurt to leave it in place even if transport-helper
never produces it. On the other hand, any helper which advertises the
"object-format" capability is supposed to support it, and without the
transport-helper side being implemented, I don't know how any helper
program can claim that.

 Documentation/gitremote-helpers.txt | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

brian m. carlson March 7, 2024, 10:20 p.m. UTC | #1
On 2024-03-07 at 08:56:32, Jeff King wrote:
> Git's transport-helper code has always sent "option object-format\n",
> and never provided the "true" or "algorithm" arguments. While the
> "algorithm" request is something we might need or want to eventually
> support, it probably makes sense for now to document the actual
> behavior, especially as it has been in place for several years, since
> 8b85ee4f47 (transport-helper: implement object-format extensions,
> 2020-05-25).
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> As I discussed in patch 1, remote-curl does handle the "true" thing
> correctly. And that's really the helper that matters in practice (it's
> possible some third party helper is looking for the explicit "true", but
> presumably they'd have reported their confusion to the list). So we
> could probably just start tacking on the "true" in transport-helper.c
> and leave that part of the documentation untouched.
> 
> I'm less sure of the specific-algorithm thing, just because it seems
> like remote-curl would never make use of it anyway (preferring instead
> to match whatever algorithm is used by the http remote). But maybe there
> are pending interoperability plans that depend on this?

It was designed to allow indicating that we know how to support both
SHA-1 and SHA-256 and we want one or the other (so we don't need to do
an expensive conversion).  However, if it's not implemented, I agree we
should document what's implemented, and then extend it when interop
comes.
Jeff King March 12, 2024, 7:45 a.m. UTC | #2
On Thu, Mar 07, 2024 at 10:20:16PM +0000, brian m. carlson wrote:

> > As I discussed in patch 1, remote-curl does handle the "true" thing
> > correctly. And that's really the helper that matters in practice (it's
> > possible some third party helper is looking for the explicit "true", but
> > presumably they'd have reported their confusion to the list). So we
> > could probably just start tacking on the "true" in transport-helper.c
> > and leave that part of the documentation untouched.
> > 
> > I'm less sure of the specific-algorithm thing, just because it seems
> > like remote-curl would never make use of it anyway (preferring instead
> > to match whatever algorithm is used by the http remote). But maybe there
> > are pending interoperability plans that depend on this?
> 
> It was designed to allow indicating that we know how to support both
> SHA-1 and SHA-256 and we want one or the other (so we don't need to do
> an expensive conversion).  However, if it's not implemented, I agree we
> should document what's implemented, and then extend it when interop
> comes.

I guess my reservation is that when it _does_ come time to extend, we'll
have to introduce a new capability. The capability "object-format" has a
documented meaning now, and what we send is currently a subset of that
(sort of[1]). If we later start sending an explicit algorithm, then in
theory they're supposed to handle that, too, if they implemented against
the docs.

Whereas if we roll back the explicit-algorithm part of the docs, now we
can't assume any helper claiming "object-format" will understand it. And
we'll need them to say "object-format-extended" or something. That's
both more work, and delays adoption for helpers which implemented what
the current docs say.

So I guess my question was more of: are we thinking this explicit
algorithm thing is coming very soon? If so, it might be worth keeping it
in the docs. But if not, and it's just a hypothetical future, it may be
better to clean things up now. And I ask you as the person who mostly
juggles possible future algorithm plans in his head. ;) Of course if the
answer is some combination of "I don't really remember what the plan
was" and "I don't have time to work on it anytime soon" that's OK, too.

-Peff

[1] In the above I'm really just talking about the explicit-algorithm
    part. The "sort of" is that we claim to send "object-format true"
    but actually just send "object-format". There I'm more inclined to
    just align the docs with practice, as the two are equivalent.
brian m. carlson March 13, 2024, 9:11 p.m. UTC | #3
On 2024-03-12 at 07:45:13, Jeff King wrote:
> So I guess my question was more of: are we thinking this explicit
> algorithm thing is coming very soon? If so, it might be worth keeping it
> in the docs. But if not, and it's just a hypothetical future, it may be
> better to clean things up now. And I ask you as the person who mostly
> juggles possible future algorithm plans in his head. ;) Of course if the
> answer is some combination of "I don't really remember what the plan
> was" and "I don't have time to work on it anytime soon" that's OK, too.

The answer is that I'm not planning on doing the SHA-1/SHA-256 interop
work except as part of my employment, since I'm kinda out of energy in
that area and it's a lot of work, and I don't believe that my employer
is planning to have me do that anytime soon.  Thus, if nobody else is
planning on doing it in short order, it probably won't be getting done.

I know Eric was working on some of the interop work, so perhaps he can
speak to whether he's planning on working in this area soonish.
Eric W. Biederman March 14, 2024, 12:47 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2024-03-12 at 07:45:13, Jeff King wrote:
>> So I guess my question was more of: are we thinking this explicit
>> algorithm thing is coming very soon? If so, it might be worth keeping it
>> in the docs. But if not, and it's just a hypothetical future, it may be
>> better to clean things up now. And I ask you as the person who mostly
>> juggles possible future algorithm plans in his head. ;) Of course if the
>> answer is some combination of "I don't really remember what the plan
>> was" and "I don't have time to work on it anytime soon" that's OK, too.

Given the rest of the conversation I thought something about the
object-format option was going to depend upon work that I am doing.

Reading up on object-format this seems to be something that should
be sorted out now.

Fundamentally the object-format code is about a client representing a
SHA256 repository encountering a server representing a SHA1 repository
and detecting and handling that case cleanly.  Or the other way around.

This is a current concern as SHA1 and SHA256 repositories are both
currently supported.

The only future concern is what happens when a client for a SHA256
repository encounters a server serving a SHA1 repository and wants to
switch into a compatibility mode, before it starts sending something
that will confuse the server.

That said I think a lot of think we do a lot of that today in practice
by simply detecting the length of the hash.

In general the plan is that all of the multiple hash interop work
happens on the client and the server worries about handling a single
hash efficiently.

That said I haven't worked with the git protocol so I don't know
what is needed in detail for a client to figure out what the server
is speaking and cleanly abort, or quickly switch to the servers
language.  Jeff do you have any insight into that?

> The answer is that I'm not planning on doing the SHA-1/SHA-256 interop
> work except as part of my employment, since I'm kinda out of energy in
> that area and it's a lot of work, and I don't believe that my employer
> is planning to have me do that anytime soon.  Thus, if nobody else is
> planning on doing it in short order, it probably won't be getting done.
>
> I know Eric was working on some of the interop work, so perhaps he can
> speak to whether he's planning on working in this area soonish.

Soon-ish.

Getting the SHA1/SHA256 interop working is something that I feel pretty
strongly about.  So once I can set aside some time I am going to
push forward with it.

I have code doing with pretty much everything else working and tested
except the actual interop working at this point.  That is I have code
for bi-hash repositories.

Breaking everything into small enough chunks that people don't feel
daunted looking at the code has been a bit of a challenge.  My current
plan is to write some ``unit tests'' (that is tests that test a single
abstraction in the code at a time), so I can feel comfortable feeding
things in much smaller pieces.

Once the core infrastructure is merged for bi-hash repositories then
I plan to work on the actual interop between the repositories.  With
the challenging technical problem I have been looking at is quickly
and efficiently writing a pack in the repository hash, while
retaining a translation to it's original hash.

Once the translation is done the rest is fiddly bits that should come
fairly quickly and should be comparatively easy to review.  AKA things
like the client detecting the other end is using a different hash
algorithm and using that information to send heads in a format the
server understands.


That said I will be happy to help sort out object-format now.
That is maintenance and it has no dependencies that I am aware
of.


...

That said.  Sorting out object-format has no dependencies on anything
else I have been doing.  I will be happy to help sort that out right
now.

Eric
Junio C Hamano March 14, 2024, 3:33 p.m. UTC | #5
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> The answer is that I'm not planning on doing the SHA-1/SHA-256 interop
> work except as part of my employment, since I'm kinda out of energy in
> that area and it's a lot of work, and I don't believe that my employer
> is planning to have me do that anytime soon.

It is sad to hear that it is depriotised, even though it is one of
the larger areas with high importance for the longer term.  Thank
you very much for the progress in this area so far..

> Thus, if nobody else is
> planning on doing it in short order, it probably won't be getting done.
>
> I know Eric was working on some of the interop work, so perhaps he can
> speak to whether he's planning on working in this area soonish.
brian m. carlson March 14, 2024, 9:21 p.m. UTC | #6
On 2024-03-14 at 12:47:16, Eric W. Biederman wrote:
> That said I think a lot of think we do a lot of that today in practice
> by simply detecting the length of the hash.

That's only true for the dumb HTTP protocol.  Everything else should not
do that and we specifically want to avoid doing that, since we may very
well end up with SHA-3-256 or another 256-bit hash instead of SHA-256 if
there are sufficient cryptographic advances.

In fact, if we're going to support reftables via the dumb HTTP protocol,
then we should add some sort of capability advertisement that tells the
remote side what functionality is supported, and simply specify the hash
in that format.
brian m. carlson March 14, 2024, 9:54 p.m. UTC | #7
On 2024-03-14 at 15:33:24, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > The answer is that I'm not planning on doing the SHA-1/SHA-256 interop
> > work except as part of my employment, since I'm kinda out of energy in
> > that area and it's a lot of work, and I don't believe that my employer
> > is planning to have me do that anytime soon.
> 
> It is sad to hear that it is depriotised, even though it is one of
> the larger areas with high importance for the longer term.  Thank
> you very much for the progress in this area so far..

I don't want to claim that my employer is not prioritizing SHA-256, it's
just that the focus right now is not having me write the interop code.
Other work is ongoing which has and probably will in the future result
in Git contributions, although not necessarily directly related to the
interop work.  Some of our work porting away from from libgit2 to Git to
get better SHA-256 support has resulted in us writing new features which
we upstream.

As far as my personal contributions, I'm focusing on other, smaller
Git-related things right now[0], and I'm just writing less code in C
(and effectively no code in C other than Git).  And I'm also doing other
things in my life which leave me less time to work on Git.

[0] Including, hopefully soon, some credential helper improvements.
Eric W. Biederman March 15, 2024, 3:41 p.m. UTC | #8
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2024-03-14 at 12:47:16, Eric W. Biederman wrote:
>> That said I think a lot of think we do a lot of that today in practice
>> by simply detecting the length of the hash.
>
> That's only true for the dumb HTTP protocol.  Everything else should not
> do that and we specifically want to avoid doing that, since we may very
> well end up with SHA-3-256 or another 256-bit hash instead of SHA-256 if
> there are sufficient cryptographic advances.

My apologies.  I thought Jeff King was reporting that object-format
extension did not work, and that had been masked by a test.

I see you saying and a quick grep through the code supports that the
object-format extension is implemented, and that the primary problem
is that the Documentation varies slightly from what is implemented.


Looking at the code I am left with the question:
 Is the object-format extension properly implemented in all cases?


If the object-format extension is properly implemented such that a
client and server mismatch can be detected I am for just Documenting
what is currently implemented and calling it good.

The reason for that is
Documentation/technical/hash-function-transition.txt does not expect
servers to support more than hash function.  I don't have a perspective
that differs.  So detecting what the client and server support and
failing if they differ should be good enough.



I am concerned that the current code may not report it's hash function
in all of the cases it needs to, to be able to detect a mismatch.

I look at commit 8b85ee4f47aa ("transport-helper: implement
object-format extensions") and I don't see anything that generates
":object-format=" after it has been asked for except the code
in remote-curl.c added in commit 7f60501775b2 ("remote-curl: implement
object-format extensions").

Maybe I am mistaken but a name like remote-curl has me strongly
suspecting that it does not cover all of the cases that git supports
that implement protocol v2.

I think I see some omissions in updating the protocol v2 Documentation.


Can some folks who understand how git protocol v2 is implemented better
that I do, tell me if I am seeing things or if it indeed looks like
there are some omissions in the object-format implementation?

Eric
Jeff King March 16, 2024, 6:04 a.m. UTC | #9
On Fri, Mar 15, 2024 at 10:41:24AM -0500, Eric W. Biederman wrote:

> I see you saying and a quick grep through the code supports that the
> object-format extension is implemented, and that the primary problem
> is that the Documentation varies slightly from what is implemented.
> 
> 
> Looking at the code I am left with the question:
>  Is the object-format extension properly implemented in all cases?
> 
> 
> If the object-format extension is properly implemented such that a
> client and server mismatch can be detected I am for just Documenting
> what is currently implemented and calling it good.
> 
> The reason for that is
> Documentation/technical/hash-function-transition.txt does not expect
> servers to support more than hash function.  I don't have a perspective
> that differs.  So detecting what the client and server support and
> failing if they differ should be good enough.

AFAIK the code all works correctly, and there are no cases where we fail
to notice a mismatch. The two code/doc inconsistencies (and bearing in
mind this is for the transport-helper protocol, not the v2 protocol
itself) are:

  - the docs say "object-format true", but the code just says
    "object-format". They're semantically equivalent, so it's just a
    minor syntax issue.

  - the docs say that Git may write "object-format sha256" to the
    helper, but the code will never do that.

So my big question is for the second case: is that something that we'll
need to be able to do (possibly to support interop, but possibly for
some other case)? If not, we should probably just fix the docs. If so,
then we need to either fix the code, or accept that we'll need to add a
new capability/extension later.

> I am concerned that the current code may not report it's hash function
> in all of the cases it needs to, to be able to detect a mismatch.
> 
> I look at commit 8b85ee4f47aa ("transport-helper: implement
> object-format extensions") and I don't see anything that generates
> ":object-format=" after it has been asked for except the code
> in remote-curl.c added in commit 7f60501775b2 ("remote-curl: implement
> object-format extensions").
> 
> Maybe I am mistaken but a name like remote-curl has me strongly
> suspecting that it does not cover all of the cases that git supports
> that implement protocol v2.

That all sounds right. We are talking just about the transport-helper
protocol here, where Git speaks to a separate program that actually
contacts the remote server. And the main helper we ship is remote-curl
(which handles https, http, etc). Everything else is linked directly and
does not need to use a separate process (we use a separate process to
avoid linking curl, openssl, etc into the main Git binary).

We do ship remote-fd and remote-ext, but they don't support most options
(and probably don't need to, because they're mostly pass-throughs that
just use the "connect" feature).

The other major helpers people tend to use are adapters to other version
control systems (e.g., remote-hg, cinnabar). We don't ship any of those
ourselves. They'll obviously need to learn about the transport-helper
object-format capability before they're ready to handle sha256 repos,
but I suspect that works has not really started.

> I think I see some omissions in updating the protocol v2 Documentation.

If you mean from the commits listed above, I don't think so; they are
just touching the transport-helper protocol, not the v2 wire protocol.

-Peff
Eric W. Biederman March 17, 2024, 8:47 p.m. UTC | #10
Jeff King <peff@peff.net> writes:

> On Fri, Mar 15, 2024 at 10:41:24AM -0500, Eric W. Biederman wrote:
>
>> I see you saying and a quick grep through the code supports that the
>> object-format extension is implemented, and that the primary problem
>> is that the Documentation varies slightly from what is implemented.
>> 
>> 
>> Looking at the code I am left with the question:
>>  Is the object-format extension properly implemented in all cases?
>> 
>> 
>> If the object-format extension is properly implemented such that a
>> client and server mismatch can be detected I am for just Documenting
>> what is currently implemented and calling it good.
>> 
>> The reason for that is
>> Documentation/technical/hash-function-transition.txt does not expect
>> servers to support more than hash function.  I don't have a perspective
>> that differs.  So detecting what the client and server support and
>> failing if they differ should be good enough.
>
> AFAIK the code all works correctly, and there are no cases where we fail
> to notice a mismatch. The two code/doc inconsistencies (and bearing in
> mind this is for the transport-helper protocol, not the v2 protocol
> itself)

Thank you for the explanation of the transport-helper vs the v2 helper
protocol explanation below.

> are:
>
>   - the docs say "object-format true", but the code just says
>     "object-format". They're semantically equivalent, so it's just a
>     minor syntax issue.

I am a bit confused on this point after having read the code.  It
appears that when "object-format" is sent remote-curl
experiences "object-format true".

Assuming remote-curl is the only remote helper that currently implements
the object-format capability.  I think we ant to fix transport-helper to
send "object-format true" just to be consistent with all of the other
options.

Among other things that will allow using the set_helper_option helper
function, and it will generally keep the code robust as then the code
doesn't develop a special case for the one option that doesn't take an
option value.

>   - the docs say that Git may write "object-format sha256" to the
>     helper, but the code will never do that.

It looks like remote_curl will get confused in that case when it
processes "object-format sha256" as well.  As it stores that value in
options.hash_algo, which in all other cases is used to store what the
hash algorithm computed from the remote side.

> So my big question is for the second case: is that something that we'll
> need to be able to do (possibly to support interop, but possibly for
> some other case)? If not, we should probably just fix the docs. If so,
> then we need to either fix the code, or accept that we'll need to add a
> new capability/extension later.

Since this is the transport helper understanding this enough
to give a good reply is challenging.

As I read things the happy path for most connections is either going to
turn into git protocol v2, git-fast-export, or git-fast-import.
Unless I am misunderstanding something all of those will bypass
the code paths the remote helper object-format capability affects.
It is only when the remote helper send "fallback" during connect
that the remote helper format capability might be used.

The only practical need I can imagine for this is if the client
is going to send oids before asking the remote side what it's oids
are.  The only case I can imagine doing this is the initial push
of a repository.

My sense is that unless we can find a current case that was overlooked
during the initial conversion we should remove "object-format
<hash-function>" support from the code and the documentation.

Any new cases that are not currently implemented will almost
certainly be handled by the "smart" protocols.

Looking at the code in transport-helper.c:push_refs it appears the one
use case I can think of is explicitly not supported. The code says:
>	if (!remote_refs) {
>		fprintf(stderr,
>			_("No refs in common and none specified; doing nothing.\n"
>			  "Perhaps you should specify a branch.\n"));
>		return 0;
>	}


>> I think I see some omissions in updating the protocol v2 Documentation.
>
> If you mean from the commits listed above, I don't think so; they are
> just touching the transport-helper protocol, not the v2 wire protocol.

This just proves I haven't dug through these protocol bits enough to
have a good understanding of how they operate yet.

So I think at the end of the day we just want to do something
the diff below.

Mostly it deletes and simplifies code, but I found one case where
a malfunctioning remote helper could confuse us, so I added a check
to ensure :object-format is sent when we expect it to be sent.

Does that jive with how you are reading the situation?

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index ed8da428c98b..47e5bb2cc925 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -542,7 +542,7 @@ set by Git if the remote helper has the 'option' capability.
 	transaction.  If successful, all refs will be updated, or none will.  If the
 	remote side does not support this capability, the push will fail.
 
-'option object-format' {'true'|algorithm}::
+'option object-format' {'true'}::
 	If 'true', indicate that the caller wants hash algorithm information
 	to be passed back from the remote.  This mode is used when fetching
 	refs.
diff --git a/remote-curl.c b/remote-curl.c
index 1161dc7fed68..6f4cb3467458 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -213,12 +213,8 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "object-format")) {
 		int algo;
 		options.object_format = 1;
-		if (strcmp(value, "true")) {
-			algo = hash_algo_by_name(value);
-			if (algo == GIT_HASH_UNKNOWN)
-				die("unknown object format '%s'", value);
-			options.hash_algo = &hash_algos[algo];
-		}
+		if (strcmp(value, "true"))
+			die("unknown object format '%s'", value);
 		return 0;
 	} else {
 		return 1 /* unsupported */;
diff --git a/transport-helper.c b/transport-helper.c
index b660b7942f9f..e648f136287d 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1206,13 +1206,13 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	struct ref **tail = &ret;
 	struct ref *posn;
 	struct strbuf buf = STRBUF_INIT;
+	bool received_object_format = false;
 
 	data->get_refs_list_called = 1;
 	helper = get_helper(transport);
 
 	if (data->object_format) {
-		write_str_in_full(helper->in, "option object-format\n");
-		if (recvline(data, &buf) || strcmp(buf.buf, "ok"))
+		if (set_helper_option(transport, "object-format", "true"))
 			exit(128);
 	}
 
@@ -1236,9 +1236,13 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 					die(_("unsupported object format '%s'"),
 					    value);
 				transport->hash_algo = &hash_algos[algo];
+				received_hash_algo = true;
 			}
 			continue;
 		}
+		else if (data->object_format && !received_object_format) {
+			die(_("missing :object-format"));
+		}
 
 		eov = strchr(buf.buf, ' ');
 		if (!eov)

Eric
Jeff King March 18, 2024, 8:49 a.m. UTC | #11
On Sun, Mar 17, 2024 at 03:47:18PM -0500, Eric W. Biederman wrote:

> >   - the docs say "object-format true", but the code just says
> >     "object-format". They're semantically equivalent, so it's just a
> >     minor syntax issue.
> 
> I am a bit confused on this point after having read the code.  It
> appears that when "object-format" is sent remote-curl
> experiences "object-format true".

Right, this is due to this code in remote-curl.c:

                  } else if (skip_prefix(buf.buf, "option ", &arg)) {
                          char *value = strchr(arg, ' ');
                          int result;
  
                          if (value)
                                  *value++ = '\0';
                          else
                                  value = "true";

which goes way back to the beginning of remote-curl, even though I don't
think Git ever sends a value-less option. Anyway, that's presumably why
nobody noticed that "option object-format" is unusual.

> Assuming remote-curl is the only remote helper that currently implements
> the object-format capability.  I think we ant to fix transport-helper to
> send "object-format true" just to be consistent with all of the other
> options.

We could be breaking third-party helpers that we don't know about. Of
course, those helpers would have to have ignored the documentation. And
I suspect they simply don't exist, or somebody would have showed up and
asked about it (coupled with how new and relatively obscure the hash
algorithm work has been so far).

So maybe we can get away with fixing it now. We should definitely break
it out into its own patch so we can decide independently, though.

> >   - the docs say that Git may write "object-format sha256" to the
> >     helper, but the code will never do that.
> 
> It looks like remote_curl will get confused in that case when it
> processes "object-format sha256" as well.  As it stores that value in
> options.hash_algo, which in all other cases is used to store what the
> hash algorithm computed from the remote side.

Yeah, it ends up in the same variable. I _suspect_ it would simply be
overwritten by the remote repo's idea of the hash. I'm not sure if
that's a bug (if the specific algorithm given by the main process is
supposed to take precedence) or a feature (if it's just a suggestion,
and then the helper says "tough luck, the remote is using sha1"). It's
hard to tell because Git never sends it. ;)

> As I read things the happy path for most connections is either going to
> turn into git protocol v2, git-fast-export, or git-fast-import.
> Unless I am misunderstanding something all of those will bypass
> the code paths the remote helper object-format capability affects.
> It is only when the remote helper send "fallback" during connect
> that the remote helper format capability might be used.

Yeah, I suspect that is true for remote-curl. It may not be for other
helpers which don't support "connect".

> The only practical need I can imagine for this is if the client
> is going to send oids before asking the remote side what it's oids
> are.  The only case I can imagine doing this is the initial push
> of a repository.

Hmm, I _think_ we are OK there in practice. Even if there are no refs on
the remote repo (running git-receive-pack), it will still issue a
capability line with its object-format. And then the helper (say,
remote-curl) will report that back to the caller (git-push) who might
say "hey, wait, there's a mismatch". And indeed, it seems to work in
practice with remote-curl, where the push yields:

  fatal: the receiving end does not support this repository's hash algorithm

In theory I suppose Git could directly issue a "push" command to the
helper (which would then specify oids along with refs to push) without
ever issuing "list for-push" (which is what causes the helper to contact
the remote to discover and report back the object format). But it
doesn't do that, and I don't see why it ever would.

This is all neglecting dumb protocols that don't even know how to figure
out the object format of the other side, but I think that's an
orthogonal problem. Either it remains unsolved, or whatever solution we
come up with then gets pushed back over the transport-helper protocol in
the same way.

> My sense is that unless we can find a current case that was overlooked
> during the initial conversion we should remove "object-format
> <hash-function>" support from the code and the documentation.

Yeah, if you don't have any plans to use it for interop work, then I
think we can declare it useless. I'll rework my patch series a bit to
remove the useless sending-side code, and then add a patch on top to
switch the "true" syntax as discussed above.

> Looking at the code in transport-helper.c:push_refs it appears the one
> use case I can think of is explicitly not supported. The code says:
> >	if (!remote_refs) {
> >		fprintf(stderr,
> >			_("No refs in common and none specified; doing nothing.\n"
> >			  "Perhaps you should specify a branch.\n"));
> >		return 0;
> >	}

That only triggers if you didn't ask to push anything. You might have a
sha256 ref locally and say "git push origin my-branch", and then we'd
need to communicate the ref/oid combo for my-branch to the helper. But
as above, I think by that point the helper will have discovered and
reported back the object-format to push.

> Mostly it deletes and simplifies code, but I found one case where
> a malfunctioning remote helper could confuse us, so I added a check
> to ensure :object-format is sent when we expect it to be sent.

That's probably a reasonable thing to check. We should update the docs
to indicate that it's required to send back ":object-format" if the
helper negotiated that capability. I'll add a patch to do that.

> Does that jive with how you are reading the situation?

Yep, I think I have a good sense how to proceed. It may be a day or so
before I produce a series. Thanks for the discussion!

-Peff
diff mbox series

Patch

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 07c8439a6f..12dffbf383 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -542,13 +542,10 @@  set by Git if the remote helper has the 'option' capability.
 	transaction.  If successful, all refs will be updated, or none will.  If the
 	remote side does not support this capability, the push will fail.
 
-'option object-format' {'true'|algorithm}::
-	If 'true', indicate that the caller wants hash algorithm information
+'option object-format'::
+	Indicate that the caller wants hash algorithm information
 	to be passed back from the remote.  This mode is used when fetching
 	refs.
-+
-If set to an algorithm, indicate that the caller wants to interact with
-the remote side using that algorithm.
 
 SEE ALSO
 --------