Message ID | 20200513005424.81369-35-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SHA-256 part 2/3: protocol functionality | expand |
On Wed, 13 May 2020 at 02:58, brian m. carlson <sandals@crustytoothpaste.net> wrote: > > ls-remote may or may not operate within a repository, and as such will > not have been initialized with the repository's hash algorithm. Even if > it were, the remote side could be using a different algorithm and we > would still want to display those refs properly. Find the hash > algorithm used by the remote side by querying the transport object and > set our hash algorithm accordingly. > > Without this change, if the remote side is using SHA-256, we truncate > the refs to 40 hex characters, since that's the length of the default > hash algorithm (SHA-1). Could we add a test that passes now but would have failed before? > ref = transport_get_remote_refs(transport, &ref_prefixes); > + if (ref) { > + int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); > + repo_set_hash_algo(the_repository, hash_algo); > + } This will modify `the_hash_algo`. Quoting commit 78a6766802 ("Integrate hash algorithm support with repo setup", 2017-11-12): Add a constant, the_hash_algo, which points to the hash_algo structure pointer in the repository global. Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user. The transition plan anticipates that these may be different. We can add an additional element in the future (say, ui_hash_algo) to provide for this case. Don't we violate that here? Is it mostly luck that we can go on to list what we want to list and that we will never write to disk based on `the_hash_algo` being "wrong"(?)? Or am I missing something? Martin
On 2020-05-16 at 11:16:46, Martin Ågren wrote: > On Wed, 13 May 2020 at 02:58, brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > > > ls-remote may or may not operate within a repository, and as such will > > not have been initialized with the repository's hash algorithm. Even if > > it were, the remote side could be using a different algorithm and we > > would still want to display those refs properly. Find the hash > > algorithm used by the remote side by querying the transport object and > > set our hash algorithm accordingly. > > > > Without this change, if the remote side is using SHA-256, we truncate > > the refs to 40 hex characters, since that's the length of the default > > hash algorithm (SHA-1). > > Could we add a test that passes now but would have failed before? The existing tests that call "git ls-remote" actually fail with SHA-256 if we don't do this, specifically "ls-remote works outside repository" in t5512. That's the thing with a lot of this series: our existing test suite is enormously effective at catching these things, but writing a new test is hard because we can't actually instantiate a SHA-256 repository (because then users could, and it's broken until the end of the series). Perhaps unsurprisingly, that's how I found this problem. So while I would love to write a test for this case, I can't without allowing users to corrupt and destroy their data in the mean time (or tacking the final six commits to this series). > > ref = transport_get_remote_refs(transport, &ref_prefixes); > > + if (ref) { > > + int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); > > + repo_set_hash_algo(the_repository, hash_algo); > > + } > > This will modify `the_hash_algo`. Quoting commit 78a6766802 ("Integrate > hash algorithm support with repo setup", 2017-11-12): > > Add a constant, the_hash_algo, which points to the hash_algo structure > pointer in the repository global. Note that this is the hash which is > used to serialize data to disk, not the hash which is used to display > items to the user. The transition plan anticipates that these may be > different. We can add an additional element in the future (say, > ui_hash_algo) to provide for this case. > > Don't we violate that here? Is it mostly luck that we can go on to list > what we want to list and that we will never write to disk based on > `the_hash_algo` being "wrong"(?)? Or am I missing something? We do violate that and we also rely on it never having any effect on our current repository. Unfortunately, as things stand now, we don't support multiple hash algorithms in the same running binary, and we can't until we allow a member of struct object_id to vary based on the hash algorithm. That work is coming in a future series (after we have a fully functioning SHA-256 stage 4 implementation), but at this point, I'm still working through all of the crashes we get from random places where we make assumptions about initializing things, so it's not a straightforward fix. For now, I think this is the best we can do without major additional surgery to the codebase. I'm fine with stating that git ls-remote can read the repository (to parse remotes) but can't write to it, since that's the behavior users will expect anyway. I'll update the commit message to reflect that wart and assumption, since it would be good to document it.
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 6ef519514b..3a4dd12903 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -118,6 +118,10 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) transport->server_options = &server_options; ref = transport_get_remote_refs(transport, &ref_prefixes); + if (ref) { + int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); + repo_set_hash_algo(the_repository, hash_algo); + } if (transport_disconnect(transport)) { UNLEAK(sorting); return 1;
ls-remote may or may not operate within a repository, and as such will not have been initialized with the repository's hash algorithm. Even if it were, the remote side could be using a different algorithm and we would still want to display those refs properly. Find the hash algorithm used by the remote side by querying the transport object and set our hash algorithm accordingly. Without this change, if the remote side is using SHA-256, we truncate the refs to 40 hex characters, since that's the length of the default hash algorithm (SHA-1). Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- builtin/ls-remote.c | 4 ++++ 1 file changed, 4 insertions(+)