Message ID | cover.1580413221.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
Headers | show |
Series | fix inconsistent uses of the_repo in parse_object()'s call chain | expand |
Matheus Tavares <matheus.bernardino@usp.br> writes: > Note: I also tried to replace some uses of the_hash_algo with the struct > git_hash_algo from the received repository, for consitency. (In > practice, I'm not sure if this is very useful right now, but maybe it > will be relevant for the change to SHA256?) Still, many functions in > parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo. > Since changing this would require a much bigger operation, I decided to > leave it as is, for now. Passing a repo instance throughout the callchain and redusing the use of the_repo would be a worthwhile longer term clean-up. Moving off of the_hash_algo to repo->hash_algo is, too, and it is very much relevant to the "newhash" work Brian (CCed) has been working on. You two do not want to step on each other's toes. Please coordinate. Thanks.
On 2020-01-30 at 21:26:02, Junio C Hamano wrote: > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > Note: I also tried to replace some uses of the_hash_algo with the struct > > git_hash_algo from the received repository, for consitency. (In > > practice, I'm not sure if this is very useful right now, but maybe it > > will be relevant for the change to SHA256?) Still, many functions in > > parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo. > > Since changing this would require a much bigger operation, I decided to > > leave it as is, for now. > > Passing a repo instance throughout the callchain and redusing the > use of the_repo would be a worthwhile longer term clean-up. Yes, I think this would be a nice change to see. > Moving off of the_hash_algo to repo->hash_algo is, too, and it is > very much relevant to the "newhash" work Brian (CCed) has been > working on. You two do not want to step on each other's toes. > Please coordinate. This series is fine from my end. I think it provides a nice improvement that I can take advantage of in the future, so I don't see a reason to hold it up on my account. The few patches I have in this area would be easy to rebase or adjust.
On Thu, Jan 30, 2020 at 05:32:16PM -0300, Matheus Tavares wrote: > The motivation for this patchset is another series I'm working on, to > make git-grep stop adding submodules' odbs to the alternates list. With > that, parse_object() will have to be called with the subdmodules' struct > repository. But it seemed that this function doesn't pass on the > received repo to some inner calls, which, instead, always use > the_repository. This series seeks to fix these inconsistencies. I read over the patches and they all seem sensible. There may be spots you missed where these functions are subtly using the_repository under the hood (or where you used the_repository but could have used some repository pointer tucked away in a struct). But even if that is the case, it seems clear that all of the changes are going in the correct direction, and we can continue to iterate on top. > Note: I also tried to replace some uses of the_hash_algo with the struct > git_hash_algo from the received repository, for consitency. (In > practice, I'm not sure if this is very useful right now, but maybe it > will be relevant for the change to SHA256?) Still, many functions in > parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo. > Since changing this would require a much bigger operation, I decided to > leave it as is, for now. I'm not sure I look forward to a day where oid_to_hex() needs to take an extra parameter, just because it will make it more annoying to use. But we'll have to go there eventually unless we plan to forbid mixing of repositories with different hashes within the same process. The hash transition document says: To convert recorded submodule pointers, you need to have the converted submodule repository in place. The translation table of the submodule can be used to look up the new hash. which I take to mean that the local copy of the submodule needs to match the superproject hash (this says nothing about what the upstream owner of the submodule wants to do; we'd be translating on the fly to the new hash in the local repo). So using the_hash_algo would work either way. I don't think we're particularly interested in supporting multiple unrelated repositories within the same process. While that would be convenient for some cases (e.g., you have a million repositories and want to look at all of their trees without creating a million processes), I don't think it's a goal anyone is really working towards with this "struct repository" layer. > Note II: Despite receiving a repo through the apply_state struct, > apply.c:apply_binary() call functions which uses the_repository > internally. Because of that, I used the_hash_algo in this function in > patch 6. Should I change it to use apply_state->repo->hash_algo > anyway? I think this follows my "it's OK for now because we're going in the right direction from above". I.e., we probably do eventually want to use apply_state->repo->hash_algo, just like we probably do eventually want all those functions to take a repository argument. But by even using the_hash_algo, you've at least marked the spot for later fixing (the very final step of this long process will be eradicating all of the_hash_algo and the_repository from the bottom of the call-stack on up). -Peff
Hi, Peff Thanks for the great feedback. On Fri, Jan 31, 2020 at 6:03 AM Jeff King <peff@peff.net> wrote: > [...] > The hash transition document says: > > To convert recorded submodule pointers, you need to have the converted > submodule repository in place. The translation table of the submodule > can be used to look up the new hash. > > which I take to mean that the local copy of the submodule needs to match > the superproject hash (this says nothing about what the upstream owner > of the submodule wants to do; we'd be translating on the fly to the new > hash in the local repo). So using the_hash_algo would work either way. > > I don't think we're particularly interested in supporting multiple > unrelated repositories within the same process. While that would be > convenient for some cases (e.g., you have a million repositories and > want to look at all of their trees without creating a million > processes), I don't think it's a goal anyone is really working towards > with this "struct repository" layer. Thanks for these explanations. One thing that left me thinking, though, is about changing the_hash_algo to r->hash_algo (not only in oid_to_hex()). I previously thought this would eventually be required for the hash transition. But if I understood your comment correctly, it might not be necessary since the submodule should always match the superproject hash. And, as you mentioned, supporting multiple unrelated repos [in a single process] is not one of the main goals, as well. So wouldn't keep using the_hash_algo be OK?
On Sat, Feb 01, 2020 at 01:44:04AM -0300, Matheus Tavares Bernardino wrote: > > I don't think we're particularly interested in supporting multiple > > unrelated repositories within the same process. While that would be > > convenient for some cases (e.g., you have a million repositories and > > want to look at all of their trees without creating a million > > processes), I don't think it's a goal anyone is really working towards > > with this "struct repository" layer. > > Thanks for these explanations. One thing that left me thinking, > though, is about changing the_hash_algo to r->hash_algo (not only in > oid_to_hex()). I previously thought this would eventually be required > for the hash transition. But if I understood your comment correctly, > it might not be necessary since the submodule should always match the > superproject hash. And, as you mentioned, supporting multiple > unrelated repos [in a single process] is not one of the main goals, as > well. So wouldn't keep using the_hash_algo be OK? Yes, by the reasoning in my message, it wouldn't be a big problem. But I literally hadn't thought about it until writing that, so I may either be missing something, or there may be plans from other folks working on the hash transition that contradict that. Even if we don't care too much about it for now, though, it still seems like a step in the right direction. -Peff