mbox series

[0/7] fix inconsistent uses of the_repo in parse_object()'s call chain

Message ID cover.1580413221.git.matheus.bernardino@usp.br (mailing list archive)
Headers show
Series fix inconsistent uses of the_repo in parse_object()'s call chain | expand

Message

Matheus Tavares Jan. 30, 2020, 8:32 p.m. UTC
The motivation for this patchset is another series I'm working on, to
make git-grep stop adding submodules' odbs to the alternates list. With
that, parse_object() will have to be called with the subdmodules' struct
repository. But it seemed that this function doesn't pass on the
received repo to some inner calls, which, instead, always use
the_repository. This series seeks to fix these inconsistencies.

Note: I also tried to replace some uses of the_hash_algo with the struct
git_hash_algo from the received repository, for consitency. (In
practice, I'm not sure if this is very useful right now, but maybe it
will be relevant for the change to SHA256?) Still, many functions in
parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
Since changing this would require a much bigger operation, I decided to
leave it as is, for now.

Note II: Despite receiving a repo through the apply_state struct,
apply.c:apply_binary() call functions which uses the_repository
internally. Because of that, I used the_hash_algo in this function in
patch 6. Should I change it to use apply_state->repo->hash_algo
anyway?

travis build: https://travis-ci.org/matheustavares/git/builds/644022000

Matheus Tavares (7):
  diff: make diff_populate_filespec() honor its repo argument
  cache-tree: use given repo's hash_algo at verify_one()
  pack-check: use given repo's hash_algo at verify_packfile()
  streaming: allow open_istream() to handle any repo
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: allow check_object_signature() to handle any repo

 apply.c                  |  6 +++--
 archive-tar.c            |  6 ++---
 archive-zip.c            |  3 ++-
 builtin/fast-export.c    |  3 ++-
 builtin/index-pack.c     | 10 +++++---
 builtin/mktag.c          |  7 +++--
 builtin/pack-objects.c   |  3 ++-
 builtin/replace.c        |  3 ++-
 builtin/unpack-objects.c |  3 ++-
 cache-tree.c             | 11 +++++---
 cache.h                  |  3 ++-
 convert.c                |  2 +-
 diff.c                   |  2 +-
 diffcore-rename.c        |  4 +--
 dir.c                    |  4 +--
 log-tree.c               |  3 ++-
 object-store.h           |  5 ++--
 object.c                 |  5 ++--
 pack-check.c             | 12 ++++-----
 sha1-file.c              | 55 ++++++++++++++++++++++------------------
 streaming.c              | 28 ++++++++++----------
 streaming.h              |  4 ++-
 22 files changed, 106 insertions(+), 76 deletions(-)

Comments

Junio C Hamano Jan. 30, 2020, 9:26 p.m. UTC | #1
Matheus Tavares <matheus.bernardino@usp.br> writes:

> Note: I also tried to replace some uses of the_hash_algo with the struct
> git_hash_algo from the received repository, for consitency. (In
> practice, I'm not sure if this is very useful right now, but maybe it
> will be relevant for the change to SHA256?) Still, many functions in
> parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> Since changing this would require a much bigger operation, I decided to
> leave it as is, for now.

Passing a repo instance throughout the callchain and redusing the
use of the_repo would be a worthwhile longer term clean-up.  

Moving off of the_hash_algo to repo->hash_algo is, too, and it is
very much relevant to the "newhash" work Brian (CCed) has been
working on.  You two do not want to step on each other's toes.
Please coordinate.

Thanks.
brian m. carlson Jan. 30, 2020, 11:46 p.m. UTC | #2
On 2020-01-30 at 21:26:02, Junio C Hamano wrote:
> Matheus Tavares <matheus.bernardino@usp.br> writes:
> 
> > Note: I also tried to replace some uses of the_hash_algo with the struct
> > git_hash_algo from the received repository, for consitency. (In
> > practice, I'm not sure if this is very useful right now, but maybe it
> > will be relevant for the change to SHA256?) Still, many functions in
> > parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> > Since changing this would require a much bigger operation, I decided to
> > leave it as is, for now.
> 
> Passing a repo instance throughout the callchain and redusing the
> use of the_repo would be a worthwhile longer term clean-up.

Yes, I think this would be a nice change to see.

> Moving off of the_hash_algo to repo->hash_algo is, too, and it is
> very much relevant to the "newhash" work Brian (CCed) has been
> working on.  You two do not want to step on each other's toes.
> Please coordinate.

This series is fine from my end.  I think it provides a nice improvement
that I can take advantage of in the future, so I don't see a reason to
hold it up on my account.  The few patches I have in this area would be
easy to rebase or adjust.
Jeff King Jan. 31, 2020, 9:03 a.m. UTC | #3
On Thu, Jan 30, 2020 at 05:32:16PM -0300, Matheus Tavares wrote:

> The motivation for this patchset is another series I'm working on, to
> make git-grep stop adding submodules' odbs to the alternates list. With
> that, parse_object() will have to be called with the subdmodules' struct
> repository. But it seemed that this function doesn't pass on the
> received repo to some inner calls, which, instead, always use
> the_repository. This series seeks to fix these inconsistencies.

I read over the patches and they all seem sensible. There may be spots
you missed where these functions are subtly using the_repository under
the hood (or where you used the_repository but could have used some
repository pointer tucked away in a struct). But even if that is the
case, it seems clear that all of the changes are going in the correct
direction, and we can continue to iterate on top.

> Note: I also tried to replace some uses of the_hash_algo with the struct
> git_hash_algo from the received repository, for consitency. (In
> practice, I'm not sure if this is very useful right now, but maybe it
> will be relevant for the change to SHA256?) Still, many functions in
> parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> Since changing this would require a much bigger operation, I decided to
> leave it as is, for now.

I'm not sure I look forward to a day where oid_to_hex() needs to take an
extra parameter, just because it will make it more annoying to use. But
we'll have to go there eventually unless we plan to forbid mixing of
repositories with different hashes within the same process.

The hash transition document says:

  To convert recorded submodule pointers, you need to have the converted
  submodule repository in place. The translation table of the submodule
  can be used to look up the new hash.

which I take to mean that the local copy of the submodule needs to match
the superproject hash (this says nothing about what the upstream owner
of the submodule wants to do; we'd be translating on the fly to the new
hash in the local repo). So using the_hash_algo would work either way.

I don't think we're particularly interested in supporting multiple
unrelated repositories within the same process. While that would be
convenient for some cases (e.g., you have a million repositories and
want to look at all of their trees without creating a million
processes), I don't think it's a goal anyone is really working towards
with this "struct repository" layer.

> Note II: Despite receiving a repo through the apply_state struct,
> apply.c:apply_binary() call functions which uses the_repository
> internally. Because of that, I used the_hash_algo in this function in
> patch 6. Should I change it to use apply_state->repo->hash_algo
> anyway?

I think this follows my "it's OK for now because we're going in the
right direction from above". I.e., we probably do eventually want to use
apply_state->repo->hash_algo, just like we probably do eventually want
all those functions to take a repository argument. But by even using
the_hash_algo, you've at least marked the spot for later fixing (the
very final step of this long process will be eradicating all of
the_hash_algo and the_repository from the bottom of the call-stack on
up).

-Peff
Matheus Tavares Feb. 1, 2020, 4:44 a.m. UTC | #4
Hi, Peff

Thanks for the great feedback.

On Fri, Jan 31, 2020 at 6:03 AM Jeff King <peff@peff.net> wrote:
>
[...]
> The hash transition document says:
>
>   To convert recorded submodule pointers, you need to have the converted
>   submodule repository in place. The translation table of the submodule
>   can be used to look up the new hash.
>
> which I take to mean that the local copy of the submodule needs to match
> the superproject hash (this says nothing about what the upstream owner
> of the submodule wants to do; we'd be translating on the fly to the new
> hash in the local repo). So using the_hash_algo would work either way.
>
> I don't think we're particularly interested in supporting multiple
> unrelated repositories within the same process. While that would be
> convenient for some cases (e.g., you have a million repositories and
> want to look at all of their trees without creating a million
> processes), I don't think it's a goal anyone is really working towards
> with this "struct repository" layer.

Thanks for these explanations. One thing that left me thinking,
though, is about changing the_hash_algo to r->hash_algo (not only in
oid_to_hex()). I previously thought this would eventually be required
for the hash transition. But if I understood your comment correctly,
it might not be necessary since the submodule should always match the
superproject hash. And, as you mentioned, supporting multiple
unrelated repos [in a single process] is not one of the main goals, as
well. So wouldn't keep using the_hash_algo be OK?
Jeff King Feb. 1, 2020, 11:04 a.m. UTC | #5
On Sat, Feb 01, 2020 at 01:44:04AM -0300, Matheus Tavares Bernardino wrote:

> > I don't think we're particularly interested in supporting multiple
> > unrelated repositories within the same process. While that would be
> > convenient for some cases (e.g., you have a million repositories and
> > want to look at all of their trees without creating a million
> > processes), I don't think it's a goal anyone is really working towards
> > with this "struct repository" layer.
> 
> Thanks for these explanations. One thing that left me thinking,
> though, is about changing the_hash_algo to r->hash_algo (not only in
> oid_to_hex()). I previously thought this would eventually be required
> for the hash transition. But if I understood your comment correctly,
> it might not be necessary since the submodule should always match the
> superproject hash. And, as you mentioned, supporting multiple
> unrelated repos [in a single process] is not one of the main goals, as
> well. So wouldn't keep using the_hash_algo be OK?

Yes, by the reasoning in my message, it wouldn't be a big problem. But I
literally hadn't thought about it until writing that, so I may either be
missing something, or there may be plans from other folks working on the
hash transition that contradict that.

Even if we don't care too much about it for now, though, it still seems
like a step in the right direction.

-Peff