mbox series

[RFC,0/2] Bring the_repository into cmd_foo

Message ID 20181018183758.81186-1-sbeller@google.com (mailing list archive)
Headers show
Series Bring the_repository into cmd_foo | expand

Message

Stefan Beller Oct. 18, 2018, 6:37 p.m. UTC
> On Wed, Oct 17, 2018 at 5:41 AM Derrick Stolee <stolee@gmail.com> wrote:
>> I had one high-level question: How are we testing that these "arbitrary
>> repository" changes are safe?
> [...]
> Or instead we could accelerate the long term plan of removing a
> hard coded the_repository and have each cmd builtin take an additional
> repository pointer from the init code, such that we'd bring all of Git to
> work on arbitrary repositories. Then the standard test suite should be
> okay, as there is no special case for the_repository any more.

Demo'd in this RFC series for git-merge-base.

The core idea is found in patch 1,
and the proof of concept is found in patch 2.

What do you think?

Thanks,
Stefan

Stefan Beller (3):
  repository: have get_the_repository() to remove the_repository
    dependency
  builtin/merge-base.c: do not rely on the_repository any more

 builtin/merge-base.c  | 67 ++++++++++++++++++++++++++-----------------
 repository.c          | 10 +++++++
 repository.h          | 13 ++++++++-
 t/t6010-merge-base.sh |  3 +-

Comments

Jonathan Tan Oct. 18, 2018, 9:01 p.m. UTC | #1
> > Or instead we could accelerate the long term plan of removing a
> > hard coded the_repository and have each cmd builtin take an additional
> > repository pointer from the init code, such that we'd bring all of Git to
> > work on arbitrary repositories. Then the standard test suite should be
> > okay, as there is no special case for the_repository any more.
> 
> Demo'd in this RFC series for git-merge-base.
> 
> The core idea is found in patch 1,
> and the proof of concept is found in patch 2.

I don't think working around the_repository is sufficient, as there are
other ways to access the same repository state (the_index, directly
accessing files through file I/O). Instead I would prefer a test like in
t/test-repository.c - each patch set would probably only need one test
for the last function converted, since typically the last function uses
every other function converted.

Also, even if we decided that working around the_repository is
sufficient, I don't think this get_the_repository() is a good approach.
If (or when) we decide to convert all builtins to not use
the_repository, we would have to clean up all such calls.

Better would be to pass the_repository from the invoker of the cmd
builtin, and reuse NO_THE_REPOSITORY_COMPATIBILITY_MACROS in the
builtin. (I haven't thought much about how to transition to this, but
one way might be to extend "struct cmd_struct" in git.c to also have a
new-style function pointer, and #define GIT_CMD(c, f, o) {c, NULL, o, f}
or something like that.)

This doesn't directly address the fact that the builtin might call lib
code that indirectly references the_repository, but I think that this
won't be an issue because by the time we're ready to convert builtins to
not use the_repository, most if not all of the lib code would have
NO_THE_REPOSITORY_COMPATIBILITY_MACROS defined anyway.
Stefan Beller Oct. 18, 2018, 11:23 p.m. UTC | #2
On Thu, Oct 18, 2018 at 2:01 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > > Or instead we could accelerate the long term plan of removing a
> > > hard coded the_repository and have each cmd builtin take an additional
> > > repository pointer from the init code, such that we'd bring all of Git to
> > > work on arbitrary repositories. Then the standard test suite should be
> > > okay, as there is no special case for the_repository any more.
> >
> > Demo'd in this RFC series for git-merge-base.
> >
> > The core idea is found in patch 1,
> > and the proof of concept is found in patch 2.
>
> I don't think working around the_repository is sufficient, as there are
> other ways to access the same repository state (the_index, directly
> accessing files through file I/O).

Sure, but that would stick out like a sore thumb?

> Instead I would prefer a test like in
> t/test-repository.c - each patch set would probably only need one test
> for the last function converted, since typically the last function uses
> every other function converted.

I'll look into that.

>
> Also, even if we decided that working around the_repository is
> sufficient, I don't think this get_the_repository() is a good approach.
> If (or when) we decide to convert all builtins to not use
> the_repository, we would have to clean up all such calls.

Just like we have to cleanup the calls to the_repository or the_index
in general now (c.f. nd/the-index)

> Better would be to pass the_repository from the invoker of the cmd
> builtin, and reuse NO_THE_REPOSITORY_COMPATIBILITY_MACROS in the
> builtin. (I haven't thought much about how to transition to this, but
> one way might be to extend "struct cmd_struct" in git.c to also have a
> new-style function pointer, and #define GIT_CMD(c, f, o) {c, NULL, o, f}
> or something like that.)

This sounds like the next step to me.

> This doesn't directly address the fact that the builtin might call lib
> code that indirectly references the_repository, but I think that this
> won't be an issue because by the time we're ready to convert builtins to
> not use the_repository, most if not all of the lib code would have
> NO_THE_REPOSITORY_COMPATIBILITY_MACROS defined anyway.

And until then we double up on tests, one time the regular end-to-end tests
and additional tests for repository agnostic units in test-repository.c ?

The whole point of this approach is to keep the testing at the level
that we currently have and make the tests more powerful in doubling
for both (a) testing existing behavior, (b) getting fairly good coverage
of repository-fication of the code base by these 2 simple knobs.