mbox series

[00/12] Remove more index compatibility macros

Message ID pull.830.git.1609506428.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Remove more index compatibility macros | expand

Message

Linus Arver via GitGitGadget Jan. 1, 2021, 1:06 p.m. UTC
I noticed that Duy's project around USE_THE_INDEX_COMPATIBILITY_MACROS has
been on pause for a while. Here is my attempt to continue that project a
little.

I started going through the builtins that still use cache_name_pos() and the
first few were easy: merge-inex, mv, rm.

Then I hit update-index and it was a bit bigger. It's included here as well.

My strategy for update-index was to create static globals "repo" and
"istate" that point to the_repository and the_index, respectively. Then, I
was able to remove macros one-by-one without changing method prototypes
within the file.

I had started trying to keep everything local to the method signatures, but
I hit a snag when reaching the command-line parsing callbacks, which I could
not modify their call signature. At that point, I had something that was
already much more complicated than what I present now. Outside of the first
update-index commit, everything was a mechanical find/replace.

In total, this allows us to remove four of the compatibility macros because
they are no longer used.

Thanks, -Stolee

Derrick Stolee (12):
  merge-index: drop index compatibility macros
  mv: remove index compatibility macros
  rm: remove compatilibity macros
  update-index: drop the_index, the_repository
  update-index: use istate->cache over active_cache
  update-index: use index->cache_nr over active_nr
  update-index: use istate->cache_changed
  update-index: use index_name_pos() over cache_name_pos()
  update-index: use remove_file_from_index()
  update-index: use add_index_entry()
  update-index: replace several compatibility macros
  update-index: remove ce_match_stat(), all macros

 Documentation/technical/racy-git.txt |   6 +-
 builtin/merge-index.c                |  33 +++---
 builtin/mv.c                         |  42 ++++----
 builtin/rm.c                         |  56 ++++++-----
 builtin/update-index.c               | 145 ++++++++++++++-------------
 cache.h                              |   4 -
 6 files changed, 149 insertions(+), 137 deletions(-)


base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-830%2Fderrickstolee%2Findex-compatibility-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-830/derrickstolee/index-compatibility-1-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/830

Comments

Elijah Newren Jan. 1, 2021, 9:16 p.m. UTC | #1
On Fri, Jan 1, 2021 at 5:10 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> I noticed that Duy's project around USE_THE_INDEX_COMPATIBILITY_MACROS has
> been on pause for a while. Here is my attempt to continue that project a
> little.
>
> I started going through the builtins that still use cache_name_pos() and the
> first few were easy: merge-inex, mv, rm.
>
> Then I hit update-index and it was a bit bigger. It's included here as well.
>
> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.
>
> I had started trying to keep everything local to the method signatures, but
> I hit a snag when reaching the command-line parsing callbacks, which I could
> not modify their call signature. At that point, I had something that was
> already much more complicated than what I present now. Outside of the first
> update-index commit, everything was a mechanical find/replace.
>
> In total, this allows us to remove four of the compatibility macros because
> they are no longer used.

This series is divided nicely in a way that makes review easy.  I've
made some of these same types of changes elsewhere, and the whole
series is really just a long sequence of mechanical changes (plus a
case or two of fixing formatting on a line you were changing anyway,
such as adding spaces around an operator).  I think the work to
continue dropping the implicit dependency on the_index is helpful.

I only found two minor suggestions for improving the commit messages;
the patches all look good to me.
Eric Sunshine Jan. 2, 2021, 6:12 a.m. UTC | #2
On Fri, Jan 1, 2021 at 8:09 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.
>
> I had started trying to keep everything local to the method signatures, but
> I hit a snag when reaching the command-line parsing callbacks, which I could
> not modify their call signature. [...]

You should be able to do this, not by modifying the callback
signature, but by taking advantage of the `extra` member of `struct
option` which is available to callback functions or arbitrary use. If
you need to access the index in a callback, then assign a `struct
index_state *` to `extra`; likewise assign a `struct repository *` to
`extra` to access the repository. If you need access to both the index
and the repository, then just store the repository in `extra` since
the repository has an `index` field.

You won't be able to use any of the canned OPT_FOO() macros to
initialize an entry in the update-index.c `options[]` array which
needs `extra`-initialization since the macros don't let you specify
`extra`, but you can easily bypass the macro and initialize the
`struct option` manually. (After all, the macros exist for
convenience; they are not a hard requirement.)

Within the callback, extract the `repository` or `index_state` as you
would any other field. For instance:

    const struct repository *repo = opt->extra;

This should allow you to get rid of the globals introduced by patch
[4/12] (assuming passing the index and repo arguments around
everywhere doesn't get overly hairy).
Derrick Stolee Jan. 4, 2021, 1:01 a.m. UTC | #3
On 1/2/2021 1:12 AM, Eric Sunshine wrote:
> On Fri, Jan 1, 2021 at 8:09 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> My strategy for update-index was to create static globals "repo" and
>> "istate" that point to the_repository and the_index, respectively. Then, I
>> was able to remove macros one-by-one without changing method prototypes
>> within the file.
>>
>> I had started trying to keep everything local to the method signatures, but
>> I hit a snag when reaching the command-line parsing callbacks, which I could
>> not modify their call signature. [...]
> 
> You should be able to do this, not by modifying the callback
> signature, but by taking advantage of the `extra` member of `struct
> option` which is available to callback functions or arbitrary use. If
> you need to access the index in a callback, then assign a `struct
> index_state *` to `extra`; likewise assign a `struct repository *` to
> `extra` to access the repository. If you need access to both the index
> and the repository, then just store the repository in `extra` since
> the repository has an `index` field.
> 
> You won't be able to use any of the canned OPT_FOO() macros to
> initialize an entry in the update-index.c `options[]` array which
> needs `extra`-initialization since the macros don't let you specify
> `extra`, but you can easily bypass the macro and initialize the
> `struct option` manually. (After all, the macros exist for
> convenience; they are not a hard requirement.)
> 
> Within the callback, extract the `repository` or `index_state` as you
> would any other field. For instance:
> 
>     const struct repository *repo = opt->extra;

Yes, this is definitely the way to make it possible.

> This should allow you to get rid of the globals introduced by patch
> [4/12] (assuming passing the index and repo arguments around
> everywhere doesn't get overly hairy).

My attempts just getting to the point of hitting these callbacks was
already making me frustrated with how complicated the code became with
that approach.

Perhaps now that I've removed the compatibility macros, it would be
easier to insert the method parameters since most of the lines that
need to change would be method prototypes and the calls to those methods
(plus the callback function details).

Is that a valuable effort? I could give it a try, but I want to be sure
that adjusting all of those helper methods in the builtin would indeed
have valuable improvements over the static globals used here.

Thanks,
-Stolee
Eric Sunshine Jan. 4, 2021, 6:22 a.m. UTC | #4
On Sun, Jan 3, 2021 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
> On 1/2/2021 1:12 AM, Eric Sunshine wrote:
> > This should allow you to get rid of the globals introduced by patch
> > [4/12] (assuming passing the index and repo arguments around
> > everywhere doesn't get overly hairy).
>
> Perhaps now that I've removed the compatibility macros, it would be
> easier to insert the method parameters since most of the lines that
> need to change would be method prototypes and the calls to those methods
> (plus the callback function details).
>
> Is that a valuable effort? I could give it a try, but I want to be sure
> that adjusting all of those helper methods in the builtin would indeed
> have valuable improvements over the static globals used here.

My impression was that the goal of the earlier work was to pass the
index and repository to each function specifically to avoid tying the
function to a particular index or repository. This helps in cases in
which client code needs to operate on a different index or repository
(for instance, a submodule). Generally speaking, making the index and
repository file-static rather than global does not help reach that
goal since functions are still tied to state which is not local to the
function itself.

Would the extra effort be valuable in this particular case? I'm not
familiar with this code, but given that `update-index` is a builtin,
such effort may not be too meaningful. If, however, any of the code
from `buildin/update-index.c` ever gets "libified" and moved into the
core library, then that would be a good time to update the functions
to take those values as arguments rather than relying on file-static
or globals. But that's not something that this series necessarily
needs to do; the task can wait until the code needs to be shared by
other modules, I would think.
Derrick Stolee Jan. 5, 2021, 4:41 a.m. UTC | #5
On 1/4/2021 1:22 AM, Eric Sunshine wrote:
> On Sun, Jan 3, 2021 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
>> On 1/2/2021 1:12 AM, Eric Sunshine wrote:
>>> This should allow you to get rid of the globals introduced by patch
>>> [4/12] (assuming passing the index and repo arguments around
>>> everywhere doesn't get overly hairy).
>>
>> Perhaps now that I've removed the compatibility macros, it would be
>> easier to insert the method parameters since most of the lines that
>> need to change would be method prototypes and the calls to those methods
>> (plus the callback function details).
>>
>> Is that a valuable effort? I could give it a try, but I want to be sure
>> that adjusting all of those helper methods in the builtin would indeed
>> have valuable improvements over the static globals used here.
> 
> My impression was that the goal of the earlier work was to pass the
> index and repository to each function specifically to avoid tying the
> function to a particular index or repository. This helps in cases in
> which client code needs to operate on a different index or repository
> (for instance, a submodule). Generally speaking, making the index and
> repository file-static rather than global does not help reach that
> goal since functions are still tied to state which is not local to the
> function itself.
> 
> Would the extra effort be valuable in this particular case? I'm not
> familiar with this code, but given that `update-index` is a builtin,
> such effort may not be too meaningful. If, however, any of the code
> from `buildin/update-index.c` ever gets "libified" and moved into the
> core library, then that would be a good time to update the functions
> to take those values as arguments rather than relying on file-static
> or globals. But that's not something that this series necessarily
> needs to do; the task can wait until the code needs to be shared by
> other modules, I would think.

I tried again tonight, and it started getting messy, but then I
realized that I could group the callbacks that need the repo and
index to use a common struct that holds the other parameters they
need. It's still a bigger patch than I'd like, but it is more
reasonable.

v2 is incoming with my attempt at this.

Thanks,
-Stolee
Junio C Hamano Jan. 6, 2021, 3:55 a.m. UTC | #6
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.

Knee-jerk reaction: swapping one pair of global with another?  Would
that give us enough upside?  It may allow some codepaths involved to
work on an in-core index instance that is different from the_index,
but does not make them reentrant.

Do we now have callers that actually pass an in-core index instance
that is different from the_index, and more importantly, that fail
loudly if the codepaths involved in this conversion forgets to
update some accesses to the_index not to the specified one?

If not, ... 

> In total, this allows us to remove four of the compatibility macros because
> they are no longer used.

... a conversion like this, removing the use of the compatibility
macros for the sake of removing them, invites future headaches by
leaving untested code churn behind with potential bugs that will
only get discovered after somebody actually starts making calls
with the non-default in-core index instances.

I've come to know the competence of you well enough to trust your
patches like patches from other proficient, prolific and prominent
contributors (I won't name names, but you know who you are), but we
are all human and are prone to introduce bugs.

That's all my knee-jerk impression before actually reading the
series through, though.  I'll certainloy know more after reading
them.

Thanks.

> Derrick Stolee (12):
>   merge-index: drop index compatibility macros
>   mv: remove index compatibility macros
>   rm: remove compatilibity macros
>   update-index: drop the_index, the_repository
>   update-index: use istate->cache over active_cache
>   update-index: use index->cache_nr over active_nr
>   update-index: use istate->cache_changed
>   update-index: use index_name_pos() over cache_name_pos()
>   update-index: use remove_file_from_index()
>   update-index: use add_index_entry()
>   update-index: replace several compatibility macros
>   update-index: remove ce_match_stat(), all macros
>
>  Documentation/technical/racy-git.txt |   6 +-
>  builtin/merge-index.c                |  33 +++---
>  builtin/mv.c                         |  42 ++++----
>  builtin/rm.c                         |  56 ++++++-----
>  builtin/update-index.c               | 145 ++++++++++++++-------------
>  cache.h                              |   4 -
>  6 files changed, 149 insertions(+), 137 deletions(-)
>
>
> base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-830%2Fderrickstolee%2Findex-compatibility-1-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-830/derrickstolee/index-compatibility-1-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/830
Derrick Stolee Jan. 6, 2021, 11:35 a.m. UTC | #7
On 1/5/2021 10:55 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> My strategy for update-index was to create static globals "repo" and
>> "istate" that point to the_repository and the_index, respectively. Then, I
>> was able to remove macros one-by-one without changing method prototypes
>> within the file.
> 
> Knee-jerk reaction: swapping one pair of global with another?  Would
> that give us enough upside?  It may allow some codepaths involved to
> work on an in-core index instance that is different from the_index,
> but does not make them reentrant.

My intention was to reduce the use of globals in libgit.a while keeping
with existing patterns of static globals in the builtin code. While
this can be thought of "module variables" instead of true globals, they
aren't exactly desirable. In v2, these static globals are temporary to
the series and are completely removed by the end.

The new patch sequence can hopefully be seen as "this preprocessor
macro was expanded" and then "static globals are replaced with
method parameters" which are pretty straightforward.

> Do we now have callers that actually pass an in-core index instance
> that is different from the_index, and more importantly, that fail
> loudly if the codepaths involved in this conversion forgets to
> update some accesses to the_index not to the specified one?
> 
> If not, ... 
> 
>> In total, this allows us to remove four of the compatibility macros because
>> they are no longer used.
> 
> ... a conversion like this, removing the use of the compatibility
> macros for the sake of removing them, invites future headaches by
> leaving untested code churn behind with potential bugs that will
> only get discovered after somebody actually starts making calls
> with the non-default in-core index instances.

Perhaps I had misunderstood the state of the conversion project. I
thought that the full conversion was just paused because Duy moved
on to other things. I thought it might be valuable to pick up the
baton while also thinking about the space.

If this is _not_ a valuable project to continue, then I can hold
off for now.

Unfortunately, we'll never know if everything is safe from assuming
the_index until the macro itself is gone. It helps that libgit.a
doesn't use it at 

> I've come to know the competence of you well enough to trust your
> patches like patches from other proficient, prolific and prominent
> contributors (I won't name names, but you know who you are), but we
> are all human and are prone to introduce bugs.

That means a lot, thanks. And yes, I'm well aware that bugs can be
introduced. I've added my share.

> That's all my knee-jerk impression before actually reading the
> series through, though.  I'll certainloy know more after reading
> them.

I look forward to your thoughts.

Thanks,
-Stolee
Junio C Hamano Jan. 6, 2021, 8:52 p.m. UTC | #8
Derrick Stolee <stolee@gmail.com> writes:

> My intention was to reduce the use of globals in libgit.a while keeping
> with existing patterns of static globals in the builtin code.

The above (without having thought about it too hard or long enough
yet) sounds a sensible guideline to go by.  Thanks for a helpful
hint to keep in mind while reading the series.

> While
> this can be thought of "module variables" instead of true globals, they
> aren't exactly desirable. In v2, these static globals are temporary to
> the series and are completely removed by the end.

;-)