mbox series

[RFC,00/20] submodule: remove git-submodule.sh, create bare builtin/submodule.c

Message ID RFC-cover-00.20-00000000000-20220610T011725Z-avarab@gmail.com (mailing list archive)
Headers show
Series submodule: remove git-submodule.sh, create bare builtin/submodule.c | expand

Message

Ævar Arnfjörð Bjarmason June 10, 2022, 2:01 a.m. UTC
On Fri, Jun 10 2022, Glen Choo via GitGitGadget wrote:

> As a follow up to ar/submodule-update [1] and its successors
> gc/submodule-update-part* [2] [3], this series converts the last remaining
> piece of "git submodule update" into C, namely, the option parsing in
> git-submodule.sh.

Aside at the end at [2].

> As a result, git-submodule.sh::cmd_update() is now an (almost) one-liner:
>
> cmd_update() { git ${wt_prefix:+-C "$wt_prefix"} submodule--helper update
> ${wt_prefix:+--prefix "$wt_prefix"}
> "$@" }
>
> and best of all, "git submodule update" now shows a usage string for its own
> subcommand instead of a giant usage string for all of "git submodule" :)
>
> Given how many options "git submodule update" accepts, this series takes a
> gradual approach:
>
>  1. Create a variable opts, which holds the literal options we want to pass
>     to "git submodule--helper update". Then, for each option...
>  2. If "git submodule--helper update" already understands the string option,
>     append it to opts and remove any special handling (1-3/8).
>  3. Otherwise, if the option makes sense, teach "git submodule--helper
>     update" to understand the option. Goto 2. (4-5/8).
>  4. Otherwise, if the option makes no sense, drop it (6/8).
>  5. When we've processed all options, delete all the option parsing code
>     (7/8) and clean up (8/8).

That's quite the timing coincidence. I hacked this up yesterday,
thinking that the submodule topic had been too quiet for a while, and
wondering how hard it was to convert the rest of git-submodule.sh.

It's more than 2x the length of yours, but gets to the point where we
can "git rm git-submodule.sh".

Some brief comparison/commentary:

> Glen Choo (8):
>   submodule update: remove intermediate parsing
>   submodule update: pass options containing "[no-]"
>   submodule update: pass options with stuck forms

Yeah, this is the alternate approach I considered and ended up
discarding. I.e. to make forward progress with migrating things away
from the cmd_*() functions you either have to prepare things in
advance and then sweep the rug from under them in one go.

Or, as you're doing here teaching them about the options they're
not-really-parsing anymore, but must know about because they're in a
loop that ends with a "if unknown option, usage".

>   submodule update: pass --require-init and --init

Almost the same as my 12/20.

>   submodule--helper update: use one param per type

Same as my 13/20, but I ended up doing it in a more narrow/smaller
way. I tried your way and ran into some bug, then figured I'd do it
more narrowly instead of debugging it.

>   submodule update: remove -v, pass --quiet

Hrm, so we don't need it at all then. Well, that's a bit simpler than
my 1[45]/20 and 17/20 :)

So yeah, definitely RFC-quality, but I ran into that one test that
used -v, and then saw the missing docs etc. But no cheating, so I've
left it in :)

I do wonder if we should leave it in anyway, we never documented -v,
but we *did* understand it, and if you look at:

    git log -p -Gsay -- git-submodule.sh

We used to have a lot more code impacted by it, but looking at this
again now it would have only been for users of command-lines like:

    git submodule --quiet update -v [...]

I.e. where we already set the flag to the non-default quiet, and then
used -v to flip it.

I think at this point I've talked myself into "let's just remove it",
but maybe...

>   submodule update: stop parsing options in .sh

Same effect as my 16/20, but it's the last one I converted, the
cmd_update() case being the trickiest.

>   submodule update: remove never-used expansion

Same as my 02/20, but as seen there I think you missed several
"prefix" non-uses.

Brief commentary on my patches, details in commit messages:

Ævar Arnfjörð Bjarmason (20):
  git-submodule.sh: remove unused sanitize_submodule_env()
  git-submodule.sh: remove unused $prefix variable
  git-submodule.sh: remove unused --super-prefix logic

I removed a bit more dead code here than yours.

  git-submodule.sh: normalize parsing of "--branch"
  git-submodule.sh: normalize parsing of --cached

This & various other prep commits (hereafter "easy prep") make
subsequent one-time conversions of whole cmd_*() easier.

  submodule--helper: rename "absorb-git-dirs" to "absorbgitdirs"
  git-submodule.sh: create a "case" dispatch statement

easy prep

  submodule--helper: pretend to be "git submodule" in "-h" output

easy prep & bug fix for existing (on master) output bugs.

  git-submodule.sh: dispatch "sync" to helper
  git-submodule.sh: dispatch directly to helper
  git-submodule.sh: dispatch "foreach" to helper

These are easy conversions as the options 1=1 map after the above
prep.

  submodule--helper: have --require-init imply --init
  submodule--helper: understand --checkout, --merge and --rebase
    synonyms
  git-submodule doc: document the -v" option to "update"
  submodule--helper: understand -v option for "update"

not-so-easy prep for "cmd_update()"

  git-submodule.sh: dispatch "update" to helper

Full cmd_update() migration in one go.

  git-submodule.sh: use "$quiet", not "$GIT_QUIET"

"easy prep", but this one is less overall churn if done at the end,
but as noted above could/should maybe be dropped entirely.

  git-submodule.sh: simplify parsing loop

Not really needed, but I wanted to get the code as close to minimal
for the next step, to eyeball the resulting sh v.s. C version.

  submodule: make it a built-in, remove git-submodule.sh

We now have a builtin/submodule.c *and* the current
builtin/submodule--helper.c, and we even dispatch to "git
submodule--helper" via run_command()!

The idea is to be as close as possible to a bug-for-bug implementation
of the shellscript, and that reviewers should be confident in being
able to trace what commands we invoked before/after, we're invoking
the same "git submodule--helper" commands.

Of course we eventually want to get to some full union of
builtin/submodule{,--helper}.c, but that can wait.

  submodule: add a subprocess-less submodule.useBuiltin setting

Wait, a useBuiltin setting to switch between two built-ins? Yeah,
maybe it makes little sense, but here we get rid of the run_command()
overhead, and could generally use the built-in to experiment with
deeper integration between the two.

Performance is around ~2x faster with the "real" built-in than the
run_command() version, whic hin turn is more than 6x as fast on basic
overhead than the shellscript version, to the extent that anyone cares
about "git submodule" overhead. See [1] at the end for a benchmark.

That last change adds a CI target for
GIT_TEST_SUBMODULE_USE_BUILTIN=true, full CI run here:
https://github.com/avar/git/actions/runs/2472131257

 Documentation/config/submodule.txt |   4 +
 Documentation/git-submodule.txt    |   8 +-
 Makefile                           |   2 +-
 builtin.h                          |   1 +
 builtin/submodule--helper.c        | 118 +++---
 builtin/submodule.c                | 169 ++++++++
 ci/run-build-and-tests.sh          |   1 +
 git-sh-setup.sh                    |   7 -
 git-submodule.sh                   | 637 -----------------------------
 git.c                              |   1 +
 submodule.c                        |   2 +-
 t/README                           |   4 +
 12 files changed, 255 insertions(+), 699 deletions(-)
 create mode 100644 builtin/submodule.c
 delete mode 100755 git-submodule.sh

1. GIT_TEST_SUBMODULE_USE_BUILTIN=true git hyperfine -L rev origin/master,HEAD~0 -L v false,true -s 'make CFLAGS=-O3' 'GIT_TEST_SUBMODULE_USE_BUILTIN={v} ./git --exec-path=$PWD submodule status' -r 20
Benchmark 1: GIT_TEST_SUBMODULE_USE_BUILTIN=false ./git --exec-path=$PWD submodule status' in 'origin/master
  Time (mean ± σ):      40.9 ms ±   0.3 ms    [User: 33.3 ms, System: 9.7 ms]
  Range (min … max):    40.2 ms …  41.5 ms    20 runs

Benchmark 2: GIT_TEST_SUBMODULE_USE_BUILTIN=false ./git --exec-path=$PWD submodule status' in 'HEAD~0
  Time (mean ± σ):      12.4 ms ±   0.1 ms    [User: 9.9 ms, System: 2.5 ms]
  Range (min … max):    12.2 ms …  12.7 ms    20 runs

Benchmark 3: GIT_TEST_SUBMODULE_USE_BUILTIN=true ./git --exec-path=$PWD submodule status' in 'origin/master
  Time (mean ± σ):      40.9 ms ±   0.5 ms    [User: 35.6 ms, System: 7.2 ms]
  Range (min … max):    40.1 ms …  41.8 ms    20 runs

Benchmark 4: GIT_TEST_SUBMODULE_USE_BUILTIN=true ./git --exec-path=$PWD submodule status' in 'HEAD~0
  Time (mean ± σ):       6.4 ms ±   0.1 ms    [User: 3.9 ms, System: 2.5 ms]
  Range (min … max):     6.3 ms …   6.6 ms    20 runs

Summary
  'GIT_TEST_SUBMODULE_USE_BUILTIN=true ./git --exec-path=$PWD submodule status' in 'HEAD~0' ran
    1.94 ± 0.03 times faster than 'GIT_TEST_SUBMODULE_USE_BUILTIN=false ./git --exec-path=$PWD submodule status' in 'HEAD~0'
    6.40 ± 0.11 times faster than 'GIT_TEST_SUBMODULE_USE_BUILTIN=true ./git --exec-path=$PWD submodule status' in 'origin/master'
    6.40 ± 0.10 times faster than 'GIT_TEST_SUBMODULE_USE_BUILTIN=false ./git --exec-path=$PWD submodule status' in 'origin/master'

2. Aside: I don't think these ever made it on-list but Atharva's
   version of what we're trying to do here is at:
   https://github.com/tfidfwastaken/git/tree/submodule-make-builtin-2

   I'd looked those over at some distant point in the past, and
   skimmed them again yesterday, but thought they were too much
   all-at-once to be confident in testing it myself, hence coming up
   with this alternate & smaller approach.

Comments

Glen Choo June 13, 2022, 7:07 p.m. UTC | #1
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> As a result, git-submodule.sh::cmd_update() is now an (almost) one-liner:
>>
>> cmd_update() { git ${wt_prefix:+-C "$wt_prefix"} submodule--helper update
>> ${wt_prefix:+--prefix "$wt_prefix"}
>> "$@" }
>>
>> and best of all, "git submodule update" now shows a usage string for its own
>> subcommand instead of a giant usage string for all of "git submodule" :)
>>
>> Given how many options "git submodule update" accepts, this series takes a
>> gradual approach:
>>
>>  1. Create a variable opts, which holds the literal options we want to pass
>>     to "git submodule--helper update". Then, for each option...
>>  2. If "git submodule--helper update" already understands the string option,
>>     append it to opts and remove any special handling (1-3/8).
>>  3. Otherwise, if the option makes sense, teach "git submodule--helper
>>     update" to understand the option. Goto 2. (4-5/8).
>>  4. Otherwise, if the option makes no sense, drop it (6/8).
>>  5. When we've processed all options, delete all the option parsing code
>>     (7/8) and clean up (8/8).
>
> That's quite the timing coincidence. I hacked this up yesterday,
> thinking that the submodule topic had been too quiet for a while, and
> wondering how hard it was to convert the rest of git-submodule.sh.
>
> It's more than 2x the length of yours, but gets to the point where we
> can "git rm git-submodule.sh".

Very cool. I've skimmed through all of the patches, which mostly look
good except for ~1-2 things.

Your series shows that there isn't any prohibitively difficult work left
to finish the conversion, which is great! The real problem IMO is the
potential for mechanical errors given how many lines this touches.

Here's a way of breaking apart the work that makes sense to me:

- Reuse the patches that prepare git-submodule.sh for the conversion,
  particularly 1-7/20 (create a "case" dispatch statement and its
  preceding patches).
- Keep my series that prepares "update", since that's the most tedious
  one to convert. If I don't dispatch to the "case" statement, I don't
  think it will even conflict with the preparatory series.

  Some of your patches make more sense than mine, and I'll incorporate
  them as necessary :)
- Dispatch subcommands using the "case" dispatch, including "update". We
  might have to do this slowly if we want things to be easy to eyeball.
- "git rm git-submodule.sh"!

>> Glen Choo (8):
>>   submodule update: remove intermediate parsing
>>   submodule update: pass options containing "[no-]"
>>   submodule update: pass options with stuck forms
>
> Yeah, this is the alternate approach I considered and ended up
> discarding. I.e. to make forward progress with migrating things away
> from the cmd_*() functions you either have to prepare things in
> advance and then sweep the rug from under them in one go.
>
> Or, as you're doing here teaching them about the options they're
> not-really-parsing anymore, but must know about because they're in a
> loop that ends with a "if unknown option, usage".

Yes, if you took as many steps as I did, your series would be way too
long :P

To convert "update", I don't think this many steps is necessary; I
prepared it this way primarily to make it easier for everyone to spot
how the options changed so that they can give feedback. Some of these
can be squashed in my reroll

>>   submodule--helper update: use one param per type
>
> Same as my 13/20, but I ended up doing it in a more narrow/smaller
> way. I tried your way and ran into some bug, then figured I'd do it
> more narrowly instead of debugging it.

Yeah your approach is easier to eyeball, so I'll do this instead.

>>   submodule update: remove -v, pass --quiet
>
> Hrm, so we don't need it at all then. Well, that's a bit simpler than
> my 1[45]/20 and 17/20 :)
>
> So yeah, definitely RFC-quality, but I ran into that one test that
> used -v, and then saw the missing docs etc. But no cheating, so I've
> left it in :)
>
> I do wonder if we should leave it in anyway, we never documented -v,
> but we *did* understand it, and if you look at:
>
>     git log -p -Gsay -- git-submodule.sh
>
> We used to have a lot more code impacted by it, but looking at this
> again now it would have only been for users of command-lines like:
>
>     git submodule --quiet update -v [...]
>
> I.e. where we already set the flag to the non-default quiet, and then
> used -v to flip it.
>
> I think at this point I've talked myself into "let's just remove it",
> but maybe...

On hindsight, what I did is definitely cheating ;)

My series also breaks the way we'd handle --quiet in "git submodule",
i.e.

   git submodule --quiet update

should be quiet, but isn't.

Your approach actually handles --quiet as per the original shell script,
which is a good enough reason to do it your way. We can think about
removing it later.

> Brief commentary on my patches, details in commit messages:
>
> Ævar Arnfjörð Bjarmason (20):
>   git-submodule.sh: remove unused sanitize_submodule_env()
>   git-submodule.sh: remove unused $prefix variable
>   git-submodule.sh: remove unused --super-prefix logic
>
> I removed a bit more dead code here than yours.
>
>   git-submodule.sh: normalize parsing of "--branch"
>   git-submodule.sh: normalize parsing of --cached
>
> This & various other prep commits (hereafter "easy prep") make
> subsequent one-time conversions of whole cmd_*() easier.
>
>   submodule--helper: rename "absorb-git-dirs" to "absorbgitdirs"
>   git-submodule.sh: create a "case" dispatch statement
>
> easy prep

This would all make sense in a preparatory series, with the exception of 
3/20 git-submodule.sh: remove unused --super-prefix logic.

We have several instances where we invoke submodule--helper directly
with --super-prefix, e.g. inside sync_submodule():
    
    if (flags & OPT_RECURSIVE) {
      struct child_process cpr = CHILD_PROCESS_INIT;

      cpr.git_cmd = 1;
      cpr.dir = path;
      prepare_submodule_repo_env(&cpr.env_array);

      strvec_push(&cpr.args, "--super-prefix"); /* Here */

I even have a (as of now private) patch that replaces "update"'s
--recursive-prefix with --super-prefix.

This probably wasn't caught in the tests because this only affects how
we calculate the submodule 'displayname'.

>   submodule--helper: pretend to be "git submodule" in "-h" output
>
> easy prep & bug fix for existing (on master) output bugs.
>
>   git-submodule.sh: dispatch "sync" to helper
>   git-submodule.sh: dispatch directly to helper
>   git-submodule.sh: dispatch "foreach" to helper
>
> These are easy conversions as the options 1=1 map after the above
> prep.

Yes, these are pretty easy. I'm worried about the number of lines
changed and the potential for mechanical errors, but we can roll these
more slowly if necessary.

>   submodule--helper: have --require-init imply --init
>   submodule--helper: understand --checkout, --merge and --rebase
>     synonyms
>   git-submodule doc: document the -v" option to "update"
>   submodule--helper: understand -v option for "update"
>
> not-so-easy prep for "cmd_update()"
>
>   git-submodule.sh: dispatch "update" to helper
>
> Full cmd_update() migration in one go.

Yeah, and since it's not-so-easy, it probably makes sense to continue to
keep my series around. I'll borrow some of these patches if that's ok :)

>   git-submodule.sh: use "$quiet", not "$GIT_QUIET"
>
> "easy prep", but this one is less overall churn if done at the end,
> but as noted above could/should maybe be dropped entirely.
>
>   git-submodule.sh: simplify parsing loop
>
> Not really needed, but I wanted to get the code as close to minimal
> for the next step, to eyeball the resulting sh v.s. C version.
>
>   submodule: make it a built-in, remove git-submodule.sh
>
> We now have a builtin/submodule.c *and* the current
> builtin/submodule--helper.c, and we even dispatch to "git
> submodule--helper" via run_command()!
>
> The idea is to be as close as possible to a bug-for-bug implementation
> of the shellscript, and that reviewers should be confident in being
> able to trace what commands we invoked before/after, we're invoking
> the same "git submodule--helper" commands.
>
> Of course we eventually want to get to some full union of
> builtin/submodule{,--helper}.c, but that can wait.
>
>   submodule: add a subprocess-less submodule.useBuiltin setting
>
> Wait, a useBuiltin setting to switch between two built-ins? Yeah,
> maybe it makes little sense, but here we get rid of the run_command()
> overhead, and could generally use the built-in to experiment with
> deeper integration between the two.
>
> ...

Interesting approach. It looks ok to me, but if we break up this series,
maybe this will be stale by the time we integrate the rest of the
changes?