diff mbox series

[v3,4/4] submodule: port submodule subcommand 'summary' from shell to C

Message ID 20200812194404.17028-5-shouryashukla.oo@gmail.com (mailing list archive)
State Accepted
Commit e83e3333b5770815e9c2e3aee48c305257385bbb
Headers show
Series submodule: port subcommand 'summary' from shell to C | expand

Commit Message

Shourya Shukla Aug. 12, 2020, 7:44 p.m. UTC
From: Prathamesh Chavan <pc44800@gmail.com>

Convert submodule subcommand 'summary' to a builtin and call it via
'git-submodule.sh'.

The shell version had to call $diff_cmd twice, once to find the modified
modules cared by the user and then again, with that list of modules
to do various operations for computing the summary of those modules.
On the other hand, the C version does not need a second call to
$diff_cmd since it reuses the module list from the first call to do the
aforementioned tasks.

In the C version, we use the combination of setting a child process'
working directory to the submodule path and then calling
'prepare_submodule_repo_env()' which also sets the 'GIT_DIR' to '.git',
so that we can be certain that those spawned processes will not access
the superproject's ODB by mistake.

A behavioural difference between the C and the shell version is that the
shell version outputs two line feeds after the 'git log' output when run
outside of the tests while the C version outputs one line feed in any
case. The reason for this is that the shell version calls log with
'--pretty=format:<fmt>' whose output is followed by two echo
calls; 'format' does not have "terminator" semantics like its 'tformat'
counterpart. So, the log output is terminated by a newline only when
invoked by the user and not when invoked from the scripts. This results
in the one & two line feed differences in the shell version.
On the other hand, the C version calls log with '--pretty=<fmt>'
which is equivalent to '--pretty:tformat:<fmt>' which is then
followed by a 'printf("\n")'. Due to its "terminator" semantics the
log output is always terminated by newline and hence one line feed in
any case.

Also, when we try to pass an option-like argument after a non-option
argument, for instance:

    git submodule summary HEAD --foo-bar

    (or)

    git submodule summary HEAD --cached

That argument would be treated like a path to the submodule for which
the user is requesting a summary. So, the option ends up having no
effect. Though, passing '--quiet' is an exception to this:

    git submodule summary HEAD --quiet

While 'summary' doesn't support '--quiet', we don't get an output for
the above command as '--quiet' is treated as a path which means we get
an output only if a submodule whose path is '--quiet' exists.

The error message in case of computing a summary for non-existent
submodules in the C version is different from that of the shell version.
Since the new error message is not marked for translation, change the
'test_i18ngrep' in t7421.4 to 'grep'.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Stefan Beller <stefanbeller@gmail.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 builtin/submodule--helper.c      | 429 +++++++++++++++++++++++++++++++
 git-submodule.sh                 | 186 +-------------
 t/t7421-submodule-summary-add.sh |   2 +-
 3 files changed, 431 insertions(+), 186 deletions(-)

Comments

Jeff King Aug. 18, 2020, 2:08 a.m. UTC | #1
On Thu, Aug 13, 2020 at 01:14:04AM +0530, Shourya Shukla wrote:

> +static void print_submodule_summary(struct summary_cb *info, char* errmsg,
> +				    int total_commits, const char *displaypath,
> +				    const char *src_abbrev, const char *dst_abbrev,
> +				    int missing_src, int missing_dst,
> +				    struct module_cb *p)

The "missing_src" and "missing_dst" parameters in this function are
unused.

I _think_ they can be safely removed, and are not a sign of a bug. We
seem to fully handle them in the calling function. But this is the first
time I looked at the code, and I didn't dig too deeply.

-Peff
Shourya Shukla Aug. 21, 2020, 5:22 a.m. UTC | #2
> The "missing_src" and "missing_dst" parameters in this function are
> unused.

Yes, they are unused.

> I _think_ they can be safely removed, and are not a sign of a bug. We
> seem to fully handle them in the calling function. But this is the first
> time I looked at the code, and I didn't dig too deeply.

Sure. They an be removed. I am sending a patch for the same.
Johannes Schindelin Aug. 21, 2020, 3:17 p.m. UTC | #3
Hi Shourya,

On Thu, 13 Aug 2020, Shourya Shukla wrote:

> [...]
> diff --git a/t/t7421-submodule-summary-add.sh b/t/t7421-submodule-summary-add.sh
> index 829fe26d6d..59a9b00467 100755
> --- a/t/t7421-submodule-summary-add.sh
> +++ b/t/t7421-submodule-summary-add.sh
> @@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
>  	git commit -m "change submodule path" &&
>  	rev=$(git -C sm rev-parse --short HEAD^) &&
>  	git submodule summary HEAD^^ -- my-subm >actual 2>err &&
> -	test_i18ngrep "fatal:.*my-subm" err &&
> +	grep "fatal:.*my-subm" err &&

Sadly, this breaks on Windows: on Linux (and before this patch, also on
Windows), the error message reads somewhat like this:

	fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory

However, with the built-in `git submodule summary`, on Windows the error
message reads like this:

	error: cannot spawn git: No such file or directory

Now, this is of course not the best way to present this error message, but
please note that even providing a better error message does not fix the
erroneous expectation of the `fatal:` prefix (Git typically produces this
when `die()`ing, which can be done in the POSIX version that uses `fork()`
and `exec()` but not in the Windows version that needs to use
`CreateProcessW()` instead).

Therefore, I propose this patch on top:

-- snipsnap --
[PATCH] mingw: mention if `mingw_spawnve()` failed due to a missing directory

When we recently converted the `summary` subcommand of `git submodule`
to be mostly built-in, a bug was uncovered where a very unhelpful error
message was produced when a process could not be spawned because the
directory in which it was supposed to be run does not exist.

Even so, we _still_ have to adjust the `git submodule summary` test, to
accommodate for the fact that the `mingw_spawnve()` function will return
with an error instead of `die()`ing.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/mingw.c                   | 4 ++++
 t/t7421-submodule-summary-add.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 1a64d4efb26b..3c30d0cab589 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1850,6 +1850,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen
 	/* Make sure to override previous errors, if any */
 	errno = 0;

+	if (dir && !is_directory(dir))
+		return error_errno(_("could not exec '%s' in '%s'"),
+				   argv[0], dir);
+
 	if (restrict_handle_inheritance < 0)
 		restrict_handle_inheritance = core_restrict_inherited_handles;
 	/*
diff --git a/t/t7421-submodule-summary-add.sh b/t/t7421-submodule-summary-add.sh
index 59a9b00467dc..f00d69ca29ea 100755
--- a/t/t7421-submodule-summary-add.sh
+++ b/t/t7421-submodule-summary-add.sh
@@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
 	git commit -m "change submodule path" &&
 	rev=$(git -C sm rev-parse --short HEAD^) &&
 	git submodule summary HEAD^^ -- my-subm >actual 2>err &&
-	grep "fatal:.*my-subm" err &&
+	grep "my-subm" err &&
 	cat >expected <<-EOF &&
 	* my-subm ${rev}...0000000:

--
2.28.0.windows.1
Junio C Hamano Aug. 21, 2020, 4:35 p.m. UTC | #4
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Sadly, this breaks on Windows: on Linux (and before this patch, also on
> Windows), the error message reads somewhat like this:
>
> 	fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
>
> However, with the built-in `git submodule summary`, on Windows the error
> message reads like this:
>
> 	error: cannot spawn git: No such file or directory

I think a test that relies on platform-specific error string is a
bug.  It's like expecting an exact string out of strerror(), which
we had to fix a few times.

So I am not sure we would want to butcher compat/mingw.c only to
match such an expectation by a (buggy) test.
Shourya Shukla Aug. 21, 2020, 5:17 p.m. UTC | #5
> I think a test that relies on platform-specific error string is a
> bug.  It's like expecting an exact string out of strerror(), which
> we had to fix a few times.

> So I am not sure we would want to butcher compat/mingw.c only to
> match such an expectation by a (buggy) test.

Alright. That is understandable. What alternative do you suggest? Should
we change the check in the test?
Junio C Hamano Aug. 21, 2020, 6:09 p.m. UTC | #6
Shourya Shukla <shouryashukla.oo@gmail.com> writes:

>> I think a test that relies on platform-specific error string is a
>> bug.  It's like expecting an exact string out of strerror(), which
>> we had to fix a few times.
>
>> So I am not sure we would want to butcher compat/mingw.c only to
>> match such an expectation by a (buggy) test.
>
> Alright. That is understandable. What alternative do you suggest? Should
> we change the check in the test?

A buggy check should of course be changed.

It should be sufficient to ensure "git submodule summary" fails,
regardless of what exact error message it issues, no?

If the command does not exit with non-zero exit status, when it
gives a "fatal" error message, that may indicate another bug,
though.
Kaartic Sivaraam Aug. 21, 2020, 6:54 p.m. UTC | #7
On Fri, 2020-08-21 at 11:09 -0700, Junio C Hamano wrote:
> Shourya Shukla <shouryashukla.oo@gmail.com> writes:
> 
> > > I think a test that relies on platform-specific error string is a
> > > bug.  It's like expecting an exact string out of strerror(),
> > > which
> > > we had to fix a few times.
> > > So I am not sure we would want to butcher compat/mingw.c only to
> > > match such an expectation by a (buggy) test.
> > 
> > Alright. That is understandable. What alternative do you suggest?
> > Should
> > we change the check in the test?
> 
> A buggy check should of course be changed.
> 
> It should be sufficient to ensure "git submodule summary" fails,
> regardless of what exact error message it issues, no?
> 

Unfortunately, we can't do that here. See below.

> If the command does not exit with non-zero exit status, when it
> gives a "fatal" error message, that may indicate another bug,
> though.

Here's the error message with context of the trash directory of that
test:

-- 8< --
$ cd t
$ ./t7421-submodule-summary-add.sh  -d
$ cd trash\ directory.t7421-submodule-summary-add/

$ git submodule summary HEAD^^
fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
* my-subm 35b40f1...0000000:

* subm 0000000...dbd5fc8 (2):
  > add file2

-- >8 --

That 'fatal' is a consequence of spawning a process in
`verify_submodule_committish` of `builtin/submodule--helper.c` even for
non-existent submodules. I don't think that 'fatal: ' message is giving
any useful information here. The fact that submodule 'my-subm' doesn't
exist can easily be inferred just by looking at the destination mode
(0000000). If anything that 'fatal' message is just confusing and
unnecessary, IMO.

So, we could easily suppress it by doing something like this (while
also fixing the test):

-- 8< --
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 63ea39025d..d45be7fbdf 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -972,7 +972,7 @@ static char* verify_submodule_committish(const char *sm_path,
        strvec_pushf(&cp_rev_parse.args, "%s^0", committish);
        strvec_push(&cp_rev_parse.args, "--");
 
-       if (capture_command(&cp_rev_parse, &result, 0))
+       if (!is_directory(sm_path) || capture_command(&cp_rev_parse, &result, 0))
                return NULL;
 
        strbuf_trim_trailing_newline(&result);
diff --git a/t/t7421-submodule-summary-add.sh b/t/t7421-submodule-summary-add.sh
index 59a9b00467..8a2c2b38b6 100755
--- a/t/t7421-submodule-summary-add.sh
+++ b/t/t7421-submodule-summary-add.sh
@@ -58,7 +58,6 @@ test_expect_success 'submodule summary output for submodules with changed paths'
        git commit -m "change submodule path" &&
        rev=$(git -C sm rev-parse --short HEAD^) &&
        git submodule summary HEAD^^ -- my-subm >actual 2>err &&
-       grep "fatal:.*my-subm" err &&
        cat >expected <<-EOF &&
        * my-subm ${rev}...0000000:
 
-- >8 --

BTW, I noted that `print_submodule_summary` has the following
definition:

   static void print_submodule_summary(struct summary_cb *info, char* errmsg
   				    ...

Note how '*' is placed near 'char' for 'errmsg' with an incorrect style. Ditto
for the return type of `verify_submodule_committish`. This might make
for a nice cleanup patch.
Junio C Hamano Aug. 21, 2020, 7:54 p.m. UTC | #8
Kaartic Sivaraam <kaartic.sivaraam@gmail.com> writes:

> Here's the error message with context of the trash directory of that
> test:
>
> -- 8< --
> $ cd t
> $ ./t7421-submodule-summary-add.sh  -d
> $ cd trash\ directory.t7421-submodule-summary-add/
>
> $ git submodule summary HEAD^^
> fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
> * my-subm 35b40f1...0000000:
>
> * subm 0000000...dbd5fc8 (2):
>   > add file2
>
> -- >8 --
>
> That 'fatal' is a consequence of spawning a process in
> `verify_submodule_committish` of `builtin/submodule--helper.c` even for
> non-existent submodules.

Oh, so doing something that would cause the error message to be
emitted itself is a bug.

> I don't think that 'fatal: ' message is giving
> any useful information here. The fact that submodule 'my-subm' doesn't
> exist can easily be inferred just by looking at the destination mode
> (0000000). If anything that 'fatal' message is just confusing and
> unnecessary, IMO.

Yes, I 100% agree.

> So, we could easily suppress it by doing something like this (while
> also fixing the test):

Yup.  That is a very good idea.  

Or the caller of verify_submodule_committish() should refrain from
calling it for the path?  After all, it is checking sm_path is a
path to where a submodule should be before calling the function
(instead of calling it for every random path), iow its criteria to
make the call currently may be "the path in the index says it is a
submodule", but it should easily be updated to "the path in the
index says it is a submodule, and the submodule actually is
populated", right?

> @@ -972,7 +972,7 @@ static char* verify_submodule_committish(const char *sm_path,
>         strvec_pushf(&cp_rev_parse.args, "%s^0", committish);
> ...
> BTW, I noted that `print_submodule_summary` has the following
> definition:
>
>    static void print_submodule_summary(struct summary_cb *info, char* errmsg
>    				    ...
>
> Note how '*' is placed near 'char' for 'errmsg' with an incorrect style. Ditto
> for the return type of `verify_submodule_committish`. This might make
> for a nice cleanup patch.

Yup.  It would have been nicer to catch these before the topic hit
'next'.

Thanks.
Kaartic Sivaraam Aug. 23, 2020, 8:03 p.m. UTC | #9
On Fri, 2020-08-21 at 12:54 -0700, Junio C Hamano wrote:
> Kaartic Sivaraam <kaartic.sivaraam@gmail.com> writes:
> 
> > Here's the error message with context of the trash directory of that
> > test:
> > 
> > -- 8< --
> > $ cd t
> > $ ./t7421-submodule-summary-add.sh  -d
> > $ cd trash\ directory.t7421-submodule-summary-add/
> > 
> > $ git submodule summary HEAD^^
> > fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
> > * my-subm 35b40f1...0000000:
> > 
> > * subm 0000000...dbd5fc8 (2):
> >   > add file2
> > 
> > -- >8 --
> > 
> > That 'fatal' is a consequence of spawning a process in
> > `verify_submodule_committish` of `builtin/submodule--helper.c` even for
> > non-existent submodules.
> 
> Oh, so doing something that would cause the error message to be
> emitted itself is a bug.
> 

Exactly.

> > I don't think that 'fatal: ' message is giving
> > any useful information here. The fact that submodule 'my-subm' doesn't
> > exist can easily be inferred just by looking at the destination mode
> > (0000000). If anything that 'fatal' message is just confusing and
> > unnecessary, IMO.
> 
> Yes, I 100% agree.
> 
> > So, we could easily suppress it by doing something like this (while
> > also fixing the test):
> 
> Yup.  That is a very good idea.  
> 
> Or the caller of verify_submodule_committish() should refrain from
> calling it for the path?  After all, it is checking sm_path is a
> path to where a submodule should be before calling the function
> (instead of calling it for every random path), iow its criteria to
> make the call currently may be "the path in the index says it is a
> submodule", but it should easily be updated to "the path in the
> index says it is a submodule, and the submodule actually is
> populated", right?
> 

Ah, this reminds me of the initial version of the patch which did
exactly that. Quoting it here for reference:

+	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
+	if (is_nonbare_repository_dir(&sm_git_dir_sb))
+		is_sm_git_dir = 1;
+
+	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
+		missing_src = verify_submodule_object_name(p->sm_path,
+							   oid_to_hex(&p->oid_src));
+
+	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
+		missing_dst = verify_submodule_object_name(p->sm_path,
+							   oid_to_hex(&p->oid_dst));
+

Note: `verify_submodule_object_name` is now renamed to
`verify_submodule_committish`.

That does sound like a sane approach to me. There's not much point in
invoking `rev-parse` in a non-populated (a.k.a. de-initialized) or non-
existent submodule but we removed that check as we thought it was
unnecessary redundant because `capture_command` would fail anyway.
Looks like we failed to notice the additional `fatal` message fallout
then.

Also, I think it would be better to something like the following in
t7421 to ensure that `fatal` doesn't sneak up accidentally in the
future:

-- 8< --
diff --git t/t7421-submodule-summary-add.sh t/t7421-submodule-summary-add.sh
index 59a9b00467..b070f13714 100755
--- t/t7421-submodule-summary-add.sh
+++ t/t7421-submodule-summary-add.sh
@@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
        git commit -m "change submodule path" &&
        rev=$(git -C sm rev-parse --short HEAD^) &&
        git submodule summary HEAD^^ -- my-subm >actual 2>err &&
-       grep "fatal:.*my-subm" err &&
+       test_must_be_empty err &&
        cat >expected <<-EOF &&
        * my-subm ${rev}...0000000:
 
-- >8 --
Kaartic Sivaraam Aug. 23, 2020, 8:12 p.m. UTC | #10
On Mon, 2020-08-24 at 01:33 +0530, Kaartic Sivaraam wrote:
> On Fri, 2020-08-21 at 12:54 -0700, Junio C Hamano wrote:
> > Kaartic Sivaraam <kaartic.sivaraam@gmail.com> writes:
> > 
> > > So, we could easily suppress it by doing something like this (while
> > > also fixing the test):
> > 
> > Yup.  That is a very good idea.  
> > 
> > Or the caller of verify_submodule_committish() should refrain from
> > calling it for the path?  After all, it is checking sm_path is a
> > path to where a submodule should be before calling the function
> > (instead of calling it for every random path), iow its criteria to
> > make the call currently may be "the path in the index says it is a
> > submodule", but it should easily be updated to "the path in the
> > index says it is a submodule, and the submodule actually is
> > populated", right?
> > 
> 
> Ah, this reminds me of the initial version of the patch which did
> exactly that. Quoting it here for reference:
> 
> +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
> +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
> +		is_sm_git_dir = 1;
> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> +		missing_src = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_src));
> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> +		missing_dst = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_dst));
> +
> 
> Note: `verify_submodule_object_name` is now renamed to
> `verify_submodule_committish`.
> 
> That does sound like a sane approach to me. There's not much point in
> invoking `rev-parse` in a non-populated (a.k.a. de-initialized) or non-
> existent submodule but we removed that check as we thought it was
> unnecessary redundant because `capture_command` would fail anyway.
> Looks like we failed to notice the additional `fatal` message fallout
> then.
> 

Here's a link to the start of the relevant discussion, just in case:

https://lore.kernel.org/git/nycvar.QRO.7.76.6.2007031712160.50@tvgsbejvaqbjf.bet/

> Also, I think it would be better to something like the following in
> t7421 to ensure that `fatal` doesn't sneak up accidentally in the
> future:
> 
> -- 8< --
> diff --git t/t7421-submodule-summary-add.sh t/t7421-submodule-summary-add.sh
> index 59a9b00467..b070f13714 100755
> --- t/t7421-submodule-summary-add.sh
> +++ t/t7421-submodule-summary-add.sh
> @@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
>         git commit -m "change submodule path" &&
>         rev=$(git -C sm rev-parse --short HEAD^) &&
>         git submodule summary HEAD^^ -- my-subm >actual 2>err &&
> -       grep "fatal:.*my-subm" err &&
> +       test_must_be_empty err &&
>         cat >expected <<-EOF &&
>         * my-subm ${rev}...0000000:
>  
> -- >8 --
>
Shourya Shukla Aug. 24, 2020, 7:26 a.m. UTC | #11
On 24/08 01:33, Kaartic Sivaraam wrote:
> > Or the caller of verify_submodule_committish() should refrain from
> > calling it for the path?  After all, it is checking sm_path is a
> > path to where a submodule should be before calling the function
> > (instead of calling it for every random path), iow its criteria to
> > make the call currently may be "the path in the index says it is a
> > submodule", but it should easily be updated to "the path in the
> > index says it is a submodule, and the submodule actually is
> > populated", right?
> > 
> 
> Ah, this reminds me of the initial version of the patch which did
> exactly that. Quoting it here for reference:
> 
> +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
> +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
> +		is_sm_git_dir = 1;
> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> +		missing_src = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_src));
> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> +		missing_dst = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_dst));
> +
> 
> Note: `verify_submodule_object_name` is now renamed to
> `verify_submodule_committish`.
> 
> That does sound like a sane approach to me. There's not much point in
> invoking `rev-parse` in a non-populated (a.k.a. de-initialized) or non-
> existent submodule but we removed that check as we thought it was
> unnecessary redundant because `capture_command` would fail anyway.
> Looks like we failed to notice the additional `fatal` message fallout
> then.

This is what I have tried to implement after your suggestion:

-----8<-----
strbuf_addstr(&sb, p->sm_path);
	if (is_nonbare_repository_dir(&sb) && S_ISGITLINK(p->mod_src)) {
		src_abbrev = verify_submodule_committish(p->sm_path,
							 oid_to_hex(&p->oid_src));
		if (!src_abbrev) {
			missing_src = 1;
			/*
			 * As `rev-parse` failed, we fallback to getting
			 * the abbreviated hash using oid_src. We do
			 * this as we might still need the abbreviated
			 * hash in cases like a submodule type change, etc.
			 */
			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
		}
	} else {
		/*
		 * The source does not point to a submodule.
		 * So, we fallback to getting the abbreviation using
		 * oid_src as we might still need the abbreviated
		 * hash in cases like submodule add, etc.
		 */
		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
	}

	if (is_nonbare_repository_dir(&sb) && S_ISGITLINK(p->mod_dst)) {
		dst_abbrev = verify_submodule_committish(p->sm_path,
							 oid_to_hex(&p->oid_dst));
		if (!dst_abbrev) {
			missing_dst = 1;
			/*
			 * As `rev-parse` failed, we fallback to getting
			 * the abbreviated hash using oid_dst. We do
			 * this as we might still need the abbreviated
			 * hash in cases like a submodule type change, etc.
			 */
			dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
		}
	} else {
		/*
		 * The destination does not point to a submodule.
		 * So, we fallback to getting the abbreviation using
		 * oid_dst as we might still need the abbreviated
		 * hash in cases like a submodule removal, etc.
		 */
		dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
	}
----->8-----

That is, add another check along with the 'S_ISGITLINK()' one. Now, the
thing is that 'rev-list' (called just after this part) starts to bother
and comes up with its own 'fatal' that the directory rev-list does not
exist.

The thing is that 'missing{src,dst}' should be set to 1 in two cases:

    1. If the hash is not found, i.e, when 'verify_submodule..()'
       returns a NULL. Something which is happening right now as well.

    2. If the SM is not reachable for some reason (maybe it does not
       exist like in our case). Something which is NOT happening right
       now.

Or if having the same variable denote two things does not please you,
then we can create another variable for the second check BUT, we will
have to incorporate checking of that variable in the 

-----8<-----
if (!missing_src && !missing_dst) {
		struct child_process cp_rev_list = CHILD_PROCESS_INIT;
		struct strbuf sb_rev_list = STRBUF_INIT;

		strvec_pushl(&cp_rev_list.args, "rev-list",
			     "--first-parent", "--count", NULL);
                 .........
----->8-----

check.

This way, we hit two birds with one stone:

    1. Bypass the 'verify_submodule..()' call when the SM directory does
       not exist. We can then remove the is_directory() test from the
       'verify_submodule_..()' function.

    2. Avoid a 'rev-{parse,list}' fatal error message and thus pass all
       the tests successfully.

Therefore, the final outcome is something like this:

-----8<-----
	if (is_directory(p->sm_path) && S_ISGITLINK(p->mod_src)) {
		src_abbrev = verify_submodule_committish(p->sm_path,
							 oid_to_hex(&p->oid_src));
		if (!src_abbrev) {
			missing_src = 1;
			/*
			 * As `rev-parse` failed, we fallback to getting
			 * the abbreviated hash using oid_src. We do
			 * this as we might still need the abbreviated
			 * hash in cases like a submodule type change, etc.
			 */
			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
		}
	} else {
		missing_src = 1;
		/*
		 * The source does not point to a submodule.
		 * So, we fallback to getting the abbreviation using
		 * oid_src as we might still need the abbreviated
		 * hash in cases like submodule add, etc.
		 */
		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
	}

	if (is_directory(p->sm_path) && S_ISGITLINK(p->mod_dst)) {
		dst_abbrev = verify_submodule_committish(p->sm_path,
							 oid_to_hex(&p->oid_dst));
		if (!dst_abbrev) {
			missing_dst = 1;
			/*
			 * As `rev-parse` failed, we fallback to getting
			 * the abbreviated hash using oid_dst. We do
			 * this as we might still need the abbreviated
			 * hash in cases like a submodule type change, etc.
			 */
			dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
		}
	} else {
		missing_dst = 1;
		/*
		 * The destination does not point to a submodule.
		 * So, we fallback to getting the abbreviation using
		 * oid_dst as we might still need the abbreviated
		 * hash in cases like a submodule removal, etc.
		 */
		dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
	}
----->8-----

Or if is_directory() does not please you then we can make it
'is_nonbare_..()' too. The outcome will be unchanged.

What are your opinions on this?

> Also, I think it would be better to something like the following in
> t7421 to ensure that `fatal` doesn't sneak up accidentally in the
> future:
> 
> -- 8< --
> diff --git t/t7421-submodule-summary-add.sh t/t7421-submodule-summary-add.sh
> index 59a9b00467..b070f13714 100755
> --- t/t7421-submodule-summary-add.sh
> +++ t/t7421-submodule-summary-add.sh
> @@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
>         git commit -m "change submodule path" &&
>         rev=$(git -C sm rev-parse --short HEAD^) &&
>         git submodule summary HEAD^^ -- my-subm >actual 2>err &&
> -       grep "fatal:.*my-subm" err &&
> +       test_must_be_empty err &&
>         cat >expected <<-EOF &&
>         * my-subm ${rev}...0000000:
>  
> -- >8 --

Yes, this I will do.
Shourya Shukla Aug. 24, 2020, 8:46 a.m. UTC | #12
Or rather, we can do this:

-----8<-----
if (S_ISGITLINK(p->mod_src)) {
		struct strbuf sb = STRBUF_INIT;
		strbuf_addstr(&sb, p->sm_path);
		if (is_nonbare_repository_dir(&sb))
			src_abbrev = verify_submodule_committish(p->sm_path,
								                     oid_to_hex(&p->oid_src));
		strbuf_release(&sb);
		if (!src_abbrev) {
			missing_src = 1;
			/*
			 * As `rev-parse` failed, we fallback to getting
			 * the abbreviated hash using oid_src. We do
			 * this as we might still need the abbreviated
			 * hash in cases like a submodule type change, etc.
			 */
			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
		}
	} else {
		/*
		 * The source does not point to a submodule.
		 * So, we fallback to getting the abbreviation using
		 * oid_src as we might still need the abbreviated
		 * hash in cases like submodule add, etc.
		 */
		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
	}
----->8-----

Similarly for dst as well. This solution passes all the tests and does
not call 'verify_submodule_committish()' all the time. The previous
approach failed a couple of tests, this one seems fine to me.

How is this one?
Kaartic Sivaraam Aug. 24, 2020, 11:08 a.m. UTC | #13
On Mon, 2020-08-24 at 14:16 +0530, Shourya Shukla wrote:
> Or rather, we can do this:
> 
> -----8<-----
> if (S_ISGITLINK(p->mod_src)) {
> 		struct strbuf sb = STRBUF_INIT;
> 		strbuf_addstr(&sb, p->sm_path);
> 		if (is_nonbare_repository_dir(&sb))
> 			src_abbrev = verify_submodule_committish(p->sm_path,
> 								                     oid_to_hex(&p->oid_src));
> 		strbuf_release(&sb);
> 		if (!src_abbrev) {
> 			missing_src = 1;
> 			/*
> 			 * As `rev-parse` failed, we fallback to getting
> 			 * the abbreviated hash using oid_src. We do
> 			 * this as we might still need the abbreviated
> 			 * hash in cases like a submodule type change, etc.
> 			 */
> 			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
> 		}
> 	} else {
> 		/*
> 		 * The source does not point to a submodule.
> 		 * So, we fallback to getting the abbreviation using
> 		 * oid_src as we might still need the abbreviated
> 		 * hash in cases like submodule add, etc.
> 		 */
> 		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
> 	}
> ----->8-----
> 
> Similarly for dst as well. This solution passes all the tests and does
> not call 'verify_submodule_committish()' all the time. The previous
> approach failed a couple of tests, this one seems fine to me.
> 
> How is this one?
> 

This is more or less what I had in mind initially. But later after
being reminded about the fact that there's a code path which calls
`generate_submodule_summary` only when `is_nonbare_repository_dir`
succeeds, I realized any conditional that uses
`is_nonbare_repository_dir` or the likes of it would be confusing. So,
I think a better approach would be something like:

-- 8< --
diff --git builtin/submodule--helper.c builtin/submodule--helper.c
index 63ea39025d..b490108cd9 100644
--- builtin/submodule--helper.c
+++ builtin/submodule--helper.c
@@ -1036,7 +1036,7 @@ static void print_submodule_summary(struct summary_cb *info, char* errmsg,
 static void generate_submodule_summary(struct summary_cb *info,
                                       struct module_cb *p)
 {
-       char *displaypath, *src_abbrev, *dst_abbrev;
+       char *displaypath, *src_abbrev = NULL, *dst_abbrev;
        int missing_src = 0, missing_dst = 0;
        char *errmsg = NULL;
        int total_commits = -1;
@@ -1062,8 +1062,9 @@ static void generate_submodule_summary(struct summary_cb *info,
        }
 
        if (S_ISGITLINK(p->mod_src)) {
-               src_abbrev = verify_submodule_committish(p->sm_path,
-                                                        oid_to_hex(&p->oid_src));
+               if (p->status != 'D')
+                       src_abbrev = verify_submodule_committish(p->sm_path,
+                                                                oid_to_hex(&p->oid_src));
                if (!src_abbrev) {
                        missing_src = 1;
                        /*
diff --git t/t7421-submodule-summary-add.sh t/t7421-submodule-summary-add.sh
index 59a9b00467..b070f13714 100755
--- t/t7421-submodule-summary-add.sh
+++ t/t7421-submodule-summary-add.sh
@@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
        git commit -m "change submodule path" &&
        rev=$(git -C sm rev-parse --short HEAD^) &&
        git submodule summary HEAD^^ -- my-subm >actual 2>err &&
-       grep "fatal:.*my-subm" err &&
+       test_must_be_empty err &&
        cat >expected <<-EOF &&
        * my-subm ${rev}...0000000:
 
-- >8 --

I suggest this as the other code path that calls
`generate_submodule_summary` without going through the
`is_nonbare_repository_dir` condition is the one where we get
`p->status` as 'T' (typechange) or 'D' (deleted). We don't have to
worry about 'T' as we would want the hash for the new object anyway.
That leaves us with 'D' which we indeed have to handle.

Note that no such handling is required for the similar portion
corresponding to `dst_abbrev` as the conditional `if (S_ISGITLINK(p-
>mod_dst))` already guards the `verify_submodule_committish` when we
have a status of 'D'.
Shourya Shukla Aug. 24, 2020, 5:50 p.m. UTC | #14
On 24/08 04:38, Kaartic Sivaraam wrote:
> On Mon, 2020-08-24 at 14:16 +0530, Shourya Shukla wrote:
> > Or rather, we can do this:
> > 
> > -----8<-----
> > if (S_ISGITLINK(p->mod_src)) {
> > 		struct strbuf sb = STRBUF_INIT;
> > 		strbuf_addstr(&sb, p->sm_path);
> > 		if (is_nonbare_repository_dir(&sb))
> > 			src_abbrev = verify_submodule_committish(p->sm_path,
> > 								                     oid_to_hex(&p->oid_src));
> > 		strbuf_release(&sb);
> > 		if (!src_abbrev) {
> > 			missing_src = 1;
> > 			/*
> > 			 * As `rev-parse` failed, we fallback to getting
> > 			 * the abbreviated hash using oid_src. We do
> > 			 * this as we might still need the abbreviated
> > 			 * hash in cases like a submodule type change, etc.
> > 			 */
> > 			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
> > 		}
> > 	} else {
> > 		/*
> > 		 * The source does not point to a submodule.
> > 		 * So, we fallback to getting the abbreviation using
> > 		 * oid_src as we might still need the abbreviated
> > 		 * hash in cases like submodule add, etc.
> > 		 */
> > 		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
> > 	}
> > ----->8-----
> > 
> > Similarly for dst as well. This solution passes all the tests and does
> > not call 'verify_submodule_committish()' all the time. The previous
> > approach failed a couple of tests, this one seems fine to me.
> > 
> > How is this one?
> > 
> 
> This is more or less what I had in mind initially. But later after
> being reminded about the fact that there's a code path which calls
> `generate_submodule_summary` only when `is_nonbare_repository_dir`
> succeeds, I realized any conditional that uses
> `is_nonbare_repository_dir` or the likes of it would be confusing. So,
> I think a better approach would be something like:

Alright. I understand. The case for which we faced the problem got
called using this part:

		if (p->status == 'D' || p->status == 'T') {
			generate_submodule_summary(info, p);
			continue;
		}

But I understand your concern. I will change this.

> -- 8< --
> diff --git builtin/submodule--helper.c builtin/submodule--helper.c
> index 63ea39025d..b490108cd9 100644
> --- builtin/submodule--helper.c
> +++ builtin/submodule--helper.c
> @@ -1036,7 +1036,7 @@ static void print_submodule_summary(struct summary_cb *info, char* errmsg,
>  static void generate_submodule_summary(struct summary_cb *info,
>                                        struct module_cb *p)
>  {
> -       char *displaypath, *src_abbrev, *dst_abbrev;
> +       char *displaypath, *src_abbrev = NULL, *dst_abbrev;
>         int missing_src = 0, missing_dst = 0;
>         char *errmsg = NULL;
>         int total_commits = -1;
> @@ -1062,8 +1062,9 @@ static void generate_submodule_summary(struct summary_cb *info,
>         }
>  
>         if (S_ISGITLINK(p->mod_src)) {
> -               src_abbrev = verify_submodule_committish(p->sm_path,
> -                                                        oid_to_hex(&p->oid_src));
> +               if (p->status != 'D')
> +                       src_abbrev = verify_submodule_committish(p->sm_path,
> +                                                                oid_to_hex(&p->oid_src));
>                 if (!src_abbrev) {
>                         missing_src = 1;
>                         /*
> diff --git t/t7421-submodule-summary-add.sh t/t7421-submodule-summary-add.sh
> index 59a9b00467..b070f13714 100755
> --- t/t7421-submodule-summary-add.sh
> +++ t/t7421-submodule-summary-add.sh
> @@ -58,7 +58,7 @@ test_expect_success 'submodule summary output for submodules with changed paths'
>         git commit -m "change submodule path" &&
>         rev=$(git -C sm rev-parse --short HEAD^) &&
>         git submodule summary HEAD^^ -- my-subm >actual 2>err &&
> -       grep "fatal:.*my-subm" err &&
> +       test_must_be_empty err &&
>         cat >expected <<-EOF &&
>         * my-subm ${rev}...0000000:
>  
> -- >8 --
> 
> I suggest this as the other code path that calls
> `generate_submodule_summary` without going through the
> `is_nonbare_repository_dir` condition is the one where we get
> `p->status` as 'T' (typechange) or 'D' (deleted). We don't have to
> worry about 'T' as we would want the hash for the new object anyway.
> That leaves us with 'D' which we indeed have to handle.

Oh you did mention it here. Yeah, this is perfect.

> Note that no such handling is required for the similar portion
> corresponding to `dst_abbrev` as the conditional `if (S_ISGITLINK(p-
> >mod_dst))` already guards the `verify_submodule_committish` when we
> have a status of 'D'.

Sure I will keep this in mind.
Junio C Hamano Aug. 24, 2020, 5:54 p.m. UTC | #15
(A few miniscule things I noticed that are irrelevant to the code
structure discussion).

> +static char* verify_submodule_committish(const char *sm_path,

Style: in C, asterisk sticks to the identifier, not type, i.e.

    static char *verify_submodule_committish(const char *sm_path, ...);

> +static void print_submodule_summary(struct summary_cb *info, char* errmsg,

Likewise; "char *errmsg".

> +static void generate_submodule_summary(struct summary_cb *info,
> +				       struct module_cb *p)
> +{
> +	char *displaypath, *src_abbrev, *dst_abbrev;
> +	int missing_src = 0, missing_dst = 0;
> +	char *errmsg = NULL;
> +	int total_commits = -1;
> +
> +	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
> +		if (S_ISGITLINK(p->mod_dst)) {
> +...
> +		} else {
> +			/* for a submodule removal (mode:0000000), don't warn */
> +			if (p->mod_dst)
> +				warning(_("unexpected mode %d\n"), p->mod_dst);
> +		}
> +	}

Nobody can read mode bits written in decimal.  Use "%o" instead,
perhaps?

Thanks.
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index a03dc84ea4..63ea39025d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -927,6 +927,434 @@  static int module_name(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+struct module_cb {
+	unsigned int mod_src;
+	unsigned int mod_dst;
+	struct object_id oid_src;
+	struct object_id oid_dst;
+	char status;
+	const char *sm_path;
+};
+#define MODULE_CB_INIT { 0, 0, NULL, NULL, '\0', NULL }
+
+struct module_cb_list {
+	struct module_cb **entries;
+	int alloc, nr;
+};
+#define MODULE_CB_LIST_INIT { NULL, 0, 0 }
+
+struct summary_cb {
+	int argc;
+	const char **argv;
+	const char *prefix;
+	unsigned int cached: 1;
+	unsigned int for_status: 1;
+	unsigned int files: 1;
+	int summary_limit;
+};
+#define SUMMARY_CB_INIT { 0, NULL, NULL, 0, 0, 0, 0 }
+
+enum diff_cmd {
+	DIFF_INDEX,
+	DIFF_FILES
+};
+
+static char* verify_submodule_committish(const char *sm_path,
+					 const char *committish)
+{
+	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
+	struct strbuf result = STRBUF_INIT;
+
+	cp_rev_parse.git_cmd = 1;
+	cp_rev_parse.dir = sm_path;
+	prepare_submodule_repo_env(&cp_rev_parse.env_array);
+	strvec_pushl(&cp_rev_parse.args, "rev-parse", "-q", "--short", NULL);
+	strvec_pushf(&cp_rev_parse.args, "%s^0", committish);
+	strvec_push(&cp_rev_parse.args, "--");
+
+	if (capture_command(&cp_rev_parse, &result, 0))
+		return NULL;
+
+	strbuf_trim_trailing_newline(&result);
+	return strbuf_detach(&result, NULL);
+}
+
+static void print_submodule_summary(struct summary_cb *info, char* errmsg,
+				    int total_commits, const char *displaypath,
+				    const char *src_abbrev, const char *dst_abbrev,
+				    int missing_src, int missing_dst,
+				    struct module_cb *p)
+{
+	if (p->status == 'T') {
+		if (S_ISGITLINK(p->mod_dst))
+			printf(_("* %s %s(blob)->%s(submodule)"),
+				 displaypath, src_abbrev, dst_abbrev);
+		else
+			printf(_("* %s %s(submodule)->%s(blob)"),
+				 displaypath, src_abbrev, dst_abbrev);
+	} else {
+		printf("* %s %s...%s",
+			displaypath, src_abbrev, dst_abbrev);
+	}
+
+	if (total_commits < 0)
+		printf(":\n");
+	else
+		printf(" (%d):\n", total_commits);
+
+	if (errmsg) {
+		printf(_("%s"), errmsg);
+	} else if (total_commits > 0) {
+		struct child_process cp_log = CHILD_PROCESS_INIT;
+
+		cp_log.git_cmd = 1;
+		cp_log.dir = p->sm_path;
+		prepare_submodule_repo_env(&cp_log.env_array);
+		strvec_pushl(&cp_log.args, "log", NULL);
+
+		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
+			if (info->summary_limit > 0)
+				strvec_pushf(&cp_log.args, "-%d",
+					     info->summary_limit);
+
+			strvec_pushl(&cp_log.args, "--pretty=  %m %s",
+				     "--first-parent", NULL);
+			strvec_pushf(&cp_log.args, "%s...%s",
+				     src_abbrev, dst_abbrev);
+		} else if (S_ISGITLINK(p->mod_dst)) {
+			strvec_pushl(&cp_log.args, "--pretty=  > %s",
+				     "-1", dst_abbrev, NULL);
+		} else {
+			strvec_pushl(&cp_log.args, "--pretty=  < %s",
+				     "-1", src_abbrev, NULL);
+		}
+		run_command(&cp_log);
+	}
+	printf("\n");
+}
+
+static void generate_submodule_summary(struct summary_cb *info,
+				       struct module_cb *p)
+{
+	char *displaypath, *src_abbrev, *dst_abbrev;
+	int missing_src = 0, missing_dst = 0;
+	char *errmsg = NULL;
+	int total_commits = -1;
+
+	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
+		if (S_ISGITLINK(p->mod_dst)) {
+			struct ref_store *refs = get_submodule_ref_store(p->sm_path);
+			if (refs)
+				refs_head_ref(refs, handle_submodule_head_ref, &p->oid_dst);
+		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
+			struct stat st;
+			int fd = open(p->sm_path, O_RDONLY);
+
+			if (fd < 0 || fstat(fd, &st) < 0 ||
+			    index_fd(&the_index, &p->oid_dst, fd, &st, OBJ_BLOB,
+				     p->sm_path, 0))
+				error(_("couldn't hash object from '%s'"), p->sm_path);
+		} else {
+			/* for a submodule removal (mode:0000000), don't warn */
+			if (p->mod_dst)
+				warning(_("unexpected mode %d\n"), p->mod_dst);
+		}
+	}
+
+	if (S_ISGITLINK(p->mod_src)) {
+		src_abbrev = verify_submodule_committish(p->sm_path,
+							 oid_to_hex(&p->oid_src));
+		if (!src_abbrev) {
+			missing_src = 1;
+			/*
+			 * As `rev-parse` failed, we fallback to getting
+			 * the abbreviated hash using oid_src. We do
+			 * this as we might still need the abbreviated
+			 * hash in cases like a submodule type change, etc.
+			 */
+			src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
+		}
+	} else {
+		/*
+		 * The source does not point to a submodule.
+		 * So, we fallback to getting the abbreviation using
+		 * oid_src as we might still need the abbreviated
+		 * hash in cases like submodule add, etc.
+		 */
+		src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
+	}
+
+	if (S_ISGITLINK(p->mod_dst)) {
+		dst_abbrev = verify_submodule_committish(p->sm_path,
+							 oid_to_hex(&p->oid_dst));
+		if (!dst_abbrev) {
+			missing_dst = 1;
+			/*
+			 * As `rev-parse` failed, we fallback to getting
+			 * the abbreviated hash using oid_dst. We do
+			 * this as we might still need the abbreviated
+			 * hash in cases like a submodule type change, etc.
+			 */
+			dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
+		}
+	} else {
+		/*
+		 * The destination does not point to a submodule.
+		 * So, we fallback to getting the abbreviation using
+		 * oid_dst as we might still need the abbreviated
+		 * hash in cases like a submodule removal, etc.
+		 */
+		dst_abbrev = xstrndup(oid_to_hex(&p->oid_dst), 7);
+	}
+
+	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
+
+	if (!missing_src && !missing_dst) {
+		struct child_process cp_rev_list = CHILD_PROCESS_INIT;
+		struct strbuf sb_rev_list = STRBUF_INIT;
+
+		strvec_pushl(&cp_rev_list.args, "rev-list",
+			     "--first-parent", "--count", NULL);
+		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
+			strvec_pushf(&cp_rev_list.args, "%s...%s",
+				     src_abbrev, dst_abbrev);
+		else
+			strvec_push(&cp_rev_list.args, S_ISGITLINK(p->mod_src) ?
+				    src_abbrev : dst_abbrev);
+		strvec_push(&cp_rev_list.args, "--");
+
+		cp_rev_list.git_cmd = 1;
+		cp_rev_list.dir = p->sm_path;
+		prepare_submodule_repo_env(&cp_rev_list.env_array);
+
+		if (!capture_command(&cp_rev_list, &sb_rev_list, 0))
+			total_commits = atoi(sb_rev_list.buf);
+
+		strbuf_release(&sb_rev_list);
+	} else {
+		/*
+		 * Don't give error msg for modification whose dst is not
+		 * submodule, i.e., deleted or changed to blob
+		 */
+		if (S_ISGITLINK(p->mod_dst)) {
+			struct strbuf errmsg_str = STRBUF_INIT;
+			if (missing_src && missing_dst) {
+				strbuf_addf(&errmsg_str, "  Warn: %s doesn't contain commits %s and %s\n",
+					    displaypath, oid_to_hex(&p->oid_src),
+					    oid_to_hex(&p->oid_dst));
+			} else {
+				strbuf_addf(&errmsg_str, "  Warn: %s doesn't contain commit %s\n",
+					    displaypath, missing_src ?
+					    oid_to_hex(&p->oid_src) :
+					    oid_to_hex(&p->oid_dst));
+			}
+			errmsg = strbuf_detach(&errmsg_str, NULL);
+		}
+	}
+
+	print_submodule_summary(info, errmsg, total_commits,
+				displaypath, src_abbrev,
+				dst_abbrev, missing_src,
+				missing_dst, p);
+
+	free(displaypath);
+	free(src_abbrev);
+	free(dst_abbrev);
+}
+
+static void prepare_submodule_summary(struct summary_cb *info,
+				      struct module_cb_list *list)
+{
+	int i;
+	for (i = 0; i < list->nr; i++) {
+		const struct submodule *sub;
+		struct module_cb *p = list->entries[i];
+		struct strbuf sm_gitdir = STRBUF_INIT;
+
+		if (p->status == 'D' || p->status == 'T') {
+			generate_submodule_summary(info, p);
+			continue;
+		}
+
+		if (info->for_status && p->status != 'A' &&
+		    (sub = submodule_from_path(the_repository,
+					       &null_oid, p->sm_path))) {
+			char *config_key = NULL;
+			const char *value;
+			int ignore_all = 0;
+
+			config_key = xstrfmt("submodule.%s.ignore",
+					     sub->name);
+			if (!git_config_get_string_const(config_key, &value))
+				ignore_all = !strcmp(value, "all");
+			else if (sub->ignore)
+				ignore_all = !strcmp(sub->ignore, "all");
+
+			free(config_key);
+			if (ignore_all)
+				continue;
+		}
+
+		/* Also show added or modified modules which are checked out */
+		strbuf_addstr(&sm_gitdir, p->sm_path);
+		if (is_nonbare_repository_dir(&sm_gitdir))
+			generate_submodule_summary(info, p);
+		strbuf_release(&sm_gitdir);
+	}
+}
+
+static void submodule_summary_callback(struct diff_queue_struct *q,
+				       struct diff_options *options,
+				       void *data)
+{
+	int i;
+	struct module_cb_list *list = data;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		struct module_cb *temp;
+
+		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
+			continue;
+		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
+		temp->mod_src = p->one->mode;
+		temp->mod_dst = p->two->mode;
+		temp->oid_src = p->one->oid;
+		temp->oid_dst = p->two->oid;
+		temp->status = p->status;
+		temp->sm_path = xstrdup(p->one->path);
+
+		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
+		list->entries[list->nr++] = temp;
+	}
+}
+
+static const char *get_diff_cmd(enum diff_cmd diff_cmd)
+{
+	switch (diff_cmd) {
+	case DIFF_INDEX: return "diff-index";
+	case DIFF_FILES: return "diff-files";
+	default: BUG("bad diff_cmd value %d", diff_cmd);
+	}
+}
+
+static int compute_summary_module_list(struct object_id *head_oid,
+				       struct summary_cb *info,
+				       enum diff_cmd diff_cmd)
+{
+	struct strvec diff_args = STRVEC_INIT;
+	struct rev_info rev;
+	struct module_cb_list list = MODULE_CB_LIST_INIT;
+
+	strvec_push(&diff_args, get_diff_cmd(diff_cmd));
+	if (info->cached)
+		strvec_push(&diff_args, "--cached");
+	strvec_pushl(&diff_args, "--ignore-submodules=dirty", "--raw", NULL);
+	if (head_oid)
+		strvec_push(&diff_args, oid_to_hex(head_oid));
+	strvec_push(&diff_args, "--");
+	if (info->argc)
+		strvec_pushv(&diff_args, info->argv);
+
+	git_config(git_diff_basic_config, NULL);
+	init_revisions(&rev, info->prefix);
+	rev.abbrev = 0;
+	precompose_argv(diff_args.nr, diff_args.v);
+	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = submodule_summary_callback;
+	rev.diffopt.format_callback_data = &list;
+
+	if (!info->cached) {
+		if (diff_cmd == DIFF_INDEX)
+			setup_work_tree();
+		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
+			perror("read_cache_preload");
+			return -1;
+		}
+	} else if (read_cache() < 0) {
+		perror("read_cache");
+		return -1;
+	}
+
+	if (diff_cmd == DIFF_INDEX)
+		run_diff_index(&rev, info->cached);
+	else
+		run_diff_files(&rev, 0);
+	prepare_submodule_summary(info, &list);
+	strvec_clear(&diff_args);
+	return 0;
+}
+
+static int module_summary(int argc, const char **argv, const char *prefix)
+{
+	struct summary_cb info = SUMMARY_CB_INIT;
+	int cached = 0;
+	int for_status = 0;
+	int files = 0;
+	int summary_limit = -1;
+	enum diff_cmd diff_cmd = DIFF_INDEX;
+	struct object_id head_oid;
+	int ret;
+
+	struct option module_summary_options[] = {
+		OPT_BOOL(0, "cached", &cached,
+			 N_("use the commit stored in the index instead of the submodule HEAD")),
+		OPT_BOOL(0, "files", &files,
+			 N_("to compare the commit in the index with that in the submodule HEAD")),
+		OPT_BOOL(0, "for-status", &for_status,
+			 N_("skip submodules with 'ignore_config' value set to 'all'")),
+		OPT_INTEGER('n', "summary-limit", &summary_limit,
+			     N_("limit the summary size")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_summary_options,
+			     git_submodule_helper_usage, 0);
+
+	if (!summary_limit)
+		return 0;
+
+	if (!get_oid(argc ? argv[0] : "HEAD", &head_oid)) {
+		if (argc) {
+			argv++;
+			argc--;
+		}
+	} else if (!argc || !strcmp(argv[0], "HEAD")) {
+		/* before the first commit: compare with an empty tree */
+		oidcpy(&head_oid, the_hash_algo->empty_tree);
+		if (argc) {
+			argv++;
+			argc--;
+		}
+	} else {
+		if (get_oid("HEAD", &head_oid))
+			die(_("could not fetch a revision for HEAD"));
+	}
+
+	if (files) {
+		if (cached)
+			die(_("--cached and --files are mutually exclusive"));
+		diff_cmd = DIFF_FILES;
+	}
+
+	info.argc = argc;
+	info.argv = argv;
+	info.prefix = prefix;
+	info.cached = !!cached;
+	info.files = !!files;
+	info.for_status = !!for_status;
+	info.summary_limit = summary_limit;
+
+	ret = compute_summary_module_list((diff_cmd == DIFF_INDEX) ? &head_oid : NULL,
+					  &info, diff_cmd);
+	return ret;
+}
+
 struct sync_cb {
 	const char *prefix;
 	unsigned int flags;
@@ -2341,6 +2769,7 @@  static struct cmd_struct commands[] = {
 	{"print-default-remote", print_default_remote, 0},
 	{"sync", module_sync, SUPPORT_SUPER_PREFIX},
 	{"deinit", module_deinit, 0},
+	{"summary", module_summary, SUPPORT_SUPER_PREFIX},
 	{"remote-branch", resolve_remote_submodule_branch, 0},
 	{"push-check", push_check, 0},
 	{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
diff --git a/git-submodule.sh b/git-submodule.sh
index 43eb6051d2..6fb12585cb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -59,31 +59,6 @@  die_if_unmatched ()
 	fi
 }
 
-#
-# Print a submodule configuration setting
-#
-# $1 = submodule name
-# $2 = option name
-# $3 = default value
-#
-# Checks in the usual git-config places first (for overrides),
-# otherwise it falls back on .gitmodules.  This allows you to
-# distribute project-wide defaults in .gitmodules, while still
-# customizing individual repositories if necessary.  If the option is
-# not in .gitmodules either, print a default value.
-#
-get_submodule_config () {
-	name="$1"
-	option="$2"
-	default="$3"
-	value=$(git config submodule."$name"."$option")
-	if test -z "$value"
-	then
-		value=$(git submodule--helper config submodule."$name"."$option")
-	fi
-	printf '%s' "${value:-$default}"
-}
-
 isnumber()
 {
 	n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
@@ -831,166 +806,7 @@  cmd_summary() {
 		shift
 	done
 
-	test $summary_limit = 0 && return
-
-	if rev=$(git rev-parse -q --verify --default HEAD ${1+"$1"})
-	then
-		head=$rev
-		test $# = 0 || shift
-	elif test -z "$1" || test "$1" = "HEAD"
-	then
-		# before the first commit: compare with an empty tree
-		head=$(git hash-object -w -t tree --stdin </dev/null)
-		test -z "$1" || shift
-	else
-		head="HEAD"
-	fi
-
-	if [ -n "$files" ]
-	then
-		test -n "$cached" &&
-		die "$(gettext "The --cached option cannot be used with the --files option")"
-		diff_cmd=diff-files
-		head=
-	fi
-
-	cd_to_toplevel
-	eval "set $(git rev-parse --sq --prefix "$wt_prefix" -- "$@")"
-	# Get modified modules cared by user
-	modules=$(git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- "$@" |
-		sane_egrep '^:([0-7]* )?160000' |
-		while read -r mod_src mod_dst sha1_src sha1_dst status sm_path
-		do
-			# Always show modules deleted or type-changed (blob<->module)
-			if test "$status" = D || test "$status" = T
-			then
-				printf '%s\n' "$sm_path"
-				continue
-			fi
-			# Respect the ignore setting for --for-status.
-			if test -n "$for_status"
-			then
-				name=$(git submodule--helper name "$sm_path")
-				ignore_config=$(get_submodule_config "$name" ignore none)
-				test $status != A && test $ignore_config = all && continue
-			fi
-			# Also show added or modified modules which are checked out
-			GIT_DIR="$sm_path/.git" git rev-parse --git-dir >/dev/null 2>&1 &&
-			printf '%s\n' "$sm_path"
-		done
-	)
-
-	test -z "$modules" && return
-
-	git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- $modules |
-	sane_egrep '^:([0-7]* )?160000' |
-	cut -c2- |
-	while read -r mod_src mod_dst sha1_src sha1_dst status name
-	do
-		if test -z "$cached" &&
-			is_zero_oid $sha1_dst
-		then
-			case "$mod_dst" in
-			160000)
-				sha1_dst=$(GIT_DIR="$name/.git" git rev-parse HEAD)
-				;;
-			100644 | 100755 | 120000)
-				sha1_dst=$(git hash-object $name)
-				;;
-			000000)
-				;; # removed
-			*)
-				# unexpected type
-				eval_gettextln "unexpected mode \$mod_dst" >&2
-				continue ;;
-			esac
-		fi
-		missing_src=
-		missing_dst=
-
-		test $mod_src = 160000 &&
-		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
-		missing_src=t
-
-		test $mod_dst = 160000 &&
-		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
-		missing_dst=t
-
-		display_name=$(git submodule--helper relative-path "$name" "$wt_prefix")
-
-		total_commits=
-		case "$missing_src,$missing_dst" in
-		t,)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
-			;;
-		,t)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
-			;;
-		t,t)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$sha1_dst")"
-			;;
-		*)
-			errmsg=
-			total_commits=$(
-			if test $mod_src = 160000 && test $mod_dst = 160000
-			then
-				range="$sha1_src...$sha1_dst"
-			elif test $mod_src = 160000
-			then
-				range=$sha1_src
-			else
-				range=$sha1_dst
-			fi
-			GIT_DIR="$name/.git" \
-			git rev-list --first-parent $range -- | wc -l
-			)
-			total_commits=" ($(($total_commits + 0)))"
-			;;
-		esac
-
-		sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
-			echo $sha1_src | cut -c1-7)
-		sha1_abbr_dst=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_dst 2>/dev/null ||
-			echo $sha1_dst | cut -c1-7)
-
-		if test $status = T
-		then
-			blob="$(gettext "blob")"
-			submodule="$(gettext "submodule")"
-			if test $mod_dst = 160000
-			then
-				echo "* $display_name $sha1_abbr_src($blob)->$sha1_abbr_dst($submodule)$total_commits:"
-			else
-				echo "* $display_name $sha1_abbr_src($submodule)->$sha1_abbr_dst($blob)$total_commits:"
-			fi
-		else
-			echo "* $display_name $sha1_abbr_src...$sha1_abbr_dst$total_commits:"
-		fi
-		if test -n "$errmsg"
-		then
-			# Don't give error msg for modification whose dst is not submodule
-			# i.e. deleted or changed to blob
-			test $mod_dst = 160000 && echo "$errmsg"
-		else
-			if test $mod_src = 160000 && test $mod_dst = 160000
-			then
-				limit=
-				test $summary_limit -gt 0 && limit="-$summary_limit"
-				GIT_DIR="$name/.git" \
-				git log $limit --pretty='format:  %m %s' \
-				--first-parent $sha1_src...$sha1_dst
-			elif test $mod_dst = 160000
-			then
-				GIT_DIR="$name/.git" \
-				git log --pretty='format:  > %s' -1 $sha1_dst
-			else
-				GIT_DIR="$name/.git" \
-				git log --pretty='format:  < %s' -1 $sha1_src
-			fi
-			echo
-		fi
-		echo
-	done
+	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper summary ${prefix:+--prefix "$prefix"} ${files:+--files} ${cached:+--cached} ${for_status:+--for-status} ${summary_limit:+-n $summary_limit} -- "$@"
 }
 #
 # List all submodules, prefixed with:
diff --git a/t/t7421-submodule-summary-add.sh b/t/t7421-submodule-summary-add.sh
index 829fe26d6d..59a9b00467 100755
--- a/t/t7421-submodule-summary-add.sh
+++ b/t/t7421-submodule-summary-add.sh
@@ -58,7 +58,7 @@  test_expect_success 'submodule summary output for submodules with changed paths'
 	git commit -m "change submodule path" &&
 	rev=$(git -C sm rev-parse --short HEAD^) &&
 	git submodule summary HEAD^^ -- my-subm >actual 2>err &&
-	test_i18ngrep "fatal:.*my-subm" err &&
+	grep "fatal:.*my-subm" err &&
 	cat >expected <<-EOF &&
 	* my-subm ${rev}...0000000: