diff mbox series

[v5,3/3] ls-files.c: add --deduplicate option

Message ID e9c5318670658b032ba921129859f9fb3b2ca017.1611037846.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series builtin/ls-files.c:add git ls-file --dedup option | expand

Commit Message

ZheNing Hu Jan. 19, 2021, 6:30 a.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 32 ++++++++++++++---
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+), 5 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

Comments

Junio C Hamano Jan. 20, 2021, 9:26 p.m. UTC | #1
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  
>  		construct_fullname(&fullname, repo, ce);
>  
> +		if (skipping_duplicates && last_shown_ce &&
> +			!strcmp(last_shown_ce->name,ce->name))
> +				continue;

Style.  Missing SP after comma.

>  		if ((dir->flags & DIR_SHOW_IGNORED) &&
>  			!ce_excluded(dir, repo->index, fullname.buf, ce))
>  			continue;
>  		if (ce->ce_flags & CE_UPDATE)
>  			continue;
>  		if (show_cached || show_stage) {
> +			if (skipping_duplicates && last_shown_ce &&
> +				!strcmp(last_shown_ce->name,ce->name))
> +					continue;

OK.  When show_stage is set, skipping_duplicates is automatically
turned off (and show_unmerged is automatically covered as it turns
show_stage on automatically).  So this feature has really become
"are we showing only names, and if so, did we show an entry of the
same name before?".

>  			if (!show_unmerged || ce_stage(ce))
>  				show_ce(repo, dir, ce, fullname.buf,
>  					ce_stage(ce) ? tag_unmerged :
>  					(ce_skip_worktree(ce) ? tag_skip_worktree :
>  						tag_cached));
> +			if (show_cached && skipping_duplicates)
> +				last_shown_ce = ce;

The code that calls show_ce() belonging to a totally separate if()
statement makes my stomach hurt---how are we going to guarantee that
"last shown" really will keep track of what was shown last?

Shouldn't the above be more like this?

- 			if (!show_unmerged || ce_stage(ce))
+ 			if (!show_unmerged || ce_stage(ce)) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+				last_shown_ce = ce;
+			}

It does maintain last_shown_ce even when skipping_duplicates is not
set, but I think that is overall win.  Assigning unconditionally
would be cheaper than making a conditional jump on the variable and
make assignment (or not).

>  		}
>  		if (ce_skip_worktree(ce))
>  			continue;
> +		if (skipping_duplicates && last_shown_ce &&
> +			!strcmp(last_shown_ce->name,ce->name))
> +				continue;

Style.  Missing SP after comma.

OK, if we've shown an entry of the same name under skip-duplicates
mode, and the code that follows will show the same entry (if they
decide to show it), so we can go to the next entry early.

>  		err = lstat(fullname.buf, &st);
>  		if (err) {
> -			if (errno != ENOENT && errno != ENOTDIR)
> -				error_errno("cannot lstat '%s'", fullname.buf);
> -			if (show_deleted)
> +			if (skipping_duplicates && show_deleted && show_modified)
>  				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> -			if (show_modified)
> -				show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			else {
> +				if (errno != ENOENT && errno != ENOTDIR)
> +					error_errno("cannot lstat '%s'", fullname.buf);
> +				if (show_deleted)
> +					show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +				if (show_modified)
> +					show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			}
>  		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
>  			show_ce(repo, dir, ce, fullname.buf, tag_modified);

This part will change shape quite a bit when we follow the
suggestion I made on 1/3, so I won't analyze how correct this
version is.

> +		last_shown_ce = ce;
>  	}
>  
>  	strbuf_release(&fullname);
> @@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>  			N_("pretend that paths removed since <tree-ish> are still present")),
>  		OPT__ABBREV(&abbrev),
>  		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> +		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
>  		OPT_END()
>  	};
>  
> @@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>  		 * you also show the stage information.
>  		 */
>  		show_stage = 1;
> +	if (show_tag || show_stage)
> +		skipping_duplicates = 0;

OK.

>  	if (dir.exclude_per_dir)
>  		exc_given = 1;
>  

Thanks.
ZheNing Hu Jan. 21, 2021, 11 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2021年1月21日周四 上午5:26写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >
> >               construct_fullname(&fullname, repo, ce);
> >
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
Get it.
>
> >               if ((dir->flags & DIR_SHOW_IGNORED) &&
> >                       !ce_excluded(dir, repo->index, fullname.buf, ce))
> >                       continue;
> >               if (ce->ce_flags & CE_UPDATE)
> >                       continue;
> >               if (show_cached || show_stage) {
> > +                     if (skipping_duplicates && last_shown_ce &&
> > +                             !strcmp(last_shown_ce->name,ce->name))
> > +                                     continue;
>
> OK.  When show_stage is set, skipping_duplicates is automatically
> turned off (and show_unmerged is automatically covered as it turns
> show_stage on automatically).  So this feature has really become
> "are we showing only names, and if so, did we show an entry of the
> same name before?".
Yeah,showing only names,so I yesterday ask such question :)
>
> >                       if (!show_unmerged || ce_stage(ce))
> >                               show_ce(repo, dir, ce, fullname.buf,
> >                                       ce_stage(ce) ? tag_unmerged :
> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >                                               tag_cached));
> > +                     if (show_cached && skipping_duplicates)
> > +                             last_shown_ce = ce;
>
> The code that calls show_ce() belonging to a totally separate if()
> statement makes my stomach hurt---how are we going to guarantee that
> "last shown" really will keep track of what was shown last?
>
> Shouldn't the above be more like this?
>
> -                       if (!show_unmerged || ce_stage(ce))
> +                       if (!show_unmerged || ce_stage(ce)) {
>                                 show_ce(repo, dir, ce, fullname.buf,
>                                         ce_stage(ce) ? tag_unmerged :
>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
>                                                 tag_cached));
> +                               last_shown_ce = ce;
> +                       }
>
well,I am also thinking about this question :"last_shown_ce" is not true
last shown ce,but may be If "last_shown_ce" truly seen every last shown
ce ,We may need more cumbersome logic to make the program correct.
I have tried the processing method of your above code before, but found
 that some errors may have occurred.
> It does maintain last_shown_ce even when skipping_duplicates is not
> set, but I think that is overall win.  Assigning unconditionally
> would be cheaper than making a conditional jump on the variable and
> make assignment (or not).
>
> >               }
> >               if (ce_skip_worktree(ce))
> >                       continue;
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
>
> OK, if we've shown an entry of the same name under skip-duplicates
> mode, and the code that follows will show the same entry (if they
> decide to show it), so we can go to the next entry early.
>
> >               err = lstat(fullname.buf, &st);
> >               if (err) {
> > -                     if (errno != ENOENT && errno != ENOTDIR)
> > -                             error_errno("cannot lstat '%s'", fullname.buf);
> > -                     if (show_deleted)
> > +                     if (skipping_duplicates && show_deleted && show_modified)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > -                     if (show_modified)
> > -                             show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     else {
> > +                             if (errno != ENOENT && errno != ENOTDIR)
> > +                                     error_errno("cannot lstat '%s'", fullname.buf);
> > +                             if (show_deleted)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     }
> >               } else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> >                       show_ce(repo, dir, ce, fullname.buf, tag_modified);
>
> This part will change shape quite a bit when we follow the
> suggestion I made on 1/3, so I won't analyze how correct this
> version is.
>
Fine...
> > +             last_shown_ce = ce;
> >       }
> >
> >       strbuf_release(&fullname);
> > @@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                       N_("pretend that paths removed since <tree-ish> are still present")),
> >               OPT__ABBREV(&abbrev),
> >               OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> > +             OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
> >               OPT_END()
> >       };
> >
> > @@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                * you also show the stage information.
> >                */
> >               show_stage = 1;
> > +     if (show_tag || show_stage)
> > +             skipping_duplicates = 0;
>
> OK.
>
> >       if (dir.exclude_per_dir)
> >               exc_given = 1;
> >
>
> Thanks.

Thanks,Junio,I find my PR in gitgitgadget have been accepted.
By the way,
I found the problem "leftoverbit" and "good first issue" on gitgitgadget
It may not have been updated for a long time, and most of the above
may have been resolved.

Should it do an update?
Then we can happily be a "bounty hunter" in the git community, haha!
Junio C Hamano Jan. 21, 2021, 8:45 p.m. UTC | #3
胡哲宁 <adlternative@gmail.com> writes:

>> OK.  When show_stage is set, skipping_duplicates is automatically
>> turned off (and show_unmerged is automatically covered as it turns
>> show_stage on automatically).  So this feature has really become
>> "are we showing only names, and if so, did we show an entry of the
>> same name before?".
> Yeah,showing only names,so I yesterday ask such question :)
>>
>> >                       if (!show_unmerged || ce_stage(ce))
>> >                               show_ce(repo, dir, ce, fullname.buf,
>> >                                       ce_stage(ce) ? tag_unmerged :
>> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
>> >                                               tag_cached));
>> > +                     if (show_cached && skipping_duplicates)
>> > +                             last_shown_ce = ce;
>>
>> The code that calls show_ce() belonging to a totally separate if()
>> statement makes my stomach hurt---how are we going to guarantee that
>> "last shown" really will keep track of what was shown last?
>>
>> Shouldn't the above be more like this?
>>
>> -                       if (!show_unmerged || ce_stage(ce))
>> +                       if (!show_unmerged || ce_stage(ce)) {
>>                                 show_ce(repo, dir, ce, fullname.buf,
>>                                         ce_stage(ce) ? tag_unmerged :
>>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
>>                                                 tag_cached));
>> +                               last_shown_ce = ce;
>> +                       }
>>
> well,I am also thinking about this question :"last_shown_ce" is not true
> last shown ce,but may be If "last_shown_ce" truly seen every last shown
> ce ,We may need more cumbersome logic to make the program correct.
> I have tried the processing method of your above code before, but found
>  that some errors may have occurred.

I think judicious use of "goto" without introducing the last_shown
would probably result in a much more maintainable code.  It may look
somewhat like so:

	for (i = 0; i < repo->index->cache_nr; i++) {
		const struct cache_entry *ce = repo->index->cache[i];
		struct stat st;
		int stat_err;

		construct_fullname(&fullname, repo, ce);

		if ((dir->flags & DIR_SHOW_IGNORED) &&
			!ce_excluded(dir, repo->index, fullname.buf, ce))
			continue;
		if (ce->ce_flags & CE_UPDATE)
			continue;
		if ((show_cached || show_stage) &&
		    (!show_unmerged || ce_stage(ce))) {
			show_ce(repo, dir, ce, fullname.buf,
				ce_stage(ce) ? tag_unmerged :
				(ce_skip_worktree(ce) ? tag_skip_worktree :
				 tag_cached));
			if (skip_duplicates)
				goto skip_to_next_name;
		}

		if (!show_deleted && !show_modified)
			continue;
		if (ce_skip_worktree(ce))
			continue;
		stat_err = lstat(fullname.buf, &st);
		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
			error_errno("cannot lstat '%s'", fullname.buf);

		if (show_deleted) {
			show_ce(repo, dir, ce, fullname.buf, tag_removed);
			if (skip_duplicates)
				goto skip_to_next_name;
		}
		if (show_modified &&
		    (stat_err || ie_modified(repo->index, ce, &st, 0)))
			show_ce(repo, dir, ce, fullname.buf, tag_modified);
		continue;

	skip_to_next_name:
		{
			int j;
			const struct cache_entry **cache = repo->index->cache;
			for (j = i + 1; j < repo->index->cache_nr; j++)
				if (strcmp(ce->ce_name, cache[j]->ce_name))
					break;
			i = j - 1; /* compensate for outer for loop */
		}
	}
ZheNing Hu Jan. 22, 2021, 9:50 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> 于2021年1月22日周五 上午4:45写道:
>
> 胡哲宁 <adlternative@gmail.com> writes:
>
> >> OK.  When show_stage is set, skipping_duplicates is automatically
> >> turned off (and show_unmerged is automatically covered as it turns
> >> show_stage on automatically).  So this feature has really become
> >> "are we showing only names, and if so, did we show an entry of the
> >> same name before?".
> > Yeah,showing only names,so I yesterday ask such question :)
> >>
> >> >                       if (!show_unmerged || ce_stage(ce))
> >> >                               show_ce(repo, dir, ce, fullname.buf,
> >> >                                       ce_stage(ce) ? tag_unmerged :
> >> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >> >                                               tag_cached));
> >> > +                     if (show_cached && skipping_duplicates)
> >> > +                             last_shown_ce = ce;
> >>
> >> The code that calls show_ce() belonging to a totally separate if()
> >> statement makes my stomach hurt---how are we going to guarantee that
> >> "last shown" really will keep track of what was shown last?
> >>
> >> Shouldn't the above be more like this?
> >>
> >> -                       if (!show_unmerged || ce_stage(ce))
> >> +                       if (!show_unmerged || ce_stage(ce)) {
> >>                                 show_ce(repo, dir, ce, fullname.buf,
> >>                                         ce_stage(ce) ? tag_unmerged :
> >>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
> >>                                                 tag_cached));
> >> +                               last_shown_ce = ce;
> >> +                       }
> >>
> > well,I am also thinking about this question :"last_shown_ce" is not true
> > last shown ce,but may be If "last_shown_ce" truly seen every last shown
> > ce ,We may need more cumbersome logic to make the program correct.
> > I have tried the processing method of your above code before, but found
> >  that some errors may have occurred.
>
> I think judicious use of "goto" without introducing the last_shown
> would probably result in a much more maintainable code.  It may look
> somewhat like so:
>
>         for (i = 0; i < repo->index->cache_nr; i++) {
>                 const struct cache_entry *ce = repo->index->cache[i];
>                 struct stat st;
>                 int stat_err;
>
>                 construct_fullname(&fullname, repo, ce);
>
>                 if ((dir->flags & DIR_SHOW_IGNORED) &&
>                         !ce_excluded(dir, repo->index, fullname.buf, ce))
>                         continue;
>                 if (ce->ce_flags & CE_UPDATE)
>                         continue;
>                 if ((show_cached || show_stage) &&
>                     (!show_unmerged || ce_stage(ce))) {
>                         show_ce(repo, dir, ce, fullname.buf,
>                                 ce_stage(ce) ? tag_unmerged :
>                                 (ce_skip_worktree(ce) ? tag_skip_worktree :
>                                  tag_cached));
>                         if (skip_duplicates)
>                                 goto skip_to_next_name;
>                 }
>
>                 if (!show_deleted && !show_modified)
>                         continue;
>                 if (ce_skip_worktree(ce))
>                         continue;
>                 stat_err = lstat(fullname.buf, &st);
>                 if (stat_err && (errno != ENOENT && errno != ENOTDIR))
>                         error_errno("cannot lstat '%s'", fullname.buf);
>
>                 if (show_deleted) {
>                         show_ce(repo, dir, ce, fullname.buf, tag_removed);
>                         if (skip_duplicates)
>                                 goto skip_to_next_name;
>                 }
>                 if (show_modified &&
>                     (stat_err || ie_modified(repo->index, ce, &st, 0)))
>                         show_ce(repo, dir, ce, fullname.buf, tag_modified);
>                 continue;
>
>         skip_to_next_name:
>                 {
>                         int j;
>                         const struct cache_entry **cache = repo->index->cache;
>                         for (j = i + 1; j < repo->index->cache_nr; j++)
>                                 if (strcmp(ce->ce_name, cache[j]->ce_name))
>                                         break;
>                         i = j - 1; /* compensate for outer for loop */
>                 }
>         }
I have to admit that this is indeed a good way to skip with "goto".
Thanks for your help.
And should I still use gitgitgadget PR on my origin branch "dedup"or
send patch on branch "zh/ls-files-deduplicate"?
Johannes Schindelin Jan. 22, 2021, 4:04 p.m. UTC | #5
Hi 胡哲宁,

On Fri, 22 Jan 2021, 胡哲宁 wrote:

> And should I still use gitgitgadget PR on my origin branch "dedup"or
> send patch on branch "zh/ls-files-deduplicate"?

The way GitGitGadget is designed asks for contributors to adjust their
patch(es) via interactive rebase, implementing the suggestions and
addressing the concerns while doing so, then force-pushing, optionally
amending the first PR comment (i.e. the description) with a list of
those changes, and then submitting a new iteration via `/submit`.

Ciao,
Johannes
Junio C Hamano Jan. 22, 2021, 6:02 p.m. UTC | #6
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> And should I still use gitgitgadget PR on my origin branch "dedup"or
>> send patch on branch "zh/ls-files-deduplicate"?
>
> The way GitGitGadget is designed asks for contributors to adjust their
> patch(es) via interactive rebase, implementing the suggestions and
> addressing the concerns while doing so, then force-pushing, optionally
> amending the first PR comment (i.e. the description) with a list of
> those changes, and then submitting a new iteration via `/submit`.

Thanks for clearly explaining the rules.

As I suspect many people are afraid of forcing their pushes, it
would assure them to explain that it is OK to force when them
restart the series from scratch by replacing the commits.

And it would very much help on the receiving end when the
description gets updated.

Just being curious, but when a series hits 'next', would the way in
which the user interacts with GGG change?  With or without GGG, what
is done on the local side is not all that different---you build new
commits on top without disturbing the commits that are in 'next'.
Then what?  Push it again (this time there is no need to force) and
submit the additional ones via `/submit`?

Thanks.
ZheNing Hu Jan. 23, 2021, 8:20 a.m. UTC | #7
Hi Johannes Schindelin,
Thanks for prompt me how to choose, maybe I was a little confused
about the git workflow before, "zh/ls-files-deduplicate" this kind of
 branch I don't need to operate,right?

Well, then I will modify my code on the original branch.

Johannes Schindelin <Johannes.Schindelin@gmx.de> 于2021年1月23日周六 上午12:04写道:
>
> Hi 胡哲宁,
>
> On Fri, 22 Jan 2021, 胡哲宁 wrote:
>
> > And should I still use gitgitgadget PR on my origin branch "dedup"or
> > send patch on branch "zh/ls-files-deduplicate"?
>
> The way GitGitGadget is designed asks for contributors to adjust their
> patch(es) via interactive rebase, implementing the suggestions and
> addressing the concerns while doing so, then force-pushing, optionally
> amending the first PR comment (i.e. the description) with a list of
> those changes, and then submitting a new iteration via `/submit`.
>
> Ciao,
> Johannes
Johannes Schindelin March 19, 2021, 1:54 p.m. UTC | #8
Hi Junio,

I just noticed that this still waited in my inbox for me to answer it.

On Fri, 22 Jan 2021, Junio C Hamano wrote:

> Just being curious, but when a series hits 'next', would the way in
> which the user interacts with GGG change?

My hunch is that we should probably tell new users (for who GitGitGadget
now uses the "new user" PR label) about the expectations of only adding
patches on top (i.e. in a new PR), unless the branch gets kicked out of
`next`.

> With or without GGG, what is done on the local side is not all that
> different---you build new commits on top without disturbing the commits
> that are in 'next'. Then what?  Push it again (this time there is no
> need to force) and submit the additional ones via `/submit`?

GitGitGadget would send the entire patch series, which is probably not a
good idea.

Ciao,
Dscho
Junio C Hamano March 19, 2021, 6:11 p.m. UTC | #9
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 22 Jan 2021, Junio C Hamano wrote:
>
>> Just being curious, but when a series hits 'next', would the way in
>> which the user interacts with GGG change?
>
> My hunch is that we should probably tell new users (for who GitGitGadget
> now uses the "new user" PR label) about the expectations of only adding
> patches on top (i.e. in a new PR), unless the branch gets kicked out of
> `next`.
>
>> With or without GGG, what is done on the local side is not all that
>> different---you build new commits on top without disturbing the commits
>> that are in 'next'. Then what?  Push it again (this time there is no
>> need to force) and submit the additional ones via `/submit`?
>
> GitGitGadget would send the entire patch series, which is probably not a
> good idea.

Thanks for a clarification.

While we are on the topic of GGG, if I may ask for a new feature or
two (or perhaps such a feature already exists), it would be nice if
contributors are allowed to tweak who are CC'ed in the outgoing
patch mail in various ways:

 - I may author a commit as <gitster@work.addre.ss> and make a pull
   request on GitHub, but the <gitster@pobox.com> is the address
   associated with the GitHub account making the pull request.  I
   think GGG sends CC to the author (at work) as well as me, but I
   may prefer to get correspondence on the patch at either one of my
   addresses not both.  "Mr GGG, please compute the CC list the
   normal way, and drop this address from the CC list" that I can
   say when I say "/submit" might be a good way to do so.

 - Also, at "/submit" time, being able to say "Also CC: these
   addresses" would be a good feature, without contaminating the
   commit log message with CC: trailer lines.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..d11c8ade402 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@  SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@  OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1454ab1ae6f..709d727c574 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@  static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_shown_ce;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -314,6 +316,7 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 	}
 	if (! (show_cached || show_stage || show_deleted || show_modified))
 		return;
+	last_shown_ce = NULL;
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -321,30 +324,46 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 
 		construct_fullname(&fullname, repo, ce);
 
+		if (skipping_duplicates && last_shown_ce &&
+			!strcmp(last_shown_ce->name,ce->name))
+				continue;
 		if ((dir->flags & DIR_SHOW_IGNORED) &&
 			!ce_excluded(dir, repo->index, fullname.buf, ce))
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
 		if (show_cached || show_stage) {
+			if (skipping_duplicates && last_shown_ce &&
+				!strcmp(last_shown_ce->name,ce->name))
+					continue;
 			if (!show_unmerged || ce_stage(ce))
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (show_cached && skipping_duplicates)
+				last_shown_ce = ce;
 		}
 		if (ce_skip_worktree(ce))
 			continue;
+		if (skipping_duplicates && last_shown_ce &&
+			!strcmp(last_shown_ce->name,ce->name))
+				continue;
 		err = lstat(fullname.buf, &st);
 		if (err) {
-			if (errno != ENOENT && errno != ENOTDIR)
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (show_deleted)
+			if (skipping_duplicates && show_deleted && show_modified)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified)
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else {
+				if (errno != ENOENT && errno != ENOTDIR)
+					error_errno("cannot lstat '%s'", fullname.buf);
+				if (show_deleted)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified)
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		last_shown_ce = ce;
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..2682b1f43a6
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@ 
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done