diff mbox series

[v3] ls-files.c: add --dedup option

Message ID pull.832.v3.git.1610626942677.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] ls-files.c: add --dedup option | expand

Commit Message

ZheNing Hu Jan. 14, 2021, 12:22 p.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--dedup` option will suppress
some duplicate options under some conditions.

In a merge conflict, one item of "git ls-files" output may
appear multiple times. For example,now the file `a.c` has
a conflict,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --dedup" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` in
the same time, The `--dedup` option can also suppress modified
entries output.

`--dedup` option relevant descriptions in
`Documentation/git-ls-files.txt`,
the test script in `t/t3012-ls-files-dedup.sh`
prove the correctness of the `--dedup` option.

this patch fixed:
https://github.com/gitgitgadget/git/issues/198
Thanks.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    builtin/ls-files.c:add git ls-file --dedup option
    
    I am reading the source code of git ls-files and learned that git ls
    -files may have duplicate entries when conflict occurs in a branch merge
    or when different options are used at the same time. Users may fell
    confuse when they see these duplicate entries.
    
    As Junio C Hamano said ,it have odd behaviour.
    
    Therefore, we can provide an additional option to git ls-files to delete
    those repeated information.
    
    This fixes https://github.com/gitgitgadget/git/issues/198
    
    Thanks!

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v2:

 1:  0261e5d245e < -:  ----------- builtin/ls-files.c:add git ls-file --dedup option
 2:  a09a5098aa6 ! 1:  5ce52c8b7a4 builtin:ls-files.c:add git ls-file --dedup option
     @@ Metadata
      Author: ZheNing Hu <adlternative@gmail.com>
      
       ## Commit message ##
     -    builtin:ls-files.c:add git ls-file --dedup option
     +    ls-files.c: add --dedup option
      
     -    This commit standardizes the code format.
     -    For git ls-file --dedup option added
     -    relevant descriptions in Documentation/git-ls-files.txt
     -    and wrote t/t3012-ls-files-dedup.sh test script
     -    to prove the correctness of--dedup option.
     +    In order to provide users a better experience
     +    when viewing information about files in the index
     +    and the working tree, the `--dedup` option will suppress
     +    some duplicate options under some conditions.
      
     -    this patch fixed: https://github.com/gitgitgadget/git/issues/198
     +    In a merge conflict, one item of "git ls-files" output may
     +    appear multiple times. For example,now the file `a.c` has
     +    a conflict,`a.c` will appear three times in the output of
     +    "git ls-files".We can use "git ls-files --dedup" to output
     +    `a.c` only one time.(unless `--stage` or `--unmerged` is
     +    used to view all the detailed information in the index)
     +
     +    In addition, if you use both `--delete` and `--modify` in
     +    the same time, The `--dedup` option can also suppress modified
     +    entries output.
     +
     +    `--dedup` option relevant descriptions in
     +    `Documentation/git-ls-files.txt`,
     +    the test script in `t/t3012-ls-files-dedup.sh`
     +    prove the correctness of the `--dedup` option.
     +
     +    this patch fixed:
     +    https://github.com/gitgitgadget/git/issues/198
          Thanks.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
     @@ Documentation/git-ls-files.txt: OPTIONS
       	See OUTPUT below for more information.
       
      +--dedup::
     -+	Suppress duplicates entries when conflicts happen or
     -+	specify -d -m at the same time.
     ++	Suppress duplicate entries when conflict happen or `--deleted`
     ++	and `--modified` are combined.
     ++
       -x <pattern>::
       --exclude=<pattern>::
       	Skip untracked files matching pattern.
      
       ## builtin/ls-files.c ##
     +@@ builtin/ls-files.c: static int line_terminator = '\n';
     + static int debug_mode;
     + static int show_eol;
     + static int recurse_submodules;
     ++static int delete_dup;
     + 
     + static const char *prefix;
     + static int max_prefix_len;
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
       {
       	int i;
       	struct strbuf fullname = STRBUF_INIT;
     --	const struct cache_entry *last_stage=NULL;
      +	const struct cache_entry *last_stage = NULL;
       
       	/* For cached/deleted files we don't need to even do the readdir */
       	if (show_others || show_killed) {
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 	if (show_cached || show_stage) {
       		for (i = 0; i < repo->index->cache_nr; i++) {
       			const struct cache_entry *ce = repo->index->cache[i];
     --			if(show_cached && delete_dup){
     -+
     + 
      +			if (show_cached && delete_dup) {
     - 				switch (ce_stage(ce)) {
     - 				case 0:
     - 				default:
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 					if (last_stage &&
     - 					!strcmp(last_stage->name, ce->name))
     - 						continue;
     --					last_stage=ce;
     ++				switch (ce_stage(ce)) {
     ++				case 0:
     ++				default:
     ++					break;
     ++				case 1:
     ++				case 2:
     ++				case 3:
     ++					if (last_stage &&
     ++					!strcmp(last_stage->name, ce->name))
     ++						continue;
      +					last_stage = ce;
     - 				}
     - 			}
     ++				}
     ++			}
       			construct_fullname(&fullname, repo, ce);
     + 
     + 			if ((dir->flags & DIR_SHOW_IGNORED) &&
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 			const struct cache_entry *ce = repo->index->cache[i];
       			struct stat st;
       			int err;
     --			if(delete_dup){
     -+
     + 
      +			if (delete_dup) {
     - 				switch (ce_stage(ce)) {
     - 				case 0:
     - 				default:
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 					if (last_stage &&
     - 					!strcmp(last_stage->name, ce->name))
     - 						continue;
     --					last_stage=ce;
     ++				switch (ce_stage(ce)) {
     ++				case 0:
     ++				default:
     ++					break;
     ++				case 1:
     ++				case 2:
     ++				case 3:
     ++					if (last_stage &&
     ++					!strcmp(last_stage->name, ce->name))
     ++						continue;
      +					last_stage = ce;
     - 				}
     - 			}
     ++				}
     ++			}
       			construct_fullname(&fullname, repo, ce);
     + 
     + 			if ((dir->flags & DIR_SHOW_IGNORED) &&
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
       			if (ce_skip_worktree(ce))
       				continue;
       			err = lstat(fullname.buf, &st);
     --			if(delete_dup && show_deleted && show_modified && err)
     +-			if (show_deleted && err)
      +			if (delete_dup && show_deleted && show_modified && err)
       				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --			else{
     --				if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
     +-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
     +-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
      +			else {
      +				if (show_deleted && err)
     - 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     - 				if (show_modified && ie_modified(repo->index, ce, &st, 0))
     - 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++				if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++			}
     + 		}
     + 	}
     + 
      @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
       			N_("pretend that paths removed since <tree-ish> are still present")),
       		OPT__ABBREV(&abbrev),
       		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
     --		OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),
      +		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
       		OPT_END()
       	};
     @@ t/t3012-ls-files-dedup.sh (new)
      +
      +. ./test-lib.sh
      +
     -+test_expect_success 'master branch setup and write expect1 expect2 and commit' '
     -+	touch a.txt &&
     -+	touch b.txt &&
     -+	touch delete.txt &&
     -+	cat <<-EOF >expect1 &&
     ++test_expect_success 'setup' '
     ++	> a.txt &&
     ++	> b.txt &&
     ++	> delete.txt &&
     ++	cat >expect1<<-\EOF &&
      +	M a.txt
      +	H b.txt
      +	H delete.txt
      +	H expect1
      +	H expect2
      +	EOF
     -+	cat <<-EOF >expect2 &&
     ++	cat >expect2<<-EOF &&
      +	C a.txt
      +	R delete.txt
      +	EOF
      +	git add a.txt b.txt delete.txt expect1 expect2 &&
     -+	git commit -m master:1
     -+'
     -+
     -+test_expect_success 'main commit again' '
     ++	git commit -m master:1 &&
      +	echo a>a.txt &&
      +	echo b>b.txt &&
     -+	echo delete>delete.txt &&
     ++	echo delete >delete.txt &&
      +	git add a.txt b.txt delete.txt &&
     -+	git commit -m master:2
     -+'
     -+
     -+test_expect_success 'dev commit' '
     ++	git commit -m master:2 &&
      +	git checkout HEAD~ &&
      +	git switch -c dev &&
     -+	echo change>a.txt &&
     ++	echo change >a.txt &&
      +	git add a.txt &&
     -+	git commit -m dev:1
     -+'
     -+
     -+test_expect_success 'dev merge master' '
     ++	git commit -m dev:1 &&
      +	test_must_fail git merge master &&
      +	git ls-files -t --dedup >actual1 &&
      +	test_cmp expect1 actual1 &&


 Documentation/git-ls-files.txt |  5 ++++
 builtin/ls-files.c             | 41 ++++++++++++++++++++++++--
 t/t3012-ls-files-dedup.sh      | 54 ++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+), 3 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707

Comments

Junio C Hamano Jan. 15, 2021, 12:59 a.m. UTC | #1
"阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--dedup` option will suppress
> some duplicate options under some conditions.
>
> In a merge conflict, one item of "git ls-files" output may
> appear multiple times. For example,now the file `a.c` has
> a conflict,`a.c` will appear three times in the output of
> "git ls-files".We can use "git ls-files --dedup" to output
> `a.c` only one time.(unless `--stage` or `--unmerged` is
> used to view all the detailed information in the index)

Unlike these option names we see in the description, "dedup" is not
a full word.  Perhaps spell it fully "--deduplicate" while letting
parse-options machinery to accept unique prefix (including
"--dedup"?

> In addition, if you use both `--delete` and `--modify` in
> the same time, The `--dedup` option can also suppress modified

"at the same time", I think.

> entries output.

[let's call this point "point A"]

> `--dedup` option relevant descriptions in
> `Documentation/git-ls-files.txt`,

I am not sure what this means.

> the test script in `t/t3012-ls-files-dedup.sh`
> prove the correctness of the `--dedup` option.

No amount of tests "proves" any correctness, but that is OK.  I
think you meant to say "a few tests have been added to t3012 to
protect the new feature from future breakage" or something like
that.

In any case, I think everything after "point A" and before your sign
off does not belong to the log message.  The diffstat shows that
documentation and tests have been added already.

> +--dedup::
> +	Suppress duplicate entries when conflict happen

"conflict happen" -> "there are unmerged paths", as the term
"unmerged" is already shown to readers of "ls-files --help".

> +	or `--deleted` and `--modified` are combined.

I somehow thought that you refrained from deduping when you are
showing the stages with "ls-files -u" and "ls-files -s", or you are
showing status with "ls-files -t", because you will otherwise lose
information.  In other words, showing only one cache entry out of
many that share the same name makes sense only when we are showing
name and nothing else.

Has that been changed from the previous rounds?

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index c8eae899b82..bc4eded19ab 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  		for (i = 0; i < repo->index->cache_nr; i++) {
>  			const struct cache_entry *ce = repo->index->cache[i];
>  
> +			if (show_cached && delete_dup) {
> +				switch (ce_stage(ce)) {
> +				case 0:
> +				default:
> +					break;

This part looks somewhat strange for two reasons:

 - The code enumerates ALL the possible stage numbers from 0 to 3;
   if we were to have "default", I'd expect it would be a separate
   switch arm from the possible values that calls out an programming
   error, e.g. BUG("at stage #%d???", ce_stage(ce)).  Simply removing
   the "default" arm would be another way out of this strangeness.

 - When we see a stage #0 entry, we know we will not have higher
   stage entries with the same name.  Not clearing last_stage here
   feels wrong, as the primary reason why last_stage variable is
   used is to remember the last ce that was shown, so that other
   entries with the same name can be skipped.

By the way, "last_shown_ce" may be a much better name for the
variable, as you do not really care what stage number the ce you
showed last was at (you care about its name).

Also, I do not see a good reason why the last_shown_ce variable
should have lifetime longer than the block that contains this for()
loop (and the other for loop for deleted/modified codepath we see
later).  Especially since you initialize the variable that you made
visible to the entire function to NULL before entering the first for
loop, but you do not set it back to NULL before entering the second
for loop, it is inviting a subtle bug.  You may have been given
show_cached and show_modified at the same time, so you enter the
first loop and have shown the first stage of the last conflicted
path, whose cache entry is left in the last_stage variable.  Since
the variable has longer lifespan than necessary, when the second
loop is entered, it still points at the cache entry for the highest
stage of the last conflicted path.  That is because the code forgets
to clear it to NULL before entering the second for loop.

Having said all that, I suspect that we may be much better off if we
can somehow merge the two loops into one.  You may be dedup adjacent
entries in each loop separately with the approach taken by this
patch, but I do not think the patch would work to deduplicate across
two loops.  For example, what happens if you do this?

    $ git reset --hard
    $ echo >>builtin/ls-files.c
    $ git ls-files -c -m builtin/ls-files.c
    $ git ls-files -t -c -m builtin/ls-files.c

I think you see the path twice in the output, with or without your
--dedup option (remember what I said about proving, by the way? ;-)).

Once we successfully merged two loops into one, the part that shows
tracked paths in the function would have only one loop, and it would
become a lot cleaner to add the logic to "skip showing the ce if it
has the same name as the previously shown one, only when doing so
won't lose information", by doing something like this:

	static void show_files(....)
	{
		/* show_others || show_killed done here */
		...

		/* leave early if not showing anything */
		if (! (show_cached || show_stage || show_deleted || show_modified))
			return;

		last_shown_ce = NULL;
		for (i = 0; i < repo->index->cache_nr; i++) {
			const struct cache_entry *ce = repo->index->cache[i];

			if (skipping_duplicates && last_shown_ce)
				if (!strcmp(ce->name, last_shown_ce->name))
					continue;

			construct_fullname();

                        /* various reasons to skip the entry tested */
			if (showing ignored directory and ce is excluded)
				continue;
			if (show_unmerged && !ce_stage(ce))
				continue;
			if (ce->ce_flags & CE_UPDATE)
				continue;
			... other reasons may appear here ...

			/* now we are committed to show it */
			last_shown_ce = ce;

			... various different ways to show ce come here ... 
			show_ce(...);
		}
	}

where "skipping_duplicates" would be set when "--deduplicate" is
asked and we are not showing information other than the pathname
via various options e.g. the tags (-t) or stages (-s/-u).

> +			if (delete_dup && show_deleted && show_modified && err)
>  				show_ce(repo, dir, ce, fullname.buf, tag_removed);

I actually think the original code that is still shown here ...

> +			else {
> +				if (show_deleted && err)
> +					show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +				if (show_modified && ie_modified(repo->index, ce, &st, 0))

... about modified file is buggy.  If lstat() failed, then &st has
no useful information, so it is wrong to feed it to ie_modified().

Perhaps a three-patch series that is structured like this may be in
order?

 #1: bugfix for --deleted and --modified.

	err = lstat(fullname.buf, &st);
	if (err) {
		/* deleted? */
		if (errno != E_NOENT)
			error_errno("cannot lstat '%s'", fullname.buf);
		else {
			if (show_deleted)
                        	show_ce(..., tag_removed);
			if (show_modified)
                        	show_ce(..., tag_modified);
		}
	} else if (show_modified && ie_modified(...))
		show_ce(..., tag_modified);
    
     This hopefully should not change the semantics.  If you ask
     --deleted and --modified, a deleted path would be listed twice.

 #2: consolidate two for loops into one.

     The two loops have slightly different condition to skip a ce,
     and different logic on what tag each path is shown with.  When
     --cached and --modified or --deleted are asked for at the same
     time, we'd show them multiple times (this is done inside the
     loop for each ce)

	if (show_cached || show_stage)
		show_ce(... ce_stage(ce) ? tag_unmerged : ...);
	err = lstat(fullname.buf, &st);
	if (err) {
        	/* deleted? */
                ... code that corresponds to the
		... illustration in #1 above come here.
	} else if (...)
		show_ce(..., tag_modified);

     This changes the semantics.  The original iterates the index
     twice, so you may see the same entry from --cached once and
     then again from --modified.  The updated one still will show
     the same entry twice but next to each other.

 #3: optionally deduplicate.

     Once we have a single loop, deduplicationg based on names is
     trivial, as we seen before.


Hmm?
Eric Sunshine Jan. 16, 2021, 7:13 a.m. UTC | #2
On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--dedup` option will suppress
> some duplicate options under some conditions.
> [...]

I have a few very minor comments alongside Junio's review comments...

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
> diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> @@ -0,0 +1,54 @@
> +test_description='git ls-files --dedup test.
> +
> +This test prepares the following in the cache:
> +
> +    a.txt       - a file(base)
> +    a.txt      - a file(master)
> +    a.txt       - a file(dev)
> +    b.txt       - a file
> +    delete.txt  - a file
> +    expect1    - a file
> +    expect2    - a file
> +
> +'

This test script description is outdated now. Perhaps shorten it to:

    test_description='ls-files dedup tests'

Or, it might be suitable to simply add the new test to the existing
t3004-ls-files-basic.sh instead.

> +test_expect_success 'setup' '
> +       > a.txt &&
> +       > b.txt &&
> +       > delete.txt &&
> +       cat >expect1<<-\EOF &&

Style nits: no space after redirection operator and a space before
redirection operator:

    >a.txt &&
    >b.txt &&
    >delete.txt &&
    cat >expect1 <<-\EOF &&

> +       cat >expect2<<-EOF &&

Nit: missing the backslash (and wrong spacing):

    cat >expect2 <<-\EOF &&

> +       echo a>a.txt &&
> +       echo b>b.txt &&

Style:

    echo a >a.txt &&
    echo b >b.txt &&

> +       echo delete >delete.txt &&
> +       git add a.txt b.txt delete.txt &&
> +       git commit -m master:2 &&
> +       git checkout HEAD~ &&
> +       git switch -c dev &&

If someone adds a new test after this test, then that new test will
run in the "dev" branch, which might be unexpected or undesirable. It
often is a good idea to ensure that tests do certain types of cleanup
to avoid breaking subsequent tests. Here, it would be a good idea to
ensure that the test switches back to the original branch when it
finishes (regardless of whether it finishes successfully or
unsuccessfully).

    git switch -c dev &&
    test_when_finished "git switch master" &&

Or you could use `git switch -` if you don't want to hard-code the
name "master" in the test (since there has been effort lately to
remove that name from tests.

> +       echo change >a.txt &&
> +       git add a.txt &&
> +       git commit -m dev:1 &&
> +       test_must_fail git merge master &&
> +       git ls-files -t --dedup >actual1 &&
> +       test_cmp expect1 actual1 &&
> +       rm delete.txt &&
> +       git ls-files -d -m -t --dedup >actual2 &&
> +       test_cmp expect2 actual2

We usually don't bother giving temporary files unique names like
"actual1" and "actual2" unless those files must exist at the same
time. This is because unique names like this may confuse readers into
wondering if there is some hidden interdependency between the files.
In this case, the files don't need to exist at the same time, so it
may be better simply to use the names "actual" and "expect", like
this:

    ...other stuff...
    cat >expect <<-\EOF &&
    ...
    EOF
    git ls-files -t --dedup >actual &&
    test_cmp expect actual &&
    rm delete.txt &&
    cat >expect <<-\EOF &&
    ...
    EOF
    git ls-files -d -m -t --dedup >actual &&
    test_cmp expect actual

(It also has the benefit that the "expect" content is closer to the
place where it is actually used, which may make it a bit easier for a
person reading the test to understand what is supposed to be
produced.)
ZheNing Hu Jan. 17, 2021, 3:45 a.m. UTC | #3
Junio, thank you for your patience to review
my patch and guide me how to modify it.

Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道:
>
> "阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> >
> > In a merge conflict, one item of "git ls-files" output may
> > appear multiple times. For example,now the file `a.c` has
> > a conflict,`a.c` will appear three times in the output of
> > "git ls-files".We can use "git ls-files --dedup" to output
> > `a.c` only one time.(unless `--stage` or `--unmerged` is
> > used to view all the detailed information in the index)
>
> Unlike these option names we see in the description, "dedup" is not
> a full word.  Perhaps spell it fully "--deduplicate" while letting
> parse-options machinery to accept unique prefix (including
> "--dedup"?
>
Ok i have modified "--dedup" to "--deduplicate".
> > In addition, if you use both `--delete` and `--modify` in
> > the same time, The `--dedup` option can also suppress modified
>
> "at the same time", I think.
>
My poor English grammar :-)
> > entries output.
>
> [let's call this point "point A"]
>
> > `--dedup` option relevant descriptions in
> > `Documentation/git-ls-files.txt`,
>
> I am not sure what this means.
>
> > the test script in `t/t3012-ls-files-dedup.sh`
> > prove the correctness of the `--dedup` option.
>
> No amount of tests "proves" any correctness, but that is OK.  I
> think you meant to say "a few tests have been added to t3012 to
> protect the new feature from future breakage" or something like
> that.
>
Alright, I understand!
> In any case, I think everything after "point A" and before your sign
> off does not belong to the log message.  The diffstat shows that
> documentation and tests have been added already.
>
> > +--dedup::
> > +     Suppress duplicate entries when conflict happen
>
> "conflict happen" -> "there are unmerged paths", as the term
> "unmerged" is already shown to readers of "ls-files --help".
>
Well, maybe I'm not good enough with these proper nouns.
> > +     or `--deleted` and `--modified` are combined.
>
> I somehow thought that you refrained from deduping when you are
> showing the stages with "ls-files -u" and "ls-files -s", or you are
> showing status with "ls-files -t", because you will otherwise lose
> information.  In other words, showing only one cache entry out of
> many that share the same name makes sense only when we are showing
> name and nothing else.
>
You are right, "--deduplicate" should only work on duplicate file names,
so "ls-files -t" also needs to be corrected.
Well,This is true a bug I haven't notice.
> Having said all that, I suspect that we may be much better off if we
> can somehow merge the two loops into one.  You may be dedup adjacent
> entries in each loop separately with the approach taken by this
> patch, but I do not think the patch would work to deduplicate across
> two loops.  For example, what happens if you do this?
>
>     $ git reset --hard
>     $ echo >>builtin/ls-files.c
>     $ git ls-files -c -m builtin/ls-files.c
>     $ git ls-files -t -c -m builtin/ls-files.c
>
> I think you see the path twice in the output, with or without your
> --dedup option (remember what I said about proving, by the way? ;-)).
>
Yeah,This is because I may have missed the -c option with other options
at the same time.
Here I may disagree with your point of view:
                 if (errno != E_NOENT)
                         error_errno("cannot lstat '%s'", fullname.buf);
With this sentence included, the patch will fail the test:
t/t3010-ls-files-killed-modified.sh.
the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);`
So I temporarily removed the judgment of errno.
>  #2: consolidate two for loops into one.
>
>      The two loops have slightly different condition to skip a ce,
>      and different logic on what tag each path is shown with.  When
>      --cached and --modified or --deleted are asked for at the same
>      time, we'd show them multiple times (this is done inside the
>      loop for each ce)
>
>         if (show_cached || show_stage)
>                 show_ce(... ce_stage(ce) ? tag_unmerged : ...);
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 ... code that corresponds to the
>                 ... illustration in #1 above come here.
>         } else if (...)
>                 show_ce(..., tag_modified);
>
>      This changes the semantics.  The original iterates the index
>      twice, so you may see the same entry from --cached once and
>      then again from --modified.  The updated one still will show
>      the same entry twice but next to each other.
>
Well,This does change the semantics. I think people who used two
for loops before may want to separate different outputs.
Now, if you don’t use "--deduplicate", You may see six consecutive
items under a combination of multiple options.
>  #3: optionally deduplicate.
>
>      Once we have a single loop, deduplicationg based on names is
>      trivial, as we seen before.
>
>
Indeed so.
> Hmm?
THANKS.

Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道:
>
> "阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> >
> > In a merge conflict, one item of "git ls-files" output may
> > appear multiple times. For example,now the file `a.c` has
> > a conflict,`a.c` will appear three times in the output of
> > "git ls-files".We can use "git ls-files --dedup" to output
> > `a.c` only one time.(unless `--stage` or `--unmerged` is
> > used to view all the detailed information in the index)
>
> Unlike these option names we see in the description, "dedup" is not
> a full word.  Perhaps spell it fully "--deduplicate" while letting
> parse-options machinery to accept unique prefix (including
> "--dedup"?
>
> > In addition, if you use both `--delete` and `--modify` in
> > the same time, The `--dedup` option can also suppress modified
>
> "at the same time", I think.
>
> > entries output.
>
> [let's call this point "point A"]
>
> > `--dedup` option relevant descriptions in
> > `Documentation/git-ls-files.txt`,
>
> I am not sure what this means.
>
> > the test script in `t/t3012-ls-files-dedup.sh`
> > prove the correctness of the `--dedup` option.
>
> No amount of tests "proves" any correctness, but that is OK.  I
> think you meant to say "a few tests have been added to t3012 to
> protect the new feature from future breakage" or something like
> that.
>
> In any case, I think everything after "point A" and before your sign
> off does not belong to the log message.  The diffstat shows that
> documentation and tests have been added already.
>
> > +--dedup::
> > +     Suppress duplicate entries when conflict happen
>
> "conflict happen" -> "there are unmerged paths", as the term
> "unmerged" is already shown to readers of "ls-files --help".
>
> > +     or `--deleted` and `--modified` are combined.
>
> I somehow thought that you refrained from deduping when you are
> showing the stages with "ls-files -u" and "ls-files -s", or you are
> showing status with "ls-files -t", because you will otherwise lose
> information.  In other words, showing only one cache entry out of
> many that share the same name makes sense only when we are showing
> name and nothing else.
>
> Has that been changed from the previous rounds?
>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index c8eae899b82..bc4eded19ab 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >               for (i = 0; i < repo->index->cache_nr; i++) {
> >                       const struct cache_entry *ce = repo->index->cache[i];
> >
> > +                     if (show_cached && delete_dup) {
> > +                             switch (ce_stage(ce)) {
> > +                             case 0:
> > +                             default:
> > +                                     break;
>
> This part looks somewhat strange for two reasons:
>
>  - The code enumerates ALL the possible stage numbers from 0 to 3;
>    if we were to have "default", I'd expect it would be a separate
>    switch arm from the possible values that calls out an programming
>    error, e.g. BUG("at stage #%d???", ce_stage(ce)).  Simply removing
>    the "default" arm would be another way out of this strangeness.
>
>  - When we see a stage #0 entry, we know we will not have higher
>    stage entries with the same name.  Not clearing last_stage here
>    feels wrong, as the primary reason why last_stage variable is
>    used is to remember the last ce that was shown, so that other
>    entries with the same name can be skipped.
>
> By the way, "last_shown_ce" may be a much better name for the
> variable, as you do not really care what stage number the ce you
> showed last was at (you care about its name).
>
> Also, I do not see a good reason why the last_shown_ce variable
> should have lifetime longer than the block that contains this for()
> loop (and the other for loop for deleted/modified codepath we see
> later).  Especially since you initialize the variable that you made
> visible to the entire function to NULL before entering the first for
> loop, but you do not set it back to NULL before entering the second
> for loop, it is inviting a subtle bug.  You may have been given
> show_cached and show_modified at the same time, so you enter the
> first loop and have shown the first stage of the last conflicted
> path, whose cache entry is left in the last_stage variable.  Since
> the variable has longer lifespan than necessary, when the second
> loop is entered, it still points at the cache entry for the highest
> stage of the last conflicted path.  That is because the code forgets
> to clear it to NULL before entering the second for loop.
>
> Having said all that, I suspect that we may be much better off if we
> can somehow merge the two loops into one.  You may be dedup adjacent
> entries in each loop separately with the approach taken by this
> patch, but I do not think the patch would work to deduplicate across
> two loops.  For example, what happens if you do this?
>
>     $ git reset --hard
>     $ echo >>builtin/ls-files.c
>     $ git ls-files -c -m builtin/ls-files.c
>     $ git ls-files -t -c -m builtin/ls-files.c
>
> I think you see the path twice in the output, with or without your
> --dedup option (remember what I said about proving, by the way? ;-)).
>
> Once we successfully merged two loops into one, the part that shows
> tracked paths in the function would have only one loop, and it would
> become a lot cleaner to add the logic to "skip showing the ce if it
> has the same name as the previously shown one, only when doing so
> won't lose information", by doing something like this:
>
>         static void show_files(....)
>         {
>                 /* show_others || show_killed done here */
>                 ...
>
>                 /* leave early if not showing anything */
>                 if (! (show_cached || show_stage || show_deleted || show_modified))
>                         return;
>
>                 last_shown_ce = NULL;
>                 for (i = 0; i < repo->index->cache_nr; i++) {
>                         const struct cache_entry *ce = repo->index->cache[i];
>
>                         if (skipping_duplicates && last_shown_ce)
>                                 if (!strcmp(ce->name, last_shown_ce->name))
>                                         continue;
>
>                         construct_fullname();
>
>                         /* various reasons to skip the entry tested */
>                         if (showing ignored directory and ce is excluded)
>                                 continue;
>                         if (show_unmerged && !ce_stage(ce))
>                                 continue;
>                         if (ce->ce_flags & CE_UPDATE)
>                                 continue;
>                         ... other reasons may appear here ...
>
>                         /* now we are committed to show it */
>                         last_shown_ce = ce;
>
>                         ... various different ways to show ce come here ...
>                         show_ce(...);
>                 }
>         }
>
> where "skipping_duplicates" would be set when "--deduplicate" is
> asked and we are not showing information other than the pathname
> via various options e.g. the tags (-t) or stages (-s/-u).
>
> > +                     if (delete_dup && show_deleted && show_modified && err)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_removed);
>
> I actually think the original code that is still shown here ...
>
> > +                     else {
> > +                             if (show_deleted && err)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified && ie_modified(repo->index, ce, &st, 0))
>
> ... about modified file is buggy.  If lstat() failed, then &st has
> no useful information, so it is wrong to feed it to ie_modified().
>
> Perhaps a three-patch series that is structured like this may be in
> order?
>
>  #1: bugfix for --deleted and --modified.
>
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 if (errno != E_NOENT)
>                         error_errno("cannot lstat '%s'", fullname.buf);
>                 else {
>                         if (show_deleted)
>                                 show_ce(..., tag_removed);
>                         if (show_modified)
>                                 show_ce(..., tag_modified);
>                 }
>         } else if (show_modified && ie_modified(...))
>                 show_ce(..., tag_modified);
>
>      This hopefully should not change the semantics.  If you ask
>      --deleted and --modified, a deleted path would be listed twice.
>
>  #2: consolidate two for loops into one.
>
>      The two loops have slightly different condition to skip a ce,
>      and different logic on what tag each path is shown with.  When
>      --cached and --modified or --deleted are asked for at the same
>      time, we'd show them multiple times (this is done inside the
>      loop for each ce)
>
>         if (show_cached || show_stage)
>                 show_ce(... ce_stage(ce) ? tag_unmerged : ...);
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 ... code that corresponds to the
>                 ... illustration in #1 above come here.
>         } else if (...)
>                 show_ce(..., tag_modified);
>
>      This changes the semantics.  The original iterates the index
>      twice, so you may see the same entry from --cached once and
>      then again from --modified.  The updated one still will show
>      the same entry twice but next to each other.
>
>  #3: optionally deduplicate.
>
>      Once we have a single loop, deduplicationg based on names is
>      trivial, as we seen before.
>
>
> Hmm?
ZheNing Hu Jan. 17, 2021, 3:49 a.m. UTC | #4
Eric,Thanks!
I have little confuse about I can use` test_when_finished "git switch master" `,
but I can't use` test_when_finished "git switch -" `,
why?

Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
>
> On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> > [...]
>
> I have a few very minor comments alongside Junio's review comments...
>
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> > @@ -0,0 +1,54 @@
> > +test_description='git ls-files --dedup test.
> > +
> > +This test prepares the following in the cache:
> > +
> > +    a.txt       - a file(base)
> > +    a.txt      - a file(master)
> > +    a.txt       - a file(dev)
> > +    b.txt       - a file
> > +    delete.txt  - a file
> > +    expect1    - a file
> > +    expect2    - a file
> > +
> > +'
>
> This test script description is outdated now. Perhaps shorten it to:
>
>     test_description='ls-files dedup tests'
>
> Or, it might be suitable to simply add the new test to the existing
> t3004-ls-files-basic.sh instead.
>
> > +test_expect_success 'setup' '
> > +       > a.txt &&
> > +       > b.txt &&
> > +       > delete.txt &&
> > +       cat >expect1<<-\EOF &&
>
> Style nits: no space after redirection operator and a space before
> redirection operator:
>
>     >a.txt &&
>     >b.txt &&
>     >delete.txt &&
>     cat >expect1 <<-\EOF &&
>
> > +       cat >expect2<<-EOF &&
>
> Nit: missing the backslash (and wrong spacing):
>
>     cat >expect2 <<-\EOF &&
>
> > +       echo a>a.txt &&
> > +       echo b>b.txt &&
>
> Style:
>
>     echo a >a.txt &&
>     echo b >b.txt &&
>
> > +       echo delete >delete.txt &&
> > +       git add a.txt b.txt delete.txt &&
> > +       git commit -m master:2 &&
> > +       git checkout HEAD~ &&
> > +       git switch -c dev &&
>
> If someone adds a new test after this test, then that new test will
> run in the "dev" branch, which might be unexpected or undesirable. It
> often is a good idea to ensure that tests do certain types of cleanup
> to avoid breaking subsequent tests. Here, it would be a good idea to
> ensure that the test switches back to the original branch when it
> finishes (regardless of whether it finishes successfully or
> unsuccessfully).
>
>     git switch -c dev &&
>     test_when_finished "git switch master" &&
>
> Or you could use `git switch -` if you don't want to hard-code the
> name "master" in the test (since there has been effort lately to
> remove that name from tests.
>
> > +       echo change >a.txt &&
> > +       git add a.txt &&
> > +       git commit -m dev:1 &&
> > +       test_must_fail git merge master &&
> > +       git ls-files -t --dedup >actual1 &&
> > +       test_cmp expect1 actual1 &&
> > +       rm delete.txt &&
> > +       git ls-files -d -m -t --dedup >actual2 &&
> > +       test_cmp expect2 actual2
>
> We usually don't bother giving temporary files unique names like
> "actual1" and "actual2" unless those files must exist at the same
> time. This is because unique names like this may confuse readers into
> wondering if there is some hidden interdependency between the files.
> In this case, the files don't need to exist at the same time, so it
> may be better simply to use the names "actual" and "expect", like
> this:
>
>     ...other stuff...
>     cat >expect <<-\EOF &&
>     ...
>     EOF
>     git ls-files -t --dedup >actual &&
>     test_cmp expect actual &&
>     rm delete.txt &&
>     cat >expect <<-\EOF &&
>     ...
>     EOF
>     git ls-files -d -m -t --dedup >actual &&
>     test_cmp expect actual
>
> (It also has the benefit that the "expect" content is closer to the
> place where it is actually used, which may make it a bit easier for a
> person reading the test to understand what is supposed to be
> produced.)
Junio C Hamano Jan. 17, 2021, 4:37 a.m. UTC | #5
胡哲宁 <adlternative@gmail.com> writes:

> Here I may disagree with your point of view:
>                  if (errno != E_NOENT)
>                          error_errno("cannot lstat '%s'", fullname.buf);
> With this sentence included, the patch will fail the test:
> t/t3010-ls-files-killed-modified.sh.
> the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);`
> So I temporarily removed the judgment of errno.

I didn't mean to give you a solution that is ready to be
cut-and-pasted without thinking ;-) 

If NOTDIR needs to also be excluded, then you can exclude it just
like the above illustration excludes NOENT to solve the issue,
right?

>>  #2: consolidate two for loops into one.
>> ...
>>      This changes the semantics.  The original iterates the index
>>      twice, so you may see the same entry from --cached once and
>>      then again from --modified.  The updated one still will show
>>      the same entry twice but next to each other.
>>
> Well,This does change the semantics. I think people who used two
> for loops before may want to separate different outputs.
> Now, if you don’t use "--deduplicate", You may see six consecutive
> items under a combination of multiple options.

Yes, and that is intended and is a vast improvement from the current
behaviour, which shows 3 in the first loop, bunch of unrelated
entries from the rest of the first loop, yet another bunch of
unrelated entries from the early part of the second loop and then
finally shows 3 from the second loop.  With the "single loop"
update, at least, the entries are sorted by their path and it would
make it easier to see (if the user cares to trigger both --cached
and --modified, that is), no?
Eric Sunshine Jan. 17, 2021, 5:11 a.m. UTC | #6
On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
> > > +       git switch -c dev &&
> >
> > If someone adds a new test after this test, then that new test will
> > run in the "dev" branch, which might be unexpected or undesirable. It
> > often is a good idea to ensure that tests do certain types of cleanup
> > to avoid breaking subsequent tests. Here, it would be a good idea to
> > ensure that the test switches back to the original branch when it
> > finishes (regardless of whether it finishes successfully or
> > unsuccessfully).
> >
> >     git switch -c dev &&
> >     test_when_finished "git switch master" &&
> >
> > Or you could use `git switch -` if you don't want to hard-code the
> > name "master" in the test (since there has been effort lately to
> > remove that name from tests.
> >
> I have little confuse about I can use` test_when_finished "git switch master" `,
> but I can't use` test_when_finished "git switch -" `,
> why?

You may use either one. I presented both as alternative approaches.
Junio C Hamano Jan. 17, 2021, 11:04 p.m. UTC | #7
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
>> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
>> > > +       git switch -c dev &&
>> >
>> > If someone adds a new test after this test, then that new test will
>> > run in the "dev" branch, which might be unexpected or undesirable. It
>> > often is a good idea to ensure that tests do certain types of cleanup
>> > to avoid breaking subsequent tests. Here, it would be a good idea to
>> > ensure that the test switches back to the original branch when it
>> > finishes (regardless of whether it finishes successfully or
>> > unsuccessfully).
>> >
>> >     git switch -c dev &&
>> >     test_when_finished "git switch master" &&
>> >
>> > Or you could use `git switch -` if you don't want to hard-code the
>> > name "master" in the test (since there has been effort lately to
>> > remove that name from tests.
>> >
>> I have little confuse about I can use` test_when_finished "git switch master" `,
>> but I can't use` test_when_finished "git switch -" `,
>> why?
>
> You may use either one. I presented both as alternative approaches.

I am sensing a bit of miscommunication here.  You sound like you
still believe either would work OK, but to me, it sounds like that
the author claims the one of them does not work for him/her.

I do not think the test framework itself mucks with the reflog to
make switching to "-" (or "@{-1}") break, so I find it implausible
that "switch -" form not to work, and unlike your "either is OK", I
have a moderately strong preference to use the "go back to the
previous one, whatever it is called" form.

Thanks.
Eric Sunshine Jan. 18, 2021, 2:59 p.m. UTC | #8
On Sun, Jan 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
> >> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
> >> >     test_when_finished "git switch master" &&
> >> >
> >> > Or you could use `git switch -` if you don't want to hard-code the
> >> > name "master" in the test (since there has been effort lately to
> >> > remove that name from tests.
> >> >
> >> I have little confuse about I can use` test_when_finished "git switch master" `,
> >> but I can't use` test_when_finished "git switch -" `,
> >> why?
> >
> > You may use either one. I presented both as alternative approaches.
>
> I am sensing a bit of miscommunication here.  You sound like you
> still believe either would work OK, but to me, it sounds like that
> the author claims the one of them does not work for him/her.

That could be. My eye glided over the code too quickly to notice that
it was using `git checkout HEAD~` followed immediately by `git switch
-c dev`, which means that `git switch -` would indeed not return to
"master" (in fact, it errors out).
diff mbox series

Patch

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..0f8dbeeea20 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@  SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--dedup]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@  OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--dedup::
+	Suppress duplicate entries when conflict happen or `--deleted`
+	and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..bc4eded19ab 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@  static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int delete_dup;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_stage = NULL;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -316,6 +318,20 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
 
+			if (show_cached && delete_dup) {
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage = ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -337,6 +353,20 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 			struct stat st;
 			int err;
 
+			if (delete_dup) {
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage = ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -347,10 +377,14 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			if (delete_dup && show_deleted && show_modified && err)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else {
+				if (show_deleted && err)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified && ie_modified(repo->index, ce, &st, 0))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		}
 	}
 
@@ -578,6 +612,7 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..aec7d364235
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,54 @@ 
+#!/bin/sh
+
+test_description='git ls-files --dedup test.
+
+This test prepares the following in the cache:
+
+    a.txt       - a file(base)
+    a.txt	- a file(master)
+    a.txt       - a file(dev)
+    b.txt       - a file
+    delete.txt  - a file
+    expect1	- a file
+    expect2	- a file
+
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	> a.txt &&
+	> b.txt &&
+	> delete.txt &&
+	cat >expect1<<-\EOF &&
+	M a.txt
+	H b.txt
+	H delete.txt
+	H expect1
+	H expect2
+	EOF
+	cat >expect2<<-EOF &&
+	C a.txt
+	R delete.txt
+	EOF
+	git add a.txt b.txt delete.txt expect1 expect2 &&
+	git commit -m master:1 &&
+	echo a>a.txt &&
+	echo b>b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m master:2 &&
+	git checkout HEAD~ &&
+	git switch -c dev &&
+	echo change >a.txt &&
+	git add a.txt &&
+	git commit -m dev:1 &&
+	test_must_fail git merge master &&
+	git ls-files -t --dedup >actual1 &&
+	test_cmp expect1 actual1 &&
+	rm delete.txt &&
+	git ls-files -d -m -t --dedup >actual2 &&
+	test_cmp expect2 actual2
+'
+
+test_done