diff mbox series

[v6] ls-files.c: add --deduplicate option

Message ID 20210122154640.3791035-1-adlternative@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v6] ls-files.c: add --deduplicate option | expand

Commit Message

ZheNing Hu Jan. 22, 2021, 3:46 p.m. UTC
In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 46 +++++++++++++++++-------
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 12 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

Comments

Junio C Hamano Jan. 22, 2021, 8:52 p.m. UTC | #1
ZheNing Hu <adlternative@gmail.com> writes:

> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--deduplicate` option will suppress
> some duplicate name under some conditions.

Now is it just a single patch squashing everything together?
That does not look like it.

> @@ -317,7 +318,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  	for (i = 0; i < repo->index->cache_nr; i++) {
>  		const struct cache_entry *ce = repo->index->cache[i];
>  		struct stat st;
> -		int err;
> +		int stat_err;
>  
>  		construct_fullname(&fullname, repo, ce);
>  
> @@ -326,25 +327,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  			continue;
>  		if (ce->ce_flags & CE_UPDATE)
>  			continue;
> -		if (show_cached || show_stage) {
> -			if (!show_unmerged || ce_stage(ce))
> +		if ((show_cached || show_stage) &&
> +			(!show_unmerged || ce_stage(ce))) {
>  				show_ce(repo, dir, ce, fullname.buf,
>  					ce_stage(ce) ? tag_unmerged :
>  					(ce_skip_worktree(ce) ? tag_skip_worktree :
>  						tag_cached));
> +			if (show_cached && skipping_duplicates)
> +				goto skip_to_next_name;

Why should this be so complex?  You are dropping skipping_duplicates
when the output is not name-only, so shouldn't this look more like

		if ((show_cached || show_stage) &&
		    (!show_unmerged || ce_stage(ce)) {
			show_ce(...);
                        if (skipping_duplicates)
                        	goto skip_to_next_name;
		}

It seems that this still depends on the 2/3 from the previous
iteration, against which I suggested to merge the conditions of
nested if statements into one.  That should be done in the updated
2/3, not in this step, no?

>  		}
> +		if (!show_deleted && !show_modified)
> +			continue;

And this one also belongs to the step 2/3 that consolidates the two
loops into one.

I think you'd need to start from the three patches in v5, "rebase -i"
not just [3/3] but at least [2/3], too.

Thanks.
ZheNing Hu Jan. 23, 2021, 8:27 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2021年1月23日周六 上午4:52写道:
>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--deduplicate` option will suppress
> > some duplicate name under some conditions.
>
> Now is it just a single patch squashing everything together?
> That does not look like it.
>
> > @@ -317,7 +318,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >       for (i = 0; i < repo->index->cache_nr; i++) {
> >               const struct cache_entry *ce = repo->index->cache[i];
> >               struct stat st;
> > -             int err;
> > +             int stat_err;
> >
> >               construct_fullname(&fullname, repo, ce);
> >
> > @@ -326,25 +327,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >                       continue;
> >               if (ce->ce_flags & CE_UPDATE)
> >                       continue;
> > -             if (show_cached || show_stage) {
> > -                     if (!show_unmerged || ce_stage(ce))
> > +             if ((show_cached || show_stage) &&
> > +                     (!show_unmerged || ce_stage(ce))) {
> >                               show_ce(repo, dir, ce, fullname.buf,
> >                                       ce_stage(ce) ? tag_unmerged :
> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >                                               tag_cached));
> > +                     if (show_cached && skipping_duplicates)
> > +                             goto skip_to_next_name;
>
> Why should this be so complex?  You are dropping skipping_duplicates
> when the output is not name-only, so shouldn't this look more like
>
Truly,I may have considered too much,if I have
"show_stage","skipping_duplicates"
must be false.
>                 if ((show_cached || show_stage) &&
>                     (!show_unmerged || ce_stage(ce)) {
>                         show_ce(...);
>                         if (skipping_duplicates)
>                                 goto skip_to_next_name;
>                 }
>
> It seems that this still depends on the 2/3 from the previous
> iteration, against which I suggested to merge the conditions of
> nested if statements into one.  That should be done in the updated
> 2/3, not in this step, no?
>
> >               }
> > +             if (!show_deleted && !show_modified)
> > +                     continue;
>
> And this one also belongs to the step 2/3 that consolidates the two
> loops into one.
>
> I think you'd need to start from the three patches in v5, "rebase -i"
> not just [3/3] but at least [2/3], too.
>
> Thanks.

Thanks,I am rewriting.
diff mbox series

Patch

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 0a3b5265b3..a05f063d3d 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@  SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -80,6 +81,10 @@  OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1454ab1ae6..e67dc1ff45 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@  static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -317,7 +318,7 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
-		int err;
+		int stat_err;
 
 		construct_fullname(&fullname, repo, ce);
 
@@ -326,25 +327,43 @@  static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
-		if (show_cached || show_stage) {
-			if (!show_unmerged || ce_stage(ce))
+		if ((show_cached || show_stage) &&
+			(!show_unmerged || ce_stage(ce))) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (show_cached && skipping_duplicates)
+				goto skip_to_next_name;
 		}
+		if (!show_deleted && !show_modified)
+			continue;
 		if (ce_skip_worktree(ce))
 			continue;
-		err = lstat(fullname.buf, &st);
-		if (err) {
-			if (errno != ENOENT && errno != ENOTDIR)
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (show_deleted)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified)
+		stat_err = lstat(fullname.buf, &st);
+		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+			error_errno("cannot lstat '%s'", fullname.buf);
+		if (stat_err && show_deleted) {
+			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		if (show_modified &&
+			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
-		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
-			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		continue;
+skip_to_next_name:
+		{
+			int j;
+			struct cache_entry **cache = repo->index->cache;
+			for (j = i + 1; j < repo->index->cache_nr; j++)
+				if (strcmp(ce->name, cache[j]->name))
+					break;
+			i = j - 1; /* compensate for outer for loop */
+		}
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 0000000000..2682b1f43a
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@ 
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done