Message ID | pull.832.v3.git.1610626942677.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] ls-files.c: add --dedup option | expand |
"阿德烈 via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: ZheNing Hu <adlternative@gmail.com> > > In order to provide users a better experience > when viewing information about files in the index > and the working tree, the `--dedup` option will suppress > some duplicate options under some conditions. > > In a merge conflict, one item of "git ls-files" output may > appear multiple times. For example,now the file `a.c` has > a conflict,`a.c` will appear three times in the output of > "git ls-files".We can use "git ls-files --dedup" to output > `a.c` only one time.(unless `--stage` or `--unmerged` is > used to view all the detailed information in the index) Unlike these option names we see in the description, "dedup" is not a full word. Perhaps spell it fully "--deduplicate" while letting parse-options machinery to accept unique prefix (including "--dedup"? > In addition, if you use both `--delete` and `--modify` in > the same time, The `--dedup` option can also suppress modified "at the same time", I think. > entries output. [let's call this point "point A"] > `--dedup` option relevant descriptions in > `Documentation/git-ls-files.txt`, I am not sure what this means. > the test script in `t/t3012-ls-files-dedup.sh` > prove the correctness of the `--dedup` option. No amount of tests "proves" any correctness, but that is OK. I think you meant to say "a few tests have been added to t3012 to protect the new feature from future breakage" or something like that. In any case, I think everything after "point A" and before your sign off does not belong to the log message. The diffstat shows that documentation and tests have been added already. > +--dedup:: > + Suppress duplicate entries when conflict happen "conflict happen" -> "there are unmerged paths", as the term "unmerged" is already shown to readers of "ls-files --help". > + or `--deleted` and `--modified` are combined. I somehow thought that you refrained from deduping when you are showing the stages with "ls-files -u" and "ls-files -s", or you are showing status with "ls-files -t", because you will otherwise lose information. In other words, showing only one cache entry out of many that share the same name makes sense only when we are showing name and nothing else. Has that been changed from the previous rounds? > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > index c8eae899b82..bc4eded19ab 100644 > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir) > for (i = 0; i < repo->index->cache_nr; i++) { > const struct cache_entry *ce = repo->index->cache[i]; > > + if (show_cached && delete_dup) { > + switch (ce_stage(ce)) { > + case 0: > + default: > + break; This part looks somewhat strange for two reasons: - The code enumerates ALL the possible stage numbers from 0 to 3; if we were to have "default", I'd expect it would be a separate switch arm from the possible values that calls out an programming error, e.g. BUG("at stage #%d???", ce_stage(ce)). Simply removing the "default" arm would be another way out of this strangeness. - When we see a stage #0 entry, we know we will not have higher stage entries with the same name. Not clearing last_stage here feels wrong, as the primary reason why last_stage variable is used is to remember the last ce that was shown, so that other entries with the same name can be skipped. By the way, "last_shown_ce" may be a much better name for the variable, as you do not really care what stage number the ce you showed last was at (you care about its name). Also, I do not see a good reason why the last_shown_ce variable should have lifetime longer than the block that contains this for() loop (and the other for loop for deleted/modified codepath we see later). Especially since you initialize the variable that you made visible to the entire function to NULL before entering the first for loop, but you do not set it back to NULL before entering the second for loop, it is inviting a subtle bug. You may have been given show_cached and show_modified at the same time, so you enter the first loop and have shown the first stage of the last conflicted path, whose cache entry is left in the last_stage variable. Since the variable has longer lifespan than necessary, when the second loop is entered, it still points at the cache entry for the highest stage of the last conflicted path. That is because the code forgets to clear it to NULL before entering the second for loop. Having said all that, I suspect that we may be much better off if we can somehow merge the two loops into one. You may be dedup adjacent entries in each loop separately with the approach taken by this patch, but I do not think the patch would work to deduplicate across two loops. For example, what happens if you do this? $ git reset --hard $ echo >>builtin/ls-files.c $ git ls-files -c -m builtin/ls-files.c $ git ls-files -t -c -m builtin/ls-files.c I think you see the path twice in the output, with or without your --dedup option (remember what I said about proving, by the way? ;-)). Once we successfully merged two loops into one, the part that shows tracked paths in the function would have only one loop, and it would become a lot cleaner to add the logic to "skip showing the ce if it has the same name as the previously shown one, only when doing so won't lose information", by doing something like this: static void show_files(....) { /* show_others || show_killed done here */ ... /* leave early if not showing anything */ if (! (show_cached || show_stage || show_deleted || show_modified)) return; last_shown_ce = NULL; for (i = 0; i < repo->index->cache_nr; i++) { const struct cache_entry *ce = repo->index->cache[i]; if (skipping_duplicates && last_shown_ce) if (!strcmp(ce->name, last_shown_ce->name)) continue; construct_fullname(); /* various reasons to skip the entry tested */ if (showing ignored directory and ce is excluded) continue; if (show_unmerged && !ce_stage(ce)) continue; if (ce->ce_flags & CE_UPDATE) continue; ... other reasons may appear here ... /* now we are committed to show it */ last_shown_ce = ce; ... various different ways to show ce come here ... show_ce(...); } } where "skipping_duplicates" would be set when "--deduplicate" is asked and we are not showing information other than the pathname via various options e.g. the tags (-t) or stages (-s/-u). > + if (delete_dup && show_deleted && show_modified && err) > show_ce(repo, dir, ce, fullname.buf, tag_removed); I actually think the original code that is still shown here ... > + else { > + if (show_deleted && err) > + show_ce(repo, dir, ce, fullname.buf, tag_removed); > + if (show_modified && ie_modified(repo->index, ce, &st, 0)) ... about modified file is buggy. If lstat() failed, then &st has no useful information, so it is wrong to feed it to ie_modified(). Perhaps a three-patch series that is structured like this may be in order? #1: bugfix for --deleted and --modified. err = lstat(fullname.buf, &st); if (err) { /* deleted? */ if (errno != E_NOENT) error_errno("cannot lstat '%s'", fullname.buf); else { if (show_deleted) show_ce(..., tag_removed); if (show_modified) show_ce(..., tag_modified); } } else if (show_modified && ie_modified(...)) show_ce(..., tag_modified); This hopefully should not change the semantics. If you ask --deleted and --modified, a deleted path would be listed twice. #2: consolidate two for loops into one. The two loops have slightly different condition to skip a ce, and different logic on what tag each path is shown with. When --cached and --modified or --deleted are asked for at the same time, we'd show them multiple times (this is done inside the loop for each ce) if (show_cached || show_stage) show_ce(... ce_stage(ce) ? tag_unmerged : ...); err = lstat(fullname.buf, &st); if (err) { /* deleted? */ ... code that corresponds to the ... illustration in #1 above come here. } else if (...) show_ce(..., tag_modified); This changes the semantics. The original iterates the index twice, so you may see the same entry from --cached once and then again from --modified. The updated one still will show the same entry twice but next to each other. #3: optionally deduplicate. Once we have a single loop, deduplicationg based on names is trivial, as we seen before. Hmm?
On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget <gitgitgadget@gmail.com> wrote: > In order to provide users a better experience > when viewing information about files in the index > and the working tree, the `--dedup` option will suppress > some duplicate options under some conditions. > [...] I have a few very minor comments alongside Junio's review comments... > Signed-off-by: ZheNing Hu <adlternative@gmail.com> > --- > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh > @@ -0,0 +1,54 @@ > +test_description='git ls-files --dedup test. > + > +This test prepares the following in the cache: > + > + a.txt - a file(base) > + a.txt - a file(master) > + a.txt - a file(dev) > + b.txt - a file > + delete.txt - a file > + expect1 - a file > + expect2 - a file > + > +' This test script description is outdated now. Perhaps shorten it to: test_description='ls-files dedup tests' Or, it might be suitable to simply add the new test to the existing t3004-ls-files-basic.sh instead. > +test_expect_success 'setup' ' > + > a.txt && > + > b.txt && > + > delete.txt && > + cat >expect1<<-\EOF && Style nits: no space after redirection operator and a space before redirection operator: >a.txt && >b.txt && >delete.txt && cat >expect1 <<-\EOF && > + cat >expect2<<-EOF && Nit: missing the backslash (and wrong spacing): cat >expect2 <<-\EOF && > + echo a>a.txt && > + echo b>b.txt && Style: echo a >a.txt && echo b >b.txt && > + echo delete >delete.txt && > + git add a.txt b.txt delete.txt && > + git commit -m master:2 && > + git checkout HEAD~ && > + git switch -c dev && If someone adds a new test after this test, then that new test will run in the "dev" branch, which might be unexpected or undesirable. It often is a good idea to ensure that tests do certain types of cleanup to avoid breaking subsequent tests. Here, it would be a good idea to ensure that the test switches back to the original branch when it finishes (regardless of whether it finishes successfully or unsuccessfully). git switch -c dev && test_when_finished "git switch master" && Or you could use `git switch -` if you don't want to hard-code the name "master" in the test (since there has been effort lately to remove that name from tests. > + echo change >a.txt && > + git add a.txt && > + git commit -m dev:1 && > + test_must_fail git merge master && > + git ls-files -t --dedup >actual1 && > + test_cmp expect1 actual1 && > + rm delete.txt && > + git ls-files -d -m -t --dedup >actual2 && > + test_cmp expect2 actual2 We usually don't bother giving temporary files unique names like "actual1" and "actual2" unless those files must exist at the same time. This is because unique names like this may confuse readers into wondering if there is some hidden interdependency between the files. In this case, the files don't need to exist at the same time, so it may be better simply to use the names "actual" and "expect", like this: ...other stuff... cat >expect <<-\EOF && ... EOF git ls-files -t --dedup >actual && test_cmp expect actual && rm delete.txt && cat >expect <<-\EOF && ... EOF git ls-files -d -m -t --dedup >actual && test_cmp expect actual (It also has the benefit that the "expect" content is closer to the place where it is actually used, which may make it a bit easier for a person reading the test to understand what is supposed to be produced.)
Junio, thank you for your patience to review my patch and guide me how to modify it. Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道: > > "阿德烈 via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: ZheNing Hu <adlternative@gmail.com> > > > > In order to provide users a better experience > > when viewing information about files in the index > > and the working tree, the `--dedup` option will suppress > > some duplicate options under some conditions. > > > > In a merge conflict, one item of "git ls-files" output may > > appear multiple times. For example,now the file `a.c` has > > a conflict,`a.c` will appear three times in the output of > > "git ls-files".We can use "git ls-files --dedup" to output > > `a.c` only one time.(unless `--stage` or `--unmerged` is > > used to view all the detailed information in the index) > > Unlike these option names we see in the description, "dedup" is not > a full word. Perhaps spell it fully "--deduplicate" while letting > parse-options machinery to accept unique prefix (including > "--dedup"? > Ok i have modified "--dedup" to "--deduplicate". > > In addition, if you use both `--delete` and `--modify` in > > the same time, The `--dedup` option can also suppress modified > > "at the same time", I think. > My poor English grammar :-) > > entries output. > > [let's call this point "point A"] > > > `--dedup` option relevant descriptions in > > `Documentation/git-ls-files.txt`, > > I am not sure what this means. > > > the test script in `t/t3012-ls-files-dedup.sh` > > prove the correctness of the `--dedup` option. > > No amount of tests "proves" any correctness, but that is OK. I > think you meant to say "a few tests have been added to t3012 to > protect the new feature from future breakage" or something like > that. > Alright, I understand! > In any case, I think everything after "point A" and before your sign > off does not belong to the log message. The diffstat shows that > documentation and tests have been added already. > > > +--dedup:: > > + Suppress duplicate entries when conflict happen > > "conflict happen" -> "there are unmerged paths", as the term > "unmerged" is already shown to readers of "ls-files --help". > Well, maybe I'm not good enough with these proper nouns. > > + or `--deleted` and `--modified` are combined. > > I somehow thought that you refrained from deduping when you are > showing the stages with "ls-files -u" and "ls-files -s", or you are > showing status with "ls-files -t", because you will otherwise lose > information. In other words, showing only one cache entry out of > many that share the same name makes sense only when we are showing > name and nothing else. > You are right, "--deduplicate" should only work on duplicate file names, so "ls-files -t" also needs to be corrected. Well,This is true a bug I haven't notice. > Having said all that, I suspect that we may be much better off if we > can somehow merge the two loops into one. You may be dedup adjacent > entries in each loop separately with the approach taken by this > patch, but I do not think the patch would work to deduplicate across > two loops. For example, what happens if you do this? > > $ git reset --hard > $ echo >>builtin/ls-files.c > $ git ls-files -c -m builtin/ls-files.c > $ git ls-files -t -c -m builtin/ls-files.c > > I think you see the path twice in the output, with or without your > --dedup option (remember what I said about proving, by the way? ;-)). > Yeah,This is because I may have missed the -c option with other options at the same time. Here I may disagree with your point of view: if (errno != E_NOENT) error_errno("cannot lstat '%s'", fullname.buf); With this sentence included, the patch will fail the test: t/t3010-ls-files-killed-modified.sh. the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);` So I temporarily removed the judgment of errno. > #2: consolidate two for loops into one. > > The two loops have slightly different condition to skip a ce, > and different logic on what tag each path is shown with. When > --cached and --modified or --deleted are asked for at the same > time, we'd show them multiple times (this is done inside the > loop for each ce) > > if (show_cached || show_stage) > show_ce(... ce_stage(ce) ? tag_unmerged : ...); > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > ... code that corresponds to the > ... illustration in #1 above come here. > } else if (...) > show_ce(..., tag_modified); > > This changes the semantics. The original iterates the index > twice, so you may see the same entry from --cached once and > then again from --modified. The updated one still will show > the same entry twice but next to each other. > Well,This does change the semantics. I think people who used two for loops before may want to separate different outputs. Now, if you don’t use "--deduplicate", You may see six consecutive items under a combination of multiple options. > #3: optionally deduplicate. > > Once we have a single loop, deduplicationg based on names is > trivial, as we seen before. > > Indeed so. > Hmm? THANKS. Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道: > > "阿德烈 via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: ZheNing Hu <adlternative@gmail.com> > > > > In order to provide users a better experience > > when viewing information about files in the index > > and the working tree, the `--dedup` option will suppress > > some duplicate options under some conditions. > > > > In a merge conflict, one item of "git ls-files" output may > > appear multiple times. For example,now the file `a.c` has > > a conflict,`a.c` will appear three times in the output of > > "git ls-files".We can use "git ls-files --dedup" to output > > `a.c` only one time.(unless `--stage` or `--unmerged` is > > used to view all the detailed information in the index) > > Unlike these option names we see in the description, "dedup" is not > a full word. Perhaps spell it fully "--deduplicate" while letting > parse-options machinery to accept unique prefix (including > "--dedup"? > > > In addition, if you use both `--delete` and `--modify` in > > the same time, The `--dedup` option can also suppress modified > > "at the same time", I think. > > > entries output. > > [let's call this point "point A"] > > > `--dedup` option relevant descriptions in > > `Documentation/git-ls-files.txt`, > > I am not sure what this means. > > > the test script in `t/t3012-ls-files-dedup.sh` > > prove the correctness of the `--dedup` option. > > No amount of tests "proves" any correctness, but that is OK. I > think you meant to say "a few tests have been added to t3012 to > protect the new feature from future breakage" or something like > that. > > In any case, I think everything after "point A" and before your sign > off does not belong to the log message. The diffstat shows that > documentation and tests have been added already. > > > +--dedup:: > > + Suppress duplicate entries when conflict happen > > "conflict happen" -> "there are unmerged paths", as the term > "unmerged" is already shown to readers of "ls-files --help". > > > + or `--deleted` and `--modified` are combined. > > I somehow thought that you refrained from deduping when you are > showing the stages with "ls-files -u" and "ls-files -s", or you are > showing status with "ls-files -t", because you will otherwise lose > information. In other words, showing only one cache entry out of > many that share the same name makes sense only when we are showing > name and nothing else. > > Has that been changed from the previous rounds? > > > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > > index c8eae899b82..bc4eded19ab 100644 > > --- a/builtin/ls-files.c > > +++ b/builtin/ls-files.c > > @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir) > > for (i = 0; i < repo->index->cache_nr; i++) { > > const struct cache_entry *ce = repo->index->cache[i]; > > > > + if (show_cached && delete_dup) { > > + switch (ce_stage(ce)) { > > + case 0: > > + default: > > + break; > > This part looks somewhat strange for two reasons: > > - The code enumerates ALL the possible stage numbers from 0 to 3; > if we were to have "default", I'd expect it would be a separate > switch arm from the possible values that calls out an programming > error, e.g. BUG("at stage #%d???", ce_stage(ce)). Simply removing > the "default" arm would be another way out of this strangeness. > > - When we see a stage #0 entry, we know we will not have higher > stage entries with the same name. Not clearing last_stage here > feels wrong, as the primary reason why last_stage variable is > used is to remember the last ce that was shown, so that other > entries with the same name can be skipped. > > By the way, "last_shown_ce" may be a much better name for the > variable, as you do not really care what stage number the ce you > showed last was at (you care about its name). > > Also, I do not see a good reason why the last_shown_ce variable > should have lifetime longer than the block that contains this for() > loop (and the other for loop for deleted/modified codepath we see > later). Especially since you initialize the variable that you made > visible to the entire function to NULL before entering the first for > loop, but you do not set it back to NULL before entering the second > for loop, it is inviting a subtle bug. You may have been given > show_cached and show_modified at the same time, so you enter the > first loop and have shown the first stage of the last conflicted > path, whose cache entry is left in the last_stage variable. Since > the variable has longer lifespan than necessary, when the second > loop is entered, it still points at the cache entry for the highest > stage of the last conflicted path. That is because the code forgets > to clear it to NULL before entering the second for loop. > > Having said all that, I suspect that we may be much better off if we > can somehow merge the two loops into one. You may be dedup adjacent > entries in each loop separately with the approach taken by this > patch, but I do not think the patch would work to deduplicate across > two loops. For example, what happens if you do this? > > $ git reset --hard > $ echo >>builtin/ls-files.c > $ git ls-files -c -m builtin/ls-files.c > $ git ls-files -t -c -m builtin/ls-files.c > > I think you see the path twice in the output, with or without your > --dedup option (remember what I said about proving, by the way? ;-)). > > Once we successfully merged two loops into one, the part that shows > tracked paths in the function would have only one loop, and it would > become a lot cleaner to add the logic to "skip showing the ce if it > has the same name as the previously shown one, only when doing so > won't lose information", by doing something like this: > > static void show_files(....) > { > /* show_others || show_killed done here */ > ... > > /* leave early if not showing anything */ > if (! (show_cached || show_stage || show_deleted || show_modified)) > return; > > last_shown_ce = NULL; > for (i = 0; i < repo->index->cache_nr; i++) { > const struct cache_entry *ce = repo->index->cache[i]; > > if (skipping_duplicates && last_shown_ce) > if (!strcmp(ce->name, last_shown_ce->name)) > continue; > > construct_fullname(); > > /* various reasons to skip the entry tested */ > if (showing ignored directory and ce is excluded) > continue; > if (show_unmerged && !ce_stage(ce)) > continue; > if (ce->ce_flags & CE_UPDATE) > continue; > ... other reasons may appear here ... > > /* now we are committed to show it */ > last_shown_ce = ce; > > ... various different ways to show ce come here ... > show_ce(...); > } > } > > where "skipping_duplicates" would be set when "--deduplicate" is > asked and we are not showing information other than the pathname > via various options e.g. the tags (-t) or stages (-s/-u). > > > + if (delete_dup && show_deleted && show_modified && err) > > show_ce(repo, dir, ce, fullname.buf, tag_removed); > > I actually think the original code that is still shown here ... > > > + else { > > + if (show_deleted && err) > > + show_ce(repo, dir, ce, fullname.buf, tag_removed); > > + if (show_modified && ie_modified(repo->index, ce, &st, 0)) > > ... about modified file is buggy. If lstat() failed, then &st has > no useful information, so it is wrong to feed it to ie_modified(). > > Perhaps a three-patch series that is structured like this may be in > order? > > #1: bugfix for --deleted and --modified. > > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > if (errno != E_NOENT) > error_errno("cannot lstat '%s'", fullname.buf); > else { > if (show_deleted) > show_ce(..., tag_removed); > if (show_modified) > show_ce(..., tag_modified); > } > } else if (show_modified && ie_modified(...)) > show_ce(..., tag_modified); > > This hopefully should not change the semantics. If you ask > --deleted and --modified, a deleted path would be listed twice. > > #2: consolidate two for loops into one. > > The two loops have slightly different condition to skip a ce, > and different logic on what tag each path is shown with. When > --cached and --modified or --deleted are asked for at the same > time, we'd show them multiple times (this is done inside the > loop for each ce) > > if (show_cached || show_stage) > show_ce(... ce_stage(ce) ? tag_unmerged : ...); > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > ... code that corresponds to the > ... illustration in #1 above come here. > } else if (...) > show_ce(..., tag_modified); > > This changes the semantics. The original iterates the index > twice, so you may see the same entry from --cached once and > then again from --modified. The updated one still will show > the same entry twice but next to each other. > > #3: optionally deduplicate. > > Once we have a single loop, deduplicationg based on names is > trivial, as we seen before. > > > Hmm?
Eric,Thanks! I have little confuse about I can use` test_when_finished "git switch master" `, but I can't use` test_when_finished "git switch -" `, why? Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道: > > On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > In order to provide users a better experience > > when viewing information about files in the index > > and the working tree, the `--dedup` option will suppress > > some duplicate options under some conditions. > > [...] > > I have a few very minor comments alongside Junio's review comments... > > > Signed-off-by: ZheNing Hu <adlternative@gmail.com> > > --- > > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh > > @@ -0,0 +1,54 @@ > > +test_description='git ls-files --dedup test. > > + > > +This test prepares the following in the cache: > > + > > + a.txt - a file(base) > > + a.txt - a file(master) > > + a.txt - a file(dev) > > + b.txt - a file > > + delete.txt - a file > > + expect1 - a file > > + expect2 - a file > > + > > +' > > This test script description is outdated now. Perhaps shorten it to: > > test_description='ls-files dedup tests' > > Or, it might be suitable to simply add the new test to the existing > t3004-ls-files-basic.sh instead. > > > +test_expect_success 'setup' ' > > + > a.txt && > > + > b.txt && > > + > delete.txt && > > + cat >expect1<<-\EOF && > > Style nits: no space after redirection operator and a space before > redirection operator: > > >a.txt && > >b.txt && > >delete.txt && > cat >expect1 <<-\EOF && > > > + cat >expect2<<-EOF && > > Nit: missing the backslash (and wrong spacing): > > cat >expect2 <<-\EOF && > > > + echo a>a.txt && > > + echo b>b.txt && > > Style: > > echo a >a.txt && > echo b >b.txt && > > > + echo delete >delete.txt && > > + git add a.txt b.txt delete.txt && > > + git commit -m master:2 && > > + git checkout HEAD~ && > > + git switch -c dev && > > If someone adds a new test after this test, then that new test will > run in the "dev" branch, which might be unexpected or undesirable. It > often is a good idea to ensure that tests do certain types of cleanup > to avoid breaking subsequent tests. Here, it would be a good idea to > ensure that the test switches back to the original branch when it > finishes (regardless of whether it finishes successfully or > unsuccessfully). > > git switch -c dev && > test_when_finished "git switch master" && > > Or you could use `git switch -` if you don't want to hard-code the > name "master" in the test (since there has been effort lately to > remove that name from tests. > > > + echo change >a.txt && > > + git add a.txt && > > + git commit -m dev:1 && > > + test_must_fail git merge master && > > + git ls-files -t --dedup >actual1 && > > + test_cmp expect1 actual1 && > > + rm delete.txt && > > + git ls-files -d -m -t --dedup >actual2 && > > + test_cmp expect2 actual2 > > We usually don't bother giving temporary files unique names like > "actual1" and "actual2" unless those files must exist at the same > time. This is because unique names like this may confuse readers into > wondering if there is some hidden interdependency between the files. > In this case, the files don't need to exist at the same time, so it > may be better simply to use the names "actual" and "expect", like > this: > > ...other stuff... > cat >expect <<-\EOF && > ... > EOF > git ls-files -t --dedup >actual && > test_cmp expect actual && > rm delete.txt && > cat >expect <<-\EOF && > ... > EOF > git ls-files -d -m -t --dedup >actual && > test_cmp expect actual > > (It also has the benefit that the "expect" content is closer to the > place where it is actually used, which may make it a bit easier for a > person reading the test to understand what is supposed to be > produced.)
胡哲宁 <adlternative@gmail.com> writes: > Here I may disagree with your point of view: > if (errno != E_NOENT) > error_errno("cannot lstat '%s'", fullname.buf); > With this sentence included, the patch will fail the test: > t/t3010-ls-files-killed-modified.sh. > the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);` > So I temporarily removed the judgment of errno. I didn't mean to give you a solution that is ready to be cut-and-pasted without thinking ;-) If NOTDIR needs to also be excluded, then you can exclude it just like the above illustration excludes NOENT to solve the issue, right? >> #2: consolidate two for loops into one. >> ... >> This changes the semantics. The original iterates the index >> twice, so you may see the same entry from --cached once and >> then again from --modified. The updated one still will show >> the same entry twice but next to each other. >> > Well,This does change the semantics. I think people who used two > for loops before may want to separate different outputs. > Now, if you don’t use "--deduplicate", You may see six consecutive > items under a combination of multiple options. Yes, and that is intended and is a vast improvement from the current behaviour, which shows 3 in the first loop, bunch of unrelated entries from the rest of the first loop, yet another bunch of unrelated entries from the early part of the second loop and then finally shows 3 from the second loop. With the "single loop" update, at least, the entries are sorted by their path and it would make it easier to see (if the user cares to trigger both --cached and --modified, that is), no?
On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道: > > > + git switch -c dev && > > > > If someone adds a new test after this test, then that new test will > > run in the "dev" branch, which might be unexpected or undesirable. It > > often is a good idea to ensure that tests do certain types of cleanup > > to avoid breaking subsequent tests. Here, it would be a good idea to > > ensure that the test switches back to the original branch when it > > finishes (regardless of whether it finishes successfully or > > unsuccessfully). > > > > git switch -c dev && > > test_when_finished "git switch master" && > > > > Or you could use `git switch -` if you don't want to hard-code the > > name "master" in the test (since there has been effort lately to > > remove that name from tests. > > > I have little confuse about I can use` test_when_finished "git switch master" `, > but I can't use` test_when_finished "git switch -" `, > why? You may use either one. I presented both as alternative approaches.
Eric Sunshine <sunshine@sunshineco.com> writes: > On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote: >> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道: >> > > + git switch -c dev && >> > >> > If someone adds a new test after this test, then that new test will >> > run in the "dev" branch, which might be unexpected or undesirable. It >> > often is a good idea to ensure that tests do certain types of cleanup >> > to avoid breaking subsequent tests. Here, it would be a good idea to >> > ensure that the test switches back to the original branch when it >> > finishes (regardless of whether it finishes successfully or >> > unsuccessfully). >> > >> > git switch -c dev && >> > test_when_finished "git switch master" && >> > >> > Or you could use `git switch -` if you don't want to hard-code the >> > name "master" in the test (since there has been effort lately to >> > remove that name from tests. >> > >> I have little confuse about I can use` test_when_finished "git switch master" `, >> but I can't use` test_when_finished "git switch -" `, >> why? > > You may use either one. I presented both as alternative approaches. I am sensing a bit of miscommunication here. You sound like you still believe either would work OK, but to me, it sounds like that the author claims the one of them does not work for him/her. I do not think the test framework itself mucks with the reflog to make switching to "-" (or "@{-1}") break, so I find it implausible that "switch -" form not to work, and unlike your "either is OK", I have a moderately strong preference to use the "go back to the previous one, whatever it is called" form. Thanks.
On Sun, Jan 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote: > >> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道: > >> > test_when_finished "git switch master" && > >> > > >> > Or you could use `git switch -` if you don't want to hard-code the > >> > name "master" in the test (since there has been effort lately to > >> > remove that name from tests. > >> > > >> I have little confuse about I can use` test_when_finished "git switch master" `, > >> but I can't use` test_when_finished "git switch -" `, > >> why? > > > > You may use either one. I presented both as alternative approaches. > > I am sensing a bit of miscommunication here. You sound like you > still believe either would work OK, but to me, it sounds like that > the author claims the one of them does not work for him/her. That could be. My eye glided over the code too quickly to notice that it was using `git checkout HEAD~` followed immediately by `git switch -c dev`, which means that `git switch -` would indeed not return to "master" (in fact, it errors out).
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index cbcf5263dd0..0f8dbeeea20 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -13,6 +13,7 @@ SYNOPSIS (--[cached|deleted|others|ignored|stage|unmerged|killed|modified])* (-[c|d|o|i|s|u|k|m])* [--eol] + [--dedup] [-x <pattern>|--exclude=<pattern>] [-X <file>|--exclude-from=<file>] [--exclude-per-directory=<file>] @@ -81,6 +82,10 @@ OPTIONS \0 line termination on output and do not quote filenames. See OUTPUT below for more information. +--dedup:: + Suppress duplicate entries when conflict happen or `--deleted` + and `--modified` are combined. + -x <pattern>:: --exclude=<pattern>:: Skip untracked files matching pattern. diff --git a/builtin/ls-files.c b/builtin/ls-files.c index c8eae899b82..bc4eded19ab 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -35,6 +35,7 @@ static int line_terminator = '\n'; static int debug_mode; static int show_eol; static int recurse_submodules; +static int delete_dup; static const char *prefix; static int max_prefix_len; @@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir) { int i; struct strbuf fullname = STRBUF_INIT; + const struct cache_entry *last_stage = NULL; /* For cached/deleted files we don't need to even do the readdir */ if (show_others || show_killed) { @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir) for (i = 0; i < repo->index->cache_nr; i++) { const struct cache_entry *ce = repo->index->cache[i]; + if (show_cached && delete_dup) { + switch (ce_stage(ce)) { + case 0: + default: + break; + case 1: + case 2: + case 3: + if (last_stage && + !strcmp(last_stage->name, ce->name)) + continue; + last_stage = ce; + } + } construct_fullname(&fullname, repo, ce); if ((dir->flags & DIR_SHOW_IGNORED) && @@ -337,6 +353,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir) struct stat st; int err; + if (delete_dup) { + switch (ce_stage(ce)) { + case 0: + default: + break; + case 1: + case 2: + case 3: + if (last_stage && + !strcmp(last_stage->name, ce->name)) + continue; + last_stage = ce; + } + } construct_fullname(&fullname, repo, ce); if ((dir->flags & DIR_SHOW_IGNORED) && @@ -347,10 +377,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir) if (ce_skip_worktree(ce)) continue; err = lstat(fullname.buf, &st); - if (show_deleted && err) + if (delete_dup && show_deleted && show_modified && err) show_ce(repo, dir, ce, fullname.buf, tag_removed); - if (show_modified && ie_modified(repo->index, ce, &st, 0)) - show_ce(repo, dir, ce, fullname.buf, tag_modified); + else { + if (show_deleted && err) + show_ce(repo, dir, ce, fullname.buf, tag_removed); + if (show_modified && ie_modified(repo->index, ce, &st, 0)) + show_ce(repo, dir, ce, fullname.buf, tag_modified); + } } } @@ -578,6 +612,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) N_("pretend that paths removed since <tree-ish> are still present")), OPT__ABBREV(&abbrev), OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")), + OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")), OPT_END() }; diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh new file mode 100755 index 00000000000..aec7d364235 --- /dev/null +++ b/t/t3012-ls-files-dedup.sh @@ -0,0 +1,54 @@ +#!/bin/sh + +test_description='git ls-files --dedup test. + +This test prepares the following in the cache: + + a.txt - a file(base) + a.txt - a file(master) + a.txt - a file(dev) + b.txt - a file + delete.txt - a file + expect1 - a file + expect2 - a file + +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + > a.txt && + > b.txt && + > delete.txt && + cat >expect1<<-\EOF && + M a.txt + H b.txt + H delete.txt + H expect1 + H expect2 + EOF + cat >expect2<<-EOF && + C a.txt + R delete.txt + EOF + git add a.txt b.txt delete.txt expect1 expect2 && + git commit -m master:1 && + echo a>a.txt && + echo b>b.txt && + echo delete >delete.txt && + git add a.txt b.txt delete.txt && + git commit -m master:2 && + git checkout HEAD~ && + git switch -c dev && + echo change >a.txt && + git add a.txt && + git commit -m dev:1 && + test_must_fail git merge master && + git ls-files -t --dedup >actual1 && + test_cmp expect1 actual1 && + rm delete.txt && + git ls-files -d -m -t --dedup >actual2 && + test_cmp expect2 actual2 +' + +test_done