Message ID | 20230322161820.3609-3-cheskaqiqi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff-files: integrate with sparse index | expand |
Shuqi Liang wrote: > diff --git a/builtin/diff-files.c b/builtin/diff-files.c > index dc991f753b..db90592090 100644 > --- a/builtin/diff-files.c > +++ b/builtin/diff-files.c > @@ -27,6 +27,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) > usage(diff_files_usage); > > git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ > + > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + > repo_init_revisions(the_repository, &rev, prefix); > rev.abbrev = 0; > > @@ -80,6 +84,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) > result = -1; > goto cleanup; > } > + > + if (pathspec_needs_expanded_index(the_repository->index, &rev.diffopt.pathspec)) > + ensure_full_index(the_repository->index); After reviewing the 'diff-index' integration [1], I'm wondering whether we actually need pathspec expansion at all in this case. 'diff-files' compares the working tree and the index, and will output a difference if the file on disk differs from the index. But, if a file is SKIP_WORKTREE'd, that diff will (I think?) always be empty. So, why would we need to expand a sparse directory to match a pathspec to its contents if we *know* that the diff will be empty? [1] https://lore.kernel.org/git/20230408112342.404318-1-nanth.raghul@gmail.com/ > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index d23041e27a..152f3f752e 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -1401,6 +1401,30 @@ ensure_not_expanded () { > test_region ! index ensure_full_index trace2.txt > } > > +ensure_expanded () { > + rm -f trace2.txt && > + if test -z "$WITHOUT_UNTRACKED_TXT" > + then > + echo >>sparse-index/untracked.txt > + fi && > + > + if test "$1" = "!" > + then > + shift && > + test_must_fail env \ > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ > + git -C sparse-index "$@" \ > + >sparse-index-out \ > + 2>sparse-index-error || return 1 > + else > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ > + git -C sparse-index "$@" \ > + >sparse-index-out \ > + 2>sparse-index-error || return 1 > + fi && > + test_region index ensure_full_index trace2.txt > +} This implementation duplicates a lot of the code from 'ensure_not_expanded'. Can 'ensure_expanded' and 'ensure_not_expanded' be refactored to call a common helper function (which contains the common code) instead? > + > test_expect_success 'sparse-index is not expanded' ' > init_repos && > > @@ -2101,4 +2125,32 @@ test_expect_success 'diff-files with pathspec outside sparse definition' ' > test_all_match git diff-files "folder*/a" > ' > > +test_expect_success 'diff-files pathspec expands index when necessary' ' > + init_repos && > + > + write_script edit-contents <<-\EOF && > + echo text >>"$1" > + EOF > + > + run_on_all ../edit-contents deep/a && > + > + # pathspec that should expand index > + ensure_expanded diff-files "*/a" && > + ensure_expanded diff-files "**a" Similar to the comments in my 'diff-index' review [2]: - The '**' in the pathspec doesn't do anything special unless using an explicit ':(glob)' pathspec. To make it clear that you're not trying to use a glob pathspec, you can use '*a' instead. - Why are these pathspecs in quotes, but those in 'sparse index is not expanded: diff-files' are? [2] https://lore.kernel.org/git/62821012-4fc3-5ad8-695c-70f7ab14a8c9@github.com/ > +' > + > +test_expect_success 'sparse index is not expanded: diff-files' ' > + init_repos && > + > + write_script edit-contents <<-\EOF && > + echo text >>"$1" > + EOF > + > + run_on_all ../edit-contents deep/a && > + > + ensure_not_expanded diff-files && > + ensure_not_expanded diff-files deep/a && > + ensure_not_expanded diff-files deep/* > +' > + > test_done
Hi Victoria, Sorry for the late reply. I'm still in the middle of my final exams period. On Thu, Apr 13, 2023 at 5:55 PM Victoria Dye <vdye@github.com> wrote: > > Shuqi Liang wrote: > > diff --git a/builtin/diff-files.c b/builtin/diff-files.c > > index dc991f753b..db90592090 100644 > > --- a/builtin/diff-files.c > > +++ b/builtin/diff-files.c > > @@ -27,6 +27,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) > > usage(diff_files_usage); > > > > git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ > > + > > + prepare_repo_settings(the_repository); > > + the_repository->settings.command_requires_full_index = 0; > > + > > repo_init_revisions(the_repository, &rev, prefix); > > rev.abbrev = 0; > > > > @@ -80,6 +84,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) > > result = -1; > > goto cleanup; > > } > > + > > + if (pathspec_needs_expanded_index(the_repository->index, &rev.diffopt.pathspec)) > > + ensure_full_index(the_repository->index); > > After reviewing the 'diff-index' integration [1], I'm wondering whether we > actually need pathspec expansion at all in this case. 'diff-files' compares > the working tree and the index, and will output a difference if the file on > disk differs from the index. But, if a file is SKIP_WORKTREE'd, that diff > will (I think?) always be empty. So, why would we need to expand a sparse > directory to match a pathspec to its contents if we *know* that the diff > will be empty? > > [1] https://lore.kernel.org/git/20230408112342.404318-1-nanth.raghul@gmail.com/ It's true that in the case of 'diff-files', expanding the sparse directory to match a pathspec to its contents might not be necessary. If we don't use pathspec expansion in this case. It could optimize for performance better. However, there could be some edge cases. if a user manually modifies the contents of a SKIP_WORKTREE file in the working tree, the diff between the working tree and the index would no longer be empty. So I think, In this case, expanding the sparse directory might still be necessary to ensure the correct behavior of the 'diff-files' command. > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > > index d23041e27a..152f3f752e 100755 > > --- a/t/t1092-sparse-checkout-compatibility.sh > > +++ b/t/t1092-sparse-checkout-compatibility.sh > > @@ -1401,6 +1401,30 @@ ensure_not_expanded () { > > test_region ! index ensure_full_index trace2.txt > > } > > > > +ensure_expanded () { > > + rm -f trace2.txt && > > + if test -z "$WITHOUT_UNTRACKED_TXT" > > + then > > + echo >>sparse-index/untracked.txt > > + fi && > > + > > + if test "$1" = "!" > > + then > > + shift && > > + test_must_fail env \ > > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ > > + git -C sparse-index "$@" \ > > + >sparse-index-out \ > > + 2>sparse-index-error || return 1 > > + else > > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ > > + git -C sparse-index "$@" \ > > + >sparse-index-out \ > > + 2>sparse-index-error || return 1 > > + fi && > > + test_region index ensure_full_index trace2.txt > > +} > > This implementation duplicates a lot of the code from 'ensure_not_expanded'. > Can 'ensure_expanded' and 'ensure_not_expanded' be refactored to call a > common helper function (which contains the common code) instead? Will do! > > + > > test_expect_success 'sparse-index is not expanded' ' > > init_repos && > > > > @@ -2101,4 +2125,32 @@ test_expect_success 'diff-files with pathspec outside sparse definition' ' > > test_all_match git diff-files "folder*/a" > > ' > > > > +test_expect_success 'diff-files pathspec expands index when necessary' ' > > + init_repos && > > + > > + write_script edit-contents <<-\EOF && > > + echo text >>"$1" > > + EOF > > + > > + run_on_all ../edit-contents deep/a && > > + > > + # pathspec that should expand index > > + ensure_expanded diff-files "*/a" && > > + ensure_expanded diff-files "**a" > > Similar to the comments in my 'diff-index' review [2]: > > - The '**' in the pathspec doesn't do anything special unless using an > explicit ':(glob)' pathspec. To make it clear that you're not trying to > use a glob pathspec, you can use '*a' instead. Will do ! > - Why are these pathspecs in quotes, but those in 'sparse index is not > expanded: diff-files' are? > > [2] https://lore.kernel.org/git/62821012-4fc3-5ad8-695c-70f7ab14a8c9@github.com/ I quote around the pathspec to prevent shell expansion of the pathspec patterns by the shell before they are passed to the git command. "*a" and "*/a" have special characters ' * '. I use quotes to tell the shell to treat them as regular characters. In 'sparse index is not expanded: diff-files''deep/a' does not contain any special characters that the shell would try to expand. So I use it without double quotes. And I think I need to add a double quote to 'deep/*'. Thanks, Shuqi
Shuqi Liang wrote: > Hi Victoria, > > Sorry for the late reply. I'm still in the middle of my final exams period. No problem at all, thanks for following up! > It's true that in the case of 'diff-files', expanding the sparse directory to > match a pathspec to its contents might not be necessary. If we don't use > pathspec expansion in this case. It could optimize for performance better. > > However, there could be some edge cases. if a user manually modifies the > contents of a SKIP_WORKTREE file in the working tree, the diff between > the working tree and the index would no longer be empty. So I think, In this > case, expanding the sparse directory might still be necessary to ensure the > correct behavior of the 'diff-files' command. If a user manually modifies a SKIP_WORKTREE file, SKIP_WORKTREE will be removed from the file and the index expanded automatically [1]. If that mechanism is working properly, there would be no need to manually check the pathspec and expand the index. Have you tried removing the 'pathspec_needs_expanded_index()' and running the tests? If so, is 'diff-files' producing incorrect results? [1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
Hi Victoria, > If a user manually modifies a SKIP_WORKTREE file, SKIP_WORKTREE will be > removed from the file and the index expanded automatically [1]. If that > mechanism is working properly, there would be no need to manually check the > pathspec and expand the index. > > Have you tried removing the 'pathspec_needs_expanded_index()' and running > the tests? If so, is 'diff-files' producing incorrect results? > > [1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ As per your suggestion, I tried removing pathspec_needs_expanded_index() from the code, and 'diff-files pathspec expands index when necessary' test failed. So, I'm thinking about keeping it to ensure everything works properly. I'd like to know your thoughts on this. Should we keep it or explore other options? Thanks, Shuqi
Shuqi Liang wrote: > Hi Victoria, > >> If a user manually modifies a SKIP_WORKTREE file, SKIP_WORKTREE will be >> removed from the file and the index expanded automatically [1]. If that >> mechanism is working properly, there would be no need to manually check the >> pathspec and expand the index. >> >> Have you tried removing the 'pathspec_needs_expanded_index()' and running >> the tests? If so, is 'diff-files' producing incorrect results? >> >> [1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ > > As per your suggestion, I tried removing pathspec_needs_expanded_index() > from the code, and 'diff-files pathspec expands index when necessary' > test failed. > > So, I'm thinking about keeping it to ensure everything works properly. > I'd like to know your thoughts on this. Should we keep it or explore > other options? Did the test fail because the index wasn't expanded in a case where you previously expended it to be expanded? Or because of the returned results of 'diff-files' are invalid? Only the latter represents incorrect behavior. If we're aren't expanding the index for a case that was causing index expansion before *and* the user-facing behavior is as-expected, that's the best-case scenario for a sparse index integration! Taking a step back, it's important to remember that the overarching goal of the project is not just to switch 'command_requires_full_index' to '0' everywhere, but to find all of the places where Git is working with the index and make sure that work can be done on a sparse directory. In most cases, it's possible to adapt an index-related operation to work with sparse directories (albeit with varying levels of complexity). The use of 'ensure_full_index()' is reserved for cases where it is _impossible_ to make Git perform a given action on a sparse directory - expanding the index completely eliminates the performance gains had by using a sparse index, so it should be avoided at all costs. I hope that helps! > > Thanks, > Shuqi
Hi Victoria, On Fri, Apr 21, 2023 at 5:26 PM Victoria Dye <vdye@github.com> wrote: > Only the latter represents incorrect behavior. If we're aren't expanding the > index for a case that was causing index expansion before *and* the > user-facing behavior is as-expected, that's the best-case scenario for a > sparse index integration! > > Taking a step back, it's important to remember that the overarching goal of > the project is not just to switch 'command_requires_full_index' to '0' > everywhere, but to find all of the places where Git is working with the > index and make sure that work can be done on a sparse directory. > > In most cases, it's possible to adapt an index-related operation to work > with sparse directories (albeit with varying levels of complexity). The use > of 'ensure_full_index()' is reserved for cases where it is _impossible_ to > make Git perform a given action on a sparse directory - expanding the index > completely eliminates the performance gains had by using a sparse index, so > it should be avoided at all costs. > > I hope that helps! Thanks for reminding me about the ultimate goal of sparse index integration! I've learned a lot from it. After looking into the test failure, it seems that the index didn't expand in cases where I expected it to. I'll go ahead and update my patch. Thanks, Shuqi
diff --git a/builtin/diff-files.c b/builtin/diff-files.c index dc991f753b..db90592090 100644 --- a/builtin/diff-files.c +++ b/builtin/diff-files.c @@ -27,6 +27,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) usage(diff_files_usage); git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ + + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + repo_init_revisions(the_repository, &rev, prefix); rev.abbrev = 0; @@ -80,6 +84,10 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix) result = -1; goto cleanup; } + + if (pathspec_needs_expanded_index(the_repository->index, &rev.diffopt.pathspec)) + ensure_full_index(the_repository->index); + result = run_diff_files(&rev, options); result = diff_result_code(&rev.diffopt, result); cleanup: diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 3242cfe91a..82751f2ca3 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -125,5 +125,7 @@ test_perf_on_all git checkout-index -f --all test_perf_on_all git update-index --add --remove $SPARSE_CONE/a test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*" +test_perf_on_all git diff-files +test_perf_on_all git diff-files $SPARSE_CONE/a test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index d23041e27a..152f3f752e 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1401,6 +1401,30 @@ ensure_not_expanded () { test_region ! index ensure_full_index trace2.txt } +ensure_expanded () { + rm -f trace2.txt && + if test -z "$WITHOUT_UNTRACKED_TXT" + then + echo >>sparse-index/untracked.txt + fi && + + if test "$1" = "!" + then + shift && + test_must_fail env \ + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C sparse-index "$@" \ + >sparse-index-out \ + 2>sparse-index-error || return 1 + else + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C sparse-index "$@" \ + >sparse-index-out \ + 2>sparse-index-error || return 1 + fi && + test_region index ensure_full_index trace2.txt +} + test_expect_success 'sparse-index is not expanded' ' init_repos && @@ -2101,4 +2125,32 @@ test_expect_success 'diff-files with pathspec outside sparse definition' ' test_all_match git diff-files "folder*/a" ' +test_expect_success 'diff-files pathspec expands index when necessary' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>"$1" + EOF + + run_on_all ../edit-contents deep/a && + + # pathspec that should expand index + ensure_expanded diff-files "*/a" && + ensure_expanded diff-files "**a" +' + +test_expect_success 'sparse index is not expanded: diff-files' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>"$1" + EOF + + run_on_all ../edit-contents deep/a && + + ensure_not_expanded diff-files && + ensure_not_expanded diff-files deep/a && + ensure_not_expanded diff-files deep/* +' + test_done
Originally, diff-files a pathspec that is out-of-cone in a sparse-index environment, Git dies with "pathspec '<x>' did not match any files", mainly because it does not expand the index so nothing is matched. Expand the index when the <pathspec> needs an expanded index, i.e. the <pathspec> contains wildcard that may need a full-index or the <pathspec> is simply outside of sparse-checkout definition. Remove full index requirement for `git diff-files`.Create an 'ensure_expanded' to handle silent failures. Add test to ensure the index only expanded when necessary in `git diff-files`. The `p2000` tests demonstrate a ~96% execution time reduction for 'git diff-files' and a ~97% execution time reduction for 'git diff-files' for a file using a sparse index: Test before after ----------------------------------------------------------------- 2000.78: git diff-files (full-v3) 0.09 0.08 -11.1% 2000.79: git diff-files (full-v4) 0.09 0.09 +0.0% 2000.80: git diff-files (sparse-v3) 0.52 0.02 -96.2% 2000.81: git diff-files (sparse-v4) 0.51 0.02 -96.1% 2000.82: git diff-files f2/f4/a (full-v3) 0.06 0.07 +16.7% 2000.83: git diff-files f2/f4/a (full-v4) 0.08 0.08 +0.0% 2000.84: git diff-files f2/f4/a (sparse-v3) 0.46 0.01 -97.8% 2000.85: git diff-files f2/f4/a (sparse-v4) 0.51 0.02 -96.1% Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> --- builtin/diff-files.c | 8 ++++ t/perf/p2000-sparse-operations.sh | 2 + t/t1092-sparse-checkout-compatibility.sh | 52 ++++++++++++++++++++++++ 3 files changed, 62 insertions(+)