Message ID | 20230408112342.404318-1-nanth.raghul@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GSOC,v2] diff-index: enable sparse index | expand |
Raghul Nanth A wrote: > diff-index uses the run_diff_index() function to generate its diff. This > function has been made sparse-index aware in the series that led to > 8d2c3732 (Merge branch 'ld/sparse-diff-blame', 2021-12-21). Hence we can > just set the requires-full-index to false for "diff-index". > > Performance metrics > > Test HEAD~1 HEAD > ---------------------------------------------------------------------------------------------- > 2000.2: echo >>a && git diff-index HEAD (full-v3) 0.09(0.06+0.04) 0.09(0.07+0.03) +0.0% > 2000.3: echo >>a && git diff-index HEAD (full-v4) 0.09(0.06+0.04) 0.09(0.05+0.05) +0.0% > 2000.4: echo >>a && git diff-index HEAD (sparse-v3) 0.37(0.31+0.06) 0.01(0.02+0.03) -97.3% > 2000.5: echo >>a && git diff-index HEAD (sparse-v4) 0.30(0.26+0.05) 0.01(0.01+0.04) -96.7% > 2000.6: git diff-index HEAD **a (full-v3) 0.06(0.05+0.01) 0.06(0.06+0.01) +0.0% > 2000.7: git diff-index HEAD **a (full-v4) 0.06(0.05+0.01) 0.06(0.04+0.02) +0.0% > 2000.8: git diff-index HEAD **a (sparse-v3) 0.29(0.25+0.03) 0.01(0.01+0.00) -96.6% > 2000.9: git diff-index HEAD **a (sparse-v4) 0.37(0.34+0.02) 0.01(0.01+0.00) -97.3% > 2000.10: git diff-index --cached HEAD (full-v3) 0.05(0.03+0.01) 0.05(0.03+0.02) +0.0% > 2000.11: git diff-index --cached HEAD (full-v4) 0.05(0.03+0.01) 0.05(0.02+0.02) +0.0% > 2000.12: git diff-index --cached HEAD (sparse-v3) 0.35(0.33+0.01) 0.01(0.00+0.00) -97.1% > 2000.13: git diff-index --cached HEAD (sparse-v4) 0.35(0.32+0.02) 0.01(0.00+0.00) -97.1% > --- > > Sorry for the late reply. Got caught up in school work > * Fixed commit message > * Added check to expand index if needed (based on diff-files) > * Added pathspec based tests Please include the range-diff comparing the previous version to the new one in your future iterations & patch series in general. GitGitGadget adds it by default, but if you're using 'send-email' you should be able to use the '--range-diff' option to generate it (see MyFirstContribution [1] for more information). [1] https://git-scm.com/docs/MyFirstContribution#v2-git-send-email > > builtin/diff-index.c | 9 +++++ > t/perf/p2000-sparse-operations.sh | 3 ++ > t/t1092-sparse-checkout-compatibility.sh | 44 ++++++++++++++++++++++++ > 3 files changed, 56 insertions(+) > > diff --git a/builtin/diff-index.c b/builtin/diff-index.c > index 35dc9b23ee..e67cf5a1db 100644 > --- a/builtin/diff-index.c > +++ b/builtin/diff-index.c > @@ -24,6 +24,14 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) > usage(diff_cache_usage); > > git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ > + > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + > + if (pathspec_needs_expanded_index(the_repository->index, > + &rev.diffopt.pathspec)) > + ensure_full_index(the_repository->index); Re: my last review [2] - did you look into the behavior of 'diff' with pathspecs and whether this 'pathspec_needs_expanded_index()' could be centralized (in e.g. 'run_diff_index()')? What did you find? [2] https://lore.kernel.org/git/91d3fd23-8120-db65-481a-e9f56017bb04@github.com/ > + > repo_init_revisions(the_repository, &rev, prefix); > rev.abbrev = 0; > prefix = precompose_argv_prefix(argc, argv, prefix); > @@ -69,6 +77,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) > perror("repo_read_index"); > return -1; > } > + > result = run_diff_index(&rev, option); > result = diff_result_code(&rev.diffopt, result); > release_revisions(&rev); > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > index 3242cfe91a..62499d3aa8 100755 > --- a/t/perf/p2000-sparse-operations.sh > +++ b/t/perf/p2000-sparse-operations.sh > @@ -125,5 +125,8 @@ test_perf_on_all git checkout-index -f --all > test_perf_on_all git update-index --add --remove $SPARSE_CONE/a > test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" > test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*" > +test_perf_on_all 'echo >>a && git diff-index HEAD' > +test_perf_on_all git diff-index HEAD "**a" > +test_perf_on_all git diff-index --cached HEAD > > test_done > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 801919009e..24bc716c48 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -1996,6 +1996,50 @@ test_expect_success 'sparse index is not expanded: rm' ' > ensure_not_expanded rm -r deep > ' > > +test_expect_success 'sparse index is not expanded: diff-index' ' > + init_repos && > + > + echo "new" >>sparse-index/g && > + git -C sparse-index add g && > + git -C sparse-index commit -m "dummy" && > + ensure_not_expanded diff-index HEAD~1 && > + > + echo "text" >>sparse-index/deep/a && > + > + ensure_not_expanded diff-index HEAD deep/a && > + ensure_not_expanded diff-index HEAD deep/* > +' nit: please add a newline here (after the 'sparse index is not expanded: diff-index' test) to stay consistent with the other tests in the file. > +test_expect_success 'diff-index pathspec expands index when necessary' ' > + init_repos && > + > + echo "text" >>sparse-index/deep/a && > + > + # pathspec that should expand index > + ! ensure_not_expanded diff-index "*/a" && Using '! ensure_not_expanded' will fail if the command expands the index _or_ if the command fails altogether, which could inadvertently make these tests pass even when there's a breakage in 'diff-index'. An 'ensure_expanded' function was created in [3] to test these types of cases; you can use that here if you base your branch on 'sl/diff-files-sparse' (see SubmittingPatches for more information [4]). [3] https://lore.kernel.org/git/20230322161820.3609-3-cheskaqiqi@gmail.com/ [4] https://git-scm.com/docs/SubmittingPatches#base-branch > + ! ensure_not_expanded diff-index "**a" Git pathspec syntax [5] does not follow glob rules (without the ':(glob)' prefix, at least), so the '**' doesn't do anything special here that a single '*' wouldn't do. So, to make it clear that you aren't using glob patterns, it might be better to use '*a' instead. Also, why are the wildcard pathspecs here in double-quotes, but the ones in the previous test ('sparse index is not expanded: diff-index') aren't? [5] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec > +' > + > +test_expect_success 'diff-index with pathspec outside sparse definition' ' > + init_repos && > + > + test_sparse_match test_must_fail git diff-index HEAD folder2/a > +' > + > +test_expect_success 'match all: diff-index' ' > + init_repos && > + > + test_all_match git diff-index HEAD && > + write_script edit-contents <<-\EOF && > + echo text >>$1 > + EOF > + run_on_all ../edit-contents g && > + run_on_all git add g && > + run_on_all git commit -m "two" && > + run_on_all rm g && > + test_all_match git diff-index HEAD && > + test_all_match git diff-index --cached HEAD~1 > +' > + > test_expect_success 'grep with and --cached' ' > init_repos && >
On Fri, Apr 14, 2023 at 2:44 AM Victoria Dye <vdye@github.com> wrote: > > Please include the range-diff comparing the previous version to the new one > in your future iterations & patch series in general. GitGitGadget adds it by > default, but if you're using 'send-email' you should be able to use the > '--range-diff' option to generate it (see MyFirstContribution [1] for more > information). > Yeah, I will keep this in mind. Sorry about that > Re: my last review [2] - did you look into the behavior of 'diff' with > pathspecs and whether this 'pathspec_needs_expanded_index()' could be > centralized (in e.g. 'run_diff_index()')? What did you find? I hadn't understood the review properly. I just thought you wanted to make sure the function was added to diiff-index itself. I have read through some of it, but I am still not 100% sure of the behaviour. Will run through it more to get more definitive answers > Using '! ensure_not_expanded' will fail if the command expands the index > _or_ if the command fails altogether, which could inadvertently make these > tests pass even when there's a breakage in 'diff-index'. An > 'ensure_expanded' function was created in [3] to test these types of cases; > you can use that here if you base your branch on 'sl/diff-files-sparse' (see > SubmittingPatches for more information [4]). > > [3] https://lore.kernel.org/git/20230322161820.3609-3-cheskaqiqi@gmail.com/ > [4] https://git-scm.com/docs/SubmittingPatches#base-branch > > > + ! ensure_not_expanded diff-index "**a" Yeah, I saw this function, but since this wasn't integrated into master, I wasn't sure how I would go about using it. I will base my work off of the mentioned branch for now then. As for making sure the function doesn't give false positives, it should be fine in this current case, since I did try to manually run through the commands just as a guarantee, and that seemed to run fine, but yes, I will make sure to make those updates > Git pathspec syntax [5] does not follow glob rules (without the ':(glob)' > prefix, at least), so the '**' doesn't do anything special here that a > single '*' wouldn't do. So, to make it clear that you aren't using glob > patterns, it might be better to use '*a' instead. > > Also, why are the wildcard pathspecs here in double-quotes, but the ones in > the previous test ('sparse index is not expanded: diff-index') aren't? The double quotes were just to use the glob provided by pathspec. As for why the previous ones don't have them, they are just using regular pathspecs. I will make the necessary changes as mentioned here. Thank you, Raghul
>> Re: my last review [2] - did you look into the behavior of 'diff' with >> pathspecs and whether this 'pathspec_needs_expanded_index()' could be >> centralized (in e.g. 'run_diff_index()')? What did you find? >I hadn't understood the review properly. I just thought you wanted to >make sure the function was added to diiff-index itself. I have read >through some of it, but I am still not 100% sure of the behaviour. >Will run through it more to get more definitive answers Hello Raghul! I hope this email finds you well. I recently came across your patch and noticed that you might be facing some difficulties with a specific issue. I've reviewed your patch and thought I'd share a few suggestions that might help you overcome the issue.The code below I've already test it. But there must have many detail I did not handle. In the builtin/diff.c file, the cmd_diff() function can call either 'run_diff_files()' or 'run_diff_index()' depending on the situation. when you run 'git diff',run_diff_files() is called to find differences between the working directory and the index. when you run 'git diff --cached'. 'run_diff_index()' is called to find difference between the indexand the commit. Both the "diff-index" and "diff" commands share the "run_diff_index" function. So, we can handling of pathspecs in one place(run_diff_index). Doing this we can simplify the codebase and make it easier to maintain. 1.add test for diff in t1092. We will find the test will fail. test_expect_success 'git diff with pathspec expands index when necessary' ' init_repos && echo "new" >>sparse-index/deep/a && git -C sparse-index add deep/a && # pathspec that should expand index ensure_expanded diff --cached "*/a" && write_script edit-conflict <<-\EOF && echo test >>"$1" EOF run_on_all ../edit-contents deep/a && ensure_expanded diff HEAD "*/a" ' 2."run_diff_index" is in 'diff-lib.c'.We can add 'pathspec_needs_expanded_index' in front of 'do_diff_cache()'(process the index before the start of the diff process). int run_diff_index(struct rev_info *revs, unsigned int option) { ...... ...... if (merge_base) { diff_get_merge_base(revs, &oid); name = oid_to_hex_r(merge_base_hex, &oid); } else { oidcpy(&oid, &ent->item->oid); name = ent->name; } if (pathspec_needs_expanded_index(revs->diffopt.repo->index, &revs->diffopt.pathspec)) ensure_full_index(revs->diffopt.repo->index); if (diff_cache(revs, &oid, name, cached)) exit(128); ....... ...... ....... } 3.Delete 'the pathspec_needs_expanded_index' function you have in your 'builtin/diff-index.c' in last patch. 4.Run the test again, then the test for 'git diff' and your test for 'git diff-index'will all pass! I hope these suggestions prove helpful to you. If you have any questions or would like to discuss further, please don't hesitate to reach out. ----------------------------------------------------------------------- Best, Shuqi
Hey, Thanks for the info. Your explanations make sense and I will make the appropriate changes. I had two questions I had two questions regarding this: I have been trying to base my changes off the 'sl/diff-files-sparse' branch, but I am not sure how I would go about doing that. I thought I would be just pulling changes from some remote repo but I couldn't find one. So, could you let me know how I could do that? Also, I don't seem to have been CC'd on this email. Just wanted to point that out, so that I don't accidentally miss conversations. Thanks, Raghul
Raghul Nanth A wrote: > Hey, > Thanks for the info. Your explanations make sense and I will make the > appropriate changes. I had two questions I had two questions regarding > this: > I have been trying to base my changes off the 'sl/diff-files-sparse' > branch, but I am not sure how I would go about doing that. I thought I > would be just pulling changes from some remote repo but I couldn't find > one. So, could you let me know how I could do that? You should be able to find that branch in the https://github.com/gitster/git remote. > Also, I don't seem to have been CC'd on this email. Just wanted to point > that out, so that I don't accidentally miss conversations. > > Thanks, > Raghul
diff --git a/builtin/diff-index.c b/builtin/diff-index.c index 35dc9b23ee..e67cf5a1db 100644 --- a/builtin/diff-index.c +++ b/builtin/diff-index.c @@ -24,6 +24,14 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) usage(diff_cache_usage); git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ + + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (pathspec_needs_expanded_index(the_repository->index, + &rev.diffopt.pathspec)) + ensure_full_index(the_repository->index); + repo_init_revisions(the_repository, &rev, prefix); rev.abbrev = 0; prefix = precompose_argv_prefix(argc, argv, prefix); @@ -69,6 +77,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) perror("repo_read_index"); return -1; } + result = run_diff_index(&rev, option); result = diff_result_code(&rev.diffopt, result); release_revisions(&rev); diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 3242cfe91a..62499d3aa8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -125,5 +125,8 @@ test_perf_on_all git checkout-index -f --all test_perf_on_all git update-index --add --remove $SPARSE_CONE/a test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*" +test_perf_on_all 'echo >>a && git diff-index HEAD' +test_perf_on_all git diff-index HEAD "**a" +test_perf_on_all git diff-index --cached HEAD test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 801919009e..24bc716c48 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1996,6 +1996,50 @@ test_expect_success 'sparse index is not expanded: rm' ' ensure_not_expanded rm -r deep ' +test_expect_success 'sparse index is not expanded: diff-index' ' + init_repos && + + echo "new" >>sparse-index/g && + git -C sparse-index add g && + git -C sparse-index commit -m "dummy" && + ensure_not_expanded diff-index HEAD~1 && + + echo "text" >>sparse-index/deep/a && + + ensure_not_expanded diff-index HEAD deep/a && + ensure_not_expanded diff-index HEAD deep/* +' +test_expect_success 'diff-index pathspec expands index when necessary' ' + init_repos && + + echo "text" >>sparse-index/deep/a && + + # pathspec that should expand index + ! ensure_not_expanded diff-index "*/a" && + ! ensure_not_expanded diff-index "**a" +' + +test_expect_success 'diff-index with pathspec outside sparse definition' ' + init_repos && + + test_sparse_match test_must_fail git diff-index HEAD folder2/a +' + +test_expect_success 'match all: diff-index' ' + init_repos && + + test_all_match git diff-index HEAD && + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + run_on_all ../edit-contents g && + run_on_all git add g && + run_on_all git commit -m "two" && + run_on_all rm g && + test_all_match git diff-index HEAD && + test_all_match git diff-index --cached HEAD~1 +' + test_expect_success 'grep with and --cached' ' init_repos &&