diff mbox series

[GSOC,v2] diff-index: enable sparse index

Message ID 20230408112342.404318-1-nanth.raghul@gmail.com (mailing list archive)
State New, archived
Headers show
Series [GSOC,v2] diff-index: enable sparse index | expand

Commit Message

Raghul Nanth A April 8, 2023, 11:23 a.m. UTC
diff-index uses the run_diff_index() function to generate its diff. This
function has been made sparse-index aware in the series that led to
8d2c3732 (Merge branch 'ld/sparse-diff-blame', 2021-12-21). Hence we can
just set the requires-full-index to false for "diff-index".

Performance metrics

  Test                                                  HEAD~1            HEAD
  ----------------------------------------------------------------------------------------------
  2000.2: echo >>a && git diff-index HEAD (full-v3)     0.09(0.06+0.04)   0.09(0.07+0.03) +0.0%
  2000.3: echo >>a && git diff-index HEAD (full-v4)     0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
  2000.4: echo >>a && git diff-index HEAD (sparse-v3)   0.37(0.31+0.06)   0.01(0.02+0.03) -97.3%
  2000.5: echo >>a && git diff-index HEAD (sparse-v4)   0.30(0.26+0.05)   0.01(0.01+0.04) -96.7%
  2000.6: git diff-index HEAD **a (full-v3)             0.06(0.05+0.01)   0.06(0.06+0.01) +0.0%
  2000.7: git diff-index HEAD **a (full-v4)             0.06(0.05+0.01)   0.06(0.04+0.02) +0.0%
  2000.8: git diff-index HEAD **a (sparse-v3)           0.29(0.25+0.03)   0.01(0.01+0.00) -96.6%
  2000.9: git diff-index HEAD **a (sparse-v4)           0.37(0.34+0.02)   0.01(0.01+0.00) -97.3%
  2000.10: git diff-index --cached HEAD (full-v3)       0.05(0.03+0.01)   0.05(0.03+0.02) +0.0%
  2000.11: git diff-index --cached HEAD (full-v4)       0.05(0.03+0.01)   0.05(0.02+0.02) +0.0%
  2000.12: git diff-index --cached HEAD (sparse-v3)     0.35(0.33+0.01)   0.01(0.00+0.00) -97.1%
  2000.13: git diff-index --cached HEAD (sparse-v4)     0.35(0.32+0.02)   0.01(0.00+0.00) -97.1%
---

Sorry for the late reply. Got caught up in school work
  * Fixed commit message
  * Added check to expand index if needed (based on diff-files)
  * Added pathspec based tests

 builtin/diff-index.c                     |  9 +++++
 t/perf/p2000-sparse-operations.sh        |  3 ++
 t/t1092-sparse-checkout-compatibility.sh | 44 ++++++++++++++++++++++++
 3 files changed, 56 insertions(+)

Comments

Victoria Dye April 13, 2023, 9:14 p.m. UTC | #1
Raghul Nanth A wrote:
> diff-index uses the run_diff_index() function to generate its diff. This
> function has been made sparse-index aware in the series that led to
> 8d2c3732 (Merge branch 'ld/sparse-diff-blame', 2021-12-21). Hence we can
> just set the requires-full-index to false for "diff-index".
> 
> Performance metrics
> 
>   Test                                                  HEAD~1            HEAD
>   ----------------------------------------------------------------------------------------------
>   2000.2: echo >>a && git diff-index HEAD (full-v3)     0.09(0.06+0.04)   0.09(0.07+0.03) +0.0%
>   2000.3: echo >>a && git diff-index HEAD (full-v4)     0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
>   2000.4: echo >>a && git diff-index HEAD (sparse-v3)   0.37(0.31+0.06)   0.01(0.02+0.03) -97.3%
>   2000.5: echo >>a && git diff-index HEAD (sparse-v4)   0.30(0.26+0.05)   0.01(0.01+0.04) -96.7%
>   2000.6: git diff-index HEAD **a (full-v3)             0.06(0.05+0.01)   0.06(0.06+0.01) +0.0%
>   2000.7: git diff-index HEAD **a (full-v4)             0.06(0.05+0.01)   0.06(0.04+0.02) +0.0%
>   2000.8: git diff-index HEAD **a (sparse-v3)           0.29(0.25+0.03)   0.01(0.01+0.00) -96.6%
>   2000.9: git diff-index HEAD **a (sparse-v4)           0.37(0.34+0.02)   0.01(0.01+0.00) -97.3%
>   2000.10: git diff-index --cached HEAD (full-v3)       0.05(0.03+0.01)   0.05(0.03+0.02) +0.0%
>   2000.11: git diff-index --cached HEAD (full-v4)       0.05(0.03+0.01)   0.05(0.02+0.02) +0.0%
>   2000.12: git diff-index --cached HEAD (sparse-v3)     0.35(0.33+0.01)   0.01(0.00+0.00) -97.1%
>   2000.13: git diff-index --cached HEAD (sparse-v4)     0.35(0.32+0.02)   0.01(0.00+0.00) -97.1%
> ---
> 
> Sorry for the late reply. Got caught up in school work
>   * Fixed commit message
>   * Added check to expand index if needed (based on diff-files)
>   * Added pathspec based tests

Please include the range-diff comparing the previous version to the new one
in your future iterations & patch series in general. GitGitGadget adds it by
default, but if you're using 'send-email' you should be able to use the
'--range-diff' option to generate it (see MyFirstContribution [1] for more
information).

[1] https://git-scm.com/docs/MyFirstContribution#v2-git-send-email

> 
>  builtin/diff-index.c                     |  9 +++++
>  t/perf/p2000-sparse-operations.sh        |  3 ++
>  t/t1092-sparse-checkout-compatibility.sh | 44 ++++++++++++++++++++++++
>  3 files changed, 56 insertions(+)
> 
> diff --git a/builtin/diff-index.c b/builtin/diff-index.c
> index 35dc9b23ee..e67cf5a1db 100644
> --- a/builtin/diff-index.c
> +++ b/builtin/diff-index.c
> @@ -24,6 +24,14 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
>  		usage(diff_cache_usage);
>  
>  	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
> +
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +
> +	if (pathspec_needs_expanded_index(the_repository->index,
> +					  &rev.diffopt.pathspec))
> +		ensure_full_index(the_repository->index);

Re: my last review [2] - did you look into the behavior of 'diff' with
pathspecs and whether this 'pathspec_needs_expanded_index()' could be
centralized (in e.g. 'run_diff_index()')? What did you find?

[2] https://lore.kernel.org/git/91d3fd23-8120-db65-481a-e9f56017bb04@github.com/

> +
>  	repo_init_revisions(the_repository, &rev, prefix);
>  	rev.abbrev = 0;
>  	prefix = precompose_argv_prefix(argc, argv, prefix);
> @@ -69,6 +77,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
>  		perror("repo_read_index");
>  		return -1;
>  	}
> +
>  	result = run_diff_index(&rev, option);
>  	result = diff_result_code(&rev.diffopt, result);
>  	release_revisions(&rev);
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 3242cfe91a..62499d3aa8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -125,5 +125,8 @@ test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>  test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
>  test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*"
> +test_perf_on_all 'echo >>a && git diff-index HEAD'
> +test_perf_on_all git diff-index HEAD "**a"
> +test_perf_on_all git diff-index --cached HEAD
>  
>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 801919009e..24bc716c48 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1996,6 +1996,50 @@ test_expect_success 'sparse index is not expanded: rm' '
>  	ensure_not_expanded rm -r deep
>  '
>  
> +test_expect_success 'sparse index is not expanded: diff-index' '
> +	init_repos &&
> +
> +	echo "new" >>sparse-index/g &&
> +	git -C sparse-index add g &&
> +	git -C sparse-index commit -m "dummy" &&
> +	ensure_not_expanded diff-index HEAD~1 &&
> +
> +	echo "text" >>sparse-index/deep/a &&
> +
> +	ensure_not_expanded diff-index HEAD deep/a &&
> +	ensure_not_expanded diff-index HEAD deep/*
> +'

nit: please add a newline here (after the 'sparse index is not expanded:
diff-index' test) to stay consistent with the other tests in the file.

> +test_expect_success 'diff-index pathspec expands index when necessary' '
> +	init_repos &&
> +
> +	echo "text" >>sparse-index/deep/a &&
> +
> +	# pathspec that should expand index
> +	! ensure_not_expanded diff-index "*/a" &&

Using '! ensure_not_expanded' will fail if the command expands the index
_or_ if the command fails altogether, which could inadvertently make these
tests pass even when there's a breakage in 'diff-index'. An
'ensure_expanded' function was created in [3] to test these types of cases;
you can use that here if you base your branch on 'sl/diff-files-sparse' (see
SubmittingPatches for more information [4]).

[3] https://lore.kernel.org/git/20230322161820.3609-3-cheskaqiqi@gmail.com/
[4] https://git-scm.com/docs/SubmittingPatches#base-branch

> +	! ensure_not_expanded diff-index "**a"

Git pathspec syntax [5] does not follow glob rules (without the ':(glob)'
prefix, at least), so the '**' doesn't do anything special here that a
single '*' wouldn't do. So, to make it clear that you aren't using glob
patterns, it might be better to use '*a' instead. 

Also, why are the wildcard pathspecs here in double-quotes, but the ones in
the previous test ('sparse index is not expanded: diff-index') aren't?

[5] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

> +'
> +
> +test_expect_success 'diff-index with pathspec outside sparse definition' '
> +	init_repos &&
> +
> +	test_sparse_match test_must_fail git diff-index HEAD folder2/a
> +'
> +
> +test_expect_success 'match all: diff-index' '
> +	init_repos &&
> +
> +	test_all_match git diff-index HEAD &&
> +	write_script edit-contents <<-\EOF &&
> +	echo text >>$1
> +	EOF
> +	run_on_all ../edit-contents g &&
> +	run_on_all git add g &&
> +	run_on_all git commit -m "two" &&
> +	run_on_all rm g &&
> +	test_all_match git diff-index HEAD &&
> +	test_all_match git diff-index --cached HEAD~1
> +'
> +
>  test_expect_success 'grep with and --cached' '
>  	init_repos &&
>
Raghul Nanth A April 19, 2023, 3:15 p.m. UTC | #2
On Fri, Apr 14, 2023 at 2:44 AM Victoria Dye <vdye@github.com> wrote:
>
> Please include the range-diff comparing the previous version to the new one
> in your future iterations & patch series in general. GitGitGadget adds it by
> default, but if you're using 'send-email' you should be able to use the
> '--range-diff' option to generate it (see MyFirstContribution [1] for more
> information).
>

Yeah, I will keep this in mind. Sorry about that

> Re: my last review [2] - did you look into the behavior of 'diff' with
> pathspecs and whether this 'pathspec_needs_expanded_index()' could be
> centralized (in e.g. 'run_diff_index()')? What did you find?

I hadn't understood the review properly. I just thought you wanted to
make sure the function was added to diiff-index itself. I have read
through some of it, but I am still not 100% sure of the behaviour.
Will run through it more to get more definitive answers

> Using '! ensure_not_expanded' will fail if the command expands the index
> _or_ if the command fails altogether, which could inadvertently make these
> tests pass even when there's a breakage in 'diff-index'. An
> 'ensure_expanded' function was created in [3] to test these types of cases;
> you can use that here if you base your branch on 'sl/diff-files-sparse' (see
> SubmittingPatches for more information [4]).
>
> [3] https://lore.kernel.org/git/20230322161820.3609-3-cheskaqiqi@gmail.com/
> [4] https://git-scm.com/docs/SubmittingPatches#base-branch
>
> > +     ! ensure_not_expanded diff-index "**a"

Yeah, I saw this function, but since this wasn't integrated into
master, I wasn't sure how I would go about using it. I will base my
work off of the mentioned branch for now then. As for making sure the
function doesn't give false positives, it should be fine in this
current case, since I did try to manually run through the commands
just as a guarantee, and that seemed to run fine, but yes, I will make
sure to make those updates


> Git pathspec syntax [5] does not follow glob rules (without the ':(glob)'
> prefix, at least), so the '**' doesn't do anything special here that a
> single '*' wouldn't do. So, to make it clear that you aren't using glob
> patterns, it might be better to use '*a' instead.
>
> Also, why are the wildcard pathspecs here in double-quotes, but the ones in
> the previous test ('sparse index is not expanded: diff-index') aren't?


The double quotes were just to use the glob provided by pathspec. As
for why the previous ones don't have them, they are just using regular
pathspecs.

I will make the necessary changes as mentioned here.

Thank you,
Raghul
Shuqi Liang April 22, 2023, 9:25 p.m. UTC | #3
>> Re: my last review [2] - did you look into the behavior of 'diff' with
>> pathspecs and whether this 'pathspec_needs_expanded_index()' could be
>> centralized (in e.g. 'run_diff_index()')? What did you find?

>I hadn't understood the review properly. I just thought you wanted to
>make sure the function was added to diiff-index itself. I have read
>through some of it, but I am still not 100% sure of the behaviour.
>Will run through it more to get more definitive answers


Hello Raghul!

I hope this email finds you well. I recently came across your patch and 
noticed that you might be facing some difficulties with a specific issue.
I've reviewed your patch and thought I'd share a few suggestions that 
might help you overcome the issue.The code below I've already test it.
But there must have many detail I did not handle.

In the builtin/diff.c file, the cmd_diff() function can call either 
'run_diff_files()' or 'run_diff_index()' depending on the situation.
when you run 'git diff',run_diff_files() is called to find differences
between the working directory and the index. when you run 
'git diff --cached'. 'run_diff_index()' is called to find difference 
between the indexand the commit.

Both the "diff-index" and "diff" commands share the "run_diff_index" 
function. So, we can handling of pathspecs in one place(run_diff_index).
Doing this we can simplify the codebase and make it easier to maintain.

1.add test for diff in t1092. We will find the test will fail.

test_expect_success 'git diff with pathspec expands index when necessary' '
	init_repos &&

	echo "new" >>sparse-index/deep/a &&
	git -C sparse-index add deep/a &&

	# pathspec that should expand index
	ensure_expanded diff --cached "*/a" &&

	write_script edit-conflict <<-\EOF &&
	echo test >>"$1"
	EOF

	run_on_all ../edit-contents deep/a &&
	ensure_expanded diff HEAD "*/a"
'


2."run_diff_index" is in 'diff-lib.c'.We can add 
'pathspec_needs_expanded_index' in front of 'do_diff_cache()'(process
the index before the start of the diff process).

int run_diff_index(struct rev_info *revs, unsigned int option)
{
	......
	......
	if (merge_base) {
		diff_get_merge_base(revs, &oid);
		name = oid_to_hex_r(merge_base_hex, &oid);
	} else {
		oidcpy(&oid, &ent->item->oid);
		name = ent->name;
	}


	if (pathspec_needs_expanded_index(revs->diffopt.repo->index, &revs->diffopt.pathspec))
		ensure_full_index(revs->diffopt.repo->index);


	if (diff_cache(revs, &oid, name, cached))
		exit(128);

	.......
	......
	.......
}


3.Delete 'the pathspec_needs_expanded_index' function you have in your 
'builtin/diff-index.c' in last patch.

4.Run the test again, then the test for 'git diff' and your test for 
'git diff-index'will all pass!

I hope these suggestions prove helpful to you. If you have any questions
or would like to discuss further, please don't hesitate to reach out.

-----------------------------------------------------------------------
Best,
Shuqi
Raghul Nanth A May 2, 2023, 9:46 a.m. UTC | #4
Hey,
  Thanks for the info. Your explanations make sense and I will make the appropriate changes. I had two questions I had two questions regarding this: 
  I have been trying to base my changes off the 'sl/diff-files-sparse' branch, but I am not sure how I would go about doing that. I thought I would be just pulling changes from some remote repo but I couldn't find one. So, could you let me know how I could do that?
  Also, I don't seem to have been CC'd on this email. Just wanted to point that out, so that I don't accidentally miss conversations.

Thanks,
Raghul
Victoria Dye May 2, 2023, 5:35 p.m. UTC | #5
Raghul Nanth A wrote:
> Hey,
>   Thanks for the info. Your explanations make sense and I will make the
>   appropriate changes. I had two questions I had two questions regarding
>   this: 
>   I have been trying to base my changes off the 'sl/diff-files-sparse'
>   branch, but I am not sure how I would go about doing that. I thought I
>   would be just pulling changes from some remote repo but I couldn't find
>   one. So, could you let me know how I could do that?

You should be able to find that branch in the https://github.com/gitster/git
remote.

>   Also, I don't seem to have been CC'd on this email. Just wanted to point
>   that out, so that I don't accidentally miss conversations.
> 
> Thanks,
> Raghul
diff mbox series

Patch

diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 35dc9b23ee..e67cf5a1db 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -24,6 +24,14 @@  int cmd_diff_index(int argc, const char **argv, const char *prefix)
 		usage(diff_cache_usage);
 
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
+
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
+	if (pathspec_needs_expanded_index(the_repository->index,
+					  &rev.diffopt.pathspec))
+		ensure_full_index(the_repository->index);
+
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
 	prefix = precompose_argv_prefix(argc, argv, prefix);
@@ -69,6 +77,7 @@  int cmd_diff_index(int argc, const char **argv, const char *prefix)
 		perror("repo_read_index");
 		return -1;
 	}
+
 	result = run_diff_index(&rev, option);
 	result = diff_result_code(&rev.diffopt, result);
 	release_revisions(&rev);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 3242cfe91a..62499d3aa8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -125,5 +125,8 @@  test_perf_on_all git checkout-index -f --all
 test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
 test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
 test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*"
+test_perf_on_all 'echo >>a && git diff-index HEAD'
+test_perf_on_all git diff-index HEAD "**a"
+test_perf_on_all git diff-index --cached HEAD
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 801919009e..24bc716c48 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1996,6 +1996,50 @@  test_expect_success 'sparse index is not expanded: rm' '
 	ensure_not_expanded rm -r deep
 '
 
+test_expect_success 'sparse index is not expanded: diff-index' '
+	init_repos &&
+
+	echo "new" >>sparse-index/g &&
+	git -C sparse-index add g &&
+	git -C sparse-index commit -m "dummy" &&
+	ensure_not_expanded diff-index HEAD~1 &&
+
+	echo "text" >>sparse-index/deep/a &&
+
+	ensure_not_expanded diff-index HEAD deep/a &&
+	ensure_not_expanded diff-index HEAD deep/*
+'
+test_expect_success 'diff-index pathspec expands index when necessary' '
+	init_repos &&
+
+	echo "text" >>sparse-index/deep/a &&
+
+	# pathspec that should expand index
+	! ensure_not_expanded diff-index "*/a" &&
+	! ensure_not_expanded diff-index "**a"
+'
+
+test_expect_success 'diff-index with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_sparse_match test_must_fail git diff-index HEAD folder2/a
+'
+
+test_expect_success 'match all: diff-index' '
+	init_repos &&
+
+	test_all_match git diff-index HEAD &&
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all ../edit-contents g &&
+	run_on_all git add g &&
+	run_on_all git commit -m "two" &&
+	run_on_all rm g &&
+	test_all_match git diff-index HEAD &&
+	test_all_match git diff-index --cached HEAD~1
+'
+
 test_expect_success 'grep with and --cached' '
 	init_repos &&