Message ID | a0b6a152c754862323e9a5b89ad43ab34b6548f7.1634332836.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: diff and blame builtins | expand |
On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote: > From: Lessley Dennington <lessleydennington@gmail.com> > > Enable the sparse index for the 'git blame' command. The index was already > not expanded with this command, so the most interesting thing to do is to > add tests that verify that 'git blame' behaves correctly when the sparse > index is enabled and that its performance improves. More specifically, these > cases are: > > 1. The index is not expanded for 'blame' when given paths in the sparse > checkout cone at multiple levels. > > 2. Performance measurably improves for 'blame' with sparse index when given > paths in the sparse checkout cone at multiple levels. > > The `p2000` tests demonstrate a ~60% execution time reduction when running > 'blame' for a file two levels deep and and a ~30% execution time reduction > for a file three levels deep. Eek. What's eating up the other 30% when we have to open up another layer of trees? > > Test before after > ---------------------------------------------------------------- > 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% > 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% > 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% > 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% > 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% > 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% > 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% > 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% > > We do not include paths outside the sparse checkout cone because blame > currently does not support blaming files outside of the sparse definition. > Attempting to do so fails with the following error: > > fatal: no such path '<path outside sparse definition>' in HEAD. Small nit; this error message should be indented with a couple of space characters to indicate that it's the output of running Git instead of part of your patch message. Not worth a reroll on its own, but something to keep in mind for your many future patches :). > > Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> > --- > builtin/blame.c | 3 +++ > t/perf/p2000-sparse-operations.sh | 2 ++ > t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++------- > 3 files changed, 22 insertions(+), 7 deletions(-) > > diff --git a/builtin/blame.c b/builtin/blame.c > index 641523ff9af..af3d81e2bd4 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) > long anchor; > const int hexsz = the_hash_algo->hexsz; > > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + By now we're quite used to seeing this ;). Makes sense to me. > setup_default_color_by_age(); > git_config(git_blame_config, &output_option); > repo_init_revisions(the_repository, &revs, NULL); > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > index bff93f16e93..9ac76a049b8 100755 > --- a/t/perf/p2000-sparse-operations.sh > +++ b/t/perf/p2000-sparse-operations.sh > @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard > test_perf_on_all git reset -- does-not-exist > test_perf_on_all git diff > test_perf_on_all git diff --staged > +test_perf_on_all git blame $SPARSE_CONE/a > +test_perf_on_all git blame $SPARSE_CONE/f3/a Good. > test_done > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index e5d15be9d45..960ccf2d150 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' ' > test_all_match git blame deep/deeper1/deepest/a > ' > > -# TODO: blame currently does not support blaming files outside of the > -# sparse definition. It complains that the file doesn't exist locally. > -test_expect_failure 'blame with pathspec outside sparse definition' ' > +# Blame does not support blaming files outside of the sparse > +# definition, so we verify this scenario. > +test_expect_success 'blame with pathspec outside sparse definition' ' > init_repos && > > - test_all_match git blame folder1/a && > - test_all_match git blame folder2/a && > - test_all_match git blame deep/deeper2/a && > - test_all_match git blame deep/deeper2/deepest/a > + test_sparse_match git sparse-checkout set && > + test_sparse_match test_must_fail git blame folder1/a && > + test_sparse_match test_must_fail git blame folder2/a && > + test_sparse_match test_must_fail git blame deep/deeper2/a && > + test_sparse_match test_must_fail git blame deep/deeper2/deepest/a > ' test_must_fail used to allow for segfaults, but doesn't these days. So this is a good test of "it should fail in sparse checkouts but not crash", although I think it would be good to ensure that it's failing in the way you expect (i.e., by checking that stderr contains "no such path <xyz> in HEAD"). > > test_expect_success 'checkout and reset (mixed)' ' > @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' ' > ) > ' > > +test_expect_success 'sparse index is not expanded: blame' ' > + init_repos && > + > + ensure_not_expanded blame a && > + ensure_not_expanded blame deep/a && > + ensure_not_expanded blame deep/deeper1/a && > + ensure_not_expanded blame deep/deeper1/deepest/a > +' Makes sense. Probably just one of these is necessary, but I haven't looked into init_repos (or the "setup" test) enough to know for sure. Either way, not worth changing. Thanks, Taylor
On 10/25/21 1:53 PM, Taylor Blau wrote: > On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote: >> From: Lessley Dennington <lessleydennington@gmail.com> >> >> Enable the sparse index for the 'git blame' command. The index was already >> not expanded with this command, so the most interesting thing to do is to >> add tests that verify that 'git blame' behaves correctly when the sparse >> index is enabled and that its performance improves. More specifically, these >> cases are: >> >> 1. The index is not expanded for 'blame' when given paths in the sparse >> checkout cone at multiple levels. >> >> 2. Performance measurably improves for 'blame' with sparse index when given >> paths in the sparse checkout cone at multiple levels. >> >> The `p2000` tests demonstrate a ~60% execution time reduction when running >> 'blame' for a file two levels deep and and a ~30% execution time reduction >> for a file three levels deep. > > Eek. What's eating up the other 30% when we have to open up another > layer of trees? > I'm not sure to be totally honest. However, given these are both pretty good time reductions I don't think we should be terribly concerned. >> >> Test before after >> ---------------------------------------------------------------- >> 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% >> 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% >> 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% >> 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% >> 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% >> 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% >> 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% >> 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% >> >> We do not include paths outside the sparse checkout cone because blame >> currently does not support blaming files outside of the sparse definition. >> Attempting to do so fails with the following error: >> >> fatal: no such path '<path outside sparse definition>' in HEAD. > > Small nit; this error message should be indented with a couple of space > characters to indicate that it's the output of running Git instead of > part of your patch message. Not worth a reroll on its own, but something > to keep in mind for your many future patches :). > Eh, I'm making some changes based on your suggestions anyway, so I'm including this in v3. Thanks for letting me know! >> >> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> >> --- >> builtin/blame.c | 3 +++ >> t/perf/p2000-sparse-operations.sh | 2 ++ >> t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++------- >> 3 files changed, 22 insertions(+), 7 deletions(-) >> >> diff --git a/builtin/blame.c b/builtin/blame.c >> index 641523ff9af..af3d81e2bd4 100644 >> --- a/builtin/blame.c >> +++ b/builtin/blame.c >> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) >> long anchor; >> const int hexsz = the_hash_algo->hexsz; >> >> + prepare_repo_settings(the_repository); >> + the_repository->settings.command_requires_full_index = 0; >> + > > By now we're quite used to seeing this ;). Makes sense to me. > >> setup_default_color_by_age(); >> git_config(git_blame_config, &output_option); >> repo_init_revisions(the_repository, &revs, NULL); >> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh >> index bff93f16e93..9ac76a049b8 100755 >> --- a/t/perf/p2000-sparse-operations.sh >> +++ b/t/perf/p2000-sparse-operations.sh >> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard >> test_perf_on_all git reset -- does-not-exist >> test_perf_on_all git diff >> test_perf_on_all git diff --staged >> +test_perf_on_all git blame $SPARSE_CONE/a >> +test_perf_on_all git blame $SPARSE_CONE/f3/a > > Good. > >> test_done >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index e5d15be9d45..960ccf2d150 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' ' >> test_all_match git blame deep/deeper1/deepest/a >> ' >> >> -# TODO: blame currently does not support blaming files outside of the >> -# sparse definition. It complains that the file doesn't exist locally. >> -test_expect_failure 'blame with pathspec outside sparse definition' ' >> +# Blame does not support blaming files outside of the sparse >> +# definition, so we verify this scenario. >> +test_expect_success 'blame with pathspec outside sparse definition' ' >> init_repos && >> >> - test_all_match git blame folder1/a && >> - test_all_match git blame folder2/a && >> - test_all_match git blame deep/deeper2/a && >> - test_all_match git blame deep/deeper2/deepest/a >> + test_sparse_match git sparse-checkout set && >> + test_sparse_match test_must_fail git blame folder1/a && >> + test_sparse_match test_must_fail git blame folder2/a && >> + test_sparse_match test_must_fail git blame deep/deeper2/a && >> + test_sparse_match test_must_fail git blame deep/deeper2/deepest/a >> ' > > test_must_fail used to allow for segfaults, but doesn't these days. So > this is a good test of "it should fail in sparse checkouts but not > crash", although I think it would be good to ensure that it's failing in > the way you expect (i.e., by checking that stderr contains "no such path > <xyz> in HEAD"). Good suggestion, coming in v3! >> >> test_expect_success 'checkout and reset (mixed)' ' >> @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' ' >> ) >> ' >> >> +test_expect_success 'sparse index is not expanded: blame' ' >> + init_repos && >> + >> + ensure_not_expanded blame a && >> + ensure_not_expanded blame deep/a && >> + ensure_not_expanded blame deep/deeper1/a && >> + ensure_not_expanded blame deep/deeper1/deepest/a >> +' > > Makes sense. Probably just one of these is necessary, but I haven't > looked into init_repos (or the "setup" test) enough to know for sure. > Either way, not worth changing. > > Thanks, > Taylor >
On Tue, Oct 26, 2021 at 9:17 AM Lessley Dennington <lessleydennington@gmail.com> wrote: > > On 10/25/21 1:53 PM, Taylor Blau wrote: > > On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote: > >> From: Lessley Dennington <lessleydennington@gmail.com> > >> > >> Enable the sparse index for the 'git blame' command. The index was already > >> not expanded with this command, so the most interesting thing to do is to > >> add tests that verify that 'git blame' behaves correctly when the sparse > >> index is enabled and that its performance improves. More specifically, these > >> cases are: > >> > >> 1. The index is not expanded for 'blame' when given paths in the sparse > >> checkout cone at multiple levels. > >> > >> 2. Performance measurably improves for 'blame' with sparse index when given > >> paths in the sparse checkout cone at multiple levels. > >> > >> The `p2000` tests demonstrate a ~60% execution time reduction when running > >> 'blame' for a file two levels deep and and a ~30% execution time reduction > >> for a file three levels deep. > > > > Eek. What's eating up the other 30% when we have to open up another > > layer of trees? > > > I'm not sure to be totally honest. However, given these are both pretty > good time reductions I don't think we should be terribly concerned. It's not something eating up more time in the sparse-index code; let's look a bit closer... > >> > >> Test before after > >> ---------------------------------------------------------------- > >> 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% > >> 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% > >> 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% > >> 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% > >> 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% > >> 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% > >> 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% > >> 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% Time was ~0.55s for the full at two levels deep, and dropped by just over 0.3s in sparse-index. Time was ~1.05s for the full at three levels deep, and dropped by just over 0.3s in sparse-index. So, the sparse-index enabling saves us the same amount of time, it's just that the overall execution time for the non-sparse-index comparison point goes up. Saving the same amount of time for the two cases seems intuitive to me; both cases get to avoid looking at the same number of index entries outside the sparsity paths.
diff --git a/builtin/blame.c b/builtin/blame.c index 641523ff9af..af3d81e2bd4 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix) long anchor; const int hexsz = the_hash_algo->hexsz; + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + setup_default_color_by_age(); git_config(git_blame_config, &output_option); repo_init_revisions(the_repository, &revs, NULL); diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index bff93f16e93..9ac76a049b8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard test_perf_on_all git reset -- does-not-exist test_perf_on_all git diff test_perf_on_all git diff --staged +test_perf_on_all git blame $SPARSE_CONE/a +test_perf_on_all git blame $SPARSE_CONE/f3/a test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e5d15be9d45..960ccf2d150 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' ' test_all_match git blame deep/deeper1/deepest/a ' -# TODO: blame currently does not support blaming files outside of the -# sparse definition. It complains that the file doesn't exist locally. -test_expect_failure 'blame with pathspec outside sparse definition' ' +# Blame does not support blaming files outside of the sparse +# definition, so we verify this scenario. +test_expect_success 'blame with pathspec outside sparse definition' ' init_repos && - test_all_match git blame folder1/a && - test_all_match git blame folder2/a && - test_all_match git blame deep/deeper2/a && - test_all_match git blame deep/deeper2/deepest/a + test_sparse_match git sparse-checkout set && + test_sparse_match test_must_fail git blame folder1/a && + test_sparse_match test_must_fail git blame folder2/a && + test_sparse_match test_must_fail git blame deep/deeper2/a && + test_sparse_match test_must_fail git blame deep/deeper2/deepest/a ' test_expect_success 'checkout and reset (mixed)' ' @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' ' ) ' +test_expect_success 'sparse index is not expanded: blame' ' + init_repos && + + ensure_not_expanded blame a && + ensure_not_expanded blame deep/a && + ensure_not_expanded blame deep/deeper1/a && + ensure_not_expanded blame deep/deeper1/deepest/a +' + # NEEDSWORK: a sparse-checkout behaves differently from a full checkout # in this scenario, but it shouldn't. test_expect_success 'reset mixed and checkout orphan' '