Message ID | pull.1452.v3.git.git.1676665285.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Teach diff to honor diff algorithms set through git attributes | expand |
On Fri, Feb 17, 2023 at 12:21 PM John Cai via GitGitGadget <gitgitgadget@gmail.com> wrote: > > When a repository contains different kinds of files, it may be desirable to > use different algorithms based on file type. This is currently not feasible > through the command line or using git configs. However, we can leverage the > fact that gitattributes are path aware. > > Teach the diff machinery to check gitattributes when diffing files by using > the existing diff. scheme, and add an "algorithm" type to the external > driver config. [...] > To address some of the performance concerns in the previous series, a > benchmark shows that a performance penalty is no longer incurred, now that > we are no longer adding an additional attributes parsing call: > > $ hyperfine -r 5 -L a bin-wrappers/git,git '{a} diff v2.0.0 v2.28.0' > Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 Time (mean ± σ): 1.072 s ± > 0.289 s [User: 0.626 s, System: 0.081 s] Range (min … max): 0.772 s … 1.537 > s 5 runs > > Benchmark 2: git diff v2.0.0 v2.28.0 Time (mean ± σ): 1.003 s ± 0.065 s > [User: 0.684 s, System: 0.067 s] Range (min … max): 0.914 s … 1.091 s 5 runs > > Summary 'git diff v2.0.0 v2.28.0' ran 1.07 ± 0.30 times faster than > 'git-bin-wrapper diff v2.0.0 v2.28.0' I'm sorry, I don't understand this. What are you measuring? I presume bin-wrappers/git refers to the version of git built with your changes, but what version of git does "git" refer to? Also, do you have any .gitattributes or .git/config changes present when you are testing to trigger the new functionality you have written? Also, doesn't this benchmark demonstrate the opposite of your claim? You said there was no performance penalty, but the benchmark shows a 7% slowdown. We've battled hard to get smaller improvements than that, so this is still worrisome, even if it's no longer a factor of 2 or whatever it was. But, again, I'm not sure what is being measured. If the difference is because patience diff was used for some files, then it's not an apples-to-apples comparison, and a 7% slowdown would be no cause for concern. Since I was curious, I compiled both a version of git from directly before your series, and directly after, then added a '*.[ch] diff=other' line to the end of .gitattributes, then ran: $ hyperfine -L a ./older-git,./newer-git '{a} -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' Benchmark 1: ./older-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0 Time (mean ± σ): 870.2 ms ± 4.4 ms [User: 755.2 ms, System: 109.8 ms] Range (min … max): 861.0 ms … 876.8 ms 10 runs Benchmark 2: ./newer-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0 Time (mean ± σ): 876.9 ms ± 4.8 ms [User: 758.0 ms, System: 113.1 ms] Range (min … max): 870.7 ms … 884.1 ms 10 runs Summary './older-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' ran 1.01 ± 0.01 times faster than './newer-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' I specifically specified 'myers' to match what we'd get from the default anyway, so I would only be testing the slowdown from the .gitattribute parsing. So, I think the performance overhead comes out to just 1% rather than 7% (and further that's when I make it only print overall stats about the diff rather than the full diff, since I know that's faster. If I didn't do that, the perf hit might appear to be less than 1%).
Hi Elijah, On 17 Feb 2023, at 20:16, Elijah Newren wrote: > On Fri, Feb 17, 2023 at 12:21 PM John Cai via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> When a repository contains different kinds of files, it may be desirable to >> use different algorithms based on file type. This is currently not feasible >> through the command line or using git configs. However, we can leverage the >> fact that gitattributes are path aware. >> >> Teach the diff machinery to check gitattributes when diffing files by using >> the existing diff. scheme, and add an "algorithm" type to the external >> driver config. > [...] >> To address some of the performance concerns in the previous series, a >> benchmark shows that a performance penalty is no longer incurred, now that >> we are no longer adding an additional attributes parsing call: >> >> $ hyperfine -r 5 -L a bin-wrappers/git,git '{a} diff v2.0.0 v2.28.0' >> Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 Time (mean ± σ): 1.072 s ± >> 0.289 s [User: 0.626 s, System: 0.081 s] Range (min … max): 0.772 s … 1.537 >> s 5 runs >> >> Benchmark 2: git diff v2.0.0 v2.28.0 Time (mean ± σ): 1.003 s ± 0.065 s >> [User: 0.684 s, System: 0.067 s] Range (min … max): 0.914 s … 1.091 s 5 runs >> >> Summary 'git diff v2.0.0 v2.28.0' ran 1.07 ± 0.30 times faster than >> 'git-bin-wrapper diff v2.0.0 v2.28.0' > > I'm sorry, I don't understand this. What are you measuring? I > presume bin-wrappers/git refers to the version of git built with your > changes, but what version of git does "git" refer to? Also, do you > have any .gitattributes or .git/config changes present when you are > testing to trigger the new functionality you have written? > > Also, doesn't this benchmark demonstrate the opposite of your claim? > You said there was no performance penalty, but the benchmark shows a > 7% slowdown. We've battled hard to get smaller improvements than > that, so this is still worrisome, even if it's no longer a factor of 2 > or whatever it was. But, again, I'm not sure what is being measured. > If the difference is because patience diff was used for some files, > then it's not an apples-to-apples comparison, and a 7% slowdown would > be no cause for concern. > > Since I was curious, I compiled both a version of git from directly > before your series, and directly after, then added a '*.[ch] > diff=other' line to the end of .gitattributes, then ran: > > $ hyperfine -L a ./older-git,./newer-git '{a} -c > diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' > Benchmark 1: ./older-git -c diff.other.algorithm=myers diff --numstat > v2.0.0 v2.28.0 > Time (mean ± σ): 870.2 ms ± 4.4 ms [User: 755.2 ms, System: 109.8 ms] > Range (min … max): 861.0 ms … 876.8 ms 10 runs > > Benchmark 2: ./newer-git -c diff.other.algorithm=myers diff --numstat > v2.0.0 v2.28.0 > Time (mean ± σ): 876.9 ms ± 4.8 ms [User: 758.0 ms, System: 113.1 ms] > Range (min … max): 870.7 ms … 884.1 ms 10 runs > > Summary > './older-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' ran > 1.01 ± 0.01 times faster than './newer-git -c > diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' > > I specifically specified 'myers' to match what we'd get from the > default anyway, so I would only be testing the slowdown from the > .gitattribute parsing. So, I think the performance overhead comes out > to just 1% rather than 7% (and further that's when I make it only > print overall stats about the diff rather than the full diff, since I > know that's faster. If I didn't do that, the perf hit might appear to > be less than 1%). Thanks for taking the time to do this! I should have been a bit more careful about this benchmark, and more explicit about what it was benchmarking. I just ran it again and made sure that the same algorithm was used, and I got results similar to you. Will update the cover letter, thanks!