mbox series

[v4,0/6] Optimization batch 7: use file basenames to guide rename detection

Message ID pull.843.v4.git.1613031350.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Optimization batch 7: use file basenames to guide rename detection | expand

Message

Derrick Stolee via GitGitGadget Feb. 11, 2021, 8:15 a.m. UTC
This series depends on ort-perf-batch-6[1].

This series uses file basenames (portion of the path after final '/',
including extension) in a basic fashion to guide rename detection.

Changes since v3:

 * update documentation as suggested by Junio
 * NEW: add another patch at the end, to simplify patch series that will be
   submitted later (please review!)

[1] https://lore.kernel.org/git/xmqqlfc4byt6.fsf@gitster.c.googlers.com/

Elijah Newren (6):
  t4001: add a test comparing basename similarity and content similarity
  diffcore-rename: compute basenames of all source and dest candidates
  diffcore-rename: complete find_basename_matches()
  diffcore-rename: guide inexact rename detection based on basenames
  gitdiffcore doc: mention new preliminary step for rename detection
  merge-ort: call diffcore_rename() directly

 Documentation/gitdiffcore.txt |  20 ++++
 diffcore-rename.c             | 202 +++++++++++++++++++++++++++++++++-
 merge-ort.c                   |  66 +++++++++--
 t/t4001-diff-rename.sh        |  24 ++++
 4 files changed, 301 insertions(+), 11 deletions(-)


base-commit: 7ae9460d3dba84122c2674b46e4339b9d42bdedd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-843%2Fnewren%2Fort-perf-batch-7-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-843/newren/ort-perf-batch-7-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/843

Range-diff vs v3:

 1:  3e6af929d135 = 1:  3e6af929d135 t4001: add a test comparing basename similarity and content similarity
 2:  4fff9b1ff57b = 2:  4fff9b1ff57b diffcore-rename: compute basenames of all source and dest candidates
 3:  dc26881e4ed3 = 3:  dc26881e4ed3 diffcore-rename: complete find_basename_matches()
 4:  2493f4b2f55d = 4:  2493f4b2f55d diffcore-rename: guide inexact rename detection based on basenames
 5:  fc72d24a3358 ! 5:  4e86ed3f29d4 gitdiffcore doc: mention new preliminary step for rename detection
     @@ Documentation/gitdiffcore.txt: a similarity score different from the default of
      +deleted from a different directory, it will mark them as renames and
      +exclude them from the later quadratic step (the one that pairwise
      +compares all unmatched files to find the "best" matches, determined by
     -+the highest content similarity).  So, for example, if
     -+docs/extensions.txt and docs/config/extensions.txt have similar
     -+content, then they will be marked as a rename even if it turns out
     -+that docs/extensions.txt was more similar to src/extension-checks.c.
     -+At most, one comparison is done per file in this preliminary pass; so
     -+if there are several extensions.txt files throughout the directory
     -+hierarchy that were added and deleted, this preliminary step will be
     -+skipped for those files.
     ++the highest content similarity).  So, for example, if a deleted
     ++docs/ext.txt and an added docs/config/ext.txt are similar enough, they
     ++will be marked as a rename and prevent an added docs/ext.md that may
     ++be even more similar to the deleted docs/ext.txt from being considered
     ++as the rename destination in the later step.  For this reason, the
     ++preliminary "match same filename" step uses a bit higher threshold to
     ++mark a file pair as a rename and stop considering other candidates for
     ++better matches.  At most, one comparison is done per file in this
     ++preliminary pass; so if there are several ext.txt files throughout the
     ++directory hierarchy that were added and deleted, this preliminary step
     ++will be skipped for those files.
      +
       Note.  When the "-C" option is used with `--find-copies-harder`
       option, 'git diff-{asterisk}' commands feed unmodified filepairs to
 -:  ------------ > 6:  fedb3d323d94 merge-ort: call diffcore_rename() directly

Comments

Junio C Hamano Feb. 13, 2021, 1:53 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series depends on ort-perf-batch-6[1].
>
> This series uses file basenames (portion of the path after final '/',
> including extension) in a basic fashion to guide rename detection.
>
> Changes since v3:
>
>  * update documentation as suggested by Junio
>  * NEW: add another patch at the end, to simplify patch series that will be
>    submitted later (please review!)

Sorry, by mistake I somehow read v4 and sent some comments on v3,
but as the above says, they are on the part that hadn't changed at
all, and should still be relevant.

Thanks.