mbox series

[v4,00/10] Optimization batch 8: use file basenames even more

Message ID pull.844.v4.git.1614385849.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Optimization batch 8: use file basenames even more | expand

Message

Derrick Stolee via GitGitGadget Feb. 27, 2021, 12:30 a.m. UTC
This series depends on en/diffcore-rename (a concatenation of what I was
calling ort-perf-batch-6 and ort-perf-batch-7).

Changes since v3:

 * Update the commit messages (one was out of date after the rearrangement),
   and include Stolee's Reviewed-by

Elijah Newren (10):
  diffcore-rename: use directory rename guided basename comparisons
  diffcore-rename: provide basic implementation of idx_possible_rename()
  diffcore-rename: add a mapping of destination names to their indices
  Move computation of dir_rename_count from merge-ort to diffcore-rename
  diffcore-rename: add function for clearing dir_rename_count
  diffcore-rename: move dir_rename_counts into dir_rename_info struct
  diffcore-rename: extend cleanup_dir_rename_info()
  diffcore-rename: compute dir_rename_counts in stages
  diffcore-rename: limit dir_rename_counts computation to relevant dirs
  diffcore-rename: compute dir_rename_guess from dir_rename_counts

 Documentation/gitdiffcore.txt |   2 +-
 diffcore-rename.c             | 449 ++++++++++++++++++++++++++++++++--
 diffcore.h                    |   7 +
 merge-ort.c                   | 144 +----------
 4 files changed, 449 insertions(+), 153 deletions(-)


base-commit: aeca14f748afc7fb5b65bca56ea2ebd970729814
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-844%2Fnewren%2Fort-perf-batch-8-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-844/newren/ort-perf-batch-8-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/844

Range-diff vs v3:

  1:  6afa9add40b9 !  1:  823d07532e00 diffcore-rename: use directory rename guided basename comparisons
     @@ Commit message
          min_basename_score threshold required for marking the two files as
          renames.
      
     -    This commit introduces an idx_possible_rename() function which will give
     +    This commit introduces an idx_possible_rename() function which will
          do this directory rename detection for us and give us the index within
          rename_dst of the resulting filename.  For now, this function is
          hardcoded to return -1 (not found) and just hooks up how its results
          would be used once we have a more complete implementation in place.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## Documentation/gitdiffcore.txt ##
  2:  40f57bcc2055 !  2:  2dde621d7de5 diffcore-rename: add a new idx_possible_rename function
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    diffcore-rename: add a new idx_possible_rename function
     +    diffcore-rename: provide basic implementation of idx_possible_rename()
      
     -    find_basename_matches() is great when both the remaining set of possible
     -    rename sources and the remaining set of possible rename destinations
     -    have exactly one file each with a given basename.  It allows us to match
     -    up files that have been moved to different directories without changing
     -    filenames.
     +    Add a new struct dir_rename_info with various values we need inside our
     +    idx_possible_rename() function introduced in the previous commit.  Add a
     +    basic implementation for this function showing how we plan to use the
     +    variables, but which will just return early with a value of -1 (not
     +    found) when those variables are not set up.
      
     -    When basenames are not unique, though, we want to be able to guess which
     -    directories the source files have been moved to.  Since this is the job
     -    of directory rename detection, we employ it.  However, since it is a
     -    directory rename detection idea, we also limit it to cases where we know
     -    there could have been a directory rename, i.e. where the source
     -    directory has been removed.  This has to be signalled by dirs_removed
     -    being non-NULL and containing an entry for the relevant directory.
     -    Since merge-ort.c is the only caller that currently does so, this
     -    optimization is only effective for merge-ort right now.  In the future,
     -    this condition could be reconsidered or we could modify other callers to
     -    pass the necessary strset.
     -
     -    Anyway, that's a lot of background so that we can actually describe the
     -    new function.  Add an idx_possible_rename() function which combines the
     -    recently added dir_rename_guess and idx_map fields to provide the index
     -    within rename_dst of a potential match for a given file.
     -
     -    Future commits will add checks after calling this function to compare
     -    the resulting 'likely rename' candidates to see if the two files meet
     -    the elevated min_basename_score threshold for marking them as actual
     -    renames.
     +    Future commits will do the work necessary to set up those other
     +    variables so that idx_possible_rename() does not always return -1.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  3:  0e14961574ea !  3:  21b9cf1da30e diffcore-rename: add a mapping of destination names to their indices
     @@ Commit message
          dir_rename_guess; these will be more fully populated in subsequent
          commits.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  4:  9b9d5b207b03 !  4:  3617b0209cc4 Move computation of dir_rename_count from merge-ort to diffcore-rename
     @@ Commit message
          preliminary computation of dir_rename_count after exact rename
          detection, followed by some updates after inexact rename detection.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  5:  f286e89464ea !  5:  2baf39d82f3e diffcore-rename: add function for clearing dir_rename_count
     @@ Commit message
          for clearing, or partially clearing it out.  Add a
          partial_clear_dir_rename_count() function for this purpose.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  6:  ab353f2e75eb !  6:  02f1f7c02d32 diffcore-rename: move dir_rename_counts into dir_rename_info struct
     @@ Commit message
          dir_rename_info struct.  Future commits will then make dir_rename_counts
          be computed in stages, and add computation of dir_rename_guess.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  7:  bd50d9e53804 !  7:  9c3436840534 diffcore-rename: extend cleanup_dir_rename_info()
     @@ Commit message
          Extend cleanup_dir_rename_info() to handle these two different cases,
          cleaning up the relevant bits of information for each case.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  8:  44cfae6505f2 !  8:  6bd398d3707e diffcore-rename: compute dir_rename_counts in stages
     @@ Commit message
          augment the counts via calling update_dir_rename_counts() after each
          basename-guide and inexact rename detection match is found.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  9:  752aff3a7995 !  9:  46304aaebf5a diffcore-rename: limit dir_rename_counts computation to relevant dirs
     @@ Commit message
          info->relevant_source_dirs variable for this purpose, even though at
          this stage we will only set it to dirs_removed for simplicity.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
 10:  65f7bfb735f2 ! 10:  4be565c47208 diffcore-rename: compute dir_rename_guess from dir_rename_counts
     @@ Commit message
              mega-renames:    188.754 s ±  0.284 s   130.465 s ±  0.259 s
              just-one-mega:     5.599 s ±  0.019 s     3.958 s ±  0.010 s
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##

Comments

Derrick Stolee March 9, 2021, 9:52 p.m. UTC | #1
On 2/26/2021 7:30 PM, Elijah Newren via GitGitGadget wrote:
> This series depends on en/diffcore-rename (a concatenation of what I was
> calling ort-perf-batch-6 and ort-perf-batch-7).
> 
> Changes since v3:

I'm very late in doing this, but I reviewed the range-diff and found
this version to be satisfactory. Thanks!

-Stolee