Message ID | 7ae9460d3dba84122c2674b46e4339b9d42bdedd.1612382628.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Optimization batch 6: make full use of exact renames | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > + return; /* culling incompatbile with break detection */ incompatible. > /* > - * Calculate how many renames are left (but all the source > - * files still remain as options for rename/copies!) > + * Calculate how many renames are left > */ This no longer needs to be a multi-line comment, does it?
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -578,8 +624,7 @@ void diffcore_rename(struct diff_options *options) > struct diff_filespec *one = rename_src[j].p->one; > struct diff_score this_src; > > - if (one->rename_used && !want_copies) > - continue; > + assert(!one->rename_used || want_copies || break_idx); Doesn't assert becomes a no-op in production builds? Shouldn't this be a BUG() instead?
On Fri, Feb 12, 2021 at 5:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > + return; /* culling incompatbile with break detection */ > > incompatible. Thanks. > > /* > > - * Calculate how many renames are left (but all the source > > - * files still remain as options for rename/copies!) > > + * Calculate how many renames are left > > */ > > This no longer needs to be a multi-line comment, does it? Indeed. Will fix both.
On Fri, Feb 12, 2021 at 5:06 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > @@ -578,8 +624,7 @@ void diffcore_rename(struct diff_options *options) > > struct diff_filespec *one = rename_src[j].p->one; > > struct diff_score this_src; > > > > - if (one->rename_used && !want_copies) > > - continue; > > + assert(!one->rename_used || want_copies || break_idx); > > Doesn't assert becomes a no-op in production builds? Shouldn't this > be a BUG() instead? I guess it depends on the reason for the line. If we're just trying to help others understand the code by documenting conditions that are true (and perhaps help them when they are refactoring), then comments like /* The following is true at this point: !one->rename_used || want_copies || break_idx */ could also be used, but it's kind of verbose. The form assert(!one->rename_used || want_copies || break_idx); is shorter, clearer, and is likely to be up-to-date because even if most builds and users turn off assertions, some folks will build and run with assertions on. If it's a refactoring-aid, then the latter form is also more likely to help the future developer (who may be future me). If, however, the purpose is to check for bad input or worries about programming logic possibly have edge cases, and we're afraid that such conditions might cause severe and hard to debug problems later in the code, then we absolutely would rather use a BUG(). Most of my uses of assert() fall in the former category. I use BUG() when it's a line meant for the latter category. There are a few that perhaps fall in between, where I just have to make a judgement call. I'd like to say that I'm more likely to use BUG() for those, but the lack of a BUG_ON() does tend to make it pretty annoying to use and certainly discourages it. Granted, that's the way I look at it. I'm curious if others have a different view. It certainly seems like the project likes to use both forms nearly equally: $ git grep assert\( origin/master | wc -l 468 $ git grep BUG\( origin/master | wc -l 506 so I'm curious if there are other factors that make folks pick one or the other.
diff --git a/diffcore-rename.c b/diffcore-rename.c index 8b118628b4e..74930716e70 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -454,6 +454,54 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i return count; } +static void remove_unneeded_paths_from_src(int detecting_copies) +{ + int i, new_num_src; + + if (detecting_copies) + return; /* nothing to remove */ + if (break_idx) + return; /* culling incompatbile with break detection */ + + /* + * Note on reasons why we cull unneeded sources but not destinations: + * 1) Pairings are stored in rename_dst (not rename_src), which we + * need to keep around. So, we just can't cull rename_dst even + * if we wanted to. But doing so wouldn't help because... + * + * 2) There is a matrix pairwise comparison that follows the + * "Performing inexact rename detection" progress message. + * Iterating over the destinations is done in the outer loop, + * hence we only iterate over each of those once and we can + * easily skip the outer loop early if the destination isn't + * relevant. That's only one check per destination path to + * skip. + * + * By contrast, the sources are iterated in the inner loop; if + * we check whether a source can be skipped, then we'll be + * checking it N separate times, once for each destination. + * We don't want to have to iterate over known-not-needed + * sources N times each, so avoid that by removing the sources + * from rename_src here. + */ + for (i = 0, new_num_src = 0; i < rename_src_nr; i++) { + /* + * renames are stored in rename_dst, so if a rename has + * already been detected using this source, we can just + * remove the source knowing rename_dst has its info. + */ + if (rename_src[i].p->one->rename_used) + continue; + + if (new_num_src < i) + memcpy(&rename_src[new_num_src], &rename_src[i], + sizeof(struct diff_rename_src)); + new_num_src++; + } + + rename_src_nr = new_num_src; +} + void diffcore_rename(struct diff_options *options) { int detect_rename = options->detect_rename; @@ -530,13 +578,11 @@ void diffcore_rename(struct diff_options *options) goto cleanup; /* - * Calculate how many renames are left (but all the source - * files still remain as options for rename/copies!) + * Calculate how many renames are left */ num_destinations = (rename_dst_nr - rename_count); + remove_unneeded_paths_from_src(want_copies); num_sources = rename_src_nr; - if (!want_copies) - num_sources -= rename_count; /* All done? */ if (!num_destinations || !num_sources) @@ -578,8 +624,7 @@ void diffcore_rename(struct diff_options *options) struct diff_filespec *one = rename_src[j].p->one; struct diff_score this_src; - if (one->rename_used && !want_copies) - continue; + assert(!one->rename_used || want_copies || break_idx); if (skip_unmodified && diff_unmodified_pair(rename_src[j].p))