Message ID | 20250213090040.16133-3-meetsoni3017@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | merge-recursive: optimize time complexity | expand |
On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote: > > Previously, `get_unmerged()` used `string_list_insert()`, which has an > O(n^2) complexity due to shifting elements on each insertion. It also > called `string_list_lookup()` before insertion, which performs a binary > search in O(log n). Okay. > This combination made insertion costly, especially > for large index states, as each new entry required both a search and > potentially shifting many elements. Why does the combination make it costly? O(log n) + O(n^2) is still O(n^2), so I don't see why it matters to mention the combination. Could you clarify? Also, does it actually make it costly, or do you only suspect that it does? O(n^2) worst case sometimes behaves O(n) or O(n log n) in some cases. Since your commit message says "made insertion costly" instead of "might make insertion costly", I think that would suggest you have some performance numbers to back this up on some interesting real world repository. Do you? Can you share them? > Replace `string_list_insert()` with `string_list_append()` to achieve > O(n) insertion. After all entries are added, sort the list in O(n log n) > and remove duplicates in O(n), reducing the overall complexity to > O(n log n). Okay. > This improves performance significantly for large datasets That's a big claim; it may be true, but without evidence I don't believe it for three reasons : (1) n here is the number of conflicts, not the number of files in the repo or the number of lines being merged. Thus, n is typically small. (2) Other O(n^2) behavior in merge-recursive likely drowns this particular codepath out, so any gains here just aren't going to be noticed, (3) After looking at the code and knowing the specialized structure of the index, I think that while string_list_insert() for n items in general is going to be O(n^2), it will likely functionally be O(n log n) for this particular code path, meaning you haven't actually improved the performance. > while maintaining correctness. More on that below. > Signed-off-by: Meet Soni <meetsoni3017@gmail.com> > --- > merge-recursive.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/merge-recursive.c b/merge-recursive.c > index 884ccf99a5..6165993429 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate) > if (!ce_stage(ce)) > continue; > > - item = string_list_lookup(unmerged, ce->name); > - if (!item) { > - item = string_list_insert(unmerged, ce->name); > - item->util = xcalloc(1, sizeof(struct stage_data)); > - } > + item = string_list_append(unmerged, ce->name); > + item->util = xcalloc(1, sizeof(struct stage_data)); > + > e = item->util; > e->stages[ce_stage(ce)].mode = ce->ce_mode; > oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid); Did you run any tests? I'm not sure you maintained correctness here. > } > + string_list_sort(unmerged); > + string_list_remove_duplicates(unmerged, 1); > > return unmerged; > } > -- > 2.34.1 (As a side note, due to the specialized structure of the input, I suspect this code could be modified to run in O(n), i.e. we could skip the string_list_lookup and the string_list_sort and the string_list_remove_duplicates... But, it'd make the code trickier, so it'd need to be carefully commented, the change would need to be justified, and it'd need to be carefully tested. Even if we weren't planning to delete this entire file, I suspect it's not possible to find a case justifying such a change without optimizing several other things in merge-recursive first, but optimizing those things probably results in a significant rewrite...which we've already done with merge-ort.)
Elijah Newren <newren@gmail.com> writes: > (As a side note, due to the specialized structure of the input, I > suspect this code could be modified to run in O(n), i.e. we could skip > the string_list_lookup and the string_list_sort and the > string_list_remove_duplicates... Are you talking about the input being already sorted so we can just walk the multiple input and merge them into a single stream? In the cost analysis you did earlier in the message I am responding to, being able to go down to O(n) sounds really like a great thing ;-) > But, it'd make the code trickier, so > it'd need to be carefully commented, the change would need to be > justified, and it'd need to be carefully tested. ... and measured. > Even if we weren't > planning to delete this entire file, I suspect it's not possible to > find a case justifying such a change without optimizing several other > things in merge-recursive first, but optimizing those things probably > results in a significant rewrite...which we've already done with > merge-ort.) Sounds like unless the performance issues are shared between the two, it may not be worth to spend too much brain cycles only on the "recursive" one? Thanks.
On Thu, Feb 13, 2025 at 10:30 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > (As a side note, due to the specialized structure of the input, I > > suspect this code could be modified to run in O(n), i.e. we could skip > > the string_list_lookup and the string_list_sort and the > > string_list_remove_duplicates... > > Are you talking about the input being already sorted so we can just > walk the multiple input and merge them into a single stream? In the I'm not sure what you mean by "merge them into a single stream". I think you have the right idea that we are creating a string list of information about unmerged entries, and since we're taking information from the index which is already sorted, we can just either modify the last entry in the list if it matches or append a new entry to it; no need to walk, insert, or binary search the list at all. > cost analysis you did earlier in the message I am responding to, > being able to go down to O(n) sounds really like a great thing ;-) Note first that we aren't going from O(n^2) -> O(n), we're only going from O(n log n) -> O(n). That's still great, but: * n is typically pretty small (number of unmerged files) * there's things in merge-recursive that are O(m^2), where typically m >> n (number of files in repo, or number of lines in big files in the repo) * merge-recursive is used by almost no one * we are planning to delete merge-recursive So, although O(n) is great.... > > But, it'd make the code trickier, so > > it'd need to be carefully commented, the change would need to be > > justified, and it'd need to be carefully tested. > > ... and measured. +1 > > Even if we weren't > > planning to delete this entire file, I suspect it's not possible to > > find a case justifying such a change without optimizing several other > > things in merge-recursive first, but optimizing those things probably > > results in a significant rewrite...which we've already done with > > merge-ort.) > > Sounds like unless the performance issues are shared between the > two, it may not be worth to spend too much brain cycles only on the > "recursive" one? ...yep, exactly, and this is not a performance issue shared with the ort backend; it's unique to the recursive one.
On Thu, 13 Feb 2025 at 22:41, Elijah Newren <newren@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote: > > > > Previously, `get_unmerged()` used `string_list_insert()`, which has an > > O(n^2) complexity due to shifting elements on each insertion. It also > > called `string_list_lookup()` before insertion, which performs a binary > > search in O(log n). > > Okay. > > > This combination made insertion costly, especially > > for large index states, as each new entry required both a search and > > potentially shifting many elements. > > Why does the combination make it costly? O(log n) + O(n^2) is still > O(n^2), so I don't see why it matters to mention the combination. > Could you clarify? > > Also, does it actually make it costly, or do you only suspect that it > does? O(n^2) worst case sometimes behaves O(n) or O(n log n) in some > cases. Since your commit message says "made insertion costly" instead > of "might make insertion costly", I think that would suggest you have > some performance numbers to back this up on some interesting real > world repository. Do you? Can you share them? > Sorry, I should've specified, this patch is purely theoretical, I was aiming for a trial and error kind of approach. > > Replace `string_list_insert()` with `string_list_append()` to achieve > > O(n) insertion. After all entries are added, sort the list in O(n log n) > > and remove duplicates in O(n), reducing the overall complexity to > > O(n log n). > > Okay. > > > This improves performance significantly for large datasets > > That's a big claim; it may be true, but without evidence I don't > believe it for three reasons : (1) n here is the number of conflicts, > not the number of files in the repo or the number of lines being > merged. Thus, n is typically small. (2) Other O(n^2) behavior in > merge-recursive likely drowns this particular codepath out, so any > gains here just aren't going to be noticed, (3) After looking at the > code and knowing the specialized structure of the index, I think that > while string_list_insert() for n items in general is going to be > O(n^2), it will likely functionally be O(n log n) for this particular > code path, meaning you haven't actually improved the performance. > > > while maintaining correctness. > > More on that below. > > > > Signed-off-by: Meet Soni <meetsoni3017@gmail.com> > > --- > > merge-recursive.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/merge-recursive.c b/merge-recursive.c > > index 884ccf99a5..6165993429 100644 > > --- a/merge-recursive.c > > +++ b/merge-recursive.c > > @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate) > > if (!ce_stage(ce)) > > continue; > > > > - item = string_list_lookup(unmerged, ce->name); > > - if (!item) { > > - item = string_list_insert(unmerged, ce->name); > > - item->util = xcalloc(1, sizeof(struct stage_data)); > > - } > > + item = string_list_append(unmerged, ce->name); > > + item->util = xcalloc(1, sizeof(struct stage_data)); > > + > > e = item->util; > > e->stages[ce_stage(ce)].mode = ce->ce_mode; > > oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid); > > Did you run any tests? I'm not sure you maintained correctness here. I didn't run any tests -- I wanted to, but I wasn’t sure how to do it for this change. Since you suggested dropping this patch from the series, I’ll do that. But for similar changes in the future, how should I go about testing them? > > > } > > + string_list_sort(unmerged); > > + string_list_remove_duplicates(unmerged, 1); > > > > return unmerged; > > } > > -- > > 2.34.1 > > (As a side note, due to the specialized structure of the input, I > suspect this code could be modified to run in O(n), i.e. we could skip > the string_list_lookup and the string_list_sort and the > string_list_remove_duplicates... But, it'd make the code trickier, so > it'd need to be carefully commented, the change would need to be > justified, and it'd need to be carefully tested. Even if we weren't > planning to delete this entire file, I suspect it's not possible to > find a case justifying such a change without optimizing several other > things in merge-recursive first, but optimizing those things probably > results in a significant rewrite...which we've already done with > merge-ort.) Makes sense. Thankyou for reviewing, Meet
On Thu, Feb 13, 2025 at 8:28 PM Meet Soni <meetsoni3017@gmail.com> wrote: > > On Thu, 13 Feb 2025 at 22:41, Elijah Newren <newren@gmail.com> wrote: > > > > On Thu, Feb 13, 2025 at 1:01 AM Meet Soni <meetsoni3017@gmail.com> wrote: ... > > > diff --git a/merge-recursive.c b/merge-recursive.c > > > index 884ccf99a5..6165993429 100644 > > > --- a/merge-recursive.c > > > +++ b/merge-recursive.c > > > @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate) > > > if (!ce_stage(ce)) > > > continue; > > > > > > - item = string_list_lookup(unmerged, ce->name); > > > - if (!item) { > > > - item = string_list_insert(unmerged, ce->name); > > > - item->util = xcalloc(1, sizeof(struct stage_data)); > > > - } > > > + item = string_list_append(unmerged, ce->name); > > > + item->util = xcalloc(1, sizeof(struct stage_data)); > > > + > > > e = item->util; > > > e->stages[ce_stage(ce)].mode = ce->ce_mode; > > > oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid); > > > > Did you run any tests? I'm not sure you maintained correctness here. > > I didn't run any tests -- I wanted to, but I wasn’t sure how to do it > for this change. Since you suggested dropping this patch from the > series, I’ll do that. But for similar changes in the future, how should I go > about testing them? As per Documentation/CodingGuidelines: "After any code change, make sure that the entire test suite passes." You can do that by running: cd t && make (You probably want to also run that before making any changes, just to verify that they all pass for you. Then, if any test fails after you make changes, you know it's because of your changes rather than because you missed something in building or setting up the tests.) And although it doesn't matter since we're dropping this patch, the issue I noticed was that if there were, say, three unmerged entries with the same path, the original code would create one entry in the string list and modify it 3 times (each with a different ce_stage(ce). Your modification would create three different entries (each with only information from one stage) and drop two of them, meaning we no longer have a single string_list_item that contains information from all 3 unmerged entries for the same path. I'm pretty sure running the existing tests would catch that kind of bug, which is what raised the question.
On Fri, 14 Feb 2025 at 11:35, Elijah Newren <newren@gmail.com> wrote: > > > > Did you run any tests? I'm not sure you maintained correctness here. > > > > I didn't run any tests -- I wanted to, but I wasn’t sure how to do it > > for this change. Since you suggested dropping this patch from the > > series, I’ll do that. But for similar changes in the future, how should I go > > about testing them? > > As per Documentation/CodingGuidelines: "After any code change, make > sure that the entire test suite passes." You can do that by running: > cd t && make > (You probably want to also run that before making any changes, just to > verify that they all pass for you. Then, if any test fails after you > make changes, you know it's because of your changes rather than > because you missed something in building or setting up the tests.) > > > And although it doesn't matter since we're dropping this patch, the > issue I noticed was that if there were, say, three unmerged entries > with the same path, the original code would create one entry in the > string list and modify it 3 times (each with a different ce_stage(ce). > Your modification would create three different entries (each with only > information from one stage) and drop two of them, meaning we no longer > have a single string_list_item that contains information from all 3 > unmerged entries for the same path. I'm pretty sure running the > existing tests would catch that kind of bug, which is what raised the > question. That's the thing -- I did run make in the t/ directory, and it passed. I was just wondering if there's any other way to test this in isolation, in case I want to verify such changes more directly in the future. Thanks for the clarification! Meet
On Fri, Feb 14, 2025 at 12:24 AM Meet Soni <meetsoni3017@gmail.com> wrote: > > On Fri, 14 Feb 2025 at 11:35, Elijah Newren <newren@gmail.com> wrote: > > > > > > Did you run any tests? I'm not sure you maintained correctness here. > > > > > > I didn't run any tests -- I wanted to, but I wasn’t sure how to do it > > > for this change. Since you suggested dropping this patch from the > > > series, I’ll do that. But for similar changes in the future, how should I go > > > about testing them? > > > > As per Documentation/CodingGuidelines: "After any code change, make > > sure that the entire test suite passes." You can do that by running: > > cd t && make > > (You probably want to also run that before making any changes, just to > > verify that they all pass for you. Then, if any test fails after you > > make changes, you know it's because of your changes rather than > > because you missed something in building or setting up the tests.) > > > > > > And although it doesn't matter since we're dropping this patch, the > > issue I noticed was that if there were, say, three unmerged entries > > with the same path, the original code would create one entry in the > > string list and modify it 3 times (each with a different ce_stage(ce). > > Your modification would create three different entries (each with only > > information from one stage) and drop two of them, meaning we no longer > > have a single string_list_item that contains information from all 3 > > unmerged entries for the same path. I'm pretty sure running the > > existing tests would catch that kind of bug, which is what raised the > > question. > > That's the thing -- I did run make in the t/ directory, and it passed. I was > just wondering if there's any other way to test this in isolation, in case > I want to verify such changes more directly in the future. Really? Did you rebuild the code, after making your changes? You may have been running with a pre-changes version of the code. I just applied your changes and ran the tests. I see it fail as soon as it gets to t1004. $ cd t && make test [... lots of output snipped ...] *** t1004-read-tree-m-u-wf.sh *** ok 1 - two-way setup ok 2 - two-way not clobbering ok 3 - two-way with incorrect --exclude-per-directory (1) ok 4 - two-way with incorrect --exclude-per-directory (2) ok 5 - two-way clobbering a ignored file ok 6 - three-way not complaining on an untracked path in both ok 7 - three-way not clobbering a working tree file ok 8 - three-way not complaining on an untracked file ok 9 - 3-way not overwriting local changes (setup) ok 10 - 3-way not overwriting local changes (our side) ok 11 - 3-way not overwriting local changes (their side) ok 12 - funny symlink in work tree ok 13 - funny symlink in work tree, un-unlink-able ok 14 - D/F setup ok 15 - D/F ok 16 - D/F resolve not ok 17 - D/F recursive # # # git reset --hard && # git checkout side-b && # git merge-recursive branch-point -- side-b side-a # # # failed 1 among 17 test(s) 1..17 make[1]: *** [Makefile:77: t1004-read-tree-m-u-wf.sh] Error 1 make[1]: Leaving directory '/home/newren/floss/git/t' make: *** [Makefile:63: test] Error 2 ...and if go to the toplevel directory and run under prove so I can see all the failures (and run the test suites in parallel), I see: $ cd .. && make DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS='--timer --state failed,slow,save --jobs 12' test [... lots of output snipped ...] Test Summary Report ------------------- t3424-rebase-empty.sh (Wstat: 256 Tests: 20 Failed: 18) Failed tests: 3-20 Non-zero exit status: 1 t3436-rebase-more-options.sh (Wstat: 256 Tests: 19 Failed: 17) Failed tests: 2-18 Non-zero exit status: 1 t4151-am-abort.sh (Wstat: 256 Tests: 20 Failed: 12) Failed tests: 5-9, 12-16, 19-20 Non-zero exit status: 1 t3407-rebase-abort.sh (Wstat: 256 Tests: 17 Failed: 8) Failed tests: 2-9 Non-zero exit status: 1 t3428-rebase-signoff.sh (Wstat: 256 Tests: 7 Failed: 5) Failed tests: 2, 4-7 Non-zero exit status: 1 t6409-merge-subtree.sh (Wstat: 256 Tests: 12 Failed: 5) Failed tests: 2-6 Non-zero exit status: 1 t7102-reset.sh (Wstat: 256 Tests: 38 Failed: 7) Failed tests: 14-20 Non-zero exit status: 1 t6432-merge-recursive-space-options.sh (Wstat: 256 Tests: 11 Failed: 4) Failed tests: 2, 7-8, 11 Non-zero exit status: 1 t6430-merge-recursive.sh (Wstat: 256 Tests: 37 Failed: 15) Failed tests: 10-11, 13-20, 22-24, 28-29 Non-zero exit status: 1 t3406-rebase-message.sh (Wstat: 256 Tests: 32 Failed: 8) Failed tests: 22, 24-27, 29-31 Non-zero exit status: 1 t4200-rerere.sh (Wstat: 256 Tests: 36 Failed: 5) Failed tests: 24-28 Non-zero exit status: 1 t7201-co.sh (Wstat: 256 Tests: 46 Failed: 5) Failed tests: 5-9 Non-zero exit status: 1 t3418-rebase-continue.sh (Wstat: 256 Tests: 29 Failed: 7) Failed tests: 4, 6, 10-12, 26-27 Non-zero exit status: 1 t3403-rebase-skip.sh (Wstat: 256 Tests: 20 Failed: 3) Failed tests: 2, 4, 9 Non-zero exit status: 1 t4253-am-keep-cr-dos.sh (Wstat: 256 Tests: 7 Failed: 2) Failed tests: 6-7 Non-zero exit status: 1 t9903-bash-prompt.sh (Wstat: 256 Tests: 67 Failed: 39) Failed tests: 16-31, 33-35, 37, 40-44, 46-52, 55-58, 60 62, 67 Non-zero exit status: 1 t3503-cherry-pick-root.sh (Wstat: 256 Tests: 6 Failed: 2) Failed tests: 5-6 Non-zero exit status: 1 t3401-rebase-and-am-rename.sh (Wstat: 256 Tests: 10 Failed: 2) Failed tests: 4, 10 Non-zero exit status: 1 t2407-worktree-heads.sh (Wstat: 256 Tests: 12 Failed: 2) Failed tests: 4-5 Non-zero exit status: 1 t5407-post-rewrite-hook.sh (Wstat: 256 Tests: 17 Failed: 3) Failed tests: 4-6 Non-zero exit status: 1 t2500-untracked-overwriting.sh (Wstat: 256 Tests: 10 Failed: 2) Failed tests: 9-10 Non-zero exit status: 1 t4153-am-resume-override-opts.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 3 Non-zero exit status: 1 t1015-read-index-unmerged.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 6 Non-zero exit status: 1 t3509-cherry-pick-merge-df.sh (Wstat: 256 Tests: 9 Failed: 1) Failed test: 9 Non-zero exit status: 1 t2023-checkout-m.sh (Wstat: 256 Tests: 5 Failed: 1) Failed test: 5 Non-zero exit status: 1 t7615-diff-algo-with-mergy-operations.sh (Wstat: 256 Tests: 7 Failed: 1) Failed test: 2 Non-zero exit status: 1 t6427-diff3-conflict-markers.sh (Wstat: 256 Tests: 9 Failed: 1) Failed test: 8 Non-zero exit status: 1 t1004-read-tree-m-u-wf.sh (Wstat: 256 Tests: 17 Failed: 1) Failed test: 17 Non-zero exit status: 1 t3420-rebase-autostash.sh (Wstat: 256 Tests: 52 Failed: 10) Failed tests: 11-17, 21-23 Non-zero exit status: 1 t4150-am.sh (Wstat: 256 Tests: 87 Failed: 33) Failed tests: 34-40, 42-46, 48, 50-54, 57-62, 64-65, 67-71 75, 87 Non-zero exit status: 1 t7512-status-help.sh (Wstat: 256 Tests: 46 Failed: 3) Failed tests: 5-6, 29 Non-zero exit status: 1 t3400-rebase.sh (Wstat: 256 Tests: 39 Failed: 1) Failed test: 30 Non-zero exit status: 1 t3404-rebase-interactive.sh (Wstat: 256 Tests: 131 Failed: 1) Failed test: 80 Non-zero exit status: 1 Files=1031, Tests=30662, 70 wallclock secs ( 8.33 usr 2.13 sys + 248.60 cusr 516.60 csys = 775.66 CPU) Result: FAIL make[1]: *** [Makefile:73: prove] Error 1 make[1]: Leaving directory '/home/newren/floss/git/t' make: *** [Makefile:3237: test] Error 2 I suspect this is a case where it was testing a version of git that you built before making the changes.
On Sat, 15 Feb 2025 at 00:30, Elijah Newren <newren@gmail.com> wrote: > > On Fri, Feb 14, 2025 at 12:24 AM Meet Soni <meetsoni3017@gmail.com> wrote: > > That's the thing -- I did run make in the t/ directory, and it passed. I was > > just wondering if there's any other way to test this in isolation, in case > > I want to verify such changes more directly in the future. > > Really? Did you rebuild the code, after making your changes? You may > have been running with a pre-changes version of the code. > > I just applied your changes and ran the tests. I see it fail as soon > as it gets to t1004. > > $ cd t && make test > [... lots of output snipped ...] > *** t1004-read-tree-m-u-wf.sh *** > ok 1 - two-way setup > ok 2 - two-way not clobbering > ok 3 - two-way with incorrect --exclude-per-directory (1) > ok 4 - two-way with incorrect --exclude-per-directory (2) > ok 5 - two-way clobbering a ignored file > ok 6 - three-way not complaining on an untracked path in both > ok 7 - three-way not clobbering a working tree file > ok 8 - three-way not complaining on an untracked file > ok 9 - 3-way not overwriting local changes (setup) > ok 10 - 3-way not overwriting local changes (our side) > ok 11 - 3-way not overwriting local changes (their side) > ok 12 - funny symlink in work tree > ok 13 - funny symlink in work tree, un-unlink-able > ok 14 - D/F setup > ok 15 - D/F > ok 16 - D/F resolve > not ok 17 - D/F recursive > # > # > # git reset --hard && > # git checkout side-b && > # git merge-recursive branch-point -- side-b side-a > # > # > # failed 1 among 17 test(s) > 1..17 > make[1]: *** [Makefile:77: t1004-read-tree-m-u-wf.sh] Error 1 > make[1]: Leaving directory '/home/newren/floss/git/t' > make: *** [Makefile:63: test] Error 2 > > > ...and if go to the toplevel directory and run under prove so I can > see all the failures (and run the test suites in parallel), I see: > > $ cd .. && make DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS='--timer > --state failed,slow,save --jobs 12' test > [... lots of output snipped ...] > Test Summary Report > ------------------- > t3424-rebase-empty.sh (Wstat: 256 Tests: 20 > Failed: 18) > Failed tests: 3-20 > Non-zero exit status: 1 > t3436-rebase-more-options.sh (Wstat: 256 Tests: 19 > Failed: 17) > Failed tests: 2-18 > Non-zero exit status: 1 > t4151-am-abort.sh (Wstat: 256 Tests: 20 > Failed: 12) > Failed tests: 5-9, 12-16, 19-20 > Non-zero exit status: 1 > t3407-rebase-abort.sh (Wstat: 256 Tests: 17 > Failed: 8) > Failed tests: 2-9 > Non-zero exit status: 1 > t3428-rebase-signoff.sh (Wstat: 256 Tests: 7 Failed: 5) > Failed tests: 2, 4-7 > Non-zero exit status: 1 > t6409-merge-subtree.sh (Wstat: 256 Tests: 12 > Failed: 5) > Failed tests: 2-6 > Non-zero exit status: 1 > t7102-reset.sh (Wstat: 256 Tests: 38 > Failed: 7) > Failed tests: 14-20 > Non-zero exit status: 1 > t6432-merge-recursive-space-options.sh (Wstat: 256 Tests: 11 > Failed: 4) > Failed tests: 2, 7-8, 11 > Non-zero exit status: 1 > t6430-merge-recursive.sh (Wstat: 256 Tests: 37 > Failed: 15) > Failed tests: 10-11, 13-20, 22-24, 28-29 > Non-zero exit status: 1 > t3406-rebase-message.sh (Wstat: 256 Tests: 32 > Failed: 8) > Failed tests: 22, 24-27, 29-31 > Non-zero exit status: 1 > t4200-rerere.sh (Wstat: 256 Tests: 36 > Failed: 5) > Failed tests: 24-28 > Non-zero exit status: 1 > t7201-co.sh (Wstat: 256 Tests: 46 > Failed: 5) > Failed tests: 5-9 > Non-zero exit status: 1 > t3418-rebase-continue.sh (Wstat: 256 Tests: 29 > Failed: 7) > Failed tests: 4, 6, 10-12, 26-27 > Non-zero exit status: 1 > t3403-rebase-skip.sh (Wstat: 256 Tests: 20 > Failed: 3) > Failed tests: 2, 4, 9 > Non-zero exit status: 1 > t4253-am-keep-cr-dos.sh (Wstat: 256 Tests: 7 Failed: 2) > Failed tests: 6-7 > Non-zero exit status: 1 > t9903-bash-prompt.sh (Wstat: 256 Tests: 67 > Failed: 39) > Failed tests: 16-31, 33-35, 37, 40-44, 46-52, 55-58, 60 > 62, 67 > Non-zero exit status: 1 > t3503-cherry-pick-root.sh (Wstat: 256 Tests: 6 Failed: 2) > Failed tests: 5-6 > Non-zero exit status: 1 > t3401-rebase-and-am-rename.sh (Wstat: 256 Tests: 10 > Failed: 2) > Failed tests: 4, 10 > Non-zero exit status: 1 > t2407-worktree-heads.sh (Wstat: 256 Tests: 12 > Failed: 2) > Failed tests: 4-5 > Non-zero exit status: 1 > t5407-post-rewrite-hook.sh (Wstat: 256 Tests: 17 > Failed: 3) > Failed tests: 4-6 > Non-zero exit status: 1 > t2500-untracked-overwriting.sh (Wstat: 256 Tests: 10 > Failed: 2) > Failed tests: 9-10 > Non-zero exit status: 1 > t4153-am-resume-override-opts.sh (Wstat: 256 Tests: 6 Failed: 1) > Failed test: 3 > Non-zero exit status: 1 > t1015-read-index-unmerged.sh (Wstat: 256 Tests: 6 Failed: 1) > Failed test: 6 > Non-zero exit status: 1 > t3509-cherry-pick-merge-df.sh (Wstat: 256 Tests: 9 Failed: 1) > Failed test: 9 > Non-zero exit status: 1 > t2023-checkout-m.sh (Wstat: 256 Tests: 5 Failed: 1) > Failed test: 5 > Non-zero exit status: 1 > t7615-diff-algo-with-mergy-operations.sh (Wstat: 256 Tests: 7 Failed: 1) > Failed test: 2 > Non-zero exit status: 1 > t6427-diff3-conflict-markers.sh (Wstat: 256 Tests: 9 Failed: 1) > Failed test: 8 > Non-zero exit status: 1 > t1004-read-tree-m-u-wf.sh (Wstat: 256 Tests: 17 > Failed: 1) > Failed test: 17 > Non-zero exit status: 1 > t3420-rebase-autostash.sh (Wstat: 256 Tests: 52 > Failed: 10) > Failed tests: 11-17, 21-23 > Non-zero exit status: 1 > t4150-am.sh (Wstat: 256 Tests: 87 > Failed: 33) > Failed tests: 34-40, 42-46, 48, 50-54, 57-62, 64-65, 67-71 > 75, 87 > Non-zero exit status: 1 > t7512-status-help.sh (Wstat: 256 Tests: 46 > Failed: 3) > Failed tests: 5-6, 29 > Non-zero exit status: 1 > t3400-rebase.sh (Wstat: 256 Tests: 39 > Failed: 1) > Failed test: 30 > Non-zero exit status: 1 > t3404-rebase-interactive.sh (Wstat: 256 Tests: > 131 Failed: 1) > Failed test: 80 > Non-zero exit status: 1 > Files=1031, Tests=30662, 70 wallclock secs ( 8.33 usr 2.13 sys + > 248.60 cusr 516.60 csys = 775.66 CPU) > Result: FAIL > make[1]: *** [Makefile:73: prove] Error 1 > make[1]: Leaving directory '/home/newren/floss/git/t' > make: *** [Makefile:3237: test] Error 2 > > I suspect this is a case where it was testing a version of git that > you built before making the changes. Thanks! You're right. I ran the tests before running make. After running make and testing again, it failed. Meet
diff --git a/merge-recursive.c b/merge-recursive.c index 884ccf99a5..6165993429 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -547,15 +547,15 @@ static struct string_list *get_unmerged(struct index_state *istate) if (!ce_stage(ce)) continue; - item = string_list_lookup(unmerged, ce->name); - if (!item) { - item = string_list_insert(unmerged, ce->name); - item->util = xcalloc(1, sizeof(struct stage_data)); - } + item = string_list_append(unmerged, ce->name); + item->util = xcalloc(1, sizeof(struct stage_data)); + e = item->util; e->stages[ce_stage(ce)].mode = ce->ce_mode; oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid); } + string_list_sort(unmerged); + string_list_remove_duplicates(unmerged, 1); return unmerged; }
Previously, `get_unmerged()` used `string_list_insert()`, which has an O(n^2) complexity due to shifting elements on each insertion. It also called `string_list_lookup()` before insertion, which performs a binary search in O(log n). This combination made insertion costly, especially for large index states, as each new entry required both a search and potentially shifting many elements. Replace `string_list_insert()` with `string_list_append()` to achieve O(n) insertion. After all entries are added, sort the list in O(n log n) and remove duplicates in O(n), reducing the overall complexity to O(n log n). This improves performance significantly for large datasets while maintaining correctness. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> --- merge-recursive.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)