From patchwork Thu Jan 28 16:24:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12054035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23EACC433E0 for ; Thu, 28 Jan 2021 16:26:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6E0164DED for ; Thu, 28 Jan 2021 16:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232600AbhA1Q0A (ORCPT ); Thu, 28 Jan 2021 11:26:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232490AbhA1QZi (ORCPT ); Thu, 28 Jan 2021 11:25:38 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71BD2C061574 for ; Thu, 28 Jan 2021 08:24:58 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id c12so6011267wrc.7 for ; Thu, 28 Jan 2021 08:24:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=HLFpZg7uyOxrf2psQbC56MyjR+W5113nsSJjBR38XRU=; b=K+Hvv+Hy1isT0qWPvByOy9VvGNhY5BY0yVevfCXu9HBqy3ClQcQgmh3XQVLuGOHG45 FtRf+BBU9bOidnr62OOeyS0zI1CTA38o1PrzsUhzb3yW/fhCGWoVp2DLMHmozCU+FBNX N5UYNaTAGic3rwtmGJf8jBRetKly8zS7DZQv2q5coHimea5IREsEp0WZyQJ+OoaCv9U4 7DFxq7rTff/oJPHHCvJ2P0wYRutabEu7cZ19cswrVg7/sEIB7kcvWyJgXoM9BYqh6Ma9 jZG/CBvUiP8qZdUt5qfQxX1SoaEaEqX6sxgK6L9AbLZH9cDr4QzCBBG9R5nm51ekqbPo 2T6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=HLFpZg7uyOxrf2psQbC56MyjR+W5113nsSJjBR38XRU=; b=dGXAzINvTIuV/0POiohSpRFvukU9DNv+9Ry+mBUxNKOpAFC2DEEJ4StRJPG9rx4cYJ ynq/rrC/BYd3DzwJKqv4ZNXbvczCQtUU2b/9JgeVPev2zd1pYH0JE47Aylf7oqturmCJ faDGtGb8o3DB2EYSEa7QJyadaIBQnADvQn/xKAJcZRGhbqelwUNkjcqy5MrvlAKHvsaS YV4iZpObqTks7zxLYck7xaPZ4Vmja5dV61YxjHRqNM8f6tY0PFaB+HwM5kVBcxPxTh3T Ttq+u7e84wLWbGmmgnovQZ5Uv4l/ioB2a47wNkToQRTkXfc/DrUt0kmZ8ccKkqPeb85o A5SA== X-Gm-Message-State: AOAM532P7qDqsySpla2PydvVAWGCfWNdQc/uMSnlsgqu2XDm71N0YeQn Sci+iy7YvHASAbIvZOw9RScDs3/LAsc= X-Google-Smtp-Source: ABdhPJwFQca6/6a0TQcspdLb5rDn8+BC7vjT8Wm58TmDuWpenfeX19aVXdfJQ1i3Oy3qiB6th49SYQ== X-Received: by 2002:adf:a40e:: with SMTP id d14mr17201494wra.144.1611851096912; Thu, 28 Jan 2021 08:24:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h16sm7563721wrq.29.2021.01.28.08.24.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jan 2021 08:24:56 -0800 (PST) Message-Id: <3fe74e339fc5b7083398f2df51baae5a4a008060.1611851095.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Jan 2021 16:24:52 +0000 Subject: [PATCH 1/3] commit-reach: use one walk in remove_redundant() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Michael Haggerty , me@ttaylorr.com, peff@peff.net, gitster@pobox.net, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The current implementation of remove_redundant() uses several calls to paint_down_to_common() to determine that commits are independent of each other. This leads to quadratic behavior when many inputs are passed to commands such as 'git merge-base'. For example, in the Linux kernel repository, I tested the performance by passing all tags: git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)") (Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do not point to commits.) Here is the performance improvement introduced by this change: Before: 16.4s After: 1.1s The basic approach is to do one commit walk instead of many. First, scan all commits in the list and mark their _parents_ with the STALE flag. This flag will indicate commits that are reachable from one of the inputs, except not including themselves. Then, walk commits until covering all commits up to the minimum generation number pushing the STALE flag throughout. At the end of the walk, commits in the input list that have the STALE flag are reachable from a _different_ commit in the list. These should be moved to the end of the array while the others are shifted to the front. This logic is covered by tests in t6600-test-reach.sh, so the behavior does not change. Signed-off-by: Derrick Stolee --- commit-reach.c | 108 +++++++++++++++++++++++++++++-------------------- 1 file changed, 65 insertions(+), 43 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index e38771ca5a1..677f6f7c3f3 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -164,58 +164,80 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt * the array, and return the number of commits that * are independent from each other. */ - struct commit **work; - unsigned char *redundant; - int *filled_index; - int i, j, filled; + int i, count_non_stale = 0; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; + struct commit **dup; + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; - work = xcalloc(cnt, sizeof(*work)); - redundant = xcalloc(cnt, 1); - ALLOC_ARRAY(filled_index, cnt - 1); + /* Mark all parents of the input as STALE */ + for (i = 0; i < cnt; i++) { + struct commit_list *parents; + timestamp_t generation; - for (i = 0; i < cnt; i++) repo_parse_commit(r, array[i]); - for (i = 0; i < cnt; i++) { - struct commit_list *common; - timestamp_t min_generation = commit_graph_generation(array[i]); + parents = array[i]->parents; + + while (parents) { + repo_parse_commit(r, parents->item); + if (!(parents->item->object.flags & STALE)) { + parents->item->object.flags |= STALE; + prio_queue_put(&queue, parents->item); + } + parents = parents->next; + } + + generation = commit_graph_generation(array[i]); + + if (generation < min_generation) + min_generation = generation; + } + + /* push the STALE bits up to min generation */ + while (queue.nr) { + struct commit_list *parents; + struct commit *c = prio_queue_get(&queue); + + repo_parse_commit(r, c); - if (redundant[i]) + if (commit_graph_generation(c) < min_generation) continue; - for (j = filled = 0; j < cnt; j++) { - timestamp_t curr_generation; - if (i == j || redundant[j]) - continue; - filled_index[filled] = j; - work[filled++] = array[j]; - curr_generation = commit_graph_generation(array[j]); - if (curr_generation < min_generation) - min_generation = curr_generation; + parents = c->parents; + while (parents) { + if (!(parents->item->object.flags & STALE)) { + parents->item->object.flags |= STALE; + prio_queue_put(&queue, parents->item); + } + parents = parents->next; + } + } + + /* rearrange array */ + dup = xcalloc(cnt, sizeof(struct commit *)); + COPY_ARRAY(dup, array, cnt); + for (i = 0; i < cnt; i++) { + if (dup[i]->object.flags & STALE) { + int insert = cnt - 1 - (i - count_non_stale); + array[insert] = dup[i]; + } else { + array[count_non_stale] = dup[i]; + count_non_stale++; + } + } + free(dup); + + /* clear marks */ + for (i = 0; i < cnt; i++) { + struct commit_list *parents; + parents = array[i]->parents; + + while (parents) { + clear_commit_marks(parents->item, STALE); + parents = parents->next; } - common = paint_down_to_common(r, array[i], filled, - work, min_generation); - if (array[i]->object.flags & PARENT2) - redundant[i] = 1; - for (j = 0; j < filled; j++) - if (work[j]->object.flags & PARENT1) - redundant[filled_index[j]] = 1; - clear_commit_marks(array[i], all_flags); - clear_commit_marks_many(filled, work, all_flags); - free_commit_list(common); } - /* Now collect the result */ - COPY_ARRAY(work, array, cnt); - for (i = filled = 0; i < cnt; i++) - if (!redundant[i]) - array[filled++] = work[i]; - for (j = filled, i = 0; i < cnt; i++) - if (redundant[i]) - array[j++] = work[i]; - free(work); - free(redundant); - free(filled_index); - return filled; + return count_non_stale; } static struct commit_list *get_merge_bases_many_0(struct repository *r, From patchwork Thu Jan 28 16:24:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12054033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52D30C433E9 for ; Thu, 28 Jan 2021 16:25:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21F5D64DED for ; Thu, 28 Jan 2021 16:25:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232596AbhA1QZx (ORCPT ); Thu, 28 Jan 2021 11:25:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232540AbhA1QZj (ORCPT ); Thu, 28 Jan 2021 11:25:39 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EBA3C061756 for ; Thu, 28 Jan 2021 08:24:59 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id c4so3305317wru.9 for ; Thu, 28 Jan 2021 08:24:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=oyI67X50ojzhIpnbpuXT9S/gKWGtAIYucyjHS6Dr95E=; b=cPOU8+9TwFoe/MYx9OZCpklsw3Y8bUmo4ZMBnyv16qgOk4sPuqOh8393khpTm7utW+ 1yOSq9A3RPlF0LnAOvKOKzF4kWdtFBtGrL4c75BiBKvTskcezbhI+4Ht3JKtqgzxFlC1 86VoxGBAEIxITY7R6rTzGNmuV5ci7ewi9p4znHRTelcelPJLjML4eFxbdrkmKQDX/r77 ciDJsOLWt5jjymwsR+nhyl4wP3aW2L4Mxc9B7gnKu+Gwc1PtKcFygCNbjdy+3NBLiyAs NpWYdeIG0AeF4eMlmqXL3eaE7/2BQgJxvdPA2IhbUbO5Pg8omCIatadldLN5HVZOrL5Y 6Qtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=oyI67X50ojzhIpnbpuXT9S/gKWGtAIYucyjHS6Dr95E=; b=LRa0L3eub3B4jo0LofJb4/5FwLkwhHwULCjk8Cn2U6N9CizJXmkRqApygyDPVwybkq 5k3IQ3x7R9zCZ1JGw6jUX/nNvGnwWiKZrfCwUCO14D+a/8Y1AIwrqC5Sjozall4uyDtq qZuqUBhNxpcrHXJKM8r4gH43Gvi/UaUWXDuKT0RDMy7Q8crrtEcjODSyjHBi7fnSToF3 ELI6asT1t3JWaRBvpy9c1ESFcBfHHD4YFaGTogmuUlonG1CPhtz0OXTFaLIs+14iFhF0 EF4j2one3lzptHqoz9NhfShE3tWPuUBSRoGZqGu+F2rcktiTQGxogcSHt952pUtPW3cQ MweQ== X-Gm-Message-State: AOAM533bJns0MvEtzUqwuDLNN3XWJtVVoMGUyjxhXlmwLwQ028n68cDh 5yOBTtYA18YEWNgxYZlHwhqZ2Lhh8m4= X-Google-Smtp-Source: ABdhPJwy4tUEnrlIp6wcavpVfraMqZ++TWDT8gcvroxJA0/SMev01swxHiRfPMT9yHkGCQsa8X8Kww== X-Received: by 2002:a5d:6c6b:: with SMTP id r11mr17361132wrz.38.1611851097829; Thu, 28 Jan 2021 08:24:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s4sm6439948wme.38.2021.01.28.08.24.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jan 2021 08:24:57 -0800 (PST) Message-Id: <4c58877a7095f3509df0dda52b4110758aaf3201.1611851095.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 28 Jan 2021 16:24:53 +0000 Subject: [PATCH 2/3] commit-reach: move compare_commits_by_gen Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Michael Haggerty , me@ttaylorr.com, peff@peff.net, gitster@pobox.net, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Move this earlier in the file so it can be used by more methods. Signed-off-by: Derrick Stolee --- commit-reach.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 677f6f7c3f3..783c604a405 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -17,6 +17,21 @@ static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT); +static int compare_commits_by_gen(const void *_a, const void *_b) +{ + const struct commit *a = *(const struct commit * const *)_a; + const struct commit *b = *(const struct commit * const *)_b; + + timestamp_t generation_a = commit_graph_generation(a); + timestamp_t generation_b = commit_graph_generation(b); + + if (generation_a < generation_b) + return -1; + if (generation_a > generation_b) + return 1; + return 0; +} + static int queue_has_nonstale(struct prio_queue *queue) { int i; @@ -583,21 +598,6 @@ int commit_contains(struct ref_filter *filter, struct commit *commit, return repo_is_descendant_of(the_repository, commit, list); } -static int compare_commits_by_gen(const void *_a, const void *_b) -{ - const struct commit *a = *(const struct commit * const *)_a; - const struct commit *b = *(const struct commit * const *)_b; - - timestamp_t generation_a = commit_graph_generation(a); - timestamp_t generation_b = commit_graph_generation(b); - - if (generation_a < generation_b) - return -1; - if (generation_a > generation_b) - return 1; - return 0; -} - int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, From patchwork Thu Jan 28 16:24:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12054037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DA01C433E9 for ; Thu, 28 Jan 2021 16:26:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C52864DF6 for ; Thu, 28 Jan 2021 16:26:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232446AbhA1Q0L (ORCPT ); Thu, 28 Jan 2021 11:26:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231856AbhA1QZl (ORCPT ); Thu, 28 Jan 2021 11:25:41 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8908DC0613D6 for ; Thu, 28 Jan 2021 08:25:00 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id s7so3023124wru.5 for ; Thu, 28 Jan 2021 08:25:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=YrpT/69il7yDuXCPWJ3jVL8tIaNytb7pQst2wWxhzB4=; b=XDEu4Jbr6RBWfdY/oqlVU3DQL/4TIVR92MM8c0SpkSmdksjDAh1xgLPqXNEixaVbfX ZL3xBWgMuVYVo71IgWKYgMKYQQRpBvhYAHuA2ZSoHt2wZu4utt5zmlmh+V0VTa6whybl KbWXKRjQ6mEqeRKeo2fxqd8BajfnI4n4m9hh+fydxB8k+hm48/hVNl0bK64g+MdD2z4M cJY/MdQq5YrEs5dsIC8EanVEc/rfAaCek+W0NhAfSnNN1qrNF8m74n17VQRnSAYwSlV1 DlHBrW2WhOXFXWrLpRRIqrf16VkDNA8SLgrN8EEakbo/MgJjivaI9vbhKD+Gx/g65xFQ nIOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=YrpT/69il7yDuXCPWJ3jVL8tIaNytb7pQst2wWxhzB4=; b=eq8TUQFdHvgMfhlAzcfuemLIYsrgVG4t1KN11RdtxTvqt1eEfuHpeHMYJNlCtbItkN 7fVWOUylCLFWyv3NMZWelU8I5oSMoXhVBVXYKa3otjG7EOKX8arxBlaSSYHJn4/PFNEl W/jVHxy4eFZ4cE9H4+MWToUQjlvZclF6sKUS3nFicArUIDiGZox5z8BdQVsaRIFWzSaz Xe/2JrSMlFVax6Uo7WJu6+Jer2SiAp2RgU4jCBC4mrwhH3xQrKZWXzzB2uh8FlSHLPqG MhM2fISTkrgNopvNGJHN9zXrEtU9MhRbU3BaV2oSSX2CC46CVJkgdns80F50/W0RyTfV p79w== X-Gm-Message-State: AOAM530lAPyAdE5i9fWt0NeSdsxFyXs7wiLzYcoaIpVWgk/W98Q+O/Fc 36XMroqHZPoYk4/Gmv4bFqEnuS60tDc= X-Google-Smtp-Source: ABdhPJwkP/FgE2DWQRbDNuPLcsIc7hm8Y4cQn1JAosIL5fRg+PysavI4akvsCA55zs4Lllwy3W1ihQ== X-Received: by 2002:a5d:69ce:: with SMTP id s14mr16744147wrw.206.1611851099018; Thu, 28 Jan 2021 08:24:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p9sm8091496wrj.11.2021.01.28.08.24.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jan 2021 08:24:58 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 28 Jan 2021 16:24:54 +0000 Subject: [PATCH 3/3] commit-reach: use heuristic in remove_redundant() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Michael Haggerty , me@ttaylorr.com, peff@peff.net, gitster@pobox.net, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Reachability algorithms in commit-reach.c frequently benefit from using the first-parent history as a heuristic for satisfying reachability queries. The most obvious example was implemented in 4fbcca4e (commit-reach: make can_all_from_reach... linear, 2018-07-20). Update the walk in remove_redundant() to use this same heuristic. Here, we are walking starting at the parents of the input commits. Sort those parents and walk from the highest generation to lower. Each time, use the heuristic of searching the first parent history before continuing to expand the walk. The important piece is to ensure we short-circuit the walk when we find that there is a single non-redundant commit. This happens frequently when looking for merge-bases or comparing several tags with 'git merge-base --independent'. Use a new count 'count_still_independent' and if that hits 1 we can stop walking. To update 'count_still_independent' properly, we add use of the RESULT flag on the input commits. Then we can detect when we reach one of these commits and decrease the count. We need to remove the RESULT flag at that moment because we might re-visit that commit when popping the stack. We use the STALE flag to mark parents that have been added to the new walk_start list, but we need to clear that flag before we start walking so those flags don't halt our depth-first-search walk. On my copy of the Linux kernel repository, the performance of 'git merge-base --independent ' goes from 1.1 seconds to 0.11 seconds. Signed-off-by: Derrick Stolee --- commit-reach.c | 72 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 56 insertions(+), 16 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 783c604a405..6032624282e 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -179,10 +179,13 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt * the array, and return the number of commits that * are independent from each other. */ - int i, count_non_stale = 0; + int i, count_non_stale = 0, count_still_independent = cnt; timestamp_t min_generation = GENERATION_NUMBER_INFINITY; struct commit **dup; - struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct commit **walk_start; + size_t walk_start_nr = 0, walk_start_alloc = cnt; + + ALLOC_ARRAY(walk_start, walk_start_alloc); /* Mark all parents of the input as STALE */ for (i = 0; i < cnt; i++) { @@ -190,13 +193,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt timestamp_t generation; repo_parse_commit(r, array[i]); + array[i]->object.flags |= RESULT; parents = array[i]->parents; while (parents) { repo_parse_commit(r, parents->item); if (!(parents->item->object.flags & STALE)) { parents->item->object.flags |= STALE; - prio_queue_put(&queue, parents->item); + ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc); + walk_start[walk_start_nr++] = parents->item; } parents = parents->next; } @@ -207,30 +212,65 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt min_generation = generation; } - /* push the STALE bits up to min generation */ - while (queue.nr) { - struct commit_list *parents; - struct commit *c = prio_queue_get(&queue); + QSORT(walk_start, walk_start_nr, compare_commits_by_gen); - repo_parse_commit(r, c); + /* remove STALE bit for now to allow walking through parents */ + for (i = 0; i < walk_start_nr; i++) + walk_start[i]->object.flags &= ~STALE; - if (commit_graph_generation(c) < min_generation) - continue; + /* + * Start walking from the highest generation. Hopefully, it will + * find all other items during the first-parent walk, and we can + * terminate early. Otherwise, we will do the same amount of work + * as before. + */ + for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) { + /* push the STALE bits up to min generation */ + struct commit_list *stack = NULL; - parents = c->parents; - while (parents) { - if (!(parents->item->object.flags & STALE)) { - parents->item->object.flags |= STALE; - prio_queue_put(&queue, parents->item); + commit_list_insert(walk_start[i], &stack); + walk_start[i]->object.flags |= STALE; + + while (stack) { + struct commit_list *parents; + struct commit *c = stack->item; + + repo_parse_commit(r, c); + + if (c->object.flags & RESULT) { + c->object.flags &= ~RESULT; + if (--count_still_independent <= 1) + break; } - parents = parents->next; + + if (commit_graph_generation(c) < min_generation) { + pop_commit(&stack); + continue; + } + + parents = c->parents; + while (parents) { + if (!(parents->item->object.flags & STALE)) { + parents->item->object.flags |= STALE; + commit_list_insert(parents->item, &stack); + break; + } + parents = parents->next; + } + + /* pop if all parents have been visited already */ + if (!parents) + pop_commit(&stack); } + free_commit_list(stack); } + free(walk_start); /* rearrange array */ dup = xcalloc(cnt, sizeof(struct commit *)); COPY_ARRAY(dup, array, cnt); for (i = 0; i < cnt; i++) { + dup[i]->object.flags &= ~RESULT; if (dup[i]->object.flags & STALE) { int insert = cnt - 1 - (i - count_non_stale); array[insert] = dup[i];