From patchwork Mon Jun 28 05:33:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12347249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EA3DC2B9F4 for ; Mon, 28 Jun 2021 05:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC6176199B for ; Mon, 28 Jun 2021 05:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232103AbhF1Ffj (ORCPT ); Mon, 28 Jun 2021 01:35:39 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:50685 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232124AbhF1Fff (ORCPT ); Mon, 28 Jun 2021 01:35:35 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 80FA75C0126; Mon, 28 Jun 2021 01:33:10 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 28 Jun 2021 01:33:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=Yg+Y2IwpLqve9DrlP+sDQQouuLL S7Bm+AM2yW7+D4rA=; b=UgJJ6eBMTydp43GfP4VRaQqAhkuUFo3wuCCpFgzPMgT e2uY6vyPHvVlZ5hwgsy7vdStqWwyZeonax/D/TE0/d769xPTvAuEiM2lbXyuUZg4 ZxxQlpkpEqwuhM8aeqMUnTDV9RZD7Hut1PpYaZpF0iW8JMrGatMEUILHe9NWiZVf cgJO/oP7vFeMerrTTD1//FOD/M/Fg8MRN7+ki3+gHIK3t9YSEJxKNfhi4oftM/0z 0aeV9YwKLHXIo21oAzZiUdMaBLCya30dEndkhHYYe4V6Uv+LU95GqeUkBNqDM3fY HMJiR3swlwbdA93T52pf2QLg+8cHv7guS6idmUJRQEQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=Yg+Y2I wpLqve9DrlP+sDQQouuLLS7Bm+AM2yW7+D4rA=; b=hTDfqSkGMSKj1gZ6uI25QH It9PFtwgc7MCcYzCuEcjajU1K2OiuirGhSKt9yVerI46olVVXJ4c2UGxL9lnfD7D P9J/7VnYwam256iP0x1mKYtx3YElb1KWBVlBtQXvjAhPzu1vbjcJzHuuvh8IK92t FCwnH80858+kV14M2fS+BgHI1dFjj0QstY9aHqhcQKx6Z5Ai7pmaby/gswC/Tu+g Lwf4VpdlW8ITl/8zobQHIlGK2zyW5dcrPO4qoJW4KCRx4kTijBZIWFxEueo5zMQ7 SKALuvBVFyJfc9gVlCuB0RsCdZtjg2C/ViMRYUEZtOypqlTxWGcbB21A1ZwsmTyA == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfeehfedgledvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeehgefhtdefueffheekgfffudelffejtdfhvdejkedthfehvdelgfetgfdvtedthfen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 28 Jun 2021 01:33:08 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id e8e9907d (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Jun 2021 05:33:08 +0000 (UTC) Date: Mon, 28 Jun 2021 07:33:07 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano Subject: [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Message-ID: <2f6c4e3d102e71104d7d00c78631b149b880609a.1624858240.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We'll the connectivity check logic for git-receive-pack(1) in the following commits to make it perform better. As a preparatory step, add some benchmarks such that we can measure these changes. Signed-off-by: Patrick Steinhardt --- t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100755 t/perf/p5400-receive-pack.sh diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh new file mode 100755 index 0000000000..a945e014a3 --- /dev/null +++ b/t/perf/p5400-receive-pack.sh @@ -0,0 +1,97 @@ +#!/bin/sh + +test_description="Tests performance of receive-pack" + +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + # Create a main branch such that we do not have to rely on any specific + # branch to exist in the perf repository. + git switch --force-create main && + + # Set up a pre-receive hook such that no refs will ever be changed. + # This easily allows multiple perf runs, but still exercises + # server-side reference negotiation and checking for consistency. + mkdir hooks && + write_script hooks/pre-receive <<-EOF && + #!/bin/sh + echo "failed in pre-receive hook" + exit 1 + EOF + cat >config <<-EOF && + [core] + hooksPath=$(pwd)/hooks + EOF + GIT_CONFIG_GLOBAL="$(pwd)/config" && + export GIT_CONFIG_GLOBAL && + + git switch --create updated && + test_commit --no-tag updated +' + +setup_empty() { + git init --bare "$2" +} + +setup_clone() { + git clone --bare --no-local --branch main "$1" "$2" +} + +setup_clone_bitmap() { + git clone --bare --no-local --branch main "$1" "$2" && + git -C "$2" repack -Adb +} + +# Create a reference for each commit in the target repository with extra-refs. +# While this may be an atypical setup, biggish repositories easily end up with +# hundreds of thousands of refs, and this is a good enough approximation. +setup_extrarefs() { + git clone --bare --no-local --branch main "$1" "$2" && + git -C "$2" log --all --format="tformat:create refs/commit/%h %H" | + git -C "$2" update-ref --stdin +} + +# Create a reference for each commit in the target repository with extra-refs. +# While this may be an atypical setup, biggish repositories easily end up with +# hundreds of thousands of refs, and this is a good enough approximation. +setup_extrarefs_bitmap() { + git clone --bare --no-local --branch main "$1" "$2" && + git -C "$2" log --all --format="tformat:create refs/commit/%h %H" | + git -C "$2" update-ref --stdin && + git -C "$2" repack -Adb +} + +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap +do + test_expect_success "$repo setup" ' + rm -rf target.git && + setup_$repo "$(pwd)" target.git + ' + + # If the target repository is the empty one, then the only thing we can + # do is to create a new branch. + case "$repo" in + empty) + refspecs="updated:new";; + *) + refspecs="updated:new updated:main main~10:main :main";; + esac + + for refspec in $refspecs + do + test_expect_success "$repo seed $refspec" " + test_must_fail git push --force target.git '$refspec' \ + --receive-pack='tee pack | git receive-pack' 2>err && + grep 'failed in pre-receive hook' err + " + + test_perf "$repo receive-pack $refspec" " + git receive-pack target.git negotiation && + grep 'pre-receive hook declined' negotiation + " + done +done + +test_done From patchwork Mon Jun 28 05:33:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12347253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BB2FC49EAF for ; Mon, 28 Jun 2021 05:33:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 34BE561C29 for ; Mon, 28 Jun 2021 05:33:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232119AbhF1Ffm (ORCPT ); Mon, 28 Jun 2021 01:35:42 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:52853 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232108AbhF1Ffk (ORCPT ); Mon, 28 Jun 2021 01:35:40 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id DB8D95C00BC; Mon, 28 Jun 2021 01:33:14 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 28 Jun 2021 01:33:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=zn+uMerVuuQkXaCrcow4DAa4iIC o/FcB46h+SWFoLrY=; b=C8c29wBPycCR+zBJSkR7YSFRoAu8SZwS7XF6mEKEuMk egJoZ1y7pm699ps2NsQdCMMAgopUh8TLCwOKB0V/6DUVWDwdnfo0wZ6imZM75Z3S 9mHyc1J2+ih1F5+d1YFFYI+rH2wz4oKvASopalbvlRe6vlgjjgFs3Dezp/1iYDJI 8GhWtN6pcriFFzyH3p1lJpPNQJAfhCbm1dmHJCyM+ea9uXhMHk0FyLzFxQz+6Awl fwUxx6PkCZ4Mz6TSUww+PTJwmItrteg7gYg74DtVOzw0VMPsAMfbzQU9QeY93Qml fw9tGn/wYEKFi2PSS9+tbQk5oClxj4wlbe/BPA2Ghpg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=zn+uMe rVuuQkXaCrcow4DAa4iICo/FcB46h+SWFoLrY=; b=gq5XZpz5MHAlJt+3lMPMQD VrNmOpSj839jwhVkXZyEgAsWSrMrdrBJ5Nq+JBn9cNrCjAF83gbVtEd/whzRMCe7 xPYnGjUAiMrNrm79Sv1c9++QMgQYcwxv/K4DbcyCfkLUK1cbz+FRSQJkY8dK/18Q DFReFFahjMhvWf5+o+fwVGagg3VihOCbVhg2jLBUB8AyE/axrHfNExoymS+/9Wvv Vw8LaceLfVt22PzJ2E54VFyHKd33i5yhx5owyytMOf3aYGJvNJ67FFrutpB0DHgk D0eU9hoMxZHB+HS1qax8whe8vxoMpstOfoya4K2Am31O8AXoLF5pbpSbAgjVh/jg == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfeehfedgledvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeehgefhtdefueffheekgfffudelffejtdfhvdejkedthfehvdelgfetgfdvtedthfen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 28 Jun 2021 01:33:13 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id b2510573 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Jun 2021 05:33:12 +0000 (UTC) Date: Mon, 28 Jun 2021 07:33:11 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano Subject: [PATCH v2 2/3] receive-pack: skip connectivity checks on delete-only commands Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the case where git-receive-pack(1) receives only commands which delete references, then per technical specification the client MUST NOT send a packfile. As a result, we know that no new objects have been received, which makes it a moot point to check whether all received objects are fully connected. Fix this by not doing a connectivity check in case there were no pushed objects. Given that git-rev-walk(1) with only negative references will not do any graph walk, no performance improvements are to be expected. Conceptionally, it is still the right thing to do though. The following tests were executed on linux.git and back up above expectation: Test origin/master HEAD --------------------------------------------------------------------------------------------------------- 5400.4: empty receive-pack updated:new 178.36(428.22+164.36) 177.62(421.33+164.48) -0.4% 5400.7: clone receive-pack updated:new 0.10(0.08+0.02) 0.10(0.08+0.02) +0.0% 5400.9: clone receive-pack updated:main 0.10(0.08+0.02) 0.11(0.08+0.02) +10.0% 5400.11: clone receive-pack main~10:main 0.15(0.11+0.04) 0.15(0.10+0.05) +0.0% 5400.13: clone receive-pack :main 0.01(0.00+0.01) 0.01(0.01+0.00) +0.0% 5400.16: clone_bitmap receive-pack updated:new 0.10(0.07+0.02) 0.09(0.06+0.02) -10.0% 5400.18: clone_bitmap receive-pack updated:main 0.10(0.07+0.02) 0.10(0.08+0.02) +0.0% 5400.20: clone_bitmap receive-pack main~10:main 0.15(0.11+0.03) 0.15(0.12+0.03) +0.0% 5400.22: clone_bitmap receive-pack :main 0.02(0.01+0.01) 0.01(0.00+0.00) -50.0% 5400.25: extrarefs receive-pack updated:new 32.34(20.72+11.86) 32.56(20.82+11.95) +0.7% 5400.27: extrarefs receive-pack updated:main 32.42(21.02+11.61) 32.52(20.64+12.10) +0.3% 5400.29: extrarefs receive-pack main~10:main 32.53(20.74+12.01) 32.39(20.63+11.97) -0.4% 5400.31: extrarefs receive-pack :main 7.13(3.53+3.59) 7.15(3.80+3.34) +0.3% 5400.34: extrarefs_bitmap receive-pack updated:new 32.55(20.72+12.04) 32.65(20.68+12.18) +0.3% 5400.36: extrarefs_bitmap receive-pack updated:main 32.50(20.90+11.86) 32.67(20.93+11.94) +0.5% 5400.38: extrarefs_bitmap receive-pack main~10:main 32.43(20.88+11.75) 32.35(20.68+11.89) -0.2% 5400.40: extrarefs_bitmap receive-pack :main 7.21(3.58+3.63) 7.18(3.61+3.57) -0.4% Signed-off-by: Patrick Steinhardt --- builtin/receive-pack.c | 49 ++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 19 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index a34742513a..b9263cec15 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1918,11 +1918,8 @@ static void execute_commands(struct command *commands, struct shallow_info *si, const struct string_list *push_options) { - struct check_connected_options opt = CHECK_CONNECTED_INIT; struct command *cmd; struct iterate_data data; - struct async muxer; - int err_fd = 0; int run_proc_receive = 0; if (unpacker_error) { @@ -1931,25 +1928,39 @@ static void execute_commands(struct command *commands, return; } - if (use_sideband) { - memset(&muxer, 0, sizeof(muxer)); - muxer.proc = copy_to_sideband; - muxer.in = -1; - if (!start_async(&muxer)) - err_fd = muxer.in; - /* ...else, continue without relaying sideband */ - } - data.cmds = commands; data.si = si; - opt.err_fd = err_fd; - opt.progress = err_fd && !quiet; - opt.env = tmp_objdir_env(tmp_objdir); - if (check_connected(iterate_receive_command_list, &data, &opt)) - set_connectivity_errors(commands, si); - if (use_sideband) - finish_async(&muxer); + /* + * If received commands only consist of deletions, then the client MUST + * NOT send a packfile because there cannot be any new objects in the + * first place. As a result, we do not set up a quarantine environment + * because we know no new objects will be received. And that in turn + * means that we can skip connectivity checks here. + */ + if (tmp_objdir) { + struct check_connected_options opt = CHECK_CONNECTED_INIT; + struct async muxer; + int err_fd = 0; + + if (use_sideband) { + memset(&muxer, 0, sizeof(muxer)); + muxer.proc = copy_to_sideband; + muxer.in = -1; + if (!start_async(&muxer)) + err_fd = muxer.in; + /* ...else, continue without relaying sideband */ + } + + opt.err_fd = err_fd; + opt.progress = err_fd && !quiet; + opt.env = tmp_objdir_env(tmp_objdir); + if (check_connected(iterate_receive_command_list, &data, &opt)) + set_connectivity_errors(commands, si); + + if (use_sideband) + finish_async(&muxer); + } reject_updates_to_hidden(commands); From patchwork Mon Jun 28 05:33:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12347255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF01C2B9F4 for ; Mon, 28 Jun 2021 05:33:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E0C661A1D for ; Mon, 28 Jun 2021 05:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232124AbhF1Ffy (ORCPT ); Mon, 28 Jun 2021 01:35:54 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:45259 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232123AbhF1Ffo (ORCPT ); Mon, 28 Jun 2021 01:35:44 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 3AC745C009A; Mon, 28 Jun 2021 01:33:19 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Mon, 28 Jun 2021 01:33:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=dyDH9Mp1qN04zjse7YSeWzQOTAM NNWPJMNZ/eUDr3pU=; b=g8lNQsLf1rX2cZz3mlCmxn265VVkP8zJ/ubs6PuLcqv z/HN+Gau3ZBbArFv4bKjQNjuZQdwDPrj7Z8PhJBsABXrEhfsMPHaiJBtlVP7QCBp d/2lqWaamAH22N0UUhCPQ4iDs6kH0W4e257T5Hy7f3F7kWpeqRELcj3PzZ/s+3dG ddvGw8nDdDjvKWyW5Ct1DiI8ktHOM1C+HlytOU1pr1GDGAaj/TboktEf0tTUPZcO fSNcsmXAurA2HfB5FbDSWuh+0CojG9qxfpqY3kgvobiDSys0caryWirUO/l4QsyQ i4VUT3icT8j8pNqoexBKXmfMDr7m9uMm4r8nuUDTqpQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=dyDH9M p1qN04zjse7YSeWzQOTAMNNWPJMNZ/eUDr3pU=; b=Ue9AJwIDvoX40c5ysnZ8aa NEhwBDRdVBf2vq9G7vX6RdAj+CAC5Qs7hu0m5uqZ7jMu8YFn4/pDLJWgecY9ECc7 e8mYEsCbz3KgkQfHqdY8fa6FD99iH7b49O3B+FrTTyZ7rALqNDmirEFHmbtjsdg4 WbGTklbqZitHOghfn3hzxPRtCQvpApEfVURKWA6+xrLiMnZxRGMi1NpVL7DGhWqQ ftwuExMS/iI6QA1E069NgfDh2OoYDAO3OTyLmGRfYQ294BC+ieVAtHJ2tdc4Cl83 5KVNxdvQ/vSGk4mvDQbIqVvQAr8cdm1JSBSsJBiK6asaefQ1fCkHQy3JXgEFl8Zw == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfeehfedgledvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeehgefhtdefueffheekgfffudelffejtdfhvdejkedthfehvdelgfetgfdvtedthfen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 28 Jun 2021 01:33:18 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 6a6fec77 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 28 Jun 2021 05:33:16 +0000 (UTC) Date: Mon, 28 Jun 2021 07:33:15 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano Subject: [PATCH v2 3/3] connected: implement connectivity check using bitmaps Message-ID: <7687dedd4722c39b5ecef2c2165147c25d16b8d9.1624858240.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The connectivity checks are currently implemented via git-rev-list(1): we simply ignore any objects which are reachable from preexisting refs via `--not --all`, and pass all new refs which are to be checked via its stdin. While this works well enough for repositories which do not have a lot of references, it's clear that `--not --all` will linearly scale with the number of refs which do exist: for each reference, we'll walk through its commit as well as its five parent commits (defined via `SLOP`). Given that many major hosting sites which use a pull/merge request workflow keep refs to the request's HEAD, this effectively means that `--not --all` will do a nearly complete graph walk. This commit implements an alternate strategy if the target repository has bitmaps. Objects referenced by a bitmap are by definition always fully connected, so they form a natural boundary between old revisions and new revisions. With this alternate strategy, we walk all given new object IDs. Whenever we hit any object which is covered by the bitmap, we stop the walk. The logic only kicks in in case we have a bitmap in the repository. If not, we wouldn't be able to efficiently abort the walk because we cannot easily tell whether an object already exists in the target repository and, if it does, whether it's fully connected. As a result, we'd have to perform a always do graph walk in this case. Naturally, we could do the same thing the previous git-rev-list(1) implementation did in that case and just use preexisting branch tips as boundaries. But for now, we just keep the old implementation for simplicity's sake given that performance characteristics are likely not significantly different. An easier solution may have been to simply add `--use-bitmap-index` to the git-rev-list(1) invocation, but benchmarks have shown that this is not effective: performance characteristics remain the same except for some cases where the bitmap walks performs much worse compared to the non-bitmap walk The following benchmarks have been performed in linux.git: Test origin/master HEAD --------------------------------------------------------------------------------------------------------- 5400.4: empty receive-pack updated:new 176.02(387.28+175.12) 176.86(388.75+175.51) +0.5% 5400.7: clone receive-pack updated:new 0.10(0.09+0.01) 0.08(0.06+0.03) -20.0% 5400.9: clone receive-pack updated:main 0.09(0.08+0.01) 0.09(0.06+0.03) +0.0% 5400.11: clone receive-pack main~10:main 0.14(0.11+0.03) 0.13(0.11+0.02) -7.1% 5400.13: clone receive-pack :main 0.01(0.01+0.00) 0.02(0.01+0.00) +100.0% 5400.16: clone_bitmap receive-pack updated:new 0.10(0.09+0.01) 0.28(0.13+0.15) +180.0% 5400.18: clone_bitmap receive-pack updated:main 0.10(0.08+0.02) 0.28(0.12+0.16) +180.0% 5400.20: clone_bitmap receive-pack main~10:main 0.13(0.11+0.02) 0.26(0.12+0.14) +100.0% 5400.22: clone_bitmap receive-pack :main 0.01(0.01+0.00) 0.01(0.01+0.00) +0.0% 5400.25: extrarefs receive-pack updated:new 32.14(20.76+11.59) 32.35(20.52+12.03) +0.7% 5400.27: extrarefs receive-pack updated:main 32.08(20.54+11.75) 32.10(20.78+11.53) +0.1% 5400.29: extrarefs receive-pack main~10:main 32.14(20.66+11.68) 32.27(20.65+11.83) +0.4% 5400.31: extrarefs receive-pack :main 7.09(3.56+3.53) 7.10(3.70+3.40) +0.1% 5400.34: extrarefs_bitmap receive-pack updated:new 32.41(20.59+12.02) 7.36(3.76+3.60) -77.3% 5400.36: extrarefs_bitmap receive-pack updated:main 32.26(20.53+11.95) 7.34(3.56+3.78) -77.2% 5400.38: extrarefs_bitmap receive-pack main~10:main 32.44(20.77+11.90) 7.40(3.70+3.70) -77.2% 5400.40: extrarefs_bitmap receive-pack :main 7.09(3.62+3.46) 7.17(3.79+3.38) +1.1% As expected, performance doesn't change in cases where we do not have a bitmap available given that the old code path still kicks in. In case we do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1) is slower in a "normal" clone of linux.git, it is significantly faster for a clone with lots of references. The slowness can potentially be explained by the overhead of loading the bitmap. On the other hand, the new code is faster as expected in repos which have lots of references given that we do not have to mark all negative references anymore. Signed-off-by: Patrick Steinhardt --- connected.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++ pack-bitmap.c | 4 +- pack-bitmap.h | 2 + 3 files changed, 140 insertions(+), 2 deletions(-) diff --git a/connected.c b/connected.c index b18299fdf0..474275a52f 100644 --- a/connected.c +++ b/connected.c @@ -6,6 +6,127 @@ #include "transport.h" #include "packfile.h" #include "promisor-remote.h" +#include "object.h" +#include "tree-walk.h" +#include "commit.h" +#include "tag.h" +#include "progress.h" +#include "oidset.h" +#include "pack-bitmap.h" + +#define QUEUED (1u<<0) + +static int queue_object(struct repository *repo, + struct bitmap_index *bitmap, + struct object_array *pending, + const struct object_id *oid) +{ + struct object *object; + + /* + * If the object ID is part of the bitmap, then we know that it must by + * definition be reachable in the target repository and be fully + * connected. We can thus skip checking the objects' referenced + * objects. + */ + if (bitmap_position(bitmap, oid) >= 0) + return 0; + + /* Otherwise the object is queued up for a connectivity check. */ + object = parse_object(repo, oid); + if (!object) { + /* Promisor objects do not need to be traversed. */ + if (is_promisor_object(oid)) + return 0; + return -1; + } + + /* + * If the object has been queued before already, then we don't need to + * do so again. + */ + if (object->flags & QUEUED) + return 0; + object->flags |= QUEUED; + + /* + * We do not need to queue up blobs given that they don't reference any + * other objects anyway. + */ + if (object->type == OBJ_BLOB) + return 0; + + add_object_array(object, NULL, pending); + + return 0; +} + +static int check_object( + struct repository *repo, + struct bitmap_index *bitmap, + struct object_array *pending, + const struct object *object) +{ + int ret = 0; + + if (object->type == OBJ_TREE) { + struct tree *tree = (struct tree *)object; + struct tree_desc tree_desc; + struct name_entry entry; + + if (init_tree_desc_gently(&tree_desc, tree->buffer, tree->size)) + return error(_("cannot parse tree entry")); + while (tree_entry_gently(&tree_desc, &entry)) + ret |= queue_object(repo, bitmap, pending, &entry.oid); + + free_tree_buffer(tree); + } else if (object->type == OBJ_COMMIT) { + struct commit *commit = (struct commit *) object; + struct commit_list *parents; + + for (parents = commit->parents; parents; parents = parents->next) + ret |= queue_object(repo, bitmap, pending, &parents->item->object.oid); + + free_commit_buffer(repo->parsed_objects, commit); + } else if (object->type == OBJ_TAG) { + ret |= queue_object(repo, bitmap, pending, get_tagged_oid((struct tag *) object)); + } else { + return error(_("unexpected object type")); + } + + return ret; +} + +/* + * If the target repository has a bitmap, then we treat all objects reachable + * via the bitmap as fully connected. Bitmapped objects thus serve as the + * boundary between old and new objects. + */ +static int check_connected_with_bitmap(struct repository *repo, + struct bitmap_index *bitmap, + oid_iterate_fn fn, void *cb_data, + struct check_connected_options *opt) +{ + struct object_array pending = OBJECT_ARRAY_INIT; + struct progress *progress = NULL; + size_t checked_objects = 0; + struct object_id oid; + int ret = 0; + + if (opt->progress) + progress = start_delayed_progress("Checking connectivity", 0); + + while (!fn(cb_data, &oid)) + ret |= queue_object(repo, bitmap, &pending, &oid); + while (pending.nr) { + ret |= check_object(repo, bitmap, &pending, object_array_pop(&pending)); + display_progress(progress, ++checked_objects); + } + + stop_progress(&progress); + object_array_clear(&pending); + return ret; +} /* * If we feed all the commits we want to verify to this command @@ -28,12 +149,27 @@ int check_connected(oid_iterate_fn fn, void *cb_data, int err = 0; struct packed_git *new_pack = NULL; struct transport *transport; + struct bitmap_index *bitmap; size_t base_len; if (!opt) opt = &defaults; transport = opt->transport; + bitmap = prepare_bitmap_git(the_repository); + if (bitmap) { + /* + * If we have a bitmap, then we can reuse the bitmap as + * boundary between old and new objects. + */ + err = check_connected_with_bitmap(the_repository, bitmap, + fn, cb_data, opt); + if (opt->err_fd) + close(opt->err_fd); + free_bitmap_index(bitmap); + return err; + } + if (fn(cb_data, &oid)) { if (opt->err_fd) close(opt->err_fd); diff --git a/pack-bitmap.c b/pack-bitmap.c index d90e1d9d8c..d88a882ee1 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -418,8 +418,8 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git, return pos; } -static int bitmap_position(struct bitmap_index *bitmap_git, - const struct object_id *oid) +int bitmap_position(struct bitmap_index *bitmap_git, + const struct object_id *oid) { int pos = bitmap_position_packfile(bitmap_git, oid); return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid); diff --git a/pack-bitmap.h b/pack-bitmap.h index 99d733eb26..7b4b386107 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -63,6 +63,8 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping void free_bitmap_index(struct bitmap_index *); int bitmap_walk_contains(struct bitmap_index *, struct bitmap *bitmap, const struct object_id *oid); +int bitmap_position(struct bitmap_index *bitmap_git, + const struct object_id *oid); /* * After a traversal has been performed by prepare_bitmap_walk(), this can be