From patchwork Wed Feb 23 12:35:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12756867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D792BC433F5 for ; Wed, 23 Feb 2022 12:35:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240363AbiBWMgG (ORCPT ); Wed, 23 Feb 2022 07:36:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240047AbiBWMgE (ORCPT ); Wed, 23 Feb 2022 07:36:04 -0500 Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEC779E9C0 for ; Wed, 23 Feb 2022 04:35:34 -0800 (PST) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 3402B3201464 for ; Wed, 23 Feb 2022 07:35:34 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 23 Feb 2022 07:35:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm3; bh=WNrVd9TZw72dp++gIuub95XhC9T4rkzfYLy0z/ VuXjk=; b=CXj2cSbzj36fMGua5pyNbBQb6t6zfuENxnlDGzH0VYVxKpc2mCthj+ 0tOY/aqmAus8LRCYcBWhWwM4WCBKqFNVtmBYpizjBd9+HKqPNuef8b8/BPs4Dd0X mlLQuuuSM5g5vebpBWY1/XN1CWjAwfYfZynTe+942oHlWXvAeTxO2obl4mhTzfpB +C4SbZ0Sin8PS52M0On0UdR4xkt6m7WwyXizgR0uLmkzGjipG2eQYE4KOPX8vS24 g/pOD1ObRu5A0Sq6oD3NYNjfAbWF47rhAcrR8Wq1oEmW13LkekKNPXn6pWnrA1Vy AWTr5LZNXBsEEZFxQ4eCOEtdAMCPqggw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=WNrVd9TZw72dp++gI uub95XhC9T4rkzfYLy0z/VuXjk=; b=dQgdHMoCHkR7iNLnxAanX/IdaBCcGn4a2 EF3f2wtJWfA001W1QbIIv8VE1ttFAc+6hua9E84h6BRBBWaeLgjC3Ral3Ito4wUQ u8wrAmNdO/2S6CxxrPdkG106JTNpfRVkE3Wtw/41o/DwCp44qEZKpwCqjsnqluYr 3WAFOGFpEzeAoFVrnT4PRriBkJ0ehbvDsFkH4gda2U+53H+DsIblJKP02JW/QXto 8ubtqZEwyh5kFle6BXqIVU3m8OobT77mARc53Za91BdOZau0FqD1TgrOZ//Q4eEN AwQBiPGQ4jGR+0MSlE5bOydXRpI15RIAUEObKq+lG0dfxwj8q86Rg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrledtgdegtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepfffhvffukfhfgggtuggjsehgtderre dttdejnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshesphhk shdrihhmqeenucggtffrrghtthgvrhhnpeehgfejueevjeetudehgffffeffvdejfeejie dvkeffgfekuefgheevteeufeelkeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 23 Feb 2022 07:35:33 -0500 (EST) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 7b2604ad (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 23 Feb 2022 12:35:32 +0000 (UTC) Date: Wed, 23 Feb 2022 13:35:31 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Message-ID: <80f993dddd521133154a751aeaab86adee409eea.1645619224.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When fetching from a remote repository we will by default write what has been fetched into the special FETCH_HEAD reference. The order in which references are written depends on whether the reference is for merge or not, which, despite some other conditions, is also determined based on whether the old object ID the reference is being updated from actually exists in the repository. To write FETCH_HEAD we thus loop through all references thrice: once for the references that are about to be merged, once for the references that are not for merge, and finally for all references that are ignored. For every iteration, we then look up the old object ID to determine whether the referenced object exists so that we can label it as "not-for-merge" if it doesn't exist. It goes without saying that this can be expensive in case where we are fetching a lot of references. While this is hard to avoid in the case where we're writing FETCH_HEAD, users can in fact ask us to skip this work via `--no-write-fetch-head`. In that case, we do not care for the result of those lookups at all because we don't have to order writes to FETCH_HEAD in the first place. Skip this busywork in case we're not writing to FETCH_HEAD. The following benchmark performs a mirror-fetch in a repository with about two million references: Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~) Time (mean ± σ): 75.388 s ± 1.942 s [User: 71.103 s, System: 8.953 s] Range (min … max): 73.184 s … 76.845 s 3 runs Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD) Time (mean ± σ): 69.486 s ± 1.016 s [User: 65.941 s, System: 8.806 s] Range (min … max): 68.864 s … 70.659 s 3 runs Summary 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran 1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)' Signed-off-by: Patrick Steinhardt --- builtin/fetch.c | 42 +++++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index e8305b6662..4d12c2fd4d 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1146,7 +1146,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, want_status <= FETCH_HEAD_IGNORE; want_status++) { for (rm = ref_map; rm; rm = rm->next) { - struct commit *commit = NULL; struct ref *ref = NULL; if (rm->status == REF_STATUS_REJECT_SHALLOW) { @@ -1157,21 +1156,34 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, } /* - * References in "refs/tags/" are often going to point - * to annotated tags, which are not part of the - * commit-graph. We thus only try to look up refs in - * the graph which are not in that namespace to not - * regress performance in repositories with many - * annotated tags. + * When writing FETCH_HEAD we need to determine whether + * we already have the commit or not. If not, then the + * reference is not for merge and needs to be written + * to the reflog after other commits which we already + * have. We're not interested in this property though + * in case FETCH_HEAD is not to be updated, so we can + * skip the classification in that case. */ - if (!starts_with(rm->name, "refs/tags/")) - commit = lookup_commit_in_graph(the_repository, &rm->old_oid); - if (!commit) { - commit = lookup_commit_reference_gently(the_repository, - &rm->old_oid, - 1); - if (!commit) - rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE; + if (fetch_head->fp) { + struct commit *commit = NULL; + + /* + * References in "refs/tags/" are often going to point + * to annotated tags, which are not part of the + * commit-graph. We thus only try to look up refs in + * the graph which are not in that namespace to not + * regress performance in repositories with many + * annotated tags. + */ + if (!starts_with(rm->name, "refs/tags/")) + commit = lookup_commit_in_graph(the_repository, &rm->old_oid); + if (!commit) { + commit = lookup_commit_reference_gently(the_repository, + &rm->old_oid, + 1); + if (!commit) + rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE; + } } if (rm->fetch_head_status != want_status)