From patchwork Sun Jan 12 04:15:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 11329053 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 735811580 for ; Sun, 12 Jan 2020 04:15:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 519B3214D8 for ; Sun, 12 Jan 2020 04:15:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PPAu55VK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732135AbgALEPe (ORCPT ); Sat, 11 Jan 2020 23:15:34 -0500 Received: from mail-pj1-f73.google.com ([209.85.216.73]:36802 "EHLO mail-pj1-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732129AbgALEPd (ORCPT ); Sat, 11 Jan 2020 23:15:33 -0500 Received: by mail-pj1-f73.google.com with SMTP id m61so4422476pjb.1 for ; Sat, 11 Jan 2020 20:15:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=I8ZJrKMmNogbEz2NWWc/mcIMb8/ne05IGgMc3OYFcQk=; b=PPAu55VKuqPNa3Qylka8UsmRDkp1mtYOKpZYV2WU6kHEJvQWK34+2hQKqiMJfkJur6 jK7W1A8zjd1P/zhderCCisivmzfZwsFfItAmv+IJBpE7qYYsiP/4UWmn03B531rxTk6k Al2IsXqEUPcBks1b+NmISwE2kUEAMlUqYCfkYIvL4yE/vr3BVSoxPr7Xdr0yeCUw1hmL CKnzBgnixrtunB/fphG8IteAsnD2kQfCWjB15i9jVyVIRA5YXR7GA8F8H4uT65GZvHim Esj2FclSAuFHuRE/ipdJbiJDCSUzYbx8vTSTAL45VoA3fhlMCpoDcZazVrbyeziwk4ZT 4ZTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=I8ZJrKMmNogbEz2NWWc/mcIMb8/ne05IGgMc3OYFcQk=; b=GK/FbR0UlDQbcqPZ9San1jFDirmYvIJX+oMp5tQkIP9QfYykSclMo9ofoTivvxdjTC LbpG64xu35pIfETDjxvV5E/+wmA1h1IetKmxmQ/qrZNs8QWQwjcrYHrrB5mY90HyUz78 1qH/94KQ3VH6z3AsEicHLE35VY2YrScHvDP+gxaM3cn3de9bB+xOW/y52z3LZ4pbeRt4 c4oY37zZ28s/LaTeEQc2puZdEQK4mXb8U5U1x5u4hicseAldC7NuCYSw+p5YdZ28YbPu wNNnIFad0vcK6XqUe/NVVMkfWLm5ZTt7SOT01RnQi+Ymf7Bk8VUUKOTCJ0g5Fp5WioqJ bYJw== X-Gm-Message-State: APjAAAWCIelFQKHdNIqNvBSr02Et7T/jB1kXRDLK8/5eZ63MH7I+ix40 glVABvrX9UQOWKXC7YZAErm4y25pcHkD85DvrT9xh7SXiResMG0xDdFlmlVE/VPi+PhA3t3Vj5/ JqUd1WqOyXGNr3Ju/0qoJbbpXlUYswhfjLt43w3UGiJ0ikC7ea0mSgWXWxarNaHdb1uVFbV0lWn m9 X-Google-Smtp-Source: APXvYqy7sRYA8ps2PhjmEausecKgmwSkXNgzQpsN4kKkCKIHMaghu8f16vH3yB9swogdrLicx7B+bHj6z9X8WlaRt5N7 X-Received: by 2002:a65:56c6:: with SMTP id w6mr14697230pgs.167.1578802532934; Sat, 11 Jan 2020 20:15:32 -0800 (PST) Date: Sat, 11 Jan 2020 20:15:24 -0800 In-Reply-To: Message-Id: <6a4f704e475fe1669e63731333fce9ed09d17d0c.1578802317.git.jonathantanmy@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.25.0.rc1.283.g88dfdc4193-goog Subject: [PATCH 1/2] connected: verify promisor-ness of partial clone From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Commit dfa33a298d ("clone: do faster object check for partial clones", 2019-04-21) optimized the connectivity check done when cloning with --filter to check only the existence of objects directly pointed to by refs. But this is not sufficient: they also need to be promisor objects. Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack. Signed-off-by: Jonathan Tan Reviewed-by: Jonathan Nieder --- builtin/clone.c | 5 +++-- connected.c | 19 ++++++++++++++----- connected.h | 11 ++++++----- 3 files changed, 23 insertions(+), 12 deletions(-) diff --git a/builtin/clone.c b/builtin/clone.c index 0fc89ae2b9..0516181052 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -673,7 +673,7 @@ static void update_remote_refs(const struct ref *refs, const char *msg, struct transport *transport, int check_connectivity, - int check_refs_only) + int check_refs_are_promisor_objects_only) { const struct ref *rm = mapped_refs; @@ -682,7 +682,8 @@ static void update_remote_refs(const struct ref *refs, opt.transport = transport; opt.progress = transport->progress; - opt.check_refs_only = !!check_refs_only; + opt.check_refs_are_promisor_objects_only = + !!check_refs_are_promisor_objects_only; if (check_connected(iterate_ref_map, &rm, &opt)) die(_("remote did not send all necessary objects")); diff --git a/connected.c b/connected.c index c337f5f7f4..7e9bd1bc62 100644 --- a/connected.c +++ b/connected.c @@ -52,19 +52,28 @@ int check_connected(oid_iterate_fn fn, void *cb_data, strbuf_release(&idx_file); } - if (opt->check_refs_only) { + if (opt->check_refs_are_promisor_objects_only) { /* * For partial clones, we don't want to have to do a regular * connectivity check because we have to enumerate and exclude * all promisor objects (slow), and then the connectivity check * itself becomes a no-op because in a partial clone every * object is a promisor object. Instead, just make sure we - * received the objects pointed to by each wanted ref. + * received, in a promisor packfile, the objects pointed to by + * each wanted ref. */ do { - if (!repo_has_object_file_with_flags(the_repository, &oid, - OBJECT_INFO_SKIP_FETCH_OBJECT)) - return 1; + struct packed_git *p; + + for (p = get_all_packs(the_repository); p; p = p->next) { + if (!p->pack_promisor) + continue; + if (find_pack_entry_one(oid.hash, p)) + goto promisor_pack_found; + } + return 1; +promisor_pack_found: + ; } while (!fn(cb_data, &oid)); return 0; } diff --git a/connected.h b/connected.h index ce2e7d8f2e..eba5c261ba 100644 --- a/connected.h +++ b/connected.h @@ -48,12 +48,13 @@ struct check_connected_options { unsigned is_deepening_fetch : 1; /* - * If non-zero, only check the top-level objects referenced by the - * wanted refs (passed in as cb_data). This is useful for partial - * clones, where enumerating and excluding all promisor objects is very - * slow and the commit-walk itself becomes a no-op. + * If non-zero, only check that the top-level objects referenced by the + * wanted refs (passed in as cb_data) are promisor objects. This is + * useful for partial clones, where enumerating and excluding all + * promisor objects is very slow and the commit-walk itself becomes a + * no-op. */ - unsigned check_refs_only : 1; + unsigned check_refs_are_promisor_objects_only : 1; }; #define CHECK_CONNECTED_INIT { 0 } From patchwork Sun Jan 12 04:15:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 11329055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E4BF1398 for ; Sun, 12 Jan 2020 04:15:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7D1DD2084D for ; Sun, 12 Jan 2020 04:15:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qSMYzYrX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732138AbgALEPg (ORCPT ); Sat, 11 Jan 2020 23:15:36 -0500 Received: from mail-qk1-f202.google.com ([209.85.222.202]:53955 "EHLO mail-qk1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732129AbgALEPg (ORCPT ); Sat, 11 Jan 2020 23:15:36 -0500 Received: by mail-qk1-f202.google.com with SMTP id 12so3950068qkf.20 for ; Sat, 11 Jan 2020 20:15:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VnQaPM0Mad1A+V1BPIrdiY9f3LDHQ45Vq0GKp0+1A3o=; b=qSMYzYrXD1Ow6LyyQ9ZKHQ6XpKaHAiau5JBvIJSJpxIhVwhPr3vLepoDo+Pg7hHJxG NrsizbjGzn77tqOQgzUpuCNILQW0+3g4KwtC11pXiJVRhHFlyyuJzzcdzzGJtP0M0s/5 4wYDMOuMhb+PSAoiZsftBfjfD3+5ZD0a0j1VUS67viX1RQcRZtZ/1WYn2LPoKvbYP7QR dz5B59Buv8pb9hg0azqnuyPMB3+ZOMEcpoK8NvD4Yu77/1bMI1Oo4xUGsriiJlcVAX+8 fEmEsayth86Dy1uOIl48wMC2Kie2fLp9567yzPVPnmXKHmbN6IQVR2y6/WPjk+kd7fzQ Qa9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VnQaPM0Mad1A+V1BPIrdiY9f3LDHQ45Vq0GKp0+1A3o=; b=BXZzBrgK756RucgTxKsAxwvEzC5Sd4DklwaOARSLuUsUb3T5XC9ZuWqyljPqqwC0tg P5ZkGk41kL/EZT0JpgOMwS6D7dYtfpbR17PFgLyjykl5xa8QgeWFtdL1geKggHG3IdIx 0CqEX28ZIAG0IDOFbmhYnXY+4nj3qTmidQg4KsHoa5v8eVDf8bxAtIG0a0hobb+sfDCr UMoHhFczaaIxohWJinnsAYwLwaCaQ9JhvwAIi277msXZFGwWbFcLwtMGUP2vrnBMDcRJ DukhOEiE04jaRyEVg5InbD8afqvyKjSA7OEVZGj53p16Dw0WXG0MMdyvGvPPXW+9xSDo 9tkg== X-Gm-Message-State: APjAAAXA/i9gwU06VnKace+fWDEPft27fl9JADQs5cXIAS3//LlAr5oH lXTuf+u+QHDCnou8z5G/kYZKufvPpPT3hAxLqwntFGwYjJ1tR1bCko/xTYqlsWhD1fVpt0P+wXf MDpHFjfMVeAkrZ4RTheC6fsoSDph9MLgKJKf+myamxvsLhNeV0j6zh7G+jvr1BuHX2wGs2BNbMo ht X-Google-Smtp-Source: APXvYqzU2C1ohNrvnKmsNRL1mUf2LSpXZ8XSju4MV3rERLJ6t/ZGA8FOrzBqNvVSuu0OBbkgFIIFylYVKDIUyR0V0T1o X-Received: by 2002:a0c:f8ce:: with SMTP id h14mr6048190qvo.91.1578802535279; Sat, 11 Jan 2020 20:15:35 -0800 (PST) Date: Sat, 11 Jan 2020 20:15:25 -0800 In-Reply-To: Message-Id: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.25.0.rc1.283.g88dfdc4193-goog Subject: [PATCH 2/2] fetch: forgo full connectivity check if --filter From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org If a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects. This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time. Signed-off-by: Jonathan Tan Reviewed-by: Jonathan Nieder --- For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of the remaining time is spent in yet another connectivity check (fetch_refs -> check_exist_and_connected) prior to the fetch - that will hopefully be done in a subsequent patch. --- builtin/fetch.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index b4c6d921d0..6fb50320eb 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -906,8 +906,17 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, url = xstrdup("foreign"); if (!connectivity_checked) { + struct check_connected_options opt = CHECK_CONNECTED_INIT; + + if (filter_options.choice) + /* + * Since a filter is specified, objects indirectly + * referenced by refs are allowed to be absent. + */ + opt.check_refs_are_promisor_objects_only = 1; + rm = ref_map; - if (check_connected(iterate_ref_map, &rm, NULL)) { + if (check_connected(iterate_ref_map, &rm, &opt)) { rc = error(_("%s did not send all necessary objects\n"), url); goto abort; }