From patchwork Wed Jan 16 18:25:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Linus Arver via GitGitGadget X-Patchwork-Id: 10766705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92EC813BF for ; Wed, 16 Jan 2019 18:26:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82E3B2F3E1 for ; Wed, 16 Jan 2019 18:26:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7736F2F3F5; Wed, 16 Jan 2019 18:26:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18E202F3E1 for ; Wed, 16 Jan 2019 18:26:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728723AbfAPS0N (ORCPT ); Wed, 16 Jan 2019 13:26:13 -0500 Received: from mail-ed1-f65.google.com ([209.85.208.65]:44800 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728671AbfAPS0A (ORCPT ); Wed, 16 Jan 2019 13:26:00 -0500 Received: by mail-ed1-f65.google.com with SMTP id y56so6195051edd.11 for ; Wed, 16 Jan 2019 10:25:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:in-reply-to:references:from:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=sKqpJiddQsIhJyLblShqc1itnXeinOHQo+qEu4z8V7Y=; b=X1SQZdpuIknBCUofEzSvDf26T2KqEb3vfN8lYoFzabbVLPQF3iiKEhjEx274927p6P ODXvGozwSEUpor9rxOp2r/IFpbhPagmm98j5rJBHsw7jIY29f9JUb5Dpn+lHb14/7BKq BCCO308f1OmhSHlppltPZr7kH8i4mJTjMVcSkSSdq7QAqbouL0WjUhQXokr1C267E/TL EQN1TcgVLoG9RFurdPMeuDMFubWRV6sYgT0w+BdBTflx3fGoae+GRpdED1MHiuM6RIBz 8Dbo45eYDx5Gs/+37N6hiqYbUGAeT7TGx2fkxrfbY9sEglwEjE3O5gNG+lEOyHrPBBgL yPFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:in-reply-to:references:from :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=sKqpJiddQsIhJyLblShqc1itnXeinOHQo+qEu4z8V7Y=; b=pBIdhJ2tI5f2liUUiFFV5vJRVXHX9L01slBNiv/kDq+Up1xurILS5QnEUJSTXU/1yr Ab03WiiMe6HBXo4nDKShzxLOtJlII9xxtOpoYXTW7tB4xzbB7BNC9pGIw9x9ZI+nDx8O UdryjvM2pJ8EFcf19+lWCqgRGUYjmb3F+xPMPW8KJ7RtmD6b345FlEyxptuOAfGLtaMX F58o/opLTr/Xc8KSgyiJ0rOxeFWU5cDUWbwm2aiPcZFWIYh8DOi0Pb79IIMV7TTzySgz KHKt/D+ynctsAGqRGL0R8GlV8OS1THqpDIZjE7/3D5u5csLxb8jMcbab76Wdl9vqG6g7 Xk6w== X-Gm-Message-State: AJcUukfucFZ3myOnJb/a7lSjiGGf74rGsM+uBp5HRQ1JV4YQWlW9QxuF 0tL6xhILgBtp2KKlT/CdjHYnJys4 X-Google-Smtp-Source: ALg8bN4UFtTxvd9/Zs5IoFcORXQTjiHJlH6OaVcOMU11cSQC4q0FECxzq/T3b6AF99QoiyhAFwMMWg== X-Received: by 2002:a50:ad0b:: with SMTP id y11mr8228601edc.113.1547663158582; Wed, 16 Jan 2019 10:25:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n10sm5597101edq.33.2019.01.16.10.25.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 10:25:58 -0800 (PST) Date: Wed, 16 Jan 2019 10:25:58 -0800 (PST) X-Google-Original-Date: Wed, 16 Jan 2019 18:25:51 GMT Message-Id: <02ef702884df2ad96be25486837b4009fb997c97.1547663156.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Subject: [PATCH v5 1/5] revision: add mark_tree_uninteresting_sparse Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, jrnieder@gmail.com, ramsay@ramsayjones.plus.com, Junio C Hamano , Derrick Stolee Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Derrick Stolee In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to recursively walk the trees and blobs. This will walk the same number of trees as the old mechanism. To ensure that mark_tree_uninteresting walks the tree, we need to remove the UNINTERESTING flag before calling the method. This implementation will be replaced entirely in a later commit. There is one new assumption in this approach: we are also given the oids of the interesting trees. This implementation does not use those trees at the moment, but we will use them in a later rewrite of this method. Signed-off-by: Derrick Stolee --- revision.c | 25 +++++++++++++++++++++++++ revision.h | 2 ++ 2 files changed, 27 insertions(+) diff --git a/revision.c b/revision.c index 13e0519c02..60421f3a10 100644 --- a/revision.c +++ b/revision.c @@ -99,6 +99,31 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree) mark_tree_contents_uninteresting(r, tree); } +void mark_trees_uninteresting_sparse(struct repository *r, + struct oidset *trees) +{ + struct object_id *oid; + struct oidset_iter iter; + + oidset_iter_init(trees, &iter); + while ((oid = oidset_iter_next(&iter))) { + struct tree *tree = lookup_tree(r, oid); + + if (!tree) + continue; + + if (tree->object.flags & UNINTERESTING) { + /* + * Remove the flag so the next call + * is not a no-op. The flag is added + * in mark_tree_unintersting(). + */ + tree->object.flags ^= UNINTERESTING; + mark_tree_uninteresting(r, tree); + } + } +} + struct commit_stack { struct commit **items; size_t nr, alloc; diff --git a/revision.h b/revision.h index 7987bfcd2e..df684701b9 100644 --- a/revision.h +++ b/revision.h @@ -67,6 +67,7 @@ struct rev_cmdline_info { #define REVISION_WALK_NO_WALK_SORTED 1 #define REVISION_WALK_NO_WALK_UNSORTED 2 +struct oidset; struct topo_walk_info; struct rev_info { @@ -327,6 +328,7 @@ void put_revision_mark(const struct rev_info *revs, void mark_parents_uninteresting(struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); +void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); void show_object_with_name(FILE *, struct object *, const char *); From patchwork Wed Jan 16 18:25:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Linus Arver via GitGitGadget X-Patchwork-Id: 10766691 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99C6513BF for ; Wed, 16 Jan 2019 18:26:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 881A62F3E1 for ; Wed, 16 Jan 2019 18:26:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7C1B92F3F4; Wed, 16 Jan 2019 18:26:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 709632F3E1 for ; Wed, 16 Jan 2019 18:26:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728706AbfAPS0D (ORCPT ); Wed, 16 Jan 2019 13:26:03 -0500 Received: from mail-ed1-f68.google.com ([209.85.208.68]:39182 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728687AbfAPS0C (ORCPT ); Wed, 16 Jan 2019 13:26:02 -0500 Received: by mail-ed1-f68.google.com with SMTP id b14so6221834edt.6 for ; Wed, 16 Jan 2019 10:26:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:in-reply-to:references:from:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=4DEBqdCRVwjep6I57BbIk3f4QzA+V2WCte1kUBq3GNE=; b=rug9X6rp4CiOzbuAYMoG5EhzC/Hl3dMWYwrMRZ4bGDArgqDwv6GpS8Ruuema6oBJkY Se/Uwm8RUh/veXc7kI4t2fcHTKwzh8Q9pYU+wgSKunbHgWDGJyZzQRzlmOCfF7SrKG2k zgadGeMVPfxH5R5VXUrnP1wv97Qs4Cdta9YoUSf2TAwqqQpJ0THYK+SU+o94GPTu8U6d hp2kpuZyfoGqNyCKJI1afQUZbM7vajOVFhivsvHdTiOFgBjesa75DGEduqGstHYGzS9M 38tAGg0dxH/5QD+rf8jSDq8MHXWSqcubzqGC9NxqjpesN5KaXEj6s+Tvl/PU7Zg1gjnI WvGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:in-reply-to:references:from :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=4DEBqdCRVwjep6I57BbIk3f4QzA+V2WCte1kUBq3GNE=; b=sku0XAMt6v9iuklSdyzrxDqicqALdWkrEMFa95XpjX3vltIwaea1QP31LP3kZ10NMt 1BO83Cesuzwqh4OcvmfGkXvz6oWjlRSBBunDpA9nOXVaCiTNzv5EUDsoBXSEO+Q4FWhR vzqq/C8UfZBpftjcX9SnoKzbm0hpgZyoSHhDHu8ky7uUSfnI18t/jNOUfN1HVysuHaVP OWnk/yUqOorAZtSTh/i6Dz/BVS5kdCKTeq6Rc7+kLuGmu+By/giJ8/jbcFsABn3YlfSr YJepCbdAaAE4Cahf5RI6ry3eCv5KAbSQtkYUcXLlfqWefwTu0Q7N3wOzLlbyfKdE6oDN QfdA== X-Gm-Message-State: AJcUukc5sfll0z0niWaciv8mFovRh0B+zQKQKDj09nSMChf5RTc6ZzlA j72Wvtxw3YE6EtW4perSOHYqF1TY X-Google-Smtp-Source: ALg8bN5k4V7hSo0i+or8ISAtrWLZ1Mnj15ujrEGXxStQLLEXMx4gfDYNdQl41yORqQ9gSoF4CjVnfg== X-Received: by 2002:a17:906:7f89:: with SMTP id f9-v6mr7590703ejr.51.1547663159530; Wed, 16 Jan 2019 10:25:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n29sm5873461edd.40.2019.01.16.10.25.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 10:25:58 -0800 (PST) Date: Wed, 16 Jan 2019 10:25:58 -0800 (PST) X-Google-Original-Date: Wed, 16 Jan 2019 18:25:52 GMT Message-Id: <0747f7494ef2be5e61bad610d2fe0b7db617bb49.1547663156.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Subject: [PATCH v5 2/5] list-objects: consume sparse tree walk Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, jrnieder@gmail.com, ramsay@ramsayjones.plus.com, Junio C Hamano , Derrick Stolee Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Derrick Stolee When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new objects to send to the server as a thin pack. We walk these commits until we discover a frontier of commits such that every commit walk starting at interesting commits ends in a root commit or unintersting commit. We then need to discover which non-commit objects are reachable from uninteresting commits. This commit walk is not changing during this series. The mark_edges_uninteresting() method in list-objects.c iterates on the commit list and does the following: * If the commit is UNINTERSTING, then mark its root tree and every object it can reach as UNINTERESTING. * If the commit is interesting, then mark the root tree of every UNINTERSTING parent (and all objects that tree can reach) as UNINTERSTING. At the very end, we repeat the process on every commit directly given to the revision walk from stdin. This helps ensure we properly cover shallow commits that otherwise were not included in the frontier. The logic to recursively follow trees is in the mark_tree_uninteresting() method in revision.c. The algorithm avoids duplicate work by not recursing into trees that are already marked UNINTERSTING. Add a new 'sparse' option to the mark_edges_uninteresting() method that performs this logic in a slightly different way. As we iterate over the commits, we add all of the root trees to an oidset. Then, call mark_trees_uninteresting_sparse() on that oidset. Note that we include interesting trees in this process. The current implementation of mark_trees_unintersting_sparse() will walk the same trees as the old logic, but this will be replaced in a later change. Add a '--sparse' flag in 'git pack-objects' to call this new logic. Add a new test script t/t5322-pack-objects-sparse.sh that tests this option. The tests currently demonstrate that the resulting object list is the same as the old algorithm. This includes a case where both algorithms pack an object that is not needed by a remote due to limits on the explored set of trees. When the sparse algorithm is changed in a later commit, we will add a test that demonstrates a change of behavior in some cases. Signed-off-by: Derrick Stolee --- Documentation/git-pack-objects.txt | 11 ++- bisect.c | 2 +- builtin/pack-objects.c | 5 +- builtin/rev-list.c | 2 +- http-push.c | 2 +- list-objects.c | 70 +++++++++++++++--- list-objects.h | 4 +- t/t5322-pack-objects-sparse.sh | 113 +++++++++++++++++++++++++++++ 8 files changed, 192 insertions(+), 17 deletions(-) create mode 100755 t/t5322-pack-objects-sparse.sh diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index 40c825c381..e45f3e680d 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -14,7 +14,7 @@ SYNOPSIS [--local] [--incremental] [--window=] [--depth=] [--revs [--unpacked | --all]] [--keep-pack=] [--stdout [--filter=] | base-name] - [--shallow] [--keep-true-parents] < object-list + [--shallow] [--keep-true-parents] [--sparse] < object-list DESCRIPTION @@ -196,6 +196,15 @@ depth is 4095. Add --no-reuse-object if you want to force a uniform compression level on all data no matter the source. +--sparse:: + Use the "sparse" algorithm to determine which objects to include in + the pack, when combined with the "--revs" option. This algorithm + only walks trees that appear in paths that introduce new objects. + This can have significant performance benefits when computing + a pack to send a small change. However, it is possible that extra + objects are added to the pack-file if the included commits contain + certain types of direct renames. + --thin:: Create a "thin" pack by omitting the common objects between a sender and a receiver in order to reduce network transfer. This diff --git a/bisect.c b/bisect.c index 487675c672..842f8b4b8f 100644 --- a/bisect.c +++ b/bisect.c @@ -656,7 +656,7 @@ static void bisect_common(struct rev_info *revs) if (prepare_revision_walk(revs)) die("revision walk setup failed"); if (revs->tree_objects) - mark_edges_uninteresting(revs, NULL); + mark_edges_uninteresting(revs, NULL, 0); } static void exit_if_skipped_commits(struct commit_list *tried, diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 411aefd687..7d5b0735e3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -84,6 +84,7 @@ static unsigned long pack_size_limit; static int depth = 50; static int delta_search_threads; static int pack_to_stdout; +static int sparse; static int thin; static int num_preferred_base; static struct progress *progress_state; @@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av) if (prepare_revision_walk(&revs)) die(_("revision walk setup failed")); - mark_edges_uninteresting(&revs, show_edge); + mark_edges_uninteresting(&revs, show_edge, sparse); if (!fn_show_object) fn_show_object = show_object; @@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) { OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"), N_("unpack unreachable objects newer than