From patchwork Thu Feb 4 03:58:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F40AC433DB for ; Thu, 4 Feb 2021 04:00:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D118364F6A for ; Thu, 4 Feb 2021 04:00:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232416AbhBDEAa (ORCPT ); Wed, 3 Feb 2021 23:00:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232814AbhBDD7f (ORCPT ); Wed, 3 Feb 2021 22:59:35 -0500 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC674C0613D6 for ; Wed, 3 Feb 2021 19:58:54 -0800 (PST) Received: by mail-qt1-x82d.google.com with SMTP id z22so1505615qto.7 for ; Wed, 03 Feb 2021 19:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jYKeWecrw8cLxAZ34m2igv6eAKHd7g+fCAOupy0iX/w=; b=cMt0V38cey8XcucsgCVIsJFzCBTifqtI02gSKaujLOrJdM4AhPD6DF5RhAKiSHByT+ m6YGfUyMcj4TkTsQ+eEPD5Du+t/VVM6kvSF0m1NaJgJ3QqdUEXaq+LgdQPJnpxY+RGCh DpVYvQlDMO0zFT+aL5djsrdfMLAzr76X4tbe3TDjDnOA+o4vvMhvkgoiz2tZYt4f4H01 g3u6X2CI3/VP/ApCuULfUhxpO0ENUI8nFaZUmrWMcbMmg8Ef9m+A9Gt1wuAUHE7lU0nN 4gfk1pjx1F5hNQ/9Y3JMt3QR5cdrMcWQKM0oyxd+PlUU54dlwh0zilVQRaIeaZ7Ui5fW K2Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jYKeWecrw8cLxAZ34m2igv6eAKHd7g+fCAOupy0iX/w=; b=ZwOeEWs+3wT1ccpH2DBhtwjrBmsZyHFafKalHpEabuUKnGfTDFCZGfS1VRR47zTst8 U4103H42Z+3yNvSlqjwJyjPWgquvAGt4wHKSWt9WOZl/4dPs+rrXO2VinjCf6SjN2ctj ed5/c+l3UagmUGYZ0x8ZnWJ++nxHStTZAVQRDDZm/w+5qdtP+SkqBcLJNcQkj7hGxCm9 xFpusZF4r9xc6eSRNgDcfwzPmD4Hgjy8In/2M6kjiVa/NcPjYIHUy0LZmeNDxg7uy/xC 0cAyjQLsVozLlZqilWdf+gPdQYlkR/yWbDjDGPS4fgLowxMYi/k7TG6neTL7fFT0+DJ4 ZCrQ== X-Gm-Message-State: AOAM531SWUYwRcPPqKoyYaGKUyM+fTzEjeNI2w3VbsU63zZidLuh0bfX eiB9xsmwYOpCtjNDHjE4KoycfJFG0bzgzQ== X-Google-Smtp-Source: ABdhPJySRtj68V/FSdqDi4VaVh4sn8W9oBWEwXyCo+8fDx2bNtFerW8eYZr5CHegjgu61Cq6e8nlTQ== X-Received: by 2002:ac8:5995:: with SMTP id e21mr5438293qte.294.1612411133517; Wed, 03 Feb 2021 19:58:53 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id i3sm3854839qkd.119.2021.02.03.19.58.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:58:52 -0800 (PST) Date: Wed, 3 Feb 2021 22:58:50 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Future callers will want a function to fill a 'struct pack_entry' for a given object id but _only_ from its position in any kept pack(s). In particular, an new 'git repack' mode which ensures the resulting packs form a geometric progress by object count will mark packs that it does not want to repack as "kept in-core", and it will want to halt a reachability traversal as soon as it visits an object in any of the kept packs. But, it does not want to halt the traversal at non-kept, or .keep packs. The obvious alternative is 'find_pack_entry()', but this doesn't quite suffice since it only returns the first pack it finds, which may or may not be kept (and the mru cache makes it unpredictable which one you'll get if there are options). Short of that, you could walk over all packs looking for the object in each one, but it scales with the number of packs, which may be prohibitive. Introduce 'find_kept_pack_entry()', a function which is like 'find_pack_entry()', but only fills in objects in the kept packs. Handle packs which have .keep files, as well as in-core kept packs separately, since certain callers will want to distinguish one from the other. (Though on-disk and in-core kept packs share the adjective "kept", it is best to think of the two sets as independent.) There is a gotcha when looking up objects that are duplicated in kept and non-kept packs, particularly when the MIDX stores the non-kept version and the caller asked for kept objects only. This could be resolved by teaching the MIDX to resolve duplicates by always favoring the kept pack (if one exists), but this breaks an assumption in existing MIDXs, and so it would require a format change. The benefit to changing the MIDX in this way is marginal, so we instead have a more thorough check here which is explained with a comment. Callers will be added in subsequent patches. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- packfile.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++----- packfile.h | 6 +++++ 2 files changed, 65 insertions(+), 5 deletions(-) diff --git a/packfile.c b/packfile.c index 4b938b4372..5f35cfe788 100644 --- a/packfile.c +++ b/packfile.c @@ -2031,7 +2031,10 @@ static int fill_pack_entry(const struct object_id *oid, return 1; } -int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) +static int find_one_pack_entry(struct repository *r, + const struct object_id *oid, + struct pack_entry *e, + int kept_only) { struct list_head *pos; struct multi_pack_index *m; @@ -2041,26 +2044,77 @@ int find_pack_entry(struct repository *r, const struct object_id *oid, struct pa return 0; for (m = r->objects->multi_pack_index; m; m = m->next) { - if (fill_midx_entry(r, oid, e, m)) + if (!fill_midx_entry(r, oid, e, m)) + continue; + + if (!kept_only) + return 1; + + if (((kept_only & ON_DISK_KEEP_PACKS) && e->p->pack_keep) || + ((kept_only & IN_CORE_KEEP_PACKS) && e->p->pack_keep_in_core)) return 1; } list_for_each(pos, &r->objects->packed_git_mru) { struct packed_git *p = list_entry(pos, struct packed_git, mru); - if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { - list_move(&p->mru, &r->objects->packed_git_mru); - return 1; + if (p->multi_pack_index && !kept_only) { + /* + * If this pack is covered by the MIDX, we'd have found + * the object already in the loop above if it was here, + * so don't bother looking. + * + * The exception is if we are looking only at kept + * packs. An object can be present in two packs covered + * by the MIDX, one kept and one not-kept. And as the + * MIDX points to only one copy of each object, it might + * have returned only the non-kept version above. We + * have to check again to be thorough. + */ + continue; + } + if (!kept_only || + (((kept_only & ON_DISK_KEEP_PACKS) && p->pack_keep) || + ((kept_only & IN_CORE_KEEP_PACKS) && p->pack_keep_in_core))) { + if (fill_pack_entry(oid, e, p)) { + list_move(&p->mru, &r->objects->packed_git_mru); + return 1; + } } } return 0; } +int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) +{ + return find_one_pack_entry(r, oid, e, 0); +} + +int find_kept_pack_entry(struct repository *r, + const struct object_id *oid, + unsigned flags, + struct pack_entry *e) +{ + /* + * Load all packs, including midx packs, since our "kept" strategy + * relies on that. We're relying on the side effect of it setting up + * r->objects->packed_git, which is a little ugly. + */ + get_all_packs(r); + return find_one_pack_entry(r, oid, e, flags); +} + int has_object_pack(const struct object_id *oid) { struct pack_entry e; return find_pack_entry(the_repository, oid, &e); } +int has_object_kept_pack(const struct object_id *oid, unsigned flags) +{ + struct pack_entry e; + return find_kept_pack_entry(the_repository, oid, flags, &e); +} + int has_pack_index(const unsigned char *sha1) { struct stat st; diff --git a/packfile.h b/packfile.h index a58fc738e0..624327f64d 100644 --- a/packfile.h +++ b/packfile.h @@ -161,13 +161,19 @@ int packed_object_info(struct repository *r, void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1); +#define ON_DISK_KEEP_PACKS 1 +#define IN_CORE_KEEP_PACKS 2 +#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS) + /* * Iff a pack file in the given repository contains the object named by sha1, * return true and store its location to e. */ int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e); +int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsigned flags, struct pack_entry *e); int has_object_pack(const struct object_id *oid); +int has_object_kept_pack(const struct object_id *oid, unsigned flags); int has_pack_index(const unsigned char *sha1); From patchwork Thu Feb 4 03:58:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66E26C433DB for ; Thu, 4 Feb 2021 04:00:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1985C64F6A for ; Thu, 4 Feb 2021 04:00:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231499AbhBDEA1 (ORCPT ); Wed, 3 Feb 2021 23:00:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231628AbhBDD7l (ORCPT ); Wed, 3 Feb 2021 22:59:41 -0500 Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5ECFC0613ED for ; Wed, 3 Feb 2021 19:59:00 -0800 (PST) Received: by mail-qt1-x833.google.com with SMTP id o18so1488025qtp.10 for ; Wed, 03 Feb 2021 19:59:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=gxrlNMz2QDR056RmX2LunugGE3qRRmBmpvjbiNyIYLo=; b=i/dAdc83+RxjxFeIqw/nUBK4yDJq7laOuBuGC7oOJhNkohUCX1+pUyzVSDMpRswZ94 VtSHjRa9BnV4/1kJ5JUJEvr401BhaP5zTvwzvnkjlyoKb0ApYaaYQcXkRtSfvlQjSwlY +JLRGybWVXl9XlFhymsRF7xULaYR71GQJfTDKMQWRVMlKQhdQ6jF9t6vlX3oReiRstFq Q4NxEFtR+PVHYTRsP4oNqlXa4y0hYR7R+PT8xoTeBqOZTGywSi9l2Yl26ZclZq8WR/C6 P6MInkJEjDLThutNJCFhd8tlQIhbYTQvvxqErZtmvS2/YdcyJF7UoMyid63gRhsTJ0cu Ycug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gxrlNMz2QDR056RmX2LunugGE3qRRmBmpvjbiNyIYLo=; b=UgBpXuhYPWjdcF6iUNAjv5hToNC448p+G7WN5qL06TDqfwZHPfQ+liEcRAdYBvTBfm 7gCvmg/MMribLiovMaJwSO3wWf8w3hsJiWuaxTiQk51uGSzXwH8ENfBEPtI7M/Z99hzn S+D2R3ArDjawTf1EnNBT4HU3Eru57DcHKSConiEl0iJud6y84nysmFRghtd2iBl5IgNE T3fRUkFm4i8sNTyfDcsJUyAheUZq+FUqzKy+UAGgsb8hYRhzP80Su0qF55k4W/+Q94kT ym0BGAVM4OwClNfCw9oiSPgBjh8HNUE3cZimfo8HtSDCOzxpZtWYv8NCziE0kV0d7feR eMnA== X-Gm-Message-State: AOAM533Z5p8x1olKUJBIJQ3tR9Q6U2k8XSCF/aVYfdAnBNBoQZJRwmI+ UgwtI3HJxqvx7iVHnHJz/7v9olnrOlK/gA== X-Google-Smtp-Source: ABdhPJwf+abhOtcT2tP5VF7Um15SlUG9+bbsJfmTxC3XouonzmiMjcgRFBFo7fP3wVhIGAIfzEXw/Q== X-Received: by 2002:ac8:78a:: with SMTP id l10mr5549849qth.10.1612411139796; Wed, 03 Feb 2021 19:58:59 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id k90sm2200249qtd.0.2021.02.03.19.58.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:58:59 -0800 (PST) Date: Wed, 3 Feb 2021 22:58:57 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 2/8] revision: learn '--no-kept-objects' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A future caller will want to be able to perform a reachability traversal which terminates when visiting an object found in a kept pack. The closest existing option is '--honor-pack-keep', but this isn't quite what we want. Instead of halting the traversal midway through, a full traversal is always performed, and the results are only trimmed afterwords. Besides needing to introduce a new flag (since culling results post-facto can be different than halting the traversal as it's happening), there is an additional wrinkle handling the distinction in-core and on-disk kept packs. That is: what kinds of kept pack should stop the traversal? Introduce '--no-kept-objects[=]' to specify which kinds of kept packs, if any, should stop a traversal. This can be useful for callers that want to perform a reachability analysis, but want to leave certain packs alone (for e.g., when doing a geometric repack that has some "large" packs which are kept in-core that it wants to leave alone). Signed-off-by: Taylor Blau --- Documentation/rev-list-options.txt | 7 +++ list-objects.c | 7 +++ revision.c | 15 +++++++ revision.h | 4 ++ t/t6114-keep-packs.sh | 69 ++++++++++++++++++++++++++++++ 5 files changed, 102 insertions(+) create mode 100755 t/t6114-keep-packs.sh diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 96cc89d157..f611832277 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -861,6 +861,13 @@ ifdef::git-rev-list[] Only useful with `--objects`; print the object IDs that are not in packs. +--no-kept-objects[=]:: + Halts the traversal as soon as an object in a kept pack is + found. If `` is `on-disk`, only packs with a corresponding + `*.keep` file are ignored. If `` is `in-core`, only packs + with their in-core kept state set are ignored. Otherwise, both + kinds of kept packs are ignored. + --object-names:: Only useful with `--objects`; print the names of the object IDs that are found. This is the default behavior. diff --git a/list-objects.c b/list-objects.c index e19589baa0..b06c3bfeba 100644 --- a/list-objects.c +++ b/list-objects.c @@ -338,6 +338,13 @@ static void traverse_trees_and_blobs(struct traversal_context *ctx, ctx->show_object(obj, name, ctx->show_data); continue; } + if (ctx->revs->no_kept_objects) { + struct pack_entry e; + if (find_kept_pack_entry(ctx->revs->repo, &obj->oid, + ctx->revs->keep_pack_cache_flags, + &e)) + continue; + } if (!path) path = ""; if (obj->type == OBJ_TREE) { diff --git a/revision.c b/revision.c index fbc3e607fd..4c5adb90b1 100644 --- a/revision.c +++ b/revision.c @@ -2336,6 +2336,16 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->unpacked = 1; } else if (starts_with(arg, "--unpacked=")) { die(_("--unpacked= no longer supported")); + } else if (!strcmp(arg, "--no-kept-objects")) { + revs->no_kept_objects = 1; + revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; + } else if (skip_prefix(arg, "--no-kept-objects=", &optarg)) { + revs->no_kept_objects = 1; + if (!strcmp(optarg, "in-core")) + revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + if (!strcmp(optarg, "on-disk")) + revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; } else if (!strcmp(arg, "-r")) { revs->diff = 1; revs->diffopt.flags.recursive = 1; @@ -3797,6 +3807,11 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi return commit_ignore; if (revs->unpacked && has_object_pack(&commit->object.oid)) return commit_ignore; + if (revs->no_kept_objects) { + if (has_object_kept_pack(&commit->object.oid, + revs->keep_pack_cache_flags)) + return commit_ignore; + } if (commit->object.flags & UNINTERESTING) return commit_ignore; if (revs->line_level_traverse && !want_ancestry(revs)) { diff --git a/revision.h b/revision.h index e6be3c845e..a20a530d52 100644 --- a/revision.h +++ b/revision.h @@ -148,6 +148,7 @@ struct rev_info { edge_hint_aggressive:1, limited:1, unpacked:1, + no_kept_objects:1, boundary:2, count:1, left_right:1, @@ -317,6 +318,9 @@ struct rev_info { * This is loaded from the commit-graph being used. */ struct bloom_filter_settings *bloom_filter_settings; + + /* misc. flags related to '--no-kept-objects' */ + unsigned keep_pack_cache_flags; }; int ref_excluded(struct string_list *, const char *path); diff --git a/t/t6114-keep-packs.sh b/t/t6114-keep-packs.sh new file mode 100755 index 0000000000..9239d8aa46 --- /dev/null +++ b/t/t6114-keep-packs.sh @@ -0,0 +1,69 @@ +#!/bin/sh + +test_description='rev-list with .keep packs' +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit loose && + test_commit packed && + test_commit kept && + + KEPT_PACK=$(git pack-objects --revs .git/objects/pack/pack <<-EOF + refs/tags/kept + ^refs/tags/packed + EOF + ) && + MISC_PACK=$(git pack-objects --revs .git/objects/pack/pack <<-EOF + refs/tags/packed + ^refs/tags/loose + EOF + ) && + + touch .git/objects/pack/pack-$KEPT_PACK.keep +' + +rev_list_objects () { + git rev-list "$@" >out && + sort out +} + +idx_objects () { + git show-index <$1 >expect-idx && + cut -d" " -f2 kept && + rev_list_objects --objects --all --no-object-names --no-kept-objects >no-kept && + + idx_objects .git/objects/pack/pack-$KEPT_PACK.idx >expect && + comm -3 kept no-kept >actual && + + test_cmp expect actual +' + +test_expect_success '--no-kept-objects excludes kept non-MIDX object' ' + test_config core.multiPackIndex true && + + # Create a pack with just the commit object in pack, and do not mark it + # as kept (even though it appears in $KEPT_PACK, which does have a .keep + # file). + MIDX_PACK=$(git pack-objects .git/objects/pack/pack <<-EOF + $(git rev-parse kept) + EOF + ) && + + # Write a MIDX containing all packs, but use the version of the commit + # at "kept" in a non-kept pack by touching $MIDX_PACK. + touch .git/objects/pack/pack-$MIDX_PACK.pack && + git multi-pack-index write && + + rev_list_objects --objects --no-object-names --no-kept-objects HEAD >actual && + ( + idx_objects .git/objects/pack/pack-$MISC_PACK.idx && + git rev-list --objects --no-object-names refs/tags/loose + ) | sort >expect && + test_cmp expect actual +' + +test_done From patchwork Thu Feb 4 03:59:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82F07C433E0 for ; Thu, 4 Feb 2021 04:00:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 413FD64F60 for ; Thu, 4 Feb 2021 04:00:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbhBDEAX (ORCPT ); Wed, 3 Feb 2021 23:00:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231608AbhBDD7r (ORCPT ); Wed, 3 Feb 2021 22:59:47 -0500 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 835E7C061786 for ; Wed, 3 Feb 2021 19:59:07 -0800 (PST) Received: by mail-qk1-x735.google.com with SMTP id u20so2134342qku.7 for ; Wed, 03 Feb 2021 19:59:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=C3aX1kDNL1j8DMYscR/OHAqVTlH9xuS3LfmdKgRqZ2Y=; b=MOGl4szdtY9dsEQfjHSx82tni5dDn5sGuNYo1530ln9TlyItSCW9z7Zu9VkHKJhzf0 HZn811QanGHHxz0XkmoO8YE5GbXwXx60Jo1QKZWBPmo4ae39x5lLU9LUaERO5qS84Ii7 gxgJjkSSiRYhQxP88KgH3deyZ7IXVqTfOA32GcWB+F1iXSwkClzMUxrBlZrzE2JNxsG0 5YfQeYV2s2adZOxyrRMYNtahvmThJnQfcrZUQ7A8X7xJDhyormrLX9Bc0imXrRirbzVd QBTNSAMf5D3gDC+6O+HqD79HV5kToCXL1kdMb3LSLVk4Dv8s5aOTBriUseYwA32KXI4P jCFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=C3aX1kDNL1j8DMYscR/OHAqVTlH9xuS3LfmdKgRqZ2Y=; b=OLyPbX6jL+Ke3wcBoZ+DBSxjeGEoVrU2hiPbN0+KVkxUbe6OLcbdWEFZkzSThYCtDG GN+S76URqQcmfT6HFpP+XT6ncoKcYuY7eYYIhUc92ApuRN9SDAzrZ4COX7vbCAKjewSj s375HKmQmCwORZwpJsJWwx364npNNnA+wbYQoQjPE7F183a2DuOF3wZaOoZf/Hpj48Gh V8DqVfbRz7plmiFfuDZRE6XJ95NxlbJvN5pMvU4sT+qfV/oxrkY1xGaug6VPa7gfQoJw c96P0X7I9wXZXrJYsg55Cefnh0mvINIRr3+t4eYIOktKBb30S+n6Ii5O0kBaGECykJ41 36Iw== X-Gm-Message-State: AOAM530mU8kbj5jaeKBhTU2kja1AWiqSIMmZmwntRqr6XJXZ0PkiHT+G wR6zBsYlh/35UQczC1OUGjVGv8lU+v3XZg== X-Google-Smtp-Source: ABdhPJz5Gjk4uFiA5dVlzbF+7z5HN5lKiXrkEzxdBzcpv6wOEzZyRjxCmYXMQR83dQh3AI31jHa/7A== X-Received: by 2002:a37:9d97:: with SMTP id g145mr6124989qke.300.1612411146233; Wed, 03 Feb 2021 19:59:06 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id m2sm4034241qke.117.2021.02.03.19.59.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:05 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:03 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In an upcoming commit, 'git repack' will want to create a pack comprised of all of the objects in some packs (the included packs) excluding any objects in some other packs (the excluded packs). This caller could iterate those packs themselves and feed the objects it finds to 'git pack-objects' directly over stdin, but this approach has a few downsides: - It requires every caller that wants to drive 'git pack-objects' in this way to implement pack iteration themselves. This forces the caller to think about details like what order objects are fed to pack-objects, which callers would likely rather not do. - If the set of objects in included packs is large, it requires sending a lot of data over a pipe, which is inefficient. - The caller is forced to keep track of the excluded objects, too, and make sure that it doesn't send any objects that appear in both included and excluded packs. But the biggest downside is the lack of a reachability traversal. Because the caller passes in a list of objects directly, those objects don't get a namehash assigned to them, which can have a negative impact on the delta selection process, causing 'git pack-objects' to fail to find good deltas even when they exist. The caller could formulate a reachability traversal themselves, but the only way to drive 'git pack-objects' in this way is to do a full traversal, and then remove objects in the excluded packs after the traversal is complete. This can be detrimental to callers who care about performance, especially in repositories with many objects. Introduce 'git pack-objects --stdin-packs' which remedies these four concerns. 'git pack-objects --stdin-packs' expects a list of pack names on stdin, where 'pack-xyz.pack' denotes that pack as included, and '^pack-xyz.pack' denotes it as excluded. The resulting pack includes all objects that are present in at least one included pack, and aren't present in any excluded pack. To address the delta selection problem, 'git pack-objects --stdin-packs' works as follows. First, it assembles a list of objects that it is going to pack, as above. Then, a reachability traversal is started, whose tips are any commits mentioned in included packs. Upon visiting an object, we find its corresponding object_entry in the to_pack list, and set its namehash parameter appropriately. To avoid the traversal visiting more objects than it needs to, the traversal is halted upon encountering an object which can be found in an excluded pack (by marking the excluded packs as kept in-core, and passing --no-kept-objects=in-core to the revision machinery). This can cause the traversal to halt early, for example if an object in an included pack is an ancestor of ones in excluded packs. But stopping early is OK, since filling in the namehash fields of objects in the to_pack list is only additive (i.e., having it helps the delta selection process, but leaving it blank doesn't impact the correctness of the resulting pack). Even still, it is unlikely that this hurts us much in practice, since the 'git repack --geometric' caller (which is introduced in a later commit) marks small packs as included, and large ones as excluded. During ordinary use, the small packs usually represent pushes after a large repack, and so are unlikely to be ancestors of objects that already exist in the repository. (I found it convenient while developing this patch to have 'git pack-objects' report the number of objects which were visited and got their namehash fields filled in during traversal. This is also included in the below patch via trace2 data lines). Suggested-by: Jeff King Signed-off-by: Taylor Blau --- Documentation/git-pack-objects.txt | 10 ++ builtin/pack-objects.c | 176 ++++++++++++++++++++++++++++- t/t5300-pack-object.sh | 97 ++++++++++++++++ 3 files changed, 281 insertions(+), 2 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index 54d715ead1..92733f6bf5 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -85,6 +85,16 @@ base-name:: reference was included in the resulting packfile. This can be useful to send new tags to native Git clients. +--stdin-packs:: + Read the basenames of packfiles from the standard input, instead + of object names or revision arguments. The resulting pack + contains all objects listed in the included packs (those not + beginning with `^`), excluding any objects listed in the + excluded packs (beginning with `^`). ++ +Incompatible with `--revs`, or options that imply `--revs` (such as +`--all`), with the exception of `--unpacked`, which is compatible. + --window=:: --depth=:: These two options affect how the objects contained in diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 13cde5896a..6d19eb000a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2979,6 +2979,164 @@ static int git_pack_config(const char *k, const char *v, void *cb) return git_default_config(k, v, cb); } +static int stdin_packs_found_nr; +static int stdin_packs_hints_nr; + +static int add_object_entry_from_pack(const struct object_id *oid, + struct packed_git *p, + uint32_t pos, + void *_data) +{ + struct rev_info *revs = _data; + struct object_info oi = OBJECT_INFO_INIT; + off_t ofs; + enum object_type type; + + display_progress(progress_state, ++nr_seen); + + ofs = nth_packed_object_offset(p, pos); + + oi.typep = &type; + if (packed_object_info(the_repository, p, ofs, &oi) < 0) + die(_("could not get type of object %s in pack %s"), + oid_to_hex(oid), p->pack_name); + else if (type == OBJ_COMMIT) { + /* + * commits in included packs are used as starting points for the + * subsequent revision walk + */ + add_pending_oid(revs, NULL, oid, 0); + } + + if (have_duplicate_entry(oid, 0)) + return 0; + + if (!want_object_in_pack(oid, 0, &p, &ofs)) + return 0; + + stdin_packs_found_nr++; + + create_object_entry(oid, type, 0, 0, 0, p, ofs); + + return 0; +} + +static void show_commit_pack_hint(struct commit *commit, void *_data) +{ +} + +static void show_object_pack_hint(struct object *object, const char *name, + void *_data) +{ + struct object_entry *oe = packlist_find(&to_pack, &object->oid); + if (!oe) + return; + + /* + * Our 'to_pack' list was constructed by iterating all objects packed in + * included packs, and so doesn't have a non-zero hash field that you + * would typically pick up during a reachability traversal. + * + * Make a best-effort attempt to fill in the ->hash and ->no_try_delta + * here using a now in order to perhaps improve the delta selection + * process. + */ + oe->hash = pack_name_hash(name); + oe->no_try_delta = name && no_try_delta(name); + + stdin_packs_hints_nr++; +} + +static void read_packs_list_from_stdin(void) +{ + struct strbuf buf = STRBUF_INIT; + struct string_list include_packs = STRING_LIST_INIT_DUP; + struct string_list exclude_packs = STRING_LIST_INIT_DUP; + struct string_list_item *item = NULL; + + struct packed_git *p; + struct rev_info revs; + + repo_init_revisions(the_repository, &revs, NULL); + /* + * Use a revision walk to fill in the namehash of objects in the include + * packs. To save time, we'll avoid traversing through objects that are + * in excluded packs. + * + * That may cause us to avoid populating all of the namehash fields of + * all included objects, but our goal is best-effort, since this is only + * an optimization during delta selection. + */ + revs.no_kept_objects = 1; + revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs.blob_objects = 1; + revs.tree_objects = 1; + revs.tag_objects = 1; + + while (strbuf_getline(&buf, stdin) != EOF) { + if (!buf.len) + continue; + + if (*buf.buf == '^') + string_list_append(&exclude_packs, buf.buf + 1); + else + string_list_append(&include_packs, buf.buf); + + strbuf_reset(&buf); + } + + string_list_sort(&include_packs); + string_list_sort(&exclude_packs); + + for (p = get_all_packs(the_repository); p; p = p->next) { + const char *pack_name = pack_basename(p); + + item = string_list_lookup(&include_packs, pack_name); + if (!item) + item = string_list_lookup(&exclude_packs, pack_name); + + if (item) + item->util = p; + } + + /* + * First handle all of the excluded packs, marking them as kept in-core + * so that later calls to add_object_entry() discards any objects that + * are also found in excluded packs. + */ + for_each_string_list_item(item, &exclude_packs) { + struct packed_git *p = item->util; + if (!p) + die(_("could not find pack '%s'"), item->string); + p->pack_keep_in_core = 1; + } + for_each_string_list_item(item, &include_packs) { + struct packed_git *p = item->util; + if (!p) + die(_("could not find pack '%s'"), item->string); + for_each_object_in_pack(p, + add_object_entry_from_pack, + &revs, + FOR_EACH_OBJECT_PACK_ORDER); + } + + if (prepare_revision_walk(&revs)) + die(_("revision walk setup failed")); + traverse_commit_list(&revs, + show_commit_pack_hint, + show_object_pack_hint, + NULL); + + trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found", + stdin_packs_found_nr); + trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints", + stdin_packs_hints_nr); + + strbuf_release(&buf); + string_list_clear(&include_packs, 0); + string_list_clear(&exclude_packs, 0); +} + static void read_object_list_from_stdin(void) { char line[GIT_MAX_HEXSZ + 1 + PATH_MAX + 2]; @@ -3482,6 +3640,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) struct strvec rp = STRVEC_INIT; int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0; int rev_list_index = 0; + int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -3532,6 +3691,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) OPT_SET_INT_F(0, "indexed-objects", &rev_list_index, N_("include objects referred to by the index"), 1, PARSE_OPT_NONEG), + OPT_BOOL(0, "stdin-packs", &stdin_packs, + N_("read packs from stdin")), OPT_BOOL(0, "stdout", &pack_to_stdout, N_("output pack to stdout")), OPT_BOOL(0, "include-tag", &include_tag, @@ -3636,7 +3797,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) use_internal_rev_list = 1; strvec_push(&rp, "--indexed-objects"); } - if (rev_list_unpacked) { + if (rev_list_unpacked && !stdin_packs) { use_internal_rev_list = 1; strvec_push(&rp, "--unpacked"); } @@ -3681,8 +3842,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (filter_options.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); + if (stdin_packs) + die(_("cannot use --filter with --stdin-packs")); } + if (stdin_packs && use_internal_rev_list) + die(_("cannot use internal rev list with --stdin-packs")); + /* * "soft" reasons not to use bitmaps - for on-disk repack by default we want * @@ -3741,7 +3907,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (progress) progress_state = start_progress(_("Enumerating objects"), 0); - if (!use_internal_rev_list) + if (stdin_packs) { + /* avoids adding objects in excluded packs */ + ignore_packed_keep_in_core = 1; + read_packs_list_from_stdin(); + if (rev_list_unpacked) + add_unreachable_loose_objects(); + } else if (!use_internal_rev_list) read_object_list_from_stdin(); else { get_object_list(rp.nr, rp.v); diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 392201cabd..7138a54595 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -532,4 +532,101 @@ test_expect_success 'prefetch objects' ' test_line_count = 1 donelines ' +test_expect_success 'setup for --stdin-packs tests' ' + git init stdin-packs && + ( + cd stdin-packs && + + test_commit A && + test_commit B && + test_commit C && + + for id in A B C + do + git pack-objects .git/objects/pack/pack-$id \ + --incremental --revs <<-EOF + refs/tags/$id + EOF + done && + + ls -la .git/objects/pack + ) +' + +test_expect_success '--stdin-packs with excluded packs' ' + ( + cd stdin-packs && + + PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" && + PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" && + PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" && + + git pack-objects test --stdin-packs <<-EOF && + $PACK_A + ^$PACK_B + $PACK_C + EOF + + ( + git show-index <$(ls .git/objects/pack/pack-A-*.idx) && + git show-index <$(ls .git/objects/pack/pack-C-*.idx) + ) >expect.raw && + git show-index <$(ls test-*.idx) >actual.raw && + + cut -d" " -f2 expect && + cut -d" " -f2 actual && + test_cmp expect actual + ) +' + +test_expect_success '--stdin-packs is incompatible with --filter' ' + ( + cd stdin-packs && + test_must_fail git pack-objects --stdin-packs --stdout \ + --filter=blob:none err && + test_i18ngrep "cannot use --filter with --stdin-packs" err + ) +' + +test_expect_success '--stdin-packs is incompatible with --revs' ' + ( + cd stdin-packs && + test_must_fail git pack-objects --stdin-packs --revs out \ + err && + test_i18ngrep "cannot use internal rev list with --stdin-packs" err + ) +' + +test_expect_success '--stdin-packs with loose objects' ' + ( + cd stdin-packs && + + PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" && + PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" && + PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" && + + test_commit D && # loose + + git pack-objects test2 --stdin-packs --unpacked <<-EOF && + $PACK_A + ^$PACK_B + $PACK_C + EOF + + ( + git show-index <$(ls .git/objects/pack/pack-A-*.idx) && + git show-index <$(ls .git/objects/pack/pack-C-*.idx) && + git rev-list --objects --no-object-names \ + refs/tags/C..refs/tags/D + + ) >expect.raw && + ls -la . && + git show-index <$(ls test2-*.idx) >actual.raw && + + cut -d" " -f2 expect && + cut -d" " -f2 actual && + test_cmp expect actual + ) +' + test_done From patchwork Thu Feb 4 03:59:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB605C433E0 for ; Thu, 4 Feb 2021 04:00:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A733664F6A for ; Thu, 4 Feb 2021 04:00:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232383AbhBDEAA (ORCPT ); Wed, 3 Feb 2021 23:00:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231284AbhBDD7x (ORCPT ); Wed, 3 Feb 2021 22:59:53 -0500 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAD7EC061788 for ; Wed, 3 Feb 2021 19:59:12 -0800 (PST) Received: by mail-qt1-x82f.google.com with SMTP id o18so1488253qtp.10 for ; Wed, 03 Feb 2021 19:59:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=x73HYCTvK80b0qs1/doa9iy8Ofke50OU26ioocYcYBk=; b=QnwYXzffsC6fxiPJXu9Ie4Aybj7QQ4bJtoqQHGqieDLlhAu4J42YTmkEiwe+fDWztA XuiuUeQXTD3Jjj3nPuNs+kJqHsl2F64GKsqR9ggT1FtVKzdby6JhfUzAxIQapqWKuV3y 7DdNqYXaFQZ+i8JCpt3oNm7CNVzby5gRD4Ltz4WhAZUZqbrgPMkES0He2XlW8AaXahqm Yl+3xn1gjr8NeK4/Eb6fME2fnMEQA+r/dkJOcF4/eVUHViQ6sqPGglQJ+YAKonTg8xTn 4RSi8QzfJxFxu5wkFrm4/XQAQbfXHcgIbKX8MrBSiiGnkEqTacHvFWenPN+DoirFR/va F+xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=x73HYCTvK80b0qs1/doa9iy8Ofke50OU26ioocYcYBk=; b=m0Siu82M8+mSPIKNV8XYt0Rf37bj75UDamEHKircn/+p3rp8mGEgaMbNQ9seDjDIhR aIAOKQ37IExGnMtH+yH4by1hsWqGhpZBdGY9BKmMoIr9/3QvcZUt0tAO3FICSbPim2Rz e5696oziV69UT2unzrJdmWh7gIwKZe39w8fg+yR7GlGPwSImtTgK/JtlXcTYxozj/MMx J5PkxzoQVQoaySskILygLlOj8ugMR8wkU7IdEAh1ak9D0YQLet6n4ZtCoNlyseVrDqAE JyAuTY+LIzyqcrXagSXAuu4eMzB/CKL5Us4tIOHS5u6E0yDP0sa9VmV0fb1HaDGCkBIi tUrA== X-Gm-Message-State: AOAM533Bme7/ATwD9oaXBpQFR3DpuO1xq3+NUA0oMsOGrAVm2XdzKbZ0 We0RMjIVJOnh5bUPEAPRLz4D/GjoFSooZA== X-Google-Smtp-Source: ABdhPJz/XpEbjnUvZH0C8b1WN3BWRIRKSZPXcNsV/U8popnENYgwruFM/MZP+HqPk6wdRfBy6zdMgw== X-Received: by 2002:ac8:3683:: with SMTP id a3mr5490318qtc.367.1612411151700; Wed, 03 Feb 2021 19:59:11 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id x134sm3994339qka.1.2021.02.03.19.59.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:11 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:09 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 4/8] p5303: add missing &&-chains Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King These are in a helper function, so the usual chain-lint doesn't notice them. This function is still not perfect, as it has some git invocations on the left-hand-side of the pipe, but it's primary purpose is timing, not finding bugs or correctness issues. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- t/perf/p5303-many-packs.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh index ce0c42cc9f..d90d714923 100755 --- a/t/perf/p5303-many-packs.sh +++ b/t/perf/p5303-many-packs.sh @@ -28,11 +28,11 @@ repack_into_n () { push @commits, $_ if $. % 5 == 1; } print reverse @commits; - ' "$1" >pushes + ' "$1" >pushes && # create base packfile head -n 1 pushes | - git pack-objects --delta-base-offset --revs staging/pack + git pack-objects --delta-base-offset --revs staging/pack && # and then incrementals between each pair of commits last= && From patchwork Thu Feb 4 03:59:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52321C433DB for ; Thu, 4 Feb 2021 04:01:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDC6364F68 for ; Thu, 4 Feb 2021 04:01:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232593AbhBDEA7 (ORCPT ); Wed, 3 Feb 2021 23:00:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232159AbhBDEAa (ORCPT ); Wed, 3 Feb 2021 23:00:30 -0500 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9F44C06178A for ; Wed, 3 Feb 2021 19:59:16 -0800 (PST) Received: by mail-qk1-x736.google.com with SMTP id a19so2172842qka.2 for ; Wed, 03 Feb 2021 19:59:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=p87MUy52mah7gn9Cd/dxh8jj1AQD9AiXxzxugefES70=; b=V4b8C7x1g7vk2UuozJpSP3lKbTzVzcKaIi5iEkpoMmFhMyYGq8YvHHe2nBsZAEZAyQ 97OsgV93OJ8Htjy2z16o+MZOdCV+eS+tid9FQP19+x23H5s9woR0yTG7BoJ4VEjzBWX3 ly1APhMZBx1dzI9TW6IX+yJr5OWwnhpDzLs6PdQeQM/VY+4T60Cz0QiynE6Q2zc5Hyf5 9fMvzmrwHGIixxU+0AQKnQM3Vl0giA52herZyLTSshekWvqxUWgAdlYQS90KorMG4BqG 0E1DTrFFAn4TJCdBnUufXnp8JMQ4OewtU2lD04+c/TjJbTRGBy7YQiUFDtoXAazccTtF orFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=p87MUy52mah7gn9Cd/dxh8jj1AQD9AiXxzxugefES70=; b=ixxaC2qj9gtw45abdFNqmnP7sF1rBfFLQoKCs9arCrnpJKUO5ceGI8kFcpHGHJ/Moa dRa+MnovT0kU6Srg4+TF1wKrMnd2MseROH8X3e7SrXuLNI/OsXOggjXmWRT5ebNwHxK0 BRKqaiU4a7xsRNpJyUU9D8AzazRlsDwgzn8r8cKOyTVS05fFU6fLLUqv5ypUy/QsaNkZ SV5aIdT/cXpN7wWICif5xj7F88FMDHxMjzB4fpd0xa+pgNZLFRPPui2rOTSL0oni9yAj mJOMvXOntQxLhP2modv0diSkFkkMlpEvTN4ZeJD8Pbm/DMkxWncj7Uf1oYWxWyrjNpqT GjOw== X-Gm-Message-State: AOAM531fw1SKOc/wtLfPaIW1REbzGeAhZ1o2LYHPS9EID98qKlewMEMJ KaMjjj0yVuaSjTMlbdgYslM8EfgUtvWV9w== X-Google-Smtp-Source: ABdhPJxrz6U0Wr1fJc6vy11+jLkhJZKj+3afy5eqpCYq+kgb2iSptGMQgINATrcSdZzzBvCL9xy8nA== X-Received: by 2002:ae9:eb95:: with SMTP id b143mr5888637qkg.442.1612411155809; Wed, 03 Feb 2021 19:59:15 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id 17sm4228255qtu.23.2021.02.03.19.59.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:15 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:13 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 5/8] p5303: measure time to repack with keep Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King This is the same as the regular repack test, except that we mark the single base pack as "kept" and use --assume-kept-packs-closed. The theory is that this should be faster than the normal repack, because we'll have fewer objects to traverse and process. Here are some timings on a recent clone of the kernel. In the single-pack case, there is nothing do since there are no non-excluded packs: 5303.5: repack (1) 57.42(54.88+10.64) 5303.6: repack with --stdin-packs (1) 0.01(0.01+0.00) and in the 50-pack case, it is much faster to use `--stdin-packs`, since we avoid having to consider any objects in the excluded pack: 5303.10: repack (50) 71.26(88.24+4.96) 5303.11: repack with --stdin-packs (50) 3.49(11.82+0.28) but our improvements vanish as we approach 1000 packs. 5303.15: repack (1000) 215.64(491.33+14.80) 5303.16: repack with --stdin-packs (1000) 198.79(380.51+7.97) That's because the code paths around handling .keep files are known to scale badly; they look in every single pack file to find each object. Our solution to that was to notice that most repos don't have keep files, and to make that case a fast path. But as soon as you add a single .keep, that part of pack-objects slows down again (even if we have fewer objects total to look at). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- t/perf/p5303-many-packs.sh | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh index d90d714923..b76a6efe00 100755 --- a/t/perf/p5303-many-packs.sh +++ b/t/perf/p5303-many-packs.sh @@ -31,8 +31,11 @@ repack_into_n () { ' "$1" >pushes && # create base packfile - head -n 1 pushes | - git pack-objects --delta-base-offset --revs staging/pack && + base_pack=$( + head -n 1 pushes | + git pack-objects --delta-base-offset --revs staging/pack + ) && + test_export base_pack && # and then incrementals between each pair of commits last= && @@ -49,6 +52,12 @@ repack_into_n () { last=$rev done stdin.packs + # and install the whole thing rm -f .git/objects/pack/* && mv staging/* .git/objects/pack/ @@ -91,6 +100,15 @@ do --reflog --indexed-objects --delta-base-offset \ --stdout /dev/null ' + + test_perf "repack with --stdin-packs ($nr_packs)" ' + git pack-objects \ + --keep-true-parents \ + --stdin-packs \ + --non-empty \ + --delta-base-offset \ + --stdout /dev/null + ' done # Measure pack loading with 10,000 packs. From patchwork Thu Feb 4 03:59:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB1FC433DB for ; Thu, 4 Feb 2021 04:02:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB38A64DA5 for ; Thu, 4 Feb 2021 04:02:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232151AbhBDEB1 (ORCPT ); Wed, 3 Feb 2021 23:01:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232372AbhBDEAa (ORCPT ); Wed, 3 Feb 2021 23:00:30 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 022C4C06178B for ; Wed, 3 Feb 2021 19:59:21 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id e15so1497877qte.9 for ; Wed, 03 Feb 2021 19:59:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kltz0CfZM+ZPDVqad9MFV27j48nFy4kC61jUEBQm07Y=; b=UODXQdrn/+PI+kK2U4QGIhoEPUHGqu5Kmbs9L01QqaFByAqc80GxG3IMkdQ1pgtSm3 a5kEldzm5aB5ovDjrCOPybH9UNh+FdgAWsmh2kPrf46x8qqJkPpT1YwkTX+P/6a7JZJG Tydk1BdBjqrbwdfJAOUcAFYo64tU0Vtq+Tz9qLGOZkShHzVZpI1GWlBgCWzEPEviCKzw pPkwJeCh6Ka+5kx5j91qgN7R800p8M2Mugj7X7a7IvSaIlTZsmZoyvEe+7/l7P8hLRmr Ci60k44OIgfsLZ/RlOO/E7SrAZrhjoYsh0da6/8LJ4LeAQ/Yt2Tt/xXEvP0oXNIaHSc3 uA0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kltz0CfZM+ZPDVqad9MFV27j48nFy4kC61jUEBQm07Y=; b=RFt9aaLhm+iaazApGNX9FEPzlBG7/wDgM76EhY0b0PHKlYHVQ1R9EK17sLtFWedu/Y jNdmkvqWAT8yAyw5MGDJWHIsdOQPDML7Xj4O6JZQGvBKg4iZ9OnZ7HKKEmIqw2migTvE 0W+BExMGVa3gW+8crHMCYEJ9ybs1EtyjoS5F33Gs0M9KM+Og/3mP12sFjTiXLrXn+1J7 S5QtM/7fCtZgJb/hvmfyhfHPxBTPuNOLWnobrVVY//NwdO9FL9BLdDBRwIgQ+cjoVEjW HN4HQ+JEGLtXvulgSGSKgVYrPp+y4GpKzd02ip9+4I9Q86Qts8oNpn90xg59T3upbGzQ /sFg== X-Gm-Message-State: AOAM530WnQkwbtkxTlcbg+5A688r3XEhxbiWESRFkjHFS+UtxD5k7dMV zyXs/qb/RaIt5pIpmeSrShL5sBkxzpLGPQ== X-Google-Smtp-Source: ABdhPJy6LcSfVdNLJaKUUCAZqGAwlKr7lobZeWosBGhsIE/BAmgvpsqGuwO+E0xo8H36JXxWAMQuXA== X-Received: by 2002:ac8:598e:: with SMTP id e14mr5753857qte.346.1612411159945; Wed, 03 Feb 2021 19:59:19 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id m64sm4063119qkb.90.2021.02.03.19.59.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:19 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:17 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King Now that we have find_kept_pack_entry(), we don't have to manually keep hunting through every pack to find a possible "kept" duplicate of the object. This should be faster, assuming only a portion of your total packs are actually kept. Note that we have to re-order the logic a bit here; we can deal with the "kept" situation completely, and then just fall back to the "--local" question. It might be worth having a similar optimized function to look at only local packs. Here are the results from p5303 (measurements again taken on the kernel): Test HEAD^ HEAD ----------------------------------------------------------------------------------------------- 5303.5: repack (1) 57.42(54.88+10.64) 57.44(54.71+10.78) +0.0% 5303.6: repack with --stdin-packs (1) 0.01(0.01+0.00) 0.01(0.00+0.01) +0.0% 5303.10: repack (50) 71.26(88.24+4.96) 71.32(88.38+4.90) +0.1% 5303.11: repack with --stdin-packs (50) 3.49(11.82+0.28) 3.43(11.81+0.22) -1.7% 5303.15: repack (1000) 215.64(491.33+14.80) 215.59(493.75+14.62) -0.0% 5303.16: repack with --stdin-packs (1000) 198.79(380.51+7.97) 131.44(314.24+8.11) -33.9% So our --stdin-packs case with many packs is now finally faster than the non-keep case (because it gets the speed benefit of looking at fewer objects, but not as big a penalty for looking at many packs). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 125 ++++++++++++++++++++++++----------------- 1 file changed, 73 insertions(+), 52 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6d19eb000a..fbd7b54d70 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1188,7 +1188,8 @@ static int have_duplicate_entry(const struct object_id *oid, return 1; } -static int want_found_object(int exclude, struct packed_git *p) +static int want_found_object(const struct object_id *oid, int exclude, + struct packed_git *p) { if (exclude) return 1; @@ -1209,22 +1210,73 @@ static int want_found_object(int exclude, struct packed_git *p) * Otherwise, we signal "-1" at the end to tell the caller that we do * not know either way, and it needs to check more packs. */ - if (!ignore_packed_keep_on_disk && - !ignore_packed_keep_in_core && - (!local || !have_non_local_packs)) + + /* + * Handle .keep first, as we have a fast(er) path there. + */ + if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) { + /* + * Set the flags for the kept-pack cache to be the ones we want + * to ignore. + * + * That is, if we are ignoring objects in on-disk keep packs, + * then we want to search through the on-disk keep and ignore + * the in-core ones. + */ + unsigned flags = 0; + if (ignore_packed_keep_on_disk) + flags |= ON_DISK_KEEP_PACKS; + if (ignore_packed_keep_in_core) + flags |= IN_CORE_KEEP_PACKS; + + if (ignore_packed_keep_on_disk && p->pack_keep) + return 0; + if (ignore_packed_keep_in_core && p->pack_keep_in_core) + return 0; + if (has_object_kept_pack(oid, flags)) + return 0; + } + + /* + * At this point we know definitively that either we don't care about + * keep-packs, or the object is not in one. Keep checking other + * conditions... + */ + + if (!local || !have_non_local_packs) return 1; - if (local && !p->pack_local) return 0; - if (p->pack_local && - ((ignore_packed_keep_on_disk && p->pack_keep) || - (ignore_packed_keep_in_core && p->pack_keep_in_core))) - return 0; /* we don't know yet; keep looking for more packs */ return -1; } +static int want_object_in_pack_one(struct packed_git *p, + const struct object_id *oid, + int exclude, + struct packed_git **found_pack, + off_t *found_offset) +{ + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(oid->hash, p); + + if (offset) { + if (!*found_pack) { + if (!is_pack_valid(p)) + return -1; + *found_offset = offset; + *found_pack = p; + } + return want_found_object(oid, exclude, p); + } + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). @@ -1252,7 +1304,7 @@ static int want_object_in_pack(const struct object_id *oid, * are present we will determine the answer right now. */ if (*found_pack) { - want = want_found_object(exclude, *found_pack); + want = want_found_object(oid, exclude, *found_pack); if (want != -1) return want; } @@ -1260,53 +1312,22 @@ static int want_object_in_pack(const struct object_id *oid, for (m = get_multi_pack_index(the_repository); m; m = m->next) { struct pack_entry e; if (fill_midx_entry(the_repository, oid, &e, m)) { - struct packed_git *p = e.p; - off_t offset; - - if (p == *found_pack) - offset = *found_offset; - else - offset = find_pack_entry_one(oid->hash, p); - - if (offset) { - if (!*found_pack) { - if (!is_pack_valid(p)) - continue; - *found_offset = offset; - *found_pack = p; - } - want = want_found_object(exclude, p); - if (want != -1) - return want; - } - } - } - - list_for_each(pos, get_packed_git_mru(the_repository)) { - struct packed_git *p = list_entry(pos, struct packed_git, mru); - off_t offset; - - if (p == *found_pack) - offset = *found_offset; - else - offset = find_pack_entry_one(oid->hash, p); - - if (offset) { - if (!*found_pack) { - if (!is_pack_valid(p)) - continue; - *found_offset = offset; - *found_pack = p; - } - want = want_found_object(exclude, p); - if (!exclude && want > 0) - list_move(&p->mru, - get_packed_git_mru(the_repository)); + want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset); if (want != -1) return want; } } + list_for_each(pos, get_packed_git_mru(the_repository)) { + struct packed_git *p = list_entry(pos, struct packed_git, mru); + want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset); + if (!exclude && want > 0) + list_move(&p->mru, + get_packed_git_mru(the_repository)); + if (want != -1) + return want; + } + if (uri_protocols.nr) { struct configured_exclusion *ex = oidmap_get(&configured_exclusions, oid); From patchwork Thu Feb 4 03:59:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 816D1C433E6 for ; Thu, 4 Feb 2021 04:01:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 42B1464E31 for ; Thu, 4 Feb 2021 04:01:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232699AbhBDEBF (ORCPT ); Wed, 3 Feb 2021 23:01:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232483AbhBDEAf (ORCPT ); Wed, 3 Feb 2021 23:00:35 -0500 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E725EC06178C for ; Wed, 3 Feb 2021 19:59:24 -0800 (PST) Received: by mail-qk1-x735.google.com with SMTP id v126so2096577qkd.11 for ; Wed, 03 Feb 2021 19:59:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=I+sQdE5BlUObmq5V9rJQa+VriLSwvSTanLLOnHF0c24=; b=a5Y0yJ1CJBR1C9c3VXWpXYBu2LgzETeA5S84yNbPCBoLXRGyrDYMdHSSIAJfF4H/2O B4G1qTeHOChkAuwStR8mktbOpLv7b0cZancq+Si0a0FBoUKSnpdHoohhWZEpxVrfbO+y r+7v/i3ahwEu3Dk84cfeIlk0Dr38lpQgYRz0qyIxrTU7I5eUElKIt9Eg0E7Mfl6bgOcx z62fvQ+PeUoYzj7hX6HxcwIkm2s7dy98qP9t+nXBc4VDb0d6xyHm09VyMffCf7eWSSQu Vv1oH1mPowg9VmarI7egc6OEU03nuq40ZVBrxUB3OAUBG7hsLn/irA9rmonojpaAU8qQ LPNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=I+sQdE5BlUObmq5V9rJQa+VriLSwvSTanLLOnHF0c24=; b=hZ5ll1rZAPbOJmx3V7I8cPKMWdyJHC84t1LLr3rS8Lftcja6FlPELmZs7DYu/bFZPG SfbVWBTzKidPKkAOhnAHOYM6fFSRMsoB2YBqWpVIgwfFe1DZLzxnhl0U6fytpLAU/6Df ghgwdqE2ut8IhYcKm7MKtpP7JThXJFkHbvGGoikuefqEX507WZxAzB/vEE/NTNKxwFrT T/xKPHqkbivNizgtQ34n17elytQPBldUdZ+lqvssiexR8gznNMShTycO8bWIZZ8qVH8o YWI5ObKgOJ1ZoxA9ZL2rxd5mi2p0HCDMYgUanbnfBkum7pLgzDZZfd7IgFOyk/e7iZmv N84g== X-Gm-Message-State: AOAM5330EViBcKs3E9s/xbdyF4cSJ5wGwpUf8OEw+bcPAz9uAk07WNgt 0OJnCsYI5Zee93aAL7Z1xsct4GaTeIUvcw== X-Google-Smtp-Source: ABdhPJyy4SsCue451xsZCxSwcbpIngHN+dk4KJAkP1nY9sC4OYfB+UYx6uOLdvDVPtZ6atu9P0huDw== X-Received: by 2002:ae9:ed52:: with SMTP id c79mr5945646qkg.352.1612411163827; Wed, 03 Feb 2021 19:59:23 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id a203sm4116891qkb.31.2021.02.03.19.59.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:23 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:21 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King In a recent patch we added a function 'find_kept_pack_entry()' to look for an object only among kept packs. While this function avoids doing any lookup work in non-kept packs, it is still linear in the number of packs, since we have to traverse the linked list of packs once per object. Let's cache a reduced version of that list to save us time. Note that this cache will last the lifetime of the program. We could invalidate it on reprepare_packed_git(), but there's not much point in being rigorous here: - we might already fail to notice new .keep packs showing up after the program starts. We only reprepare_packed_git() when we fail to find an object. But adding a new pack won't cause that to happen. Somebody repacking could add a new pack and delete an old one, but most of the time we'd have a descriptor or mmap open to the old pack anyway, so we might not even notice. - in pack-objects we already cache the .keep state at startup, since 56dfeb6263 (pack-objects: compute local/ignore_pack_keep early, 2016-07-29). So this is just extending that concept further. - we don't have to worry about any packed_git being removed; we always keep the old structs around, even after reprepare_packed_git() Here are p5303 results (as always, measured against the kernel): Test HEAD^ HEAD ---------------------------------------------------------------------------------------------- 5303.5: repack (1) 57.44(54.71+10.78) 57.06(54.29+10.96) -0.7% 5303.6: repack with --stdin-packs (1) 0.01(0.00+0.01) 0.01(0.01+0.00) +0.0% 5303.10: repack (50) 71.32(88.38+4.90) 71.47(88.60+5.04) +0.2% 5303.11: repack with --stdin-packs (50) 3.43(11.81+0.22) 3.49(12.21+0.26) +1.7% 5303.15: repack (1000) 215.59(493.75+14.62) 217.41(495.36+14.85) +0.8% 5303.16: repack with --stdin-packs (1000) 131.44(314.24+8.11) 126.75(309.88+8.09) -3.6% Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 6 +-- object-store.h | 10 ++++ packfile.c | 103 +++++++++++++++++++++++------------------ packfile.h | 4 -- revision.c | 8 ++-- 5 files changed, 76 insertions(+), 55 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index fbd7b54d70..b2ba5aa14f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude, */ unsigned flags = 0; if (ignore_packed_keep_on_disk) - flags |= ON_DISK_KEEP_PACKS; + flags |= CACHE_ON_DISK_KEEP_PACKS; if (ignore_packed_keep_in_core) - flags |= IN_CORE_KEEP_PACKS; + flags |= CACHE_IN_CORE_KEEP_PACKS; if (ignore_packed_keep_on_disk && p->pack_keep) return 0; @@ -3089,7 +3089,7 @@ static void read_packs_list_from_stdin(void) * an optimization during delta selection. */ revs.no_kept_objects = 1; - revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs.keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS; revs.blob_objects = 1; revs.tree_objects = 1; revs.tag_objects = 1; diff --git a/object-store.h b/object-store.h index c4fc9dd74e..4cbe8eae3c 100644 --- a/object-store.h +++ b/object-store.h @@ -105,6 +105,14 @@ static inline int pack_map_entry_cmp(const void *unused_cmp_data, return strcmp(pg1->pack_name, key ? key : pg2->pack_name); } +#define CACHE_ON_DISK_KEEP_PACKS 1 +#define CACHE_IN_CORE_KEEP_PACKS 2 + +struct kept_pack_cache { + struct packed_git **packs; + unsigned flags; +}; + struct raw_object_store { /* * Set of all object directories; the main directory is first (and @@ -150,6 +158,8 @@ struct raw_object_store { /* A most-recently-used ordered version of the packed_git list. */ struct list_head packed_git_mru; + struct kept_pack_cache *kept_pack_cache; + /* * A map of packfiles to packed_git structs for tracking which * packs have been loaded already. diff --git a/packfile.c b/packfile.c index 5f35cfe788..2a139c907b 100644 --- a/packfile.c +++ b/packfile.c @@ -2031,10 +2031,7 @@ static int fill_pack_entry(const struct object_id *oid, return 1; } -static int find_one_pack_entry(struct repository *r, - const struct object_id *oid, - struct pack_entry *e, - int kept_only) +int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) { struct list_head *pos; struct multi_pack_index *m; @@ -2044,49 +2041,64 @@ static int find_one_pack_entry(struct repository *r, return 0; for (m = r->objects->multi_pack_index; m; m = m->next) { - if (!fill_midx_entry(r, oid, e, m)) - continue; - - if (!kept_only) - return 1; - - if (((kept_only & ON_DISK_KEEP_PACKS) && e->p->pack_keep) || - ((kept_only & IN_CORE_KEEP_PACKS) && e->p->pack_keep_in_core)) + if (fill_midx_entry(r, oid, e, m)) return 1; } list_for_each(pos, &r->objects->packed_git_mru) { struct packed_git *p = list_entry(pos, struct packed_git, mru); - if (p->multi_pack_index && !kept_only) { - /* - * If this pack is covered by the MIDX, we'd have found - * the object already in the loop above if it was here, - * so don't bother looking. - * - * The exception is if we are looking only at kept - * packs. An object can be present in two packs covered - * by the MIDX, one kept and one not-kept. And as the - * MIDX points to only one copy of each object, it might - * have returned only the non-kept version above. We - * have to check again to be thorough. - */ - continue; - } - if (!kept_only || - (((kept_only & ON_DISK_KEEP_PACKS) && p->pack_keep) || - ((kept_only & IN_CORE_KEEP_PACKS) && p->pack_keep_in_core))) { - if (fill_pack_entry(oid, e, p)) { - list_move(&p->mru, &r->objects->packed_git_mru); - return 1; - } + if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { + list_move(&p->mru, &r->objects->packed_git_mru); + return 1; } } return 0; } -int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) +static void maybe_invalidate_kept_pack_cache(struct repository *r, + unsigned flags) { - return find_one_pack_entry(r, oid, e, 0); + if (!r->objects->kept_pack_cache) + return; + if (r->objects->kept_pack_cache->flags == flags) + return; + free(r->objects->kept_pack_cache->packs); + FREE_AND_NULL(r->objects->kept_pack_cache); +} + +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags) +{ + maybe_invalidate_kept_pack_cache(r, flags); + + if (!r->objects->kept_pack_cache) { + struct packed_git **packs = NULL; + size_t nr = 0, alloc = 0; + struct packed_git *p; + + /* + * We want "all" packs here, because we need to cover ones that + * are used by a midx, as well. We need to look in every one of + * them (instead of the midx itself) to cover duplicates. It's + * possible that an object is found in two packs that the midx + * covers, one kept and one not kept, but the midx returns only + * the non-kept version. + */ + for (p = get_all_packs(r); p; p = p->next) { + if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) || + (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) { + ALLOC_GROW(packs, nr + 1, alloc); + packs[nr++] = p; + } + } + ALLOC_GROW(packs, nr + 1, alloc); + packs[nr] = NULL; + + r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache)); + r->objects->kept_pack_cache->packs = packs; + r->objects->kept_pack_cache->flags = flags; + } + + return r->objects->kept_pack_cache->packs; } int find_kept_pack_entry(struct repository *r, @@ -2094,13 +2106,15 @@ int find_kept_pack_entry(struct repository *r, unsigned flags, struct pack_entry *e) { - /* - * Load all packs, including midx packs, since our "kept" strategy - * relies on that. We're relying on the side effect of it setting up - * r->objects->packed_git, which is a little ugly. - */ - get_all_packs(r); - return find_one_pack_entry(r, oid, e, flags); + struct packed_git **cache; + + for (cache = kept_pack_cache(r, flags); *cache; cache++) { + struct packed_git *p = *cache; + if (fill_pack_entry(oid, e, p)) + return 1; + } + + return 0; } int has_object_pack(const struct object_id *oid) @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid) return find_pack_entry(the_repository, oid, &e); } -int has_object_kept_pack(const struct object_id *oid, unsigned flags) +int has_object_kept_pack(const struct object_id *oid, + unsigned flags) { struct pack_entry e; return find_kept_pack_entry(the_repository, oid, flags, &e); diff --git a/packfile.h b/packfile.h index 624327f64d..eb56db2a7b 100644 --- a/packfile.h +++ b/packfile.h @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r, void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1); -#define ON_DISK_KEEP_PACKS 1 -#define IN_CORE_KEEP_PACKS 2 -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS) - /* * Iff a pack file in the given repository contains the object named by sha1, * return true and store its location to e. diff --git a/revision.c b/revision.c index 4c5adb90b1..41c0478705 100644 --- a/revision.c +++ b/revision.c @@ -2338,14 +2338,14 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg die(_("--unpacked= no longer supported")); } else if (!strcmp(arg, "--no-kept-objects")) { revs->no_kept_objects = 1; - revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; - revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; + revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS; + revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS; } else if (skip_prefix(arg, "--no-kept-objects=", &optarg)) { revs->no_kept_objects = 1; if (!strcmp(optarg, "in-core")) - revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS; if (!strcmp(optarg, "on-disk")) - revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; + revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS; } else if (!strcmp(arg, "-r")) { revs->diff = 1; revs->diffopt.flags.recursive = 1; From patchwork Thu Feb 4 03:59:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12066269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E34FC433E0 for ; Thu, 4 Feb 2021 04:01:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4396364DA5 for ; Thu, 4 Feb 2021 04:01:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231395AbhBDEBV (ORCPT ); Wed, 3 Feb 2021 23:01:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232507AbhBDEAf (ORCPT ); Wed, 3 Feb 2021 23:00:35 -0500 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0A2DC061793 for ; Wed, 3 Feb 2021 19:59:29 -0800 (PST) Received: by mail-qt1-x829.google.com with SMTP id t17so1529878qtq.2 for ; Wed, 03 Feb 2021 19:59:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=isgUzkwVBn7fcHmqSZBNmmGQSAIIYRsCgJMj8nB3IjA=; b=f1Yl+iyL4WgQZ15m18g7VcfKJzbdwKlmE6IYI7sWHf8JNnkumm6vcFFyfgV4a5XcVM JHKiDuvpmJ5ekqIITxyX1KCK1/zU69P4HHm1W7kbp21dCaufQuHPv5OUuqK7IOgpgxZG 6DKQga1BBD/9YA6nsvCAp+XyD3254C+y77HFb2TQgogAQBCoZwgTkUSEKa/9T16Kl7An 2t7hzFd0y0I582PoqvdSFjI7v4Ido6ugnYA+WLZJ5rc+xiyBErtQv2GYbh+iaMAgMBPK 5O5JxJ/3MPxJHfirkNVnL2wMs5WyhkmNn8/ulQQGcFm8qOKMcf0aKLFHeBf2U67T/Qbb D4VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=isgUzkwVBn7fcHmqSZBNmmGQSAIIYRsCgJMj8nB3IjA=; b=ZT403EBCOm8/DuOvGfoMzMn3GqbcIrzAXU6Nlj8DLoqPUxwmmOz7eVefIoxgMI/9VE DL+AEthrcupLn1Spc6fMQiff44qvBgJFZIfFDPnCPvaCaPjulTFvypH2ULKyTUoeVIeH UmSkfLoSjMpsIxOJdNWBci2WDKxbkntItPRH6cpKa6Ix3DxC8fzXaJmJx3+OQcSmhvKU nONAtaTdbdq3tUoazW/RCiFEQ+vbpad/LFSrXYfqFbMSSdoyXwZIiODv1A8Y02CS9bTV 0aLYKI2FsKPzlGbW7MxuHFfWaIX2jF/HVZvgJc4S0d1ZDAvtI43hHOp4tlXKvkNqaB7P EYew== X-Gm-Message-State: AOAM533eIqc7pv5hbg193Zxo+/k63o93svE+5qAOrDjnWW0kaLuYmMy4 W/yer4PCVT/rPivSbJ2IsxkPmv3OntbZPg== X-Google-Smtp-Source: ABdhPJy28Kokn37Oxc4Y73z9DM74AhJFPWQINGnzWKnIHopmw/cfWilYMwAh8Ux5y569XfK/NqsMsA== X-Received: by 2002:ac8:6607:: with SMTP id c7mr5561939qtp.341.1612411168557; Wed, 03 Feb 2021 19:59:28 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:3a5f:649:7bf7:4ac8]) by smtp.gmail.com with ESMTPSA id o88sm3421730qtd.79.2021.02.03.19.59.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Feb 2021 19:59:27 -0800 (PST) Date: Wed, 3 Feb 2021 22:59:25 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Often it is useful to both: - have relatively few packfiles in a repository, and - avoid having so few packfiles in a repository that we repack its entire contents regularly This patch implements a '--geometric=' option in 'git repack'. This allows the caller to specify that they would like each pack to be at least a factor times as large as the previous largest pack (by object count). Concretely, say that a repository has 'n' packfiles, labeled P1, P2, ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'. With a geometric factor of 'r', it should be that: objects(Pi) > r*objects(P(i-1)) for all i in [1, n], where the packs are sorted by objects(P1) <= objects(P2) <= ... <= objects(Pn). Since finding a true optimal repacking is NP-hard, we approximate it along two directions: 1. We assume that there is a cutoff of packs _before starting the repack_ where everything to the right of that cut-off already forms a geometric progression (or no cutoff exists and everything must be repacked). 2. We assume that everything smaller than the cutoff count must be repacked. This forms our base assumption, but it can also cause even the "heavy" packs to get repacked, for e.g., if we have 6 packs containing the following number of objects: 1, 1, 1, 2, 4, 32 then we would place the cutoff between '1, 1' and '1, 2, 4, 32', rolling up the first two packs into a pack with 2 objects. That breaks our progression and leaves us: 2, 1, 2, 4, 32 ^ (where the '^' indicates the position of our split). To restore a progression, we move the split forward (towards larger packs) joining each pack into our new pack until a geometric progression is restored. Here, that looks like: 2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32 ^ ^ ^ ^ This has the advantage of not repacking the heavy-side of packs too often while also only creating one new pack at a time. Another wrinkle is that we assume that loose, indexed, and reflog'd objects are insignificant, and lump them into any new pack that we create. This can lead to non-idempotent results. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/git-repack.txt | 11 +++ builtin/repack.c | 187 ++++++++++++++++++++++++++++++++++- t/t7703-repack-geometric.sh | 137 +++++++++++++++++++++++++ 3 files changed, 331 insertions(+), 4 deletions(-) create mode 100755 t/t7703-repack-geometric.sh diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 92f146d27d..b1ffcfd974 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -165,6 +165,17 @@ depth is 4095. Pass the `--delta-islands` option to `git-pack-objects`, see linkgit:git-pack-objects[1]. +-g=:: +--geometric=:: + Arrange resulting pack structure so that each successive pack + contains at least `` times the number of objects as the + next-largest pack. ++ +`git repack` ensures this by determining a "cut" of packfiles that need to be +repacked into one in order to ensure a geometric progression. It picks the +smallest set of packfiles such that as many of the larger packfiles (by count of +objects contained in that pack) may be left intact. + Configuration ------------- diff --git a/builtin/repack.c b/builtin/repack.c index 2158b48f4c..b4e0e69661 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -296,6 +296,124 @@ static void repack_promisor_objects(const struct pack_objects_args *args, #define ALL_INTO_ONE 1 #define LOOSEN_UNREACHABLE 2 +struct pack_geometry { + struct packed_git **pack; + uint32_t pack_nr, pack_alloc; + uint32_t split; +}; + +static uint32_t geometry_pack_weight(struct packed_git *p) +{ + if (open_pack_index(p)) + die(_("cannot open index for %s"), p->pack_name); + return p->num_objects; +} + +static int geometry_cmp(const void *va, const void *vb) +{ + uint32_t aw = geometry_pack_weight(*(struct packed_git **)va), + bw = geometry_pack_weight(*(struct packed_git **)vb); + + if (aw < bw) + return -1; + if (aw > bw) + return 1; + return 0; +} + +static void init_pack_geometry(struct pack_geometry **geometry_p) +{ + struct packed_git *p; + struct pack_geometry *geometry; + + *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); + geometry = *geometry_p; + + for (p = get_all_packs(the_repository); p; p = p->next) { + if (!pack_kept_objects && p->pack_keep) + continue; + + ALLOC_GROW(geometry->pack, + geometry->pack_nr + 1, + geometry->pack_alloc); + + geometry->pack[geometry->pack_nr] = p; + geometry->pack_nr++; + } + + QSORT(geometry->pack, geometry->pack_nr, geometry_cmp); +} + +static void split_pack_geometry(struct pack_geometry *geometry, int factor) +{ + uint32_t i; + uint32_t split; + off_t total_size = 0; + + if (geometry->pack_nr <= 1) { + geometry->split = geometry->pack_nr; + return; + } + + split = geometry->pack_nr - 1; + + /* + * First, count the number of packs (in descending order of size) which + * already form a geometric progression. + */ + for (i = geometry->pack_nr - 1; i > 0; i--) { + struct packed_git *ours = geometry->pack[i]; + struct packed_git *prev = geometry->pack[i - 1]; + if (geometry_pack_weight(ours) >= factor * geometry_pack_weight(prev)) + split--; + else + break; + } + + if (split) { + /* + * Move the split one to the right, since the top element in the + * last-compared pair can't be in the progression. Only do this + * when we split in the middle of the array (otherwise if we got + * to the end, then the split is in the right place). + */ + split++; + } + + /* + * Then, anything to the left of 'split' must be in a new pack. But, + * creating that new pack may cause packs in the heavy half to no longer + * form a geometric progression. + * + * Compute an expected size of the new pack, and then determine how many + * packs in the heavy half need to be joined into it (if any) to restore + * the geometric progression. + */ + for (i = 0; i < split; i++) + total_size += geometry_pack_weight(geometry->pack[i]); + for (i = split; i < geometry->pack_nr; i++) { + struct packed_git *ours = geometry->pack[i]; + if (geometry_pack_weight(ours) < factor * total_size) { + split++; + total_size += geometry_pack_weight(ours); + } else + break; + } + + geometry->split = split; +} + +static void clear_pack_geometry(struct pack_geometry *geometry) +{ + if (!geometry) + return; + + free(geometry->pack); + geometry->pack_nr = 0; + geometry->pack_alloc = 0; + geometry->split = 0; +} + int cmd_repack(int argc, const char **argv, const char *prefix) { struct child_process cmd = CHILD_PROCESS_INIT; @@ -303,6 +421,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list names = STRING_LIST_INIT_DUP; struct string_list rollback = STRING_LIST_INIT_NODUP; struct string_list existing_packs = STRING_LIST_INIT_DUP; + struct pack_geometry *geometry = NULL; struct strbuf line = STRBUF_INIT; int i, ext, ret; FILE *out; @@ -315,6 +434,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; int no_update_server_info = 0; struct pack_objects_args po_args = {NULL}; + int geometric_factor = 0; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -355,6 +475,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("repack objects in packs marked with .keep")), OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), N_("do not repack this pack")), + OPT_INTEGER('g', "geometric", &geometric_factor, + N_("find a geometric progression with factor ")), OPT_END() }; @@ -381,6 +503,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (write_bitmaps && !(pack_everything & ALL_INTO_ONE)) die(_(incremental_bitmap_conflict_error)); + if (geometric_factor) { + if (pack_everything) + die(_("--geometric is incompatible with -A, -a")); + init_pack_geometry(&geometry); + split_pack_geometry(geometry, geometric_factor); + } + packdir = mkpathdup("%s/pack", get_object_directory()); packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid()); @@ -395,9 +524,19 @@ int cmd_repack(int argc, const char **argv, const char *prefix) strvec_pushf(&cmd.args, "--keep-pack=%s", keep_pack_list.items[i].string); strvec_push(&cmd.args, "--non-empty"); - strvec_push(&cmd.args, "--all"); - strvec_push(&cmd.args, "--reflog"); - strvec_push(&cmd.args, "--indexed-objects"); + if (!geometry) { + /* + * 'git pack-objects' will up all objects loose or packed + * (either rolling them up or leaving them alone), so don't pass + * these options. + * + * The implementation of 'git pack-objects --stdin-packs' + * makes them redundant (and the two are incompatible). + */ + strvec_push(&cmd.args, "--all"); + strvec_push(&cmd.args, "--reflog"); + strvec_push(&cmd.args, "--indexed-objects"); + } if (has_promisor_remote()) strvec_push(&cmd.args, "--exclude-promisor-objects"); if (write_bitmaps > 0) @@ -428,17 +567,37 @@ int cmd_repack(int argc, const char **argv, const char *prefix) strvec_push(&cmd.env_array, "GIT_REF_PARANOIA=1"); } } + } else if (geometry) { + strvec_push(&cmd.args, "--stdin-packs"); + strvec_push(&cmd.args, "--unpacked"); } else { strvec_push(&cmd.args, "--unpacked"); strvec_push(&cmd.args, "--incremental"); } - cmd.no_stdin = 1; + if (geometry) + cmd.in = -1; + else + cmd.no_stdin = 1; ret = start_command(&cmd); if (ret) return ret; + if (geometry) { + FILE *in = xfdopen(cmd.in, "w"); + /* + * The resulting pack should contain all objects in packs that + * are going to be rolled up, but exclude objects in packs which + * are being left alone. + */ + for (i = 0; i < geometry->split; i++) + fprintf(in, "%s\n", pack_basename(geometry->pack[i])); + for (i = geometry->split; i < geometry->pack_nr; i++) + fprintf(in, "^%s\n", pack_basename(geometry->pack[i])); + fclose(in); + } + out = xfdopen(cmd.out, "r"); while (strbuf_getline_lf(&line, out) != EOF) { if (line.len != the_hash_algo->hexsz) @@ -506,6 +665,25 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (!string_list_has_string(&names, sha1)) remove_redundant_pack(packdir, item->string); } + + if (geometry) { + struct strbuf buf = STRBUF_INIT; + + uint32_t i; + for (i = 0; i < geometry->split; i++) { + struct packed_git *p = geometry->pack[i]; + if (string_list_has_string(&names, + hash_to_hex(p->hash))) + continue; + + strbuf_reset(&buf); + strbuf_addstr(&buf, pack_basename(p)); + strbuf_strip_suffix(&buf, ".pack"); + + remove_redundant_pack(packdir, buf.buf); + } + strbuf_release(&buf); + } if (!po_args.quiet && isatty(2)) opts |= PRUNE_PACKED_VERBOSE; prune_packed_objects(opts); @@ -527,6 +705,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) string_list_clear(&names, 0); string_list_clear(&rollback, 0); string_list_clear(&existing_packs, 0); + clear_pack_geometry(geometry); strbuf_release(&line); return 0; diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh new file mode 100755 index 0000000000..96917fc163 --- /dev/null +++ b/t/t7703-repack-geometric.sh @@ -0,0 +1,137 @@ +#!/bin/sh + +test_description='git repack --geometric works correctly' + +. ./test-lib.sh + +GIT_TEST_MULTI_PACK_INDEX=0 + +objdir=.git/objects +midx=$objdir/pack/multi-pack-index + +test_expect_success '--geometric with no packs' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + git repack --geometric 2 >out && + test_i18ngrep "Nothing new to pack" out + ) +' + +test_expect_success '--geometric with an intact progression' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + # These packs already form a geometric progression. + test_commit_bulk --start=1 1 && # 3 objects + test_commit_bulk --start=2 2 && # 6 objects + test_commit_bulk --start=4 4 && # 12 objects + + find $objdir/pack -name "*.pack" | sort >expect && + git repack --geometric 2 -d && + find $objdir/pack -name "*.pack" | sort >actual && + + test_cmp expect actual + ) +' + +test_expect_success '--geometric with small-pack rollup' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + test_commit_bulk --start=1 1 && # 3 objects + test_commit_bulk --start=2 1 && # 3 objects + find $objdir/pack -name "*.pack" | sort >small && + test_commit_bulk --start=3 4 && # 12 objects + test_commit_bulk --start=7 8 && # 24 objects + find $objdir/pack -name "*.pack" | sort >before && + + git repack --geometric 2 -d && + + # Three packs in total; two of the existing large ones, and one + # new one. + find $objdir/pack -name "*.pack" | sort >after && + test_line_count = 3 after && + comm -3 small before | tr -d "\t" >large && + grep -qFf large after + ) +' + +test_expect_success '--geometric with small- and large-pack rollup' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + # size(small1) + size(small2) > size(medium) / 2 + test_commit_bulk --start=1 1 && # 3 objects + test_commit_bulk --start=2 1 && # 3 objects + test_commit_bulk --start=2 3 && # 7 objects + test_commit_bulk --start=6 9 && # 27 objects && + + find $objdir/pack -name "*.pack" | sort >before && + + git repack --geometric 2 -d && + + find $objdir/pack -name "*.pack" | sort >after && + comm -12 before after >untouched && + + # Two packs in total; the largest pack from before running "git + # repack", and one new one. + test_line_count = 1 untouched && + test_line_count = 2 after + ) +' + +test_expect_success '--geometric ignores kept packs' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + test_commit kept && # 3 objects + test_commit pack && # 3 objects + + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF + refs/tags/kept + EOF + ) && + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF + refs/tags/pack + ^refs/tags/kept + EOF + ) && + + # neither pack contains more than twice the number of objects in + # the other, so they should be combined. but, marking one as + # .kept on disk will "freeze" it, so the pack structure should + # remain unchanged. + touch $objdir/pack/pack-$KEPT.keep && + + find $objdir/pack -name "*.pack" | sort >before && + git repack --geometric 2 -d && + find $objdir/pack -name "*.pack" | sort >after && + + # both packs should still exist + test_path_is_file $objdir/pack/pack-$KEPT.pack && + test_path_is_file $objdir/pack/pack-$PACK.pack && + + # and no new packs should be created + test_cmp before after && + + # Passing --pack-kept-objects causes packs with a .keep file to + # be repacked, too. + git repack --geometric 2 -d --pack-kept-objects && + + find $objdir/pack -name "*.pack" >after && + test_line_count = 1 after + ) +' + +test_done